Dealing with Dates in Your Data - dummies

Dealing with Dates in Your Data

By Nikhil Abraham

Dates can present problems in data. For one thing, dates are stored as numeric values. However, the precise value of the number depends on the representation for the particular platform and could even depend on the users’ preferences. For example, Excel users can choose to start dates in 1900 or 1904. The numeric encoding for each is different, so the same date can have two numeric values depending on the starting date.

In addition to problems of representation, you also need to consider how to work with time values. Creating a time value format that represents a value the user can understand is hard. For example, you might need to use Greenwich Mean Time (GMT) in some situations but a local time zone in others. Transforming between various times is also problematic.

Formatting date and time values

Obtaining the correct date and time representation can make performing analysis a lot easier. For example, you often have to change the representation to obtain a correct sorting of values. Python provides two common methods of formatting date and time.

The first technique is to call str(), which simply turns a datetime value into a string without any formatting. The strftime() function requires more work because you must define how you want the datetime value to appear after conversion. When using strftime(), you must provide a string containing special directives that define the formatting.

Now that you have some idea of how time and date conversions work, it’s time to see an example. The following example creates a datetime object and then converts it into a string using two different approaches:

import datetime as dt

now = dt.datetime.now()

print str(now)

print now.strftime('%a, %d %B %Y')

In this case, you can see that using str() is the easiest approach. However, as shown by the following output, it may not provide the output you need. Using strftime() is infinitely more flexible.

2017-01-16 17:26:45.986000

Mon, 16 January 2017

Using the right time transformation

Time zones and differences in local time can cause all sorts of problems when performing analysis. For that matter, some types of calculations simply require a time shift in order to get the right results. No matter what the reason, you may need to transform one time into another time at some point. The following examples show some techniques you can employ to perform the task.

import datetime as dt

now = dt.datetime.now()

timevalue = now + dt.timedelta(hours=2)

print now.strftime('%H:%M:%S')

print timevalue.strftime('%H:%M:%S')

print timevalue - now

The timedelta() function makes the time transformation straightforward. You can use any of these parameter names with timedelta() to change a time and date value:

  • days
  • seconds
  • microseconds
  • milliseconds
  • minutes
  • hours
  • weeks

You can also manipulate time by performing addition or subtraction on time values. You can even subtract two time values to determine the difference between them. Here’s the output from this example:

17:44:40

19:44:40

2:00:00

Notice that now is the local time, timevalue is two time zones different from this one, and there is a two-hour difference between the two times. You can perform all sorts of transformations using these techniques to ensure that your analysis always shows precisely the time-oriented values you need.