1 Overview

In this tutorial, we’ll cover how to work with dates and times in R using the lubridate package that is included in the tidyverse.

1.1 Packages (and versions) used in this document

## [1] "R version 4.0.2 (2020-06-22)"
##    Package Version
##  tidyverse   1.3.0
##  lubridate   1.7.9
##      dplyr   1.0.1

2 Importing/Manipulating Date and Time Data

Before we dive right into dates and times in R, let’s look at a few handy functions that will help you get an idea of how date and time values should look.

today()
## [1] "2020-08-11"
now()
## [1] "2020-08-11 16:02:50 CDT"
Sys.timezone()
## [1] "America/Chicago"

As you can see from the results of the previous code, R returns the date in the “year-month-day” format and the time in “hour-minute-second” in military (24 hour) format with the timezone.

2.1 parse_date_time()

One of the most common issues that you may run into when trying to import data that contains date or time information is that these pieces of information will be read in as character vectors or that they will be parsed incorrectly. The lubridate package makes this problem easy to solve. It has built in functions to handle many of the commonly encountered formats for dates and times. However, we’ll start out with the more general function to help us get accustomed to how the functions in the lubridate package work and the arguments that they’ll take.

parse_date_time(x = , orders = , tz = )

As you can see, parse_date_time() has three main arguments (there are other arguments detailed in the R documentation): x = is where we’ll specify the date(s) and time(s) that we want to work with, orders = is how we’ll specify the formatting of the info in x = (we’ll get into this in more depth shortly), and tz = is the timezone for the info in x =.

Let’s say that we want to pass in the date of the signing of the Declaration of Independence, we can easily do that.

Dec_of_Ind <- "7-4-1776"
Dec_of_Ind
## [1] "7-4-1776"
class(Dec_of_Ind)
## [1] "character"

We now have a character string of the date that we want to work with. Now we need to parse it with the parse_date_time() function. Note: For now we’ll ignore tz =.

Dec_of_Ind <- parse_date_time(x = Dec_of_Ind, orders = "mdy")
Dec_of_Ind
## [1] "1776-07-04 UTC"
class(Dec_of_Ind)
## [1] "POSIXct" "POSIXt"

As you can see, this took our date and rearranged it. That is because the agreed upon international standard for dates is to go from the biggest unit down to the smallest - Year, Month, Day, Hour, Minute, Second.

Using parse_date_time(), we can also read in dates that are in different formats. For example, we can look at the dates of the first fully functional digital computer and the release of the Apple II.

ENIAC <- "Feb 15th 1946"
AppleII <- "1977-10-6"

We can parse these dates together in the same line of code, but we need to specify both orders for the data.

parse_date_time(x = c(ENIAC, AppleII), orders = c("mdy", "ydm"))
## [1] "1946-02-15 UTC" "1977-06-10 UTC"

As you can see, these functions are able to read in both numeric and text entries (as long as they’re formatted in the typical English way).

2.1.1 Orders

You may have already figured out what the code in the orders = argument is referencing:

  • y = year
  • m = month
  • d = day

but there are other options as well:

  • Y = year (without century)
  • H = hours
  • M = minutes
  • S = seconds

Let’s take a look at some dates associated with the first moon landing.

Apollo_launch <- "July 16, 1969 13:32:00"
Apollo_moon_land <- "July 20, 1969 20:17:00"
Apollo_moon_walk <- "July 21, 1969 02:56:15"
Apollo_earth_land <- "July 24, 1969 16:50:35"

Apollo <- c(Apollo_launch, Apollo_moon_land, Apollo_moon_walk, Apollo_earth_land)

2.1.2 Time zones

We’ll now parse these objects into R as date-time objects and go ahead and specify the timezone that we want this data to be in:

Apollo11 <- parse_date_time(x = Apollo, orders = "mdyHMS", tz = "UTC")
Apollo11
## [1] "1969-07-16 13:32:00 UTC" "1969-07-20 20:17:00 UTC"
## [3] "1969-07-21 02:56:15 UTC" "1969-07-24 16:50:35 UTC"

Now you’ve probably noticed that “UTC” is the default timezone when parsing date-time data. UTC stands for Coordinated Universal Time.

But what if we wanted to put the Apollo times into my local timezone?

parse_date_time(x = Apollo, orders = "mdyHMS", tz = "America/Chicago")
## [1] "1969-07-16 13:32:00 CDT" "1969-07-20 20:17:00 CDT"
## [3] "1969-07-21 02:56:15 CDT" "1969-07-24 16:50:35 CDT"

As you can see, that changed the timezone, but it didn’t adjust the actual times (because they should’ve been in UTC). The with_tz() function will take whatever time (or vector of times) that you give it and change them to be in the timezone that you specify.

with_tz(time = Apollo11, tzone = "America/Chicago")
## [1] "1969-07-16 08:32:00 CDT" "1969-07-20 15:17:00 CDT"
## [3] "1969-07-20 21:56:15 CDT" "1969-07-24 11:50:35 CDT"

If however, you are just wanting to change the timezone but not adjust the actual time values, you can use the force_tz() function.

force_tz(time = Apollo11, tzone = "America/Chicago")
## [1] "1969-07-16 13:32:00 CDT" "1969-07-20 20:17:00 CDT"
## [3] "1969-07-21 02:56:15 CDT" "1969-07-24 16:50:35 CDT"

3 Specific Parsing Functions

Lubridate also has many built in functions for the common date-time formats that you will encounter. These include dates:

  • ymd()
  • ydm()
  • myd()
  • mdy()
  • dmy()
  • dym()

times:

  • hms()
  • hm()
  • ms()

and date-times:

  • ymd_hms()
  • ymd_hm()
  • ymd_h()
  • etc.

Now, back to our Apollo11 data. Instead of using parse_date_time(), we can use the appropriate specific function:

mdy_hms(Apollo, tz = "UTC")
## [1] "1969-07-16 13:32:00 UTC" "1969-07-20 20:17:00 UTC"
## [3] "1969-07-21 02:56:15 UTC" "1969-07-24 16:50:35 UTC"

You’ll get an error if you try to use the wrong function, so you don’t need to worry too much about accidentally using the wrong function and not noticing it.

4 Retrieving elements

4.1 Day

Let’s assume that we know that the Apollo 11 mission occurred in July of 1969, and we only want to know information about the days that it occurred. That may seem simplistic, but we have a good amount of options for how we may want to look at the days.

First, the raw day information in the data.

day(Apollo11)
## [1] 16 20 21 24

Information about which day of the week.

wday(Apollo11)
## [1] 4 1 2 5

What day is coded as 1? Sunday or Monday? We can add the label = TRUE argument to get our answer.

wday(Apollo11, label = TRUE)
## [1] Wed Sun Mon Thu
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

The day relative to the quarter of the year (i.e., jan-mar, apr-jun, jul-sep, oct-dec).

qday(Apollo11)
## [1] 16 20 21 24

And the day relative to the year.

yday(Apollo11)
## [1] 197 201 202 205

We’ve looked at our options for days pretty extensively, but we have functions for other elements as well.

4.2 Year

We can get the raw year information.

year(Apollo11)
## [1] 1969 1969 1969 1969

Whether the years being examined are leap years.

leap_year(Apollo11)
## [1] FALSE FALSE FALSE FALSE

Which quarter of the year the dates are from.

quarter(Apollo11)
## [1] 3 3 3 3

And which half of the year the dates are from.

semester(Apollo11)
## [1] 2 2 2 2

4.3 Month

We can also ask which months are included in the dates. Again, we can specify whether we want them to have a label.

month(Apollo11)
## [1] 7 7 7 7
month(Apollo11, label = TRUE)
## [1] Jul Jul Jul Jul
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

4.4 Time

We also have a variety of options for retrieving information pertaining to time data. Such as, which hour of the day the data is from.

hour(Apollo11)
## [1] 13 20  2 16

Which minute.

minute(Apollo11)
## [1] 32 17 56 50

Which second.

second(Apollo11)
## [1]  0  0 15 35

We can also ask for the time zone used.

tz(Apollo11) # tz = timezone
## [1] "UTC"

Whether the times occured in the first half of the day.

am(Apollo11)
## [1] FALSE FALSE  TRUE FALSE

Or the second half.

pm(Apollo11)
## [1]  TRUE  TRUE FALSE  TRUE

And whether daylight savings time was in effect during the date.

dst(Apollo11) # if daylight savings was in effect
## [1] FALSE FALSE FALSE FALSE

5 Working with Date-Time Data

5.1 Changing elements

The above functions are useful for examining our data, but they can also be used to change our data. For example, if we came to the realization that we had entered our data incorrectly and that the Apollo11 mission actually happened in 1968 and not 1969.

year(Apollo11) <- 1968
Apollo11
## [1] "1968-07-16 13:32:00 UTC" "1968-07-20 20:17:00 UTC"
## [3] "1968-07-21 02:56:15 UTC" "1968-07-24 16:50:35 UTC"

But we know that isn’t true, so let’s put it back.

year(Apollo11) <- 1969
Apollo11
## [1] "1969-07-16 13:32:00 UTC" "1969-07-20 20:17:00 UTC"
## [3] "1969-07-21 02:56:15 UTC" "1969-07-24 16:50:35 UTC"

5.2 Rounding Elements

You might also encounter situations in which you wish to round elements of your date-time data. Whether you wish to round seconds, minutes, or days, lubridate has you covered.

5.2.1 round_date()

This function will round to the nearest specified element. For example:

round_date(x = Apollo11, unit = "minute")
## [1] "1969-07-16 13:32:00 UTC" "1969-07-20 20:17:00 UTC"
## [3] "1969-07-21 02:56:00 UTC" "1969-07-24 16:51:00 UTC"

This will round our values to the nearest minute - notice that the seconds are all equal to zero now.

5.2.2 ceiling_date()

This function will round up to the specified element.

ceiling_date(x = Apollo11, unit = "minute")
## [1] "1969-07-16 13:32:00 UTC" "1969-07-20 20:17:00 UTC"
## [3] "1969-07-21 02:57:00 UTC" "1969-07-24 16:51:00 UTC"

This code rounded all our seconds up to the next minute - notice that the third time changed from round_date().

5.2.3 floor_date()

This function will round down to the specified element.

floor_date(x = Apollo11, unit = "minute")
## [1] "1969-07-16 13:32:00 UTC" "1969-07-20 20:17:00 UTC"
## [3] "1969-07-21 02:56:00 UTC" "1969-07-24 16:50:00 UTC"

This code rounded all of our seconds down to the previous minute - notice that our fourth time changed from round_date().

We don’t just have to use minutes, though. We can also use:

  • “second”
  • “hour”
  • “day”
  • “week”
  • “month”
  • “quarter”
  • “halfyear”
  • “year”

We can also specify multiples of any units - which is pretty nifty. For example:

round_date(x = Apollo11, unit = "6 hours")
## [1] "1969-07-16 12:00:00 UTC" "1969-07-20 18:00:00 UTC"
## [3] "1969-07-21 00:00:00 UTC" "1969-07-24 18:00:00 UTC"

Notice that all of our times are now in multiples of six hours.

6 Arithmetic and Interval Operations with Date-Time Data

Up to this point, you may have been asking yourself what the point of worrying about our date-time data is. Isn’t it just for keeping track of when an observation was recorded? It is helpful for that, but we can also perform some handy functions such as arithmetic and interval operations.

6.1 Arithmetic with date-time data

For example, what if we wanted to know how long the Apollo11 mission lasted from launch to return?

difftime(time1 = Apollo11[4], time2 = Apollo11[1])
## Time difference of 8.137905 days

We can also change the unit that the time is displayed in:

difftime(time1 = Apollo11[4], time2 = Apollo11[1],
         units = "mins")
## Time difference of 11718.58 mins

How long has it been since the Appollo11 mission ended?

difftime(time1 = now(), time2 = Apollo11[4])
## Time difference of 18646.18 days

6.2 Interval operations with date-time data

Intervals of time have a specific start and end time. We have a few ways to create intervals. Let’s try them out on some dates from the American Revolution.

head(revolution, n = 1)
## # A tibble: 1 x 2
##   event           date         
##   <chr>           <chr>        
## 1 Boston Massacre March 5, 1770
tail(revolution, n = 1)
## # A tibble: 1 x 2
##   event           date             
##   <chr>           <chr>            
## 1 Treaty of Paris September 3, 1783

First, we need to make sure that the dates are formatted properly. Here we’re just using the tidyverse pipe (%>%) method of manipulating our data.

revolution <- revolution %>% 
  mutate(date = mdy(date))

Our first way of creating an interval is to use the %--% operator between the start and end elements of our desired interval.

int_amrev <- revolution$date[1] %--% revolution$date[18]
int_amrev
## [1] 1770-03-05 UTC--1783-09-03 UTC

Alternatively, we can use the interval() function with the same two elements.

int_amrev <- interval(revolution$date[1], revolution$date[18])
int_amrev
## [1] 1770-03-05 UTC--1783-09-03 UTC

Now we have an interval that ranges from March 5, 1770 to September 3, 1783. What are some things that we can do with it? One, we can see how long of an interval it is. We have a few ways of determining this information. First, we can use int_length(), which will return the length of this interval in seconds. If you wish to get the interval length in a different time increment, you can divide the result by the necessary number. Or you can ask for the interval as a period (as.period()) or a duration (as.duration()) - we’ll talk about the differences between these two things later on.

int_length(int_amrev) # this gives seconds
## [1] 425952000
as.period(int_amrev)
## [1] "13y 5m 29d 0H 0M 0S"
as.duration(int_amrev)
## [1] "425952000s (~13.5 years)"

We can also check to see if any other important events happened during this time frame with the %within% operator. Such as the Declaration of Independence or Ben Franklin’s kite experiment.

Dec_of_Ind %within% int_amrev
## [1] TRUE
kite <- ymd("1752-6-1")
kite %within% int_amrev
## [1] FALSE

We can also check to see if two different intervals overlap each other with the int_overlaps() function. For example, the American Revolutionary War and the French Revolution (we’ll just create this in the code below).

int_overlaps(int1 = int_amrev, int2 = 
               interval(ymd("1789-5-5"), ymd("1799-11-9")))
## [1] FALSE

Nope. They don’t overlap.

7 Important Additional Considerations

7.1 Daylight savings and leap years

Up to this point, you may or may not have considered the impact that daylight savings time and leap years will have on the work that you do with dates and times. For example, this year (2019) DST will end on November 3rd. Let’s use the skills we’ve developed up to this point to do a little experiment - is November 4th the same length as November 3rd?

First, let’s create our dates that we’ll need.

nov_4 <- ymd_hms("2019-11-4 23:59:59", tz = "America/Chicago")
nov_3 <- ymd_hms("2019-11-3 23:59:59", tz = "America/Chicago")
nov_2 <- ymd_hms("2019-11-2 23:59:59", tz = "America/Chicago")

Next, let’s run our test. We’ll use difftime() to figure out the true length of November 4th and November 3rd - in that order.

difftime(time1 = nov_4, time2 = nov_3, units = "hours")
## Time difference of 24 hours
difftime(time1 = nov_3, time2 = nov_2, units = "hours")
## Time difference of 25 hours

Well that’s odd, isn’t it? Not if you’re keeping daylight savings time in mind, but some people may not keep these considerations in mind when analyzing date-time data. Thankfully, lubridate has built-in functionality to deal with these sorts of issues, but you need to make sure that you’re using the correct functions. This issue will lead us into our next topics - periods and durations.

7.4 Months

As you know, not all months are created equal - at least where length is concerned. This can cause some rather unexpected issues if you aren’t careful with your programming. Let’s demonstrate this with the following code.

ymd("2019-jan-31") + months(1)
## [1] NA

What happened?
Well, since we created a period of 1 month and tried to add that to January 31st, R tried to make it February 31st; however, that doesn’t exist, so it returned an NA value. To solve circumstances like this, we have a few special operators in the lubridate package: %m+% and %m-%. Let’s try our earlier demonstration again using one of our new operators.

ymd("2019-jan-31") %m+% months(1)
## [1] "2019-02-28"

Wonderful! What would happen for a few other days?

ymd("2019-march-30") %m-% months(1)
## [1] "2019-02-28"
ymd("2019-jan-29") %m+% months(1)
## [1] "2019-02-28"

As you can see, this operator will return the closest possible value for the month.