library(tidyverse)
library(lubridate)

For a more detailed discussion see Dates and Times Made Easy with lubridate.

At first glance working with dates and times doesn’t seem so complicated, but consider the following questions:

The answer to all these questions is: NO

There are also many regional differences in how date and time are written:

Imagine you need to analyse some stock market data, starting from 1980 to today and in second intervals. You would need to include all of these details!

Create a date object

to get todays time and date:

today()
## [1] "2018-11-12"
now()
## [1] "2018-11-12 15:48:17 -03"

there are a number of ways to create a specific date object from a string:

ymd("2018-04-29")
## [1] "2018-04-29"
mdy("April 29th, 2018")
## [1] "2018-04-29"
dmy("29-April-2018")
## [1] "2018-04-29"

this also works:

ymd(20180429)
## [1] "2018-04-29"

to add time info use an underscore and the format:

ymd_hm("2018-04-29 2:30 PM")
## [1] "2018-04-29 14:30:00 UTC"
dmy_hms("29-April-2018 2:30:45 PM")
## [1] "2018-04-29 14:30:45 UTC"

As an example we will use the data set flights in the package nycflights13. It has airline on-time data for all flights departing NYC in 2013.

library(nycflights13)
flights %>% 
  print(n=4)
## # A tibble: 336,776 x 19
##    year month   day dep_time sched_dep_time dep_delay arr_time
##   <int> <int> <int>    <int>          <int>     <dbl>    <int>
## 1  2013     1     1      517            515         2      830
## 2  2013     1     1      533            529         4      850
## 3  2013     1     1      542            540         2      923
## 4  2013     1     1      544            545        -1     1004
## # ... with 3.368e+05 more rows, and 12 more variables:
## #   sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## #   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Let’s start by calculating the hour and minutes of the departure from the dep_time. For this we can use %/% for integer devision and %% for modulo:

12.34 %% 2
## [1] 0.34
12.34 %/% 2
## [1] 6
2 * (12.34 %/% 2) + 12.34 %% 2
## [1] 12.34

with this we find

flights %>% 
  mutate(hour=dep_time %/% 100,
         minute=dep_time %% 100) ->
  flights

In this tibble the parts of the time and date info are in several columns. Let’s start by putting them together:

flights %>% 
  select(year, month, day, hour, minute) %>% 
  print(n=4)
## # A tibble: 336,776 x 5
##    year month   day  hour minute
##   <int> <int> <int> <dbl>  <dbl>
## 1  2013     1     1     5     17
## 2  2013     1     1     5     33
## 3  2013     1     1     5     42
## 4  2013     1     1     5     44
## # ... with 3.368e+05 more rows

To combine the different columns into one date/time object we can use the command make_datetime:

flights %>% 
  select(year, month, day, hour, minute) %>% 
  mutate(departure = 
      make_datetime(year, month, day, hour, minute)) ->
  flights
flights %>% 
  select(departure) %>% 
  print(n=4)
## # A tibble: 336,776 x 1
##   departure          
##   <dttm>             
## 1 2013-01-01 05:17:00
## 2 2013-01-01 05:33:00
## 3 2013-01-01 05:42:00
## 4 2013-01-01 05:44:00
## # ... with 3.368e+05 more rows

Time Spans

lubridate has a number of functions to do arithmetic with dates. For example, my age is

today()
## [1] "2018-11-12"
my.age <- today() - ymd(19610602)
as.duration(my.age)
## [1] "1812844800s (~57.45 years)"