library(tidyverse)
library(lubridate)
For a more detailed discussion see Dates and Times Made Easy with lubridate.
At first glance working with dates and times doesn’t seem so complicated, but consider the following questions:
The answer to all these questions is: NO
in principle every year divisible by 4 is a leap year, except if it is also divisible by 100, but not if also divisible by 400! So 1800 and 1900 were not leap years. 2000 was.
In countries that have Summer Time there are two days with 23 and 25 hours, respectively.
Even the above is not enough to bring the time it takes the earth to orbit the sun in perfect alignment with the calender year, so every now and then there is a minute that has 61 seconds, called a leap second. Since this system of correction was implemented in 1972, 27 leap seconds have been inserted, the most recent on December 31, 2016 at 23:59:60.
There are also many regional differences in how date and time are written:
Imagine you need to analyse some stock market data, starting from 1980 to today and in second intervals. You would need to include all of these details!
to get todays time and date:
today()
## [1] "2018-11-12"
now()
## [1] "2018-11-12 15:48:17 -03"
there are a number of ways to create a specific date object from a string:
ymd("2018-04-29")
## [1] "2018-04-29"
mdy("April 29th, 2018")
## [1] "2018-04-29"
dmy("29-April-2018")
## [1] "2018-04-29"
this also works:
ymd(20180429)
## [1] "2018-04-29"
to add time info use an underscore and the format:
ymd_hm("2018-04-29 2:30 PM")
## [1] "2018-04-29 14:30:00 UTC"
dmy_hms("29-April-2018 2:30:45 PM")
## [1] "2018-04-29 14:30:45 UTC"
As an example we will use the data set flights in the package nycflights13. It has airline on-time data for all flights departing NYC in 2013.
library(nycflights13)
flights %>%
print(n=4)
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time dep_delay arr_time
## <int> <int> <int> <int> <int> <dbl> <int>
## 1 2013 1 1 517 515 2 830
## 2 2013 1 1 533 529 4 850
## 3 2013 1 1 542 540 2 923
## 4 2013 1 1 544 545 -1 1004
## # ... with 3.368e+05 more rows, and 12 more variables:
## # sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Let’s start by calculating the hour and minutes of the departure from the dep_time. For this we can use %/% for integer devision and %% for modulo:
12.34 %% 2
## [1] 0.34
12.34 %/% 2
## [1] 6
2 * (12.34 %/% 2) + 12.34 %% 2
## [1] 12.34
with this we find
flights %>%
mutate(hour=dep_time %/% 100,
minute=dep_time %% 100) ->
flights
In this tibble the parts of the time and date info are in several columns. Let’s start by putting them together:
flights %>%
select(year, month, day, hour, minute) %>%
print(n=4)
## # A tibble: 336,776 x 5
## year month day hour minute
## <int> <int> <int> <dbl> <dbl>
## 1 2013 1 1 5 17
## 2 2013 1 1 5 33
## 3 2013 1 1 5 42
## 4 2013 1 1 5 44
## # ... with 3.368e+05 more rows
To combine the different columns into one date/time object we can use the command make_datetime:
flights %>%
select(year, month, day, hour, minute) %>%
mutate(departure =
make_datetime(year, month, day, hour, minute)) ->
flights
flights %>%
select(departure) %>%
print(n=4)
## # A tibble: 336,776 x 1
## departure
## <dttm>
## 1 2013-01-01 05:17:00
## 2 2013-01-01 05:33:00
## 3 2013-01-01 05:42:00
## 4 2013-01-01 05:44:00
## # ... with 3.368e+05 more rows
lubridate has a number of functions to do arithmetic with dates. For example, my age is
today()
## [1] "2018-11-12"
my.age <- today() - ymd(19610602)
as.duration(my.age)
## [1] "1812844800s (~57.45 years)"