R Project: Delay and Weather

By Joseph Schmuller

Try out this R project to see how one variable might affect an outcome. It’s conceivable that weather conditions could influence flight delays. How do you incorporate weather information into the assessment of delay?

One nycflights13 data frame called weather provides the weather data for every day and hour at each of the three origin airports. Here’s a glimpse of exactly what it has:

> glimpse(weather,60)
Observations: 26,130
Variables: 15
$ origin      "EWR", "EWR", "EWR", "EWR", "EWR", "...
$ year        2013, 2013, 2013, 2013, 2013, 2013, ...
$ month       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ day         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ hour        0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 1...
$ temp        37.04, 37.04, 37.94, 37.94, 37.94, 3...
$ dewp        21.92, 21.92, 21.92, 23.00, 24.08, 2...
$ humid       53.97, 53.97, 52.09, 54.51, 57.04, 5...
$ wind_dir    230, 230, 230, 230, 240, 270, 250, 2...
$ wind_speed  10.35702, 13.80936, 12.65858, 13.809...
$ wind_gust   11.918651, 15.891535, 14.567241, 15....
$ precip      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ pressure    1013.9, 1013.0, 1012.6, 1012.7, 1012...
$ visib       10, 10, 10, 10, 10, 10, 10, 10, 10, ...
$ time_hour   2012-12-31 19:00:00, 2012-12-31 20:...

So the variables it has in common with flites_name_day are the first six and the last one. To join the two data frames, use this code:

flites_day_weather <- flites_day %>%
  inner_join(weather, by = c("origin","year","month","day","hour","time_hour"))

Now you can use flites_day_weather to start answering questions about departure delay and the weather.

What questions will you ask? How will you answer them? What plots will you draw? What regression lines will you create? Will scale() help?

And, when you’re all done, take a look at arrival delay (arr_delay).