TFL bike study

I want to study Excess rentals in TfL bike sharing. First get the data on how many bikes getting hired every single day. We can get the latest data by running the following codes:

url <- "https://data.london.gov.uk/download/number-bicycle-hires/ac29363e-e0cb-47cc-a97a-e216d900a6b0/tfl-daily-cycle-hires.xlsx"

# Download TFL data to temporary file
httr::GET(url, write_disk(bike.temp <- tempfile(fileext = ".xlsx")))

## Response [https://airdrive-secure.s3-eu-west-1.amazonaws.com/london/dataset/number-bicycle-hires/2021-09-23T12%3A52%3A20/tfl-daily-cycle-hires.xlsx?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJJDIMAIVZJDICKHA%2F20211019%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20211019T093128Z&X-Amz-Expires=300&X-Amz-Signature=757cf52cad4b0585a8e8c85955a7a9c4e50fe27df85e329626695d0abf891c07&X-Amz-SignedHeaders=host]
##   Date: 2021-10-19 09:31
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 174 kB
## <ON DISK>  C:\Users\lenovo\AppData\Local\Temp\RtmpMBxQ5N\file892c703b1506.xlsx

# Use read_excel to read it as dataframe
bike0 <- read_excel(bike.temp, sheet = "Data", range = cell_cols("A:B"))

# change dates to get year, month, and week
bike <- bike0 %>%
    clean_names() %>%
    rename(bikes_hired = number_of_bicycle_hires) %>%
    mutate(year = year(day), month = lubridate::month(day, label = TRUE), week = isoweek(day))

We can easily create a facet grid that plots bikes hired by month and year.

The distributions of bikes hired per month during May and Jun in 2020 is flatter compared with that of the previous years. The standard deviations of May and June, 2020, are also higher than the previous years, reflecting the fact that there more variations among days when very few bikes were rented and days when lots of bikes were rented. This wide dispersion is the evidence of Covid-19’s significant affect and restrictionson on people’s daily travel.

We then start to reproduce the following two graphs:

Monthly changes in TFL bike rentals between 2016 and 2019
TFL bike rentals’ weekly percentage changes from the expected rentals between 2016 and 2019. The two grey shaded rectangles correspond to Q2 (weeks 14-26) and Q4 (weeks 40-52).

For both of these graphs, we calculate the expected number of rentals per week or month between 2016-2019, and then see how each week/month of 2020-2021 compares to the expected rentals, using the calculation excess_rentals = actual_rentals - expected_rentals.

The mean of the number of bicycle hired is used to calculate the expected rentals, as mean takes the whole dataset into consideration and reprsents the average of the entire data.

Additionally, we uses these links as references when creating plots:

Monthly bike rental change

We first calculate the mean of monthly number of bicycles hired between 2016 and 2019, and then get the monthly changes in TFL bike rentals using excess_rental method. We set “up” for the positive excess rental (when monthly actual_rentals is greater than expected_rentals), and “down” for the negative excess rental (when monthly actual_rentals is less than expected_rentals).

bike_month_16_19 <- bike %>%
  filter(year>=2016&year<=2019)%>%
  group_by(month) %>%
  summarise(expected_rental=mean(bikes_hired))


bike_month <- bike %>%
  filter(year>=2016) %>%
  group_by(year,month) %>%
  summarise(bike_hired_month=mean(bikes_hired),.groups = 'drop')

bike_month_comp <- merge(bike_month,bike_month_16_19,by="month") %>%
  mutate(excess_rentals = bike_hired_month - expected_rental,
         up = ifelse(bike_hired_month>expected_rental, excess_rentals, 0), #up gives the diffrence between actual and expected rentals when actual>expected
         down = ifelse(bike_hired_month<expected_rental, excess_rentals, 0)) #down gives the diffrence between actual and expected rentals when actual<expected

Then we use geom_line and geom_ribbon to generate the lines for expected rental. The green area represents the rental changes where the actual monthly number of bicycles hired is greater than the expected rental, and the red area appears when the actual monthly number of bicycles hired is less than the expected rental. The plots of monthly changes in Tfl bike rentals between 2016 and 2019 are illustrated below:

ggplot(bike_month_comp,aes(month))+
  geom_line(aes(x=month,y=expected_rental,colour="Expected",group=year),size=1)+
  geom_line(aes(x=month,y=bike_hired_month,colour="Actual",group=year))+
  geom_ribbon(aes(ymin=expected_rental,
                  ymax=expected_rental+up,group=year),alpha=0.4,fill="#7DCD85")+ #plot the areas in green using up when actual>expected
  geom_ribbon(aes(ymin=expected_rental+down,
                  ymax=expected_rental,group=year),alpha=0.4,fill="#CB454A")+ #plot the areas in red when using down actual>expected
  scale_colour_manual("",breaks=c("Expected","Actual"),values=c("blue","black"))+ 
  facet_wrap(~year)+
  theme_minimal()  +
  theme(legend.position = "none",
        plot.title =element_text(size=16, face='bold',hjust = 0,margin = margin(10,0,10,0)),
        plot.subtitle =element_text(size=16, hjust = 0), #put titles in the middle
        axis.text.x = element_text(size=10),
        axis.text.y = element_text(size=12),
        axis.ticks.x = element_line(),
        axis.ticks.y=element_line(),
        axis.title.x = element_text(size=16,face='bold'),
        axis.title.y = element_text(size=16,face='bold'),
        ) +
  labs(title = "Monthly changes in TfL bike rentals", 
       subtitle = "Expected rentals shown in blue and calculated between 2016-2019, Actual rentals shown in black", 
       caption= "Source: TfL, London Data Store",
       x="Month", y="Bike Rentals")

Weekly percentage change of bike

We then calculate the mean of weekly number of bicycles hired, and the weekly percentage change between actual and expected bike rentals between 2016 and 2019. We set “up” for the positive percentage change (when weekly actual_rentals is greater than expected_rentals), and “down” for the negative percentage change (when weekly actual_rentals is less than expected_rentals).

glimpse(bike)

## Rows: 4,051
## Columns: 5
## $ day         <dttm> 2010-07-30, 2010-07-31, 2010-08-01, 2010-08-02, 2010-08-0~
## $ bikes_hired <dbl> 6897, 5564, 4303, 6642, 7966, 7893, 8724, 9797, 6631, 7864~
## $ year        <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010~
## $ month       <ord> Jul, Jul, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug~
## $ week        <dbl> 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32~

bike_week_16_19 <- bike %>%
  filter(year>=2016&year<=2019)%>%
  group_by(week) %>%
  summarise(expected_rental=mean(bikes_hired))


bike_week <- bike %>%
  filter(year>=2016) %>%
  filter(!(year==2021&week==53))%>%
  group_by(year,week) %>%
  summarise(bike_hired_week=mean(bikes_hired),.groups = 'drop')

bike_week_comp <- merge(bike_week,bike_week_16_19,by="week") %>%
  mutate(percentage_rentals_change = (bike_hired_week - expected_rental)/expected_rental,
         up = ifelse(bike_hired_week>expected_rental, percentage_rentals_change, 0), #up gives the percentage diffrence between actual and expected rentals when actual>expected
         down = ifelse(bike_hired_week<expected_rental, percentage_rentals_change, 0)) #down gives the percentage diffrence between actual and expected rentals when actual<expected

The two grey shaded rectangles correspond to Q2 (weeks 14-26) and Q4 (weeks 40-52) are also added.

ggplot(bike_week_comp,aes(week))+
  geom_rect(aes(xmin=14,xmax=26,ymin=-0.8,ymax=1.1,group=year),colour="grey",alpha=0.002)+ #plot rectangles for Q2 
  geom_rect(aes(xmin=40,xmax=52,ymin=-0.8,ymax=1.1,group=year),colour="grey",alpha=0.002)+ #plot rectangles for Q4
  geom_rug(aes(colour=ifelse(bike_hired_week>=expected_rental,">=0","<0")),sides="b")+ #plot rug for x asis
  scale_colour_manual(values=c("#CB454A","#7DCD85"),name="Actual vs Expected ", guide=FALSE)+
  geom_line(aes(x=week,y=percentage_rentals_change,group=year),colour="black",size=0.5)+
  geom_ribbon(aes(ymin=0,
                  ymax=up,group=year),alpha=0.4,fill="#7DCD85")+ #plot the areas in green using up when actual>expected
  geom_ribbon(aes(ymin=down,
                  ymax=0,group=year),alpha=0.4,fill="#CB454A")+ #plot the areas in red using up when actual>expected
  facet_wrap(~year)+
  theme_minimal()  +
  theme(legend.position = "none",
        plot.title =element_text(size=16, face='bold',hjust = 0,margin = margin(10,0,10,0)),
        plot.subtitle =element_text(size=16, hjust = 0), #put titles in the middle
        axis.text.x = element_text(size=10),
        axis.text.y = element_text(size=12),
        axis.ticks.x = element_line(),
        axis.ticks.y=element_line(),
        axis.title.x = element_text(size=16,face='bold'),
        axis.title.y = element_text(size=16,face='bold'),
        ) +
  labs(title = "Weekly changes in TfL bike rentals", 
       subtitle = "Percentage changes from the expected level of weekly rentals", caption= "Source: TfL, London Data Store",
       x="Week", y="Percentage Change in Bike Rentals") +
  scale_x_continuous(breaks=c(0,13,26,39,53),limits=c(0,53))+
  scale_y_continuous(labels=scales::percent)

Wrong content? Edit on Github.

Monthly bike rental change

Weekly percentage change of bike

💬 Comment: