Exploring the Ford GoBike System Dataset¶

by Uchechukwu Ozoemena¶

Investigation Overview¶

This presentation will focus on insights gleaned from the Ford GoBike System dataset containing information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. Key findings were:

  • trip durations were rarely longer than 40 minutes, most of which were taken by users between the ages of 20 and 70.
  • subscribers appeared to take shorter trips than customers, probably to maximize the value of their subscription by using the bikes even for shorter distances.
  • older males tended to take trips between the hours of 4am and 5am, going against the general trend of younger users taking trips at night time.

Dataset Overview¶

The dataset contains 183412 rows and 10 columns, and was obtained via the udacity website. The columns span trip timing information including start, end, and duration; trip location information including start and end stations; and some user demographic information. I performed data wrangling to clean up missing values and convert the data into the appropriate formats, before analyzing the data to generate insightful visuals.

Note that the above cells have been set as "Skip"-type slides. That means that when the notebook is rendered as http slides, those cells won't show up.

Concentration of trip durations¶

The following visual captures the spread of trip durations on a log scale with base 3. The log scale was chosen because that transformation normalized the distribution such that the spread of the data became more visible.

The figure shows that the majority of trips last between 1 and 40 minutes. Interestingly, this concentration of results remains consistent even when viewing the distribution of trip durations by age and gender. In the next visual, the figure on the left shows the relationship between trip duration and age for the full dataset, and the figure on the right plots the same information on a smaller sample of the dataset containing 10,000 randomly selected users.

The clustering of durations at values lower than 40 remains evident in these plots. Similarly, the next visual shows similar plots but broken down by gender, and once again the clustering of durations at values lower than 40 remains evident and consistent across all gender groups.

These plots also show that the ages of users mostly fall between 20 and 70 years old.

Differences in trip duration across user types and gender groups¶

The following visual shows the absolute and relative frequency of each of these user types and gender groups.

The next visual allows us to compare the mean trip duration across the different user types and gender groups.

For all 3 gender groups, customers take longer trips than subscribers, more than twice as long for the majority of users (for male and female but not "Other"). Given that customers use the bikes less often (evidenced by their low representation in the dataset), their longer trip durations may indicate that they tend to use the bikes to cover larger distances, as opposed to subscribers who use the bikes for shorter commutes to maximize the value of their subscription.

Relationship between trip timing, age and gender¶

The following visual shows the relationship between of age and trip start/end time.

The most notable trend occurs between the ages of 31 and 37, where start/end times occur increasingly earlier in the day as the age increases. The figure also indicates that younger users tend to use the bikes late at night and shortly after midnight, whereas the more middle aged users tend to use the bikes when the sun is up. There's a visible spike in average age of early morning riders, though it's not clear what may be driving that occurrence.

The following charts explore that relationship further by breaking down the data across gender groups.

Interestingly, these charts reveal that males were responsible for the spike in average age of early morning riders seen in the previous plots. There was no such spike for the other gender groups. The trend of younger riders taking later rides is still evident across males and females, though it's more pronounced with males. Among the "Other" group, the age of users only dropped noticeably for morning rides, indicating that the middle aged and older users in this group were more willing to take rides well into the evening.

Recap of key findings¶

  • trip durations were rarely longer than 40 minutes, most of which were taken by users between the ages of 20 and 70.
  • subscribers appeared to take shorter trips than customers, probably to maximize the value of their subscription by using the bikes even for shorter distances.
  • older males tended to take trips between the hours of 4am and 5am, going against the general trend of younger users taking trips at night time.

Thank you for your time and attention!¶

You can find all the steps and code used to generate this report in the accompanying Jupyter Notebook.