Case Study - How Does a Bike-Share Navigate Speedy Success?

Part of the Google Analystics Case Study Project.

by Whoong Zi Wei


First Phase: Identify the business task.

In this case study, Cyclistic is a fictional company that offers a bike-share program that features more than 5,800 bicycles and 600 docking stations across Chicago. The director, Lily Moreno, has a goal to convert casual riders into annual members in order generate more revenue for the company. In order to achieve this, I am requested to analyze how annual members and casual riders use Cyclistic bikes differently. I am provided with a dataset that spans for 1 year to analyze and identify trends and generate insights. Casuals are identified as those who opted for single-ride passes or full-day passes, whereas members are identified as those who opted for an annual subscription.

In order to achieve my goal, I have narrowed down the questions to the following:


Prepare Phase

The data to be used for this analysis is collected by Cyclistic, and the dataset spans from April 2020 to April 2021, which is exactly one year in length. For this case study, the dataset is directly obtained from Cyclistic's own servers and it is in csv format within zip folders. (For the purpose of this exercise, this shall be the appropriate assumption). The real origin of this dataset is provided by Motivate International Inc. under this license. As this is a public data, the privacy of these data are protected and there won't be information where it would allow me to trace back to the original rider.

The data has these following fields (or columns):

As this is internal data, it is safe to assume that these data are unbiased and credible. Now, let's proceed to the process phase.

Process Phase

Tools to be used: Python.

For the entirety of this analysis, Python shall be used as the dataset has more than 3 million rows and would be unsuitable to be used on a spreadsheet software.

Data Cleaning Process

Visualize missing values

Let's get ride_duration so as to aid the cleaning process

Ensuring that columns with datetime information are properly in datetime format

Let's take a look at the missing values on end_stations and see why there are missing values


Looking at the first 15 rows of missing values inside the columns regarding end_stations, it seems that it is missing either due to the customer canceled their ride or other reasons


Let's now take a look at why start_stations are missing

Seems mixed as to why they are missing. Some, with just seconds of ride_duration can be safely assumed that the rider must've changed his mind. Now I will drop rows that have both start and end station empty.


There are more 86,439 rows with all four of the start and end station id/name missing in values. As a result, I cannot safely determine whether the riders had merely changed their minds or there was a techincal error back then.

After much deliberation and checking on the internet, I've realized that these missing values won't affect my analysis and conclusions, as further on I've also realized I should only be concerned on values that are in the negatives on the ride duration.


Now I will determine what day it was borrowed on that week and whether it is a weekend or weekday


One last thing before proceeding to the analyze phase, I will now remove the underscore in the rideable_type column and capitalize the first letters to make it look better for visualization later on.

Analyze Phase

As a refresher, these are the questions I am interested in:


First Question: What is the total number of trips for members and casuals, and what proportion of total trips do they represent?


Let's look at the unique counts of members and casuals.

There are 2,260,001 members and 1,566,977 casuals. Evidently, over the 12 months of Cyclistic's bike program there are more members than casuals. About 59% of all trips were made by members, whereas nearly 41% of trips were undertaken by casual riders


Second Question: What are the average ride lengths for members and casuals respectively?

Let's take a look at the summary statistics:

The minimum is actually on the negative side. Let's check how many of them are actually negative values.

There seems to be 10,976 rows of data that have less than 0 seconds, meaning these may be either due to techincal error or human error. I will exclude data that are less than 0 seconds to generate a more accurate insight.

Now that I've excluded the ones that have negative values on the ride_duration, the statistical summary makes more sense now. The average that the customers typically rides per session is 27 minuters, which is quite reasonable.


What is the average ride duration of casual riders and member riders?

Surprisingly, casual riders' average as compared to member riders is longer! Therefore, I can deduce that members don't actually ride longer than casual riders, but the other way around instead!

Finally, let's now calculate the average ride time (minutes) on each month for each rider type

It seems that riders ride the longest in terms of average from February until September. And from October until January there's a lower average ride duration for both member and casual riders.

Let's now look at whether the day of the week has any correlation with how frequent casual/member riders would rent bikes.

As we can see above, most riders prefer to cycle on a Saturday. As a result, this is an interesting insight that may be useful later on.


Let's now take a look if there's a difference between when member/casual riders tend to ride.

Interesting! If the company ever wants to perform an advertisement to convince casual riders to convert into annual membership, I can suggest that they should emphasize doing it more on Saturdays! Also, it seems that both members and casual riders prefer to ride on Saturdays, therefore doing an advertisement on a Saturday not only enables the company to potentially convince casual riders to subscribe to an annual subsription, but also retain the member riders to renew their membership!

This coincidentally is the fourth question that I wanted to answer to, which is: Are there particular days of the week on which most rides take place? Therefore I will now move on to the next question and after that I will start tackling the fifth and last question of my business objectives.


Next Question: What are the most common starting and ending stations for each?

To identify this, let's take a look at their respective stations' countplot.

These top 5 locations are the most rented in both the start and end stations if we account both members and casual riders.

Let's isolate member riders as we are interested more on how to convert casual riders into members.

Last Question: Is there a preference for the rideable type for members and casuals?


Let's examine the rideable_type column!

As can be seen above, no matter if it's member or casual, the docked bike is the most popular, followed by electric bikes and classic bikes.

This data doens't seem accurate if we look at it overall. Let's look at it by by month instead.

It seems that August 2020 was the most busiest in terms of how many people rented the bicycles. Docked bikes were the most used from April 2020 to November 2020. However after November 2020, it seems to have lost its popularity. With the summary statistics, I can confirm that from April-June 2020, there seems to be only data pertaining docked bikes. It also seems that electric bikes are slowly becoming popular starting from August 2020. And for some reason, classic bikes seems to be growing in popularity starting from December 2020.


To get a better picture, let's visualize popular rideable type bikes per month and separate them by whether they are members or casuals.

It seems that both casual and member riders have all suddenly preferred to ride on classic bikes starting on December 2020 and slowly increase in popularity until April 2021.

Let's take a look at how many rider types overall rode in each month.

With this chart, it seems that beginning in April the number of trips starts to rise and peaks at August. After August, the number of trips starts to fall.


Act Phase (Conclusion)

From the data that I have examined, I will give these recommendations:

  1. As casual riders actually ride longer in average than members based on the data, the potential to actually convince them to subscribe the annual membership is quite high. Therefore, if the company were to offer a special promotion offer for the annual subscription targeting casual riders, they are very highly likely to be enticed by the offer. If the company focuses on convincing casual riders in using their services beyond just on the weekends, that would increase the chances of converting a casual rider into a member.

  2. It is recommended that the company focus on the top 5 stations that are both the most frequented start and end stations. This would increase the company's advertisement exposure. If the company's main focus is to target only the most commonly rented start and end station, then the stations Streeter Dr & Grand Ave Clark St & Elm St are the ones they should heavily target on as their volume of rentee is the highest among the top 5 stations.

  3. As riders, both members and casuals, frequently ride their bikes the most on a Saturday, it is recommended that the company gives its best to promote its advertisement program on these days in order to not only reach the casual riders, but also potentially retain members to renew their annual subscription.

  4. The reason why average ride duration for members are lower than casuals may be due to the fact that they rent bikes for commutation purposes. My hypothesis is that as people who commute with docked bikes (which happens to be the most rented bike among the three types of bikes available in this data) typically do so only if the distance between their house and workplace is short. Therefore, the average ride duration is lower than casual riders. With this in mind, that means casual riders typically ride bikes for leisure purposes. As such, it is advisable for the company to focus their advert campaigns with this new knowledge in mind so as to make it more appealing to casual riders.

  5. At first, it would seem that Classic Bike is the most unpopular bike among both members and casual riders when the data is viewed as a whole. However, on closer inspection classic bikes seem to be growing in popularity starting on December 2020 and it is at its peak popularity on April 2021 with the available data. Docked bikes have lost its popularity due to some reason, maybe either it is due to the climate or the ease of pandemic restrictions starting on April. It is recommended that the company focus on promoting either electric bikes or classic bikes from now on as the data seems to indicate these two bikes are growing in popularity at the start of 2021.

  6. It seems that the number of trips taken by both member and casual riders at around September has started to decline. However, it started to jump back up high volumns starting in February. This may be of interest for the company as this behaviour seems to indicate that riders tend to ride more starting from March until August, and lesser from September until February.