This project focuses on analyzing farebox recovery, the fraction of transit revenue generated from fares, using data from the National Transit Database (NTD). The analysis examines key metrics, including farebox revenues, operating expenses, ridership (total number of trips), and vehicle miles traveled (VMT). Given the disruptions caused by the COVID-19 pandemic, the data reflects some post-pandemic irregularities. For consistency, we utilize 2022 data across all categories, recognizing the lag in reporting. The goal is not to make long-term forecasts but to gain insights into the performance and financial sustainability of various transit agencies.
Total Vehicle Revenue Miles (VRM) by Transit Agency
In this sample, the transit agency with the most total Vehicle Revenue Miles (VRM) was MTA New York City Transit, which reported a total of 10,832,855,350 VRM. This highlights the immense scale of operations for New York City’s transit system, reflecting its critical role in serving one of the largest metropolitan areas in the United States.
Total Vehicle Revenue Miles (VRM) by Transit Mode
When analyzing the data by transit mode, the Motorbus mode had the most significant operational miles, with a total of 49,444,494,088 VRM. This indicates that Motorbuses are the most utilized form of transportation in terms of distance covered.
NYC Subway Ridership in May 2024
For the month of May 2024, a total of 180,458,819 trips were taken on the New York City Subway (Heavy Rail). This figure represents the scale of ridership in one of the largest transit systems in the U.S. during that period.
NYC Subway Ridership Decline from April 2019 to April 2020
During the COVID-19 pandemic, NYC subway ridership fell dramatically between April 2019 and April 2020. The decline in total UPT was 261,525,797 trips, showcasing the severe impact of the pandemic on public transit systems.
Additional Transit Insights
Agency with the Least Total VRM: The Barnegat Bay Decoy & Baymen’s Museum had the least total VRM in this sample, reporting only 2,312 VRM. This suggests minimal operational activity for this agency.
Mode with the Least Total VRM: The Aerial Tramway mode had the least VRM among transit modes, with a total of 292,860 VRM, highlighting its smaller operational footprint compared to other modes.
Agency with the Least Total UPT: Enterprise Holdings, LLC had the fewest total Unlinked Passenger Trips, with only 858 trips in the dataset. This reflects the lower passenger activity of this particular agency.
To deepen the analysis of transit data, a new table was constructed from the USAGE dataset, focusing on the total Unlinked Passenger Trips (UPT) and Vehicle Revenue Miles (VRM) for 2022. This involved utilizing R functions like group_by, summarize, and filter, along with the year function to isolate data for 2022. The final table, named USAGE_2022_ANNUAL, included the key fields: NTD ID, Agency, Metro Area, Mode, UPT, and VRM. After summarizing the data, the table was ungrouped for further manipulation.
This new table was then merged with the FINANCIALS dataset using a left join, matched by NTD ID and Mode. The merge allowed for a comprehensive view of both operational usage and financial data, creating the USAGE_AND_FINANCIALS dataset, which was used to investigate farebox recovery and efficiency across transit systems.
Several key questions were addressed using this enriched dataset:
Most UPT in 2022: The New York City Subway (Heavy Rail) operated by the MTA had the highest UPT in 2022, with an impressive 1.79 billion trips. The system also reported 338 million VRM and collected over $2.32 billion in fares, making it the largest transit system by ridership.
Highest Farebox Recovery Ratio: The Transit Authority of Central Kentucky in the Elizabethtown–Radcliff, KY metro area achieved the highest farebox recovery ratio. Operating in the Vanpool mode, it reported 9,640 UPT and 94,027 VRM, collecting $97,300 in total fares. With operating expenses of $40,801, the agency achieved a farebox recovery ratio of 2.38, demonstrating highly efficient cost recovery despite its smaller operational scale.
Lowest Expenses per UPT: The North Carolina State University Motorbus system, based in Raleigh, NC, had the lowest expenses per UPT among motorbus services. With 2.31 million UPT and 531,555 VRM, it operated at low costs, reporting $2,727,412 in total expenses and zero fares, likely reflecting a fare-free model or heavy subsidy reliance. The motorbus system achieved an expense per UPT ratio of $1.18, highlighting its cost-effective operation despite not generating revenue from fares.
Highest Fares per UPT: The Altoona Metro Transit (Demand Response) in Altoona, PA, had the highest total fares per UPT, despite its very small scale. With just 26 trips, it managed to collect $17,058 in fares, indicating a significant fare collection per passenger trip.
Lowest Expenses per VRM: The New Mexico Department of Transportation’s Vanpool service in Las Cruces reported the lowest expenses per VRM, with 118,780 UPT and 1.75 million VRM, making it highly cost-efficient. It also generated $757,574 in total fares, showing a good balance between service provision and cost control.
Highest Fares per VRM: Chicago Water Taxi (Ferryboat) was the most efficient in terms of total fares per VRM, with 16,936 UPT and 600 VRM, generating $142,473 in fares. This highlights the high fare collection relative to the distance traveled, particularly for specialized modes like ferryboats.
Conclusion
In conclusion, identifying the most efficient transit system depends on how efficiency is defined. The Transit Authority in Elizabeth, NJ, stood out in terms of farebox recovery, while the New Mexico Vanpool and Raleigh Motorbus services excelled in cost efficiency per mile and per trip, respectively. Efficiency can be viewed through various lenses—fare recovery, ridership, or operational costs—each providing a different perspective on transit system performance.
Codes
Usage Table:
# A tibble: 6 × 8
`NTD ID` Agency metro_area Mode `3 Mode` month UPT VRM
<int> <chr> <chr> <chr> <chr> <date> <dbl> <dbl>
1 1 King County Seattle--Tacoma,… Dema… Bus 2002-01-01 135144 746158
2 1 King County Seattle--Tacoma,… Dema… Bus 2002-02-01 127378 656324
3 1 King County Seattle--Tacoma,… Dema… Bus 2002-03-01 136030 726578
4 1 King County Seattle--Tacoma,… Dema… Bus 2002-04-01 142204 736975
5 1 King County Seattle--Tacoma,… Dema… Bus 2002-05-01 144697 746158
6 1 King County Seattle--Tacoma,… Dema… Bus 2002-06-01 131833 696633
3.1) What transit agency had the most total VRM in this sample?
#3.1What transit agency had the most total VRM in this sample?USAGE %>%drop_na() %>%group_by(Agency) %>%summarize(total_VRM =sum(VRM, na.rm=TRUE)) %>%arrange(desc(total_VRM))%>%slice(1)
# A tibble: 1 × 2
Agency total_VRM
<chr> <dbl>
1 MTA New York City Transit 10832855350
3.2)What transit mode had the most total VRM in this sample?
#3.2What transit mode had the most total VRM in this sample?USAGE %>%drop_na() %>%group_by(Mode) %>%summarize(total_VRM =sum(VRM, na.rm=TRUE)) %>%arrange(desc(total_VRM))%>%slice(1)
3.3)How many trips were taken on the NYC Subway (Heavy Rail) in May 2024?
#3.3How many trips were taken on the NYC Subway (Heavy Rail) in May 2024?# Filter for Heavy Rail in NYC for May 2024NYC_Subway_trip_may_24 <- USAGE %>%mutate(month =format(ymd(month), "%Y-%m")) %>%filter(Mode =="Heavy Rail", Agency =="MTA New York City Transit", month =="2024-05") %>%summarize(total_trip =sum(UPT, na.rm =TRUE)) NYC_Subway_trip_may_24
# A tibble: 1 × 1
total_trip
<dbl>
1 180458819
3.5)How much did NYC subway ridership fall between April 2019 and April 2020?
#3.5How much did NYC subway ridership fall between April 2019 and April 2020?nyc_ridership_fall <- USAGE %>%mutate(month =format(ymd(month), "%Y-%m")) %>%filter(Agency =="MTA New York City Transit", month %in%c("2019-04", "2020-04")) %>%group_by(month) %>%summarize(total_UPT =sum(UPT, na.rm =TRUE)) %>%arrange(month)ridership_fall <- nyc_ridership_fall %>%summarize(fall = total_UPT[month =="2019-04"] - total_UPT[month =="2020-04"])ridership_fall
# A tibble: 1 × 1
fall
<dbl>
1 261525797
4.1) What transit agency had the least total VRM in this sample?
#4Find three more interesting transit facts in this data other than those above.#4.1 What transit agency had the least total VRM in this sample?USAGE %>%drop_na() %>%group_by(Agency) %>%summarize(total_VRM =sum(VRM, na.rm=TRUE)) %>%arrange(total_VRM)%>%slice(1)
# A tibble: 1 × 2
Agency total_VRM
<chr> <dbl>
1 Barnegat Bay Decoy & Baymen's Museum 2312
4.2)What transit mode had the least total VRM in this sample?
#4.2What transit mode had the least total VRM in this sample?USAGE %>%drop_na() %>%group_by(Mode) %>%summarize(total_VRM =sum(VRM, na.rm=TRUE)) %>%arrange(total_VRM)%>%slice(1)
4.3)What transit agency had the least total UPT in this sample?
#4.3What transit agency had the least total UPT in this sample?USAGE %>%drop_na() %>%group_by(Agency) %>%summarize(total_UPT =sum(UPT, na.rm=TRUE)) %>%arrange(total_UPT)%>%slice(1)
# A tibble: 1,307 × 9
`NTD ID` Agency metro_area Mode UPT VRM `Agency Name` `Total Fares`
<dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 1 King Cou… Seattle--… Dema… 6.63e5 1.29e7 King County … 740726
2 1 King Cou… Seattle--… Dema… 6.63e5 1.29e7 King County … 91601
3 1 King Cou… Seattle--… Ferr… 4.00e5 5.12e4 King County … 1715265
4 1 King Cou… Seattle--… Moto… 5.40e7 6.16e7 King County … 56566150
5 1 King Cou… Seattle--… Moto… 5.40e7 6.16e7 King County … 280187
6 1 King Cou… Seattle--… Stre… 1.12e6 1.80e5 King County … 588495
7 1 King Cou… Seattle--… Trol… 9.58e6 2.64e6 King County … 10123486
8 1 King Cou… Seattle--… Vanp… 7.03e5 4.41e6 King County … 5484481
9 2 Spokane … Spokane, … Dema… 3.10e5 4.04e6 Spokane Tran… 457699
10 2 Spokane … Spokane, … Dema… 3.10e5 4.04e6 Spokane Tran… 73585
# ℹ 1,297 more rows
# ℹ 1 more variable: Expenses <dbl>
6.1)Which transit system (agency and mode) had the most UPT in 2022?
#6.1Which transit system (agency and mode) had the most UPT in 2022?most_UPT_2022 <- USAGE_AND_FINANCIALS %>%arrange(desc(UPT)) %>%slice_max(UPT, n =1) most_UPT_2022
# A tibble: 1 × 9
`NTD ID` Agency metro_area Mode UPT VRM `Agency Name` `Total Fares`
<dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 20008 MTA New Y… New York-… Heav… 1.79e9 3.38e8 MTA New York… 2326782567
# ℹ 1 more variable: Expenses <dbl>
6.2)Which transit system (agency and mode) had the highest farebox recovery, defined as the highest ratio of Total Fares to Expenses?
#6.2Which transit system (agency and mode) had the highest farebox recovery, defined as the highest ratio of Total Fares to Expenses?highest_farebox_recovery <- USAGE_AND_FINANCIALS %>%mutate(farebox_recovery =`Total Fares`/ Expenses) %>%arrange(desc(farebox_recovery)) %>%slice_max(farebox_recovery, n =1)highest_farebox_recovery
6.3)Which transit system (agency and mode) has the lowest expenses per UPT?
#6.3Which transit system (agency and mode) has the lowest expenses per UPT?lowest_expenses_per_UPT <- USAGE_AND_FINANCIALS %>%mutate(expenses_per_UPT = Expenses / UPT) %>%# arrange(expenses_per_UPT) %>%slice_min(expenses_per_UPT, n =1) lowest_expenses_per_UPT
6.4)Which transit system (agency and mode) has the highest total fares per UPT?
#6.4Which transit system (agency and mode) has the highest total fares per UPT?highest_fares_per_UPT <- USAGE_AND_FINANCIALS %>%mutate(fares_per_UPT =`Total Fares`/ UPT) %>%arrange(desc(fares_per_UPT)) %>%slice_max(fares_per_UPT, n =1) highest_fares_per_UPT
6.5)Which transit system (agency and mode) has the lowest expenses per VRM?
#6.5Which transit system (agency and mode) has the lowest expenses per VRM?lowest_expense_per_VRM <- USAGE_AND_FINANCIALS %>%mutate(expense_per_VRM = Expenses / VRM) %>%arrange(expense_per_VRM) %>%slice_min(expense_per_VRM, n =1) lowest_expense_per_VRM
# A tibble: 1 × 10
`NTD ID` Agency metro_area Mode UPT VRM `Agency Name` `Total Fares`
<dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 66339 New Mexic… Las Cruce… Vanp… 118780 1.75e6 New Mexico D… 757574
# ℹ 2 more variables: Expenses <dbl>, expense_per_VRM <dbl>
6.6)Which transit system (agency and mode) has the highest total fares per VRM?
#6.6Which transit system (agency and mode) has the highest total fares per VRM?highest_fares_per_VRM <- USAGE_AND_FINANCIALS %>%mutate(fares_per_VRM =`Total Fares`/ VRM) %>%arrange(desc(fares_per_VRM)) %>%slice_max(fares_per_VRM, n =1) highest_fares_per_VRM