Transit System Analysis Report

This project focuses on analyzing farebox recovery, the fraction of transit revenue generated from fares, using data from the National Transit Database (NTD). The analysis examines key metrics, including farebox revenues, operating expenses, ridership (total number of trips), and vehicle miles traveled (VMT). Given the disruptions caused by the COVID-19 pandemic, the data reflects some post-pandemic irregularities. For consistency, we utilize 2022 data across all categories, recognizing the lag in reporting. The goal is not to make long-term forecasts but to gain insights into the performance and financial sustainability of various transit agencies.

Total Vehicle Revenue Miles (VRM) by Transit Agency

In this sample, the transit agency with the most total Vehicle Revenue Miles (VRM) was MTA New York City Transit, which reported a total of 10,832,855,350 VRM. This highlights the immense scale of operations for New York City’s transit system, reflecting its critical role in serving one of the largest metropolitan areas in the United States.

Total Vehicle Revenue Miles (VRM) by Transit Mode

When analyzing the data by transit mode, the Motorbus mode had the most significant operational miles, with a total of 49,444,494,088 VRM. This indicates that Motorbuses are the most utilized form of transportation in terms of distance covered.

NYC Subway Ridership in May 2024

For the month of May 2024, a total of 180,458,819 trips were taken on the New York City Subway (Heavy Rail). This figure represents the scale of ridership in one of the largest transit systems in the U.S. during that period.

NYC Subway Ridership Decline from April 2019 to April 2020

During the COVID-19 pandemic, NYC subway ridership fell dramatically between April 2019 and April 2020. The decline in total UPT was 261,525,797 trips, showcasing the severe impact of the pandemic on public transit systems.

Additional Transit Insights

  • Agency with the Least Total VRM: The Barnegat Bay Decoy & Baymen’s Museum had the least total VRM in this sample, reporting only 2,312 VRM. This suggests minimal operational activity for this agency.

  • Mode with the Least Total VRM: The Aerial Tramway mode had the least VRM among transit modes, with a total of 292,860 VRM, highlighting its smaller operational footprint compared to other modes.

  • Agency with the Least Total UPT: Enterprise Holdings, LLC had the fewest total Unlinked Passenger Trips, with only 858 trips in the dataset. This reflects the lower passenger activity of this particular agency.

To deepen the analysis of transit data, a new table was constructed from the USAGE dataset, focusing on the total Unlinked Passenger Trips (UPT) and Vehicle Revenue Miles (VRM) for 2022. This involved utilizing R functions like group_by, summarize, and filter, along with the year function to isolate data for 2022. The final table, named USAGE_2022_ANNUAL, included the key fields: NTD ID, Agency, Metro Area, Mode, UPT, and VRM. After summarizing the data, the table was ungrouped for further manipulation.

This new table was then merged with the FINANCIALS dataset using a left join, matched by NTD ID and Mode. The merge allowed for a comprehensive view of both operational usage and financial data, creating the USAGE_AND_FINANCIALS dataset, which was used to investigate farebox recovery and efficiency across transit systems.

Several key questions were addressed using this enriched dataset:

  1. Most UPT in 2022: The New York City Subway (Heavy Rail) operated by the MTA had the highest UPT in 2022, with an impressive 1.79 billion trips. The system also reported 338 million VRM and collected over $2.32 billion in fares, making it the largest transit system by ridership.

  2. Highest Farebox Recovery Ratio: The Transit Authority of Central Kentucky in the Elizabethtown–Radcliff, KY metro area achieved the highest farebox recovery ratio. Operating in the Vanpool mode, it reported 9,640 UPT and 94,027 VRM, collecting $97,300 in total fares. With operating expenses of $40,801, the agency achieved a farebox recovery ratio of 2.38, demonstrating highly efficient cost recovery despite its smaller operational scale.

  3. Lowest Expenses per UPT: The North Carolina State University Motorbus system, based in Raleigh, NC, had the lowest expenses per UPT among motorbus services. With 2.31 million UPT and 531,555 VRM, it operated at low costs, reporting $2,727,412 in total expenses and zero fares, likely reflecting a fare-free model or heavy subsidy reliance. The motorbus system achieved an expense per UPT ratio of $1.18, highlighting its cost-effective operation despite not generating revenue from fares.

  4. Highest Fares per UPT: The Altoona Metro Transit (Demand Response) in Altoona, PA, had the highest total fares per UPT, despite its very small scale. With just 26 trips, it managed to collect $17,058 in fares, indicating a significant fare collection per passenger trip.

  5. Lowest Expenses per VRM: The New Mexico Department of Transportation’s Vanpool service in Las Cruces reported the lowest expenses per VRM, with 118,780 UPT and 1.75 million VRM, making it highly cost-efficient. It also generated $757,574 in total fares, showing a good balance between service provision and cost control.

  6. Highest Fares per VRM: Chicago Water Taxi (Ferryboat) was the most efficient in terms of total fares per VRM, with 16,936 UPT and 600 VRM, generating $142,473 in fares. This highlights the high fare collection relative to the distance traveled, particularly for specialized modes like ferryboats.

Conclusion

In conclusion, identifying the most efficient transit system depends on how efficiency is defined. The Transit Authority in Elizabeth, NJ, stood out in terms of farebox recovery, while the New Mexico Vanpool and Raleigh Motorbus services excelled in cost efficiency per mile and per trip, respectively. Efficiency can be viewed through various lenses—fare recovery, ridership, or operational costs—each providing a different perspective on transit system performance.

Codes

Usage Table:
# A tibble: 6 × 8
  `NTD ID` Agency      metro_area        Mode  `3 Mode` month         UPT    VRM
     <int> <chr>       <chr>             <chr> <chr>    <date>      <dbl>  <dbl>
1        1 King County Seattle--Tacoma,… Dema… Bus      2002-01-01 135144 746158
2        1 King County Seattle--Tacoma,… Dema… Bus      2002-02-01 127378 656324
3        1 King County Seattle--Tacoma,… Dema… Bus      2002-03-01 136030 726578
4        1 King County Seattle--Tacoma,… Dema… Bus      2002-04-01 142204 736975
5        1 King County Seattle--Tacoma,… Dema… Bus      2002-05-01 144697 746158
6        1 King County Seattle--Tacoma,… Dema… Bus      2002-06-01 131833 696633

3.1) What transit agency had the most total VRM in this sample?

#3.1What transit agency had the most total VRM in this sample?
USAGE %>%
  drop_na() %>%
  group_by(Agency) %>%
  summarize(total_VRM = sum(VRM, na.rm=TRUE)) %>%
  arrange(desc(total_VRM))%>%
  slice(1)
# A tibble: 1 × 2
  Agency                      total_VRM
  <chr>                           <dbl>
1 MTA New York City Transit 10832855350

3.2)What transit mode had the most total VRM in this sample?

#3.2What transit mode had the most total VRM in this sample?
USAGE %>%
  drop_na() %>%
  group_by(Mode) %>%
  summarize(total_VRM = sum(VRM, na.rm=TRUE)) %>%
  arrange(desc(total_VRM))%>%
  slice(1)
# A tibble: 1 × 2
  Mode       total_VRM
  <chr>          <dbl>
1 Motorbus 49444494088

3.3)How many trips were taken on the NYC Subway (Heavy Rail) in May 2024?

#3.3How many trips were taken on the NYC Subway (Heavy Rail) in May 2024?

# Filter for Heavy Rail in NYC for May 2024
NYC_Subway_trip_may_24 <- USAGE %>%
  mutate(month = format(ymd(month), "%Y-%m")) %>% 
  filter(Mode == "Heavy Rail", 
         Agency == "MTA New York City Transit",  
         month == "2024-05") %>%  
  summarize(total_trip = sum(UPT, na.rm = TRUE)) 

NYC_Subway_trip_may_24
# A tibble: 1 × 1
  total_trip
       <dbl>
1  180458819

3.5)How much did NYC subway ridership fall between April 2019 and April 2020?

#3.5How much did NYC subway ridership fall between April 2019 and April 2020?
nyc_ridership_fall <- USAGE %>%
  mutate(month = format(ymd(month), "%Y-%m")) %>%  
  filter(Agency == "MTA New York City Transit", 
         month %in% c("2019-04", "2020-04")) %>%  
  group_by(month) %>%
  summarize(total_UPT = sum(UPT, na.rm = TRUE)) %>%
  arrange(month)

ridership_fall <- nyc_ridership_fall %>%
  summarize(fall = total_UPT[month == "2019-04"] - total_UPT[month == "2020-04"])

ridership_fall
# A tibble: 1 × 1
       fall
      <dbl>
1 261525797

4.1) What transit agency had the least total VRM in this sample?

#4Find three more interesting transit facts in this data other than those above.
#4.1 What transit agency had the least total VRM in this sample?
USAGE %>%
  drop_na() %>%
  group_by(Agency) %>%
  summarize(total_VRM = sum(VRM, na.rm=TRUE)) %>%
  arrange(total_VRM)%>%
  slice(1)
# A tibble: 1 × 2
  Agency                               total_VRM
  <chr>                                    <dbl>
1 Barnegat Bay Decoy & Baymen's Museum      2312

4.2)What transit mode had the least total VRM in this sample?

#4.2What transit mode had the least total VRM in this sample?
USAGE %>%
  drop_na() %>%
  group_by(Mode) %>%
  summarize(total_VRM = sum(VRM, na.rm=TRUE)) %>%
  arrange(total_VRM)%>%
  slice(1)
# A tibble: 1 × 2
  Mode           total_VRM
  <chr>              <dbl>
1 Aerial Tramway    292860

4.3)What transit agency had the least total UPT in this sample?

#4.3What transit agency had the least total UPT in this sample?
USAGE %>%
  drop_na() %>%
  group_by(Agency) %>%
  summarize(total_UPT = sum(UPT, na.rm=TRUE)) %>%
  arrange(total_UPT)%>%
  slice(1)
# A tibble: 1 × 2
  Agency                   total_UPT
  <chr>                        <dbl>
1 Enterprise Holdings, LLC       858

5)Usage 2022 Annual:

#5
USAGE_2022_ANNUAL <- USAGE %>%
  filter(year(month) == 2022) %>%            
  group_by(`NTD ID`, Agency, metro_area, Mode) %>%
  summarize(UPT = sum(UPT, na.rm = TRUE),   
            VRM = sum(VRM, na.rm = TRUE), .groups = 'drop') %>%  
  ungroup()  

# View the result
USAGE_2022_ANNUAL
# A tibble: 1,141 × 6
   `NTD ID` Agency                                metro_area Mode     UPT    VRM
      <int> <chr>                                 <chr>      <chr>  <dbl>  <dbl>
 1        1 King County                           Seattle--… Dema… 6.63e5 1.29e7
 2        1 King County                           Seattle--… Ferr… 4.00e5 5.12e4
 3        1 King County                           Seattle--… Moto… 5.40e7 6.16e7
 4        1 King County                           Seattle--… Stre… 1.12e6 1.80e5
 5        1 King County                           Seattle--… Trol… 9.58e6 2.64e6
 6        1 King County                           Seattle--… Vanp… 7.03e5 4.41e6
 7        2 Spokane Transit Authority             Spokane, … Dema… 3.10e5 4.04e6
 8        2 Spokane Transit Authority             Spokane, … Moto… 6.60e6 6.49e6
 9        2 Spokane Transit Authority             Spokane, … Vanp… 9.06e4 9.06e5
10        3 Pierce County Transportation Benefit… Seattle--… Dema… 2.15e5 3.44e6
# ℹ 1,131 more rows

Usage and Financials:

#merge table together
USAGE_AND_FINANCIALS <- left_join(USAGE_2022_ANNUAL, 
           FINANCIALS, 
           join_by(`NTD ID`, Mode)) |>
    drop_na()

USAGE_AND_FINANCIALS
# A tibble: 1,307 × 9
   `NTD ID` Agency    metro_area Mode     UPT    VRM `Agency Name` `Total Fares`
      <dbl> <chr>     <chr>      <chr>  <dbl>  <dbl> <chr>                 <dbl>
 1        1 King Cou… Seattle--… Dema… 6.63e5 1.29e7 King County …        740726
 2        1 King Cou… Seattle--… Dema… 6.63e5 1.29e7 King County …         91601
 3        1 King Cou… Seattle--… Ferr… 4.00e5 5.12e4 King County …       1715265
 4        1 King Cou… Seattle--… Moto… 5.40e7 6.16e7 King County …      56566150
 5        1 King Cou… Seattle--… Moto… 5.40e7 6.16e7 King County …        280187
 6        1 King Cou… Seattle--… Stre… 1.12e6 1.80e5 King County …        588495
 7        1 King Cou… Seattle--… Trol… 9.58e6 2.64e6 King County …      10123486
 8        1 King Cou… Seattle--… Vanp… 7.03e5 4.41e6 King County …       5484481
 9        2 Spokane … Spokane, … Dema… 3.10e5 4.04e6 Spokane Tran…        457699
10        2 Spokane … Spokane, … Dema… 3.10e5 4.04e6 Spokane Tran…         73585
# ℹ 1,297 more rows
# ℹ 1 more variable: Expenses <dbl>

6.1)Which transit system (agency and mode) had the most UPT in 2022?

#6.1Which transit system (agency and mode) had the most UPT in 2022?
most_UPT_2022 <- USAGE_AND_FINANCIALS %>%
  arrange(desc(UPT)) %>%  
  slice_max(UPT, n = 1)   

most_UPT_2022
# A tibble: 1 × 9
  `NTD ID` Agency     metro_area Mode     UPT    VRM `Agency Name` `Total Fares`
     <dbl> <chr>      <chr>      <chr>  <dbl>  <dbl> <chr>                 <dbl>
1    20008 MTA New Y… New York-… Heav… 1.79e9 3.38e8 MTA New York…    2326782567
# ℹ 1 more variable: Expenses <dbl>

6.2)Which transit system (agency and mode) had the highest farebox recovery, defined as the highest ratio of Total Fares to Expenses?

#6.2Which transit system (agency and mode) had the highest farebox recovery, defined as the highest ratio of Total Fares to Expenses?

highest_farebox_recovery <- USAGE_AND_FINANCIALS %>%
  mutate(farebox_recovery = `Total Fares` / Expenses) %>% 
  arrange(desc(farebox_recovery)) %>%  
  slice_max(farebox_recovery, n = 1)

highest_farebox_recovery
# A tibble: 1 × 10
  `NTD ID` Agency       metro_area Mode    UPT   VRM `Agency Name` `Total Fares`
     <dbl> <chr>        <chr>      <chr> <dbl> <dbl> <chr>                 <dbl>
1    40191 Transit Aut… Elizabeth… Vanp…  9640 94027 Transit Auth…         97300
# ℹ 2 more variables: Expenses <dbl>, farebox_recovery <dbl>

6.3)Which transit system (agency and mode) has the lowest expenses per UPT?

#6.3Which transit system (agency and mode) has the lowest expenses per UPT?
lowest_expenses_per_UPT <- USAGE_AND_FINANCIALS %>%
  mutate(expenses_per_UPT = Expenses / UPT) %>%  # 
  arrange(expenses_per_UPT) %>%  
  slice_min(expenses_per_UPT, n = 1) 

lowest_expenses_per_UPT
# A tibble: 1 × 10
  `NTD ID` Agency     metro_area Mode     UPT    VRM `Agency Name` `Total Fares`
     <dbl> <chr>      <chr>      <chr>  <dbl>  <dbl> <chr>                 <dbl>
1    40147 North Car… Raleigh, … Moto… 2.31e6 531555 North Caroli…             0
# ℹ 2 more variables: Expenses <dbl>, expenses_per_UPT <dbl>

6.4)Which transit system (agency and mode) has the highest total fares per UPT?

#6.4Which transit system (agency and mode) has the highest total fares per UPT?
highest_fares_per_UPT <- USAGE_AND_FINANCIALS %>%
  mutate(fares_per_UPT = `Total Fares` / UPT)  %>%  
  arrange(desc(fares_per_UPT)) %>%  
  slice_max(fares_per_UPT, n = 1) 

highest_fares_per_UPT
# A tibble: 1 × 10
  `NTD ID` Agency       metro_area Mode    UPT   VRM `Agency Name` `Total Fares`
     <dbl> <chr>        <chr>      <chr> <dbl> <dbl> <chr>                 <dbl>
1    30011 Altoona Met… Altoona, … Dema…    26    75 Altoona Metr…         17058
# ℹ 2 more variables: Expenses <dbl>, fares_per_UPT <dbl>

6.5)Which transit system (agency and mode) has the lowest expenses per VRM?

#6.5Which transit system (agency and mode) has the lowest expenses per VRM?
lowest_expense_per_VRM <- USAGE_AND_FINANCIALS %>%
  mutate(expense_per_VRM = Expenses / VRM)  %>%  
  arrange(expense_per_VRM) %>%  
  slice_min(expense_per_VRM, n = 1) 

lowest_expense_per_VRM
# A tibble: 1 × 10
  `NTD ID` Agency     metro_area Mode     UPT    VRM `Agency Name` `Total Fares`
     <dbl> <chr>      <chr>      <chr>  <dbl>  <dbl> <chr>                 <dbl>
1    66339 New Mexic… Las Cruce… Vanp… 118780 1.75e6 New Mexico D…        757574
# ℹ 2 more variables: Expenses <dbl>, expense_per_VRM <dbl>

6.6)Which transit system (agency and mode) has the highest total fares per VRM?

#6.6Which transit system (agency and mode) has the highest total fares per VRM?

highest_fares_per_VRM <- USAGE_AND_FINANCIALS %>%
  mutate(fares_per_VRM = `Total Fares` / VRM)  %>%  
  arrange(desc(fares_per_VRM)) %>%  
  slice_max(fares_per_VRM, n = 1) 

highest_fares_per_VRM
# A tibble: 1 × 10
  `NTD ID` Agency       metro_area Mode    UPT   VRM `Agency Name` `Total Fares`
     <dbl> <chr>        <chr>      <chr> <dbl> <dbl> <chr>                 <dbl>
1    50521 Chicago Wat… Chicago, … Ferr… 16936   600 Chicago Wate…        142473
# ℹ 2 more variables: Expenses <dbl>, fares_per_VRM <dbl>