New York City is an epicenter of the global novel coronavirus pandemic. Through April 16, there were 1,458 confirmed cases per 100,000 residents in New York City. Always in the media eye, and larger than any other American city, New York City has become the symbol of the crisis, even as suburban counties nearby suffer higher rates of infection.
In a paper dated April 13, 2020, Jeffrey E. Harris of M.I.T. claims that “New York City’s multitentacled subway system was a major disseminator – if not the principal transmission vehicle – of coronavirus infection during the initial takeoff of the massive epidemic.” Oddly, he does not go on to offer evidence in support of this claim in his paper.
Conversely, as I will show, data show that local infections were negatively correlated with subway use, even when controlling for demographic data. Although this correlation study does not establish causation, it more reliably characterizes the spread of the virus than the intuitions and visual inspections that Harris relies on.
Data
In an ongoing crisis with a shortage of tests, all infection and mortality data come with a major asterisk: we do not fully know the extent of the data. Only when all-cause mortality data and more-extensive testing data are available can any conclusions be confirmed. This study, like Harris’ and others, is subject to potentially massive measurement error.
Data from the American Community Survey (2018 5-year averages) show that commuting modes vary extensively across New York City. New York is broken into Community Districts (CDs), which generally correspond (on either a one-to-one or two-to-one basis) with Census Public Use Microdata Areas (PUMAs). These 55 areas contain between 110,000 and 241,000 people each. The most car-dependent PUMA (Staten Island CD3) has a car-commute share of 75%; the least car-dependent PUMA is Manhattan CD 1 & 2 with just 4% commuting by car. Generally, subway and automobile commuting are mirror images – the correlation is -.88 – even though a substantial share of New Yorkers use non-subway transit, walking, or biking to get to work. The subway commute share varies from 2.5% (Staten Island CD3 again) to 72% (Manhattan CD10 – Central Harlem). Other transit, mostly buses, is positively correlated with automobile share; both reflect the absence of subway access. However, other transit use never exceeds 29%.
In addition to commuting data, I report some ACS demographic data by PUMA.
New York City began publishing Zip code-level data on coronavirus tests and infections on April 1. These data reflect positive tests that result from virus exposures that took place up to two weeks prior. Ideally, one would prefer to use data from mid- to late-March to identify geographic patterns underlying the early spread. Citywide, 18,035 cases are reported from tests administered March 1 – March 20; we might think of these as the wave of cases contracted mainly before the city substantially shut down over the weekend of March 14. Tests administered March 21 to 30 added another 40,230 cases; these cases may have been contracted during the shutdown. Thus, it is possible that a majority of even the earliest available detailed-geography data are from post-shutdown infections.
As of April 1, the city could identify a clear coronavirus hotspot centered on Corona, Queens (because apparently the Grim Reaper has a cruel sense of humor). But by then the virus was everywhere.
I use a geographic correspondence file to ascribe Zip code level infection data to PUMAs. The borders are not generally coterminous, but many Zip codes are contained entirely in a single PUMA.
Correlations
Table 1 shows that the April 1 case rate was positively correlated with automobile and, to a lesser degree, non-subway transit commute shares. Measures of affluence and access to healthcare are negatively correlated with the case rate. Asian share is uncorrelated with case rate.
The correlations strengthen over time: affluence is a very strong, negative predictor of (log) case growth during April, partially because affluent people have fled the city in large numbers.
Of course, many of these variables are correlated among themselves; income and bachelors share are almost perfectly aligned, while subway and automobile shares are photo negatives. Thus, I use ordinary least squares to measure the controlled correlations (dropping some variables to avoid collinearity).
For reference, the average case rate per thousand is 4.6; the range is 2.9 to 8.2. A coefficient X can be easily interpreted: a ten percentage point difference in the independent variable is associated with an X/10 increase in the dependent variable. Thus, a PUMA with a 10 percentage point higher automobile commuting share is expected to have 0.32 more cases per thousand.
Put another way, a standard deviation increase in automobile commuting share (17 percentage points) accounts for almost half a standard deviation of the case rate.
The relationship between automobile share and COVID-19 case rate is the only significant one. It persists despite the outliers, not because of them, as Figure 2 shows.
Finally, in Figure 3, we see a very strong association between car commuting and the growth in case count after April 1. The three Staten Island PUMAs (in orange) occupy the upper right-hand corner of the graph. Although their infection rates were only a bit above average on April 1, their case counts grew fast. Regression 2 confirms the effect, and shows much more explanatory power from the same controls.
Robustness
To check the robustness of these controlled correlations, I ran Regression 1 five times, each time dropping one borough. When the Bronx or Queens is omitted does the coefficient for automobile commute share become insignificant at the 10 percent level, although the coefficient remains similar. The results do not, as one might reasonably suspect, rely on the uniqueness of Manhattan.
In Regression 3, below, I include transit shares of commuting instead of automobile shares. Both subway and other transit commute share are negatively associated with Apr. 1 case rate. (If I include both automobile share and subway share in the same regression, one of them becomes insignificant and small due to the collinearity between them).
Reviewing Harris (2020)
This study has used very different methods than Harris (2020) to analyze the same phenomenon, but come to the opposite conclusion. This section reviews Harris’ methods.
Harris first introduces a figure showing that subway use declined precipitously beginning around Wednesday, March 11th. New reported cases finally leveled off around March 16th, as subway use was cratering. As Harris notes, however, this is likely endogenous. Figure 3 below is a reproduction of Harris’ Figure 3, except instead of subway entries, the blue bars show meals eaten in restaurants (relative to a year prior) as measured by OpenTable. All sorts of activities declined in unison as the city became aware of the spreading disease.
Harris’ second piece of evidence is that subway ridership declined differentially during the crisis: least in Staten Island and the Bronx; most in Manhattan. Manhattan also slowed its COVID-19 growth rate most drastically. Harris claims that this is consistent with (though not proof of) subways as the primary vector of transmission.
However, if subways (or ferries) are the primary vector, why is Staten Island, with a 67 percent automobile commute share, just as susceptible to COVID-19 case growth as the rest of the city? The change in transit usage is plausibly consistent with Harris’ hypothesis; the level of transit usage is inconsistent with it.
Next, Harris shows us a map which suggests– visually – that the Q46 bus, which terminates at Long Island Jewish Medical Center, has spread coronavirus along Union Boulevard in Queens. Harris, to his credit, does not mention this: in a city so dense with bus routes and subway tracks, almost any spatially-correlated pattern will match some transit corridor. Harris does, however, insinuate that the Flushing Local might be a culprit, but only makes the suggestion via a narrative. He never comes out and says it.
Harris argues, perhaps reasonably, that subway lines (not stops) are the correct unit of analysis. But he does not use this analytical tool.
As the culmination of his argument, Harris presents a map of New York, with some of its subways lines shown, which suggests an obvious and immediate visual conclusion: COVID-19 infection rates, as of April 12, are highest in the least-dense, most automobile-dependent, peripheral parts of New York City. I reproduce his powerful image below.
Refuting Harris is quite difficult, since he makes few clear claims and develops no argument, either verbal or quantitative. Instead, each piece of data is caveated:
- “Simple comparison of the two trends in Figure 1 cannot by itself answer questions of causation.” (p. 4)
- “[It] would be inappropriate to draw firm conclusions from what would amount to a Manhattan-versus-the-rest study.” (p. 7)
- “[We’re] already at a juncture where some readers may react with extreme skepticism.” (p. 12)
- “An overall assessment of these research efforts would surely lead a scientific reviewer to conclude that cause-and-effect is difficult to prove.” (p. 16)
In fact, the only clear claim in Harris’ paper is the title: “The Subways Seeded the Massive Coronavirus Epidemic in New York City.” The data analysis presented in this study provides far more evidence against that title than Harris musters in its favor.
View of the World from 9th Avenue
Looking outside the boundaries of the five boroughs, New York’s experience does not appear to be anomalous. The five large suburban counties in New York State all report higher case rates than New York City (as of April 16), although their COVID-19 death rates are lower. Suburban counties in New Jersey report comparable case rates to New York City.
Globally, transit-dependent cities have not been hit particularly hard. Asian cities with extremely high rates of transit use, such as Hong Kong and Seoul, are among the safest places in the world at the moment. European transit hubs like London and Paris have fared less well, though they are nowhere near as hard-hit as New York. Alon Levy has shown that in Germany, transit-dependent cities do not appear to have systematically higher infection rates.
Policies, and perhaps culture, appear to have a large impact on infection rate. To the small extent that transportation options matter, automobiles appear to be more dangerous disease vectors than subways.
Discussion
One thorny issue remains: how could automobiles spread a virus? They carry at most a few passengers, who are often members of the same household anyway. Strangers’ hands don’t touch your steering wheel as they touch the straps and bars in a subway car. Like many people, I have avoided public transit since early March, but driven regularly.
There are two reasonable explanations for the likely fact that coronavirus spreads more along roads than rails. First, subway-dependent people may have cut their travel more than car-dependent people. Since travel brings us in contact with others at our destinations (stores, jobs, restaurants), the excess drop in travel may have made subway people safer precisely because the subway seems so dangerous.
Second, and less obviously, subway-dependent people likely have more geographically-determined circles of contact. Car owners can move freely well beyond their immediate neighborhood. In the language of networks, non-car owners are more likely to approximate “neighbor flooding”; car owners to approximate “uniform gossip” (hat tip to Wesley Chow for this conceptual framework). That is, if a grocery store in a low-car-ownership neighborhood becomes an infectious spot, it is likely to infect a bunch of people who will all “reinfect” each other at the drug store and the park. In a car-oriented context, by contrast, infected grocery customers would drive off to different pharmacies and parks and infect other people.
Taken together, the global trends, suburb versus city infection rates, and neighborhood trends within New York suggest that transit-dependent cities are easier to protect from viral infections even when the transit system remains open. How to re-open the city safely remains a vital question, and strong, sensible safety measures, such as mask requirements and constant station cleaning, should be the default.
This study suggests that far more attention should be paid to the dangers of spreading coronavirus by car. In New York City, immediately increasing the tolls on the city’s bridges and tunnels would discourage people from coming in and out of the city, spreading the virus as they go.
In suburban locales fighting severe outbreaks, limited-access highways ought to be closed to most drivers. High travel speeds on empty highways allows drivers to rapidly spread disease to previously unaffected areas. Keeping drivers on low-speed local roads discourages people from indulging their wanderlust and helps geographically contain outbreaks. However, like the subway, roads and driving are an essential aspect of maintaining the crucial infrastructure – health, food, utilities, information – that allows us the luxury of a long-term lockdown. And as the economy reopens, car commuters will need to be return to their usual routes. Drivers need to understand that they pose a risk of rapid geographic spread, and thus need to take extra precautions in interactions outside their own neighborhoods.