Tuesday, April 12, 2016

Is the new Health Inequality paper in JAMA driven by the exclusion of zero earners? Probably not.

new paper in JAMA by Chetty, et al., is getting a lot of press, and rightly so. Using tax and Social Security data, the authors are able to calculate life expectancy, separately by income level and geography. One particularly interesting set of results is the geographic distribution of life expectancy of low earners.

One shortcoming, however, is that the authors cannot calculate life expectancy for zero earners. The Social Security records (which they use to calculate mortality rates) do not reliably report the deaths of zero earners, since the Social Security records cover only deaths of U.S. residents, and many zero earners are not U.S. residents (e.g., they earned income in the U.S. once, giving them a Social Security Number or Tax Identification Number, but they no longer live --- or never lived --- in the U.S.). To address this problem, the authors simply drop all zero earners.

The concern, then, is that the composition of the group of "low earners" might vary systematically with local policies or labor market conditions that encourage marginally attached individuals to earn positive income. And, these differences in composition could be driving geographical differences in the life expectancy of low earners. To take a specific example, suppose that a geographic area has a particularly effective job training policy to help displaced workers find re-employment (perhaps not the most empirically relevant example, but it works for didactic purposes). Then, the set of individuals in the lower quartile of the positive earnings distribution might have worse unobservables, and thus worse life expectancy. This source of geographic variation in life expectancy for the working poor would be far less interesting than the geographic variation due to factors orthogonal to composition (e.g., health policy and health habits), which the authors are intending to uncover.

To explore this in a quick-and-dirty way, I used the 2005-2014 American Community Survey to examine the geographic correlation between (1) the fraction of adults (40 to 61) with positive family income and (2) life expectancy of the lowest quartile in a given commuting zone (using the data that the authors provide at healthinequality.org). If the authors' results were driven by this composition bias, we might expect this correlation to be negative: more workers means the bottom end of the distribution of workers might have worse unobservables and thus worse life expectancy.

To be a little bit more precise: I constructed a crosswalk from (1) the public use microdata area (PUMA) available in the ACS public use files from IPUMS to (2) commuting zones, with some help from the county-to-commuting-zone crosswalk that the authors provide on healthinequality.org, as well as the PUMA-to-county crosswalks constructed using the tools at the Missouri Census Data Center. This allows me to assign individuals from the 2005-2014 ACS to a given commuting zone (probabilistically in some cases when PUMAs don't map cleanly into commuting zones).

Then, I regress a dummy for having positive family earned income in the last 12 months (family being a subset of household) on (1) race/ethnicity dummies (black, asian, hispanic), (2) year dummies, and (3) a set of commuting zone fixed effects. I save the commuting zone fixed effects and consider that my variable of interest.

Then, I regress the 1st-quartile, race-adjusted life expectancy (averaged for men and women) on those commuting zone fixed effects. I get a positive coefficient, significant at the 1% level (though, a more thorough analysis would cluster the standard errors in some way --- it is not immediately obvious the best way to do this, since commuting zones don't map cleanly into states). One simple story for this positive correlation is that places with strong labor demand (so a higher proportion of workers) have better outcomes, and this correlation is large enough to overcome any negative composition effects.

In any case, this is the opposite of the sign that one would expect if the JAMA result were driven entirely by composition effects. This leads me to conclude that the exclusion of zero earners is NOT driving the results, and serves as evidence that the geographic variation described in the JAMA paper is a true effect.


No comments:

Post a Comment