The modern ETI literature essentially starts with Feldstein (1995). Feldstein exploits the Tax Reform Act of 1986 (TRA86), which drastically reduced taxes on high earners but had smaller impacts on lower earners. Ultimately, this boils down to a difference-in-differences in a panel setting, where the first difference is pre-post, and the second difference is between high- and low-earners (who faced different tax changes). Feldstein estimated an incredibly large ETI---greater than 3 in some specifications, suggesting that distortions are huge and, in fact, we are on the "wrong" side of the Laffer Curve.
Since Feldstein, the ETI literature (or, the literature which I am familiar with, at least) has focused on "fixing" Feldstein. First, they trying to control for the fact that counterfactual trends are not parallel between high- and low-earners (both because of mean reversion and increasing income inequality, which push in opposite directions). Second, they explore the longer-run implications, and focus on broader income measures which are less apt to transitory manipulation. The benchmark estimates of this form come from Gruber and Saez (2002). They find an ETI of 0.4, which falls to an insignificant 0.1 when they look at broad income. From this, Gruber and Saez conclude that we can increase taxes on the wealthy with only modest impacts on efficiency.
Along comes Raj Chetty, however, to show that this entire exercise is foolish. His point is simple: Gruber and Saez are relying on individuals to actively respond, over the medium-term, to relatively small changes in the net-of-tax rate. For various reasons, we might not expect this to be the case. First, individuals might not be aware of the tax change. Second, individuals might not be able to costlessly adjust along the intensive margin. Furthermore, after putting some structure on the problem, Chetty can place upper and lower bounds on the "true" elasticity based on the observed elasticity by assuming simply that agents always locate themselves such that their lifetime utility is within 1% of the (frictionless) optimum. And the "true" elasticity --- the elasticity that governs long-run behavior --- is what really matters.
The intuition is best given by the example on page 984 of Chetty. The solid black line represents the observed response. The dotted line shows the hypothesized "true" demand curve (the slope of which is the elasticity, since everything is in logs). The top panel shows the steepest possible demand associated with that observed response, and the bottom panel shows the most shallow possible demand.
Based on this framework, Chetty bounds the "true" elasticity given by Gruber and Saez's observed estimates as between 0.00 and 4.42 (see page 1000). Oomph.
Below, I put forward a simple model that shows an example of the types of optimization frictions that Chetty describes.
My beef with the ETI literature
Conceptually, we are interested in the following question: what would happen to taxable income in the long-run if the tax rate were exogenously increased? Therefore, I argue that the ideal explanatory variable would use the perceived tax rate, tP_i. This is not available in the data, so Gruber and Saez use the actual marginal tax rate faced by the agent. They take a single difference, before and after a tax reform. In particular, their model is the following, basically (where the x_i is meant to deal with mean reversion, etc):
Δ ln(y_i) = α + β Δ ln(1-t_i) + δ x_i + u_iObviously, they have to instrument for Δ ln(1-t_i) to isolate the policy variation because the tax rate is endogenous; if you increase your earnings, you'll increase your tax rate because the tax schedule is progressive. To isolate the policy variation, they simply use Δ ln(1-t*_i), where t*_i is the tax rate that would apply in the post period, assuming no change in earnings from pre to post. For example, taxes went up for income above $450,000 starting in 2013 [I'm omitting some caveats here; let's pretend inflation is zero]. Let's suppose the pre-period is 2012 and the post-period is 2013. For someone who earns $500,000 in 2012, t*_i is 35% in 2012 and 39.6% [again, omitting caveats] in 2013, regardless of what he actually earns in 2013. For someone who earns $400,000 in 2012, t*_i is 35% in both years, regardless of what she actually earns in 2013.
The problem is the following: it assumes that a $450,100 earner in 2013 acts as if he faces a discontinuously higher tax rate in 2013 than a $449,900 earner in 2013. This is unrealistic for two reasons. First, income fluctuates randomly. Second, people don't know their own taxable income; while they might know their gross income (e.g., their official yearly salary), taxable income is far less salient, since it subtracts deductions and exemptions.
Put another way: the $449,900 earner has been "treated" by the tax hike, because he will consider there to be some probability that his realization of 2013 taxable income will be above $450,000, both because of true fluctuations and because of non-salience of what "taxable income" really is. Critically, this probability is likely to be similar for the $449,900 earner and the $450,100 earner.
In the mechanics of the regression, this means that t_i is discretely different for the $449,900 and $450,100 earners. But tP_i is basically the same for these two agents. If we were using tP_i instead of t_i in the regression, then we would estimate a larger ETI: intuitively, we would estimate a smaller first-stage coefficient and an identical reduced form coefficient.
My simple model
So, here's my model. Suppose that individuals exert effort μ (measured in dolalrs), and the realization of pre-tax income is Z=μ+σX, where X is a standard normal random variable (which is realized after μ is chosen). Furthermore, let's specify (static) utility as θ c - (1+1/γ)-1μ 1+1/γ , where consumption c = Z - T(Z) (and T(.) is the tax schedule). I'm considering θ to be heterogeneous across individuals, while γ is shared across the population.
(Note that this utility function abstracts away from income effects in the choice of μ, which I argue is reasonable given the Gruber and Saez estimate an income effect near zero. I recognize the irony of using one Gruber and Saez result to attack another.)
In the case where σ=0, the optimal choice of μ (which equals Z because I have eliminated all uncertainty in this special case) is given simply by (θ(1-t))γ, where t is the marginal tax rate. Furthermore, it is straightforward to show that the ETI is equal to γ.
In the case where σ>0, we need to consider an expected utility maximization problem. The linearity in utility over consumption means that this problem is trivial when the tax function is linear (i.e., if we have a flat tax). But when the tax function is non-linear (e.g., progressive), the optimization is non-trivial, though straightforward.
In a simple case with two tax rates, the optimization problem turns out to have an elegant solution. The optimal choice of effort μ is the same as the σ=0 case, except that agents are acting as if they faced a convex combination of each tax rate, where the weight on each tax rate is the ex-ante probability that they will end up facing the tax rate in question, which is a function only of θ.
How does this relate to Gruber and Saez? Consider two individuals that straddle $450,000 in 2013, and assume that counterfactual trends are, in fact, parallel (i.e., abstract away from mean reversion and divergence in the income distribution). The Gruber and Saez model assumes that the guy on the low side acted in 2013 as if he is facing a 35% tax rate, while the guy on the high side will have acted in 2013 as if he is facing the 39.6% tax rate. If these individuals are instead behaving according to my model, they'll both tend to have similar values of θ, so they'll tend to be acting in 2013 as if they're facing approximately the same tax rate as each other.
To explore how this might matter quantitatively, I ran the Gruber and Saez empirical model on a simulated dataset that behaved according to my model. (One empirically import caveat: to prevent mean reversion, I assume that X_i,post follows a normal distribution with variance one and mean equal to the realization of X_i,pre, and agents know this.) The inputs to this simulation are as follows:
- I let θ be distributed normally in the population, such that the mean value of θ would lead to an optimal choice of effort of $450,000 in the σ=0 case, and I let θ have a standard deviation equal to one fourth its mean.
- I let σ equal $70,000.
- I let γ equal 1.
What is my conclusion from this? My model doesn't explain anywhere close to the entire Chetty bound --- my model (with its wholly arbitrary inputs) says that the observed elasticity could be something like 60% of the truth, while the Gruber/Saez estimate is about 3% of Chetty's upper bound. But this simple model provides a relatively tractable example of the sorts of factors pushing Gruber/Saez-type estimates to be too close to zero.