Tuesday, October 20, 2015

Are coin flips memoryless?

There's a working paper going around by Miller and Sanjurjo, cited in a New York Times article, that seems to be arguing the impossible: that, in a sequence of flips of a fair coin, the probability of flipping heads is smaller than 1/2 if the previous flip was heads.

The working paper argues that this is relevant to the "hot hand" debate. E.g., is a basketball player more likely to hit his next shot if he hit his previous shot? The seminal paper in this literature, Gilovich, Vallone, and Tversky (1985), found that the conditional probability of success given previous success was close to the unconditional probability of success, concluding that each shot was roughly independent. But if the laws of probability as we know them are wrong, and independence would somehow imply a decline in the conditional probability of success given previous success, then a finding of conditional probability equal to unconditional would actually be evidence in favor of the hot hand hypothesis.

This claim, for lack of a better word, appears to be wrong.

Edit: See my most recent entry for why I was misunderstanding Miller & Sanjurjo's claim with respect to the Gilovich, et al. study. Basically, I was looking at the wrong part of the Gilovich paper! My exposition of the Miller & Sanjurjo result is still valid, though.

The thought experiment that Miller and Sanjurjo are considering is as follows. Flip a fair coin S times, where S is relatively small (e.g., four). Calculate the empirical conditional probability of heads, given that the previous flip was heads. Call this empirical conditional probability Pt. Then, repeat this trial (i.e., all S flips) a large number N times. Compute the average of Pt across each trial. They show that this average of Pt is less than the unconditional probability of heads.

What's going on here? Basically, it's all about how they weight this average.

Suppose we were to weight each "eligible" flip equally, across all trials. (By "eligible," I mean that the previous flip was heads.) In other words, you define your denominator as the number of eligible flips across all N*S total flips, and you define your numerator as the number of heads that were realized in those eligible flips. That ratio will converge in probability to the conditional probability of heads as N*S goes to infinity, which remains equal to the unconditional probability because the flips are independent. Everything still works.

When we weight by trial, by contrast, we're not weighting each eligible flip equally. Suppose S = 4. Say that Trial A has four heads, and Trial B has one heads (and suppose this heads was one of the first three flips). The empirical conditional probability for Trial A is one. The conditional probability for Trial B is zero. The average across these two trials is 0.5. But the average across eligible flips is larger. Trial A has three eligible flips (all except for the first), and Trial B has one eligible flip. In these eligible flips, Trial A had three successes and Trial B had zero. The flip-weighted average is 3/4, larger than the trial-weighted average of 1/2.

More generally, when there are fewer eligible flips in a trial (i.e. fewer heads), those eligible flips are overweighted in a trial-weighted average relative to a flip-weighted average. And since the number of eligible flips will be correlated positively with the probability of heads given previous heads --- they're both closely related to the number of heads in the trial! -- the trial-weighted average is biased downward relative to the flip-weighted average. Since the flip-weighted average consistently estimates the unconditional probability, the trial-weighted average will underestimate it. That's all that's going on here.

[Edit: Most of my interpretation below is at best incomplete, at worst wrong. See my more recent entry.]

So, what should we make of the claim that "hot-hand" studies of the form of Gilovich, Vallone, and Tversky (1985) (GVT) have been misinterpreted? If GVT weighted each game (or half, or quarter) equally, then Miller and Sanjurjo would have a point. But as far as I can tell, GVT weight each "eligible shot" equally (see their Table 1). So their conclusions are unaffected by Miller and Sanjurjo's claims; this grand interpretation of this working paper is wrong.

That said, if there are empirical settings in which researchers erroneously weight by trial instead of by flip/shot/whatever, then Miller and Sanjurjo have made an important contribution. Can we think of such empirical settings?

No comments:

Post a Comment