## Friday, February 04, 2011

### Streaking

"What, never?"
"No, never!
"What, never?
"Well, hardly ever!
From Pinafore by Gilbert and Sullivan

This semester I'm teaching a Behavioral Economics course for undergrads. Last week we were to read one of my all time favorite essays, The Streak of Streaks, by Stephen Jay Gould. When I started to blog back in 2005, I used Gould as a role model for my writing, with this particular essay the pièce de résistance. It seemed a great mixture of the personal narrative with themes of scientific importance. In this case the latter are about rare and not so rare events and how we humans (mis)interpret the cause. The underlying science questions are these: Do players on occasion get on a "hot streak" in sports or do we fans attribute that to the players with no scientific basis to back up the attribution?

Given the average performance of players, it turns out the length of the streak matters a great deal in making this determination. Shorter streaks are perfectly well explained by pure chance. With flips of a coin, getting 10 heads in a row is certainly possible. The probability of it is a little less than one in a thousand or, turned around, with a 1000 flips it's not unreasonable to get a streak of 10 heads in a row. Now here's the thing about longer streaks. The probability of getting 20 heads in a row with coin flipping is not one in 2000, as some might believe based on the previous observation. It's more like one in a million (since a million is 1000 x 1000). Then 30 heads in a row happens one in a billion, 4o heads in a row one in a trillion, 50 heads in a row one in a quadrillion, and on and on way past our abilities to conceptualize the size of these numbers. Gorillas with typewriters might eventually reproduce Shakespeare, but in the time it is likely to take all intelligent life in the universe will have vanished.

On the science then, Gould's piece is about how for short streaks we seem to misperceive the cause and attribute the outcome not to chance but to the player getting hot while on long streaks, well most of us haven't observed them. So time and again we go back to the great DiMaggio and the summer of 1941, when America was not yet at war and baseball was sublime. (Ted Williams hit over .400 the same year, the last time it's been done.) DiMaggio's was a very long streak indeed. The odds are that he was hot.

In my teaching these days I'm trying something new - having the students write about what they read before we discuss it in class. Part of that is to give me a sense what they are thinking about the subject. In this class the students are mainly econ majors. Surely the vast majority of the students in the class have had at least one course in probability and another in statistics. But its not clear whether that training has penetrated their thinking when the subject isn't obviously about economics. Many of these kids are big baseball fans, as evidenced by their writing. Does knowing the one affect their views about the other?

One of Gould's salient points, borrowing from the research of Kahneman and Tversky, is that we humans tend to put things in categories and then reason about the categories rather than about the things themselves. With that, not never and hardly ever end up in the same box. Most of the time that's ok because for all intents and purposes it doesn't matter. Once in a while, however, it does matter and for this topic in particular an economist would view it an egregious error to combine the two. By way of their writing, some of the students are showing they are still more human than economist.

With rare events must come irony. We had a rare event here last Wednesday. The University canceled classes because of the severe weather. That was the day I was to teach the Gould paper. More afraid of the weather than most others on Campus because of a bad fall I took a few years ago, I had already decided not to hold a face to face class and made arrangements to teach the class online. By the time the Campus made its announcement I had already informed the students of my intentions. So I decided to go through with that, though make the session optional. It seemed a tactically sensible decision at the time, but what does an instructor do after the fact given that turnout for the optional session was low? Since I have high regard for the ideas in that essay, it didn't seem appropriate to just move on. But I also didn't want to simply redo what we had already done online. That wouldn't be fair to the students who did show up there.

Not being able to let go of this conundrum, I built this simulation in Excel which is explained in the video below. (It has macros which must be enabled and opened with Excel 2007 or 2010.) Several years ago I built a random walk simulation, to convince students that its very hard in looking at stock prices to parse out any deterministic pattern in the time series. This new simulation is more general than the old one in that it allows for a first order Markov process to determine the time path. The simple random walk is one case but there can be drift and there can be serial correlation. The other part which is new is that I built a way to count the length of the longest streaks in the series that results from a given simulation. So it is possible to consider the underlying Markov model and the streak it produced as a pair, which is pretty far down the path of doing the inference going from the streak to determine which model is likely to have generated it.

Some of this I will show in class on Monday. I hope also to get back on track with our original schedule. We'll see.