Friday, March 30, 2012

Small Samples, Hot Hands, and Flow

I'm now reading the part of Kahneman's book that is specifically about interpreting statistical information....incorrectly.  We have a tendency to impute causality.  This post is about his chapter entitled, The Law of Small Numbers.  The thesis is that we humans like to impute causality, even when the results we see are simply a consequence of random processes.   (My personal favorite in this category, though not discussed in this chapter, is that once in a while when looking at the sky you can see a cloud formation as a human face.)  In other words, we invent causal explanations when there isn't one, because we don't know how to attribute outcomes to randomness.  Much of this chapter I really liked.  The discussion that with smaller samples you are much more likely to observe extreme behavior was very helpful.  I hadn't seen the issues framed quite this way before.  So that was very good.  It was also helpful to read that even research scientists tend to be overconfident regarding observations from small samples and to grossly underestimate the sample size they need to establish their conclusions.  I did have some trouble near the end of the chapter on the "hot hand" issue.  I'll try to explain that in a bit.

I believe I have a rather sophisticated and nuanced view of probability yet with a rather poor comprehension of statistics, at least as it is applied to hypothesis testing.  In this post I will try to illustrate my issues.  I would like to say first that I come to these questions with some background on the subject.  Stephen Jay Gould's The Streak of Streaks is one of my favorite essays.  Around when it appeared there was an informal discussion in the Econ department here among the econometricians and other applied economists on whether the "hot hand" was possible.  I tagged along on those discussions.  More recently, The Truth Wears Out, has provoked considerable concern among some academics I know that much published behavioral science is based on inadequate samples.  Replication of results is not a rewarding activity, insofar as promotion and tenure are concerned, so attempts at replication happen less than would be desirable, from the perspective as science advancing knowledge is concerned.  Surprises are more interesting to read about than what we expect.  The consequence, it appears, is that even very well respected scientific journals have a tendency to publish results about outliers because they are not recognized as such at the time of publication.

Last year in an undergraduate course I taught, I devoted a class session to these two pieces.  As is my want, I made a simulation in Excel to illustrate some of the issues.  (The simulation has macros.  The file is in xlsm format.  You need Excel 2010 on a PC to run it.  The first worksheet is for the simulation itself.  Subsequent worksheets allow the user to plug the results of the simulation into column A, labeled state.  The worksheet will then compute the length of the current "up" streak and the current "down" streak as well as the maximum length of each streak.)  A good idea of the simulation can be had from this screen shot

The simulation is about a simple Markov Chain.  There are two states, up and down.  There is a matrix of transition probabilities.  The user can set those by adjusting two different parameters.  One is a drift parameter.  An increase in the drift parameter increases the magnitude of entries in the "up" column by the same amount.  The other is a correlation parameter.  An increase in the correlation parameter increases the magnitude of entries on the main diagonal by the same amount.  The other control is to set the initial state, either up or down.  The simulation runs first by erasing the previous simulation (hitting the Reset button) and then hitting the Run Sim button. It cranks away (pretty slowly on my computer), ultimately generating 1000 periods of data, and plotting the graph of that.  (I know the Excel random number generator has been criticized in the past for not being truly random, with each draw independent.  I ignore that issue in this simulation.  In other words, it's good enough for government work.)

There are two different explanations for streaks, using the two-state Markov Chain approach to explain them.  The first is high drift but no correlation.  The second is high correlation but no drift.  (Then one can have combinations of these with both high drift and high correlation.)  The first explanation produces streaks of ups but not so much streaks of downs.  The second explanation produces streaks of both types.  Kahneman reports on the results of Tom Gilovich and Robert Vallone about measuring the "hot hand" in basketball.  Apparently, they find strong evidence in support of the high drift explanation.  Kahneman writes:

Analysis of thousands of sequences of shots led to a disappointing conclusion: there is no such thing as a hot hand in professional basketball, either in shooting from the field or scoring from the foul line. Of course, some players are more accurate than others, but the sequence of successes and missed shots satisfies all tests of randomness. The hot hand is entirely in the eye of the beholders, who are consistently too quick to perceive order and causality in randomness. The hot hand is a massive and widespread cognitive illusion.

Until this point, I believe I understand things.  But now my confusion begins.  The two-state Markov Chain explanation that I've illustrated has the virtue that it's very simple to understand.  From the point of view of statistical estimation, there are only two parameters to estimate - the probabilities in the up column.  (Since the probabilities have to sum to 1, knowing the probabilities in the up column implies knowing the probabilities in the down column.)  Occam's Razor favors the simple explanation, all else equal.  The trouble is, all else is not equal.

In chapter 3 of the book, called The Lazy Controller, which is about System Two (the one our minds use that is rational and deliberate) but gets quickly tired from having to police System One (the one our minds use that is intuitive and fast).  But then Kahneman says there is an exception to prove the rule.  The exception is called Flow, which Kahneman describes as a state of effortless concentration.  The author and psychologist Mihalyi Csikszentmihalyi has studied this state extensively and has written a book on the subject.   My sense from what I know about Abraham Maslow, who provided the inspiration for Csikszentmihalyi, is that people he called self-actualizers have peak experiences fairly frequently, but that everyone in the population is capable of having a peak experience now and then, though mainly people don't attain the state because they are attentive to other things.

The question emerges then whether high caliber athletes are capable of flow experiences during athletic performance.  I know that one of Csikszentmihalyi's students, Keith Sawyer, has a book called Group Genius and there one of his anecdotes is about the high performance of the Boston Celtics when Bill Russell was their center.  In that story it wasn't just the Celtics, but the opponents too who played at a very high level.  All of this was likened to performers in a jazz ensemble, when "the music really cooks."  Taking my lead from Sawyer, it would seem flow is possible in athletics, though perhaps it happens on occasion with only one individual on a team, who "carries the rest on his back."  If this is possible, then might not flow in sports occasionally manifest as the hot hand?  How is it that Kahneman can so subscribe to the notion of flow yet categorically deny that the hot hand exists in sports?

Puzzled by this, and also aware of my own limitations in understanding statistical information but that I'm a reasonably skilled theorist, it occurs to me that the two-state Markov Chain is too simple to sort out these ideas.  Positive serial correlation in the state is at best a very crude approximation of flow, and then only for when in the "up" state.  Might one get a better approximation, still keeping the model to a Markov Chain, by increasing the number of states?

After a few moments, I start to assume there are two state variables, direction (either up or down) and mindset (normal, flow, or funk).  I added funk, though Kahneman doesn't say a word about it, because if flow is responsible for hot hands something analogous but with the opposite effect should be responsible for slumps.  (Also, this week after hearing about the Supreme Court treatment of testimony in support and against the Affordable Care Act, I went into a funk myself.)  So now we have a six-state Markov Chain, and hence 30 probability parameters to estimate.  I have no clue how much data would be sufficient to understand such a model and use it to determine whether hot hands are possible, but it's not hard for me to imagine there isn't enough data to conclude anything on this score.

This brings me to my conclusion.  While we want "the explanation," I know enough from my days as an administrator that there are often multiple possible explanations for what we do observe and ambiguity remains at the end.  In these cases, which occur all too often, we really don't know what's going on.  We may have a preferred explanation, but if we're honest we're forced to admit other explanations are possible.  Why is it then, with a sports streak, that we can't entertain two possible explanations?  One is just the luck of the draw.  Sometimes in flipping a coin you do get eight heads in a row.  The coin is not hot.  The other explanation is that the player did get hot.

This year with Illini men's basketball, we witnessed a truly great performance by Brandon Paul against Ohio State.  In basketball games of this sort it is not just whether the shots go in or not, it's the quality of the shots taken, and it's the defense the other team plays.  Paul made some incredibly difficult shots, the last two coming against the best guard defender in the league, Aaron Craft.  Later in the season, it appeared that the entire team went into a funk, going on a losing streak, playing extremely poorly against the bottom teams in the league, and the team's center bursting into tears near at the end of one lopsided loss.  This wasn't just a case of bad luck, in my view, and the view of many other Illini fans.  There were performance problems because the expectations of players and coaches had gotten out of whack.  Things had gone awry.

Human performance is not the same as drawing balls from an urn (with or without replacement).  Let's recognize that.  Let's agree that there are multiple possibilities to explain what it is that we do observe in human performance.  Maybe, that's as far as we can go.

No comments: