Getting to know you,Getting to feel free and easyWhen I am with you,Getting to know what to sayGetting to KnowYou from The King and I
Sometimes I think computing and the Internet does a number on our common sense, to the point where we start to worship false idols, because doing so is trendy, rather than because there should be any reasonable expectation that they'll deliver the goods. I want to push back on this some and in this post I will do that by making two points. The first, I hope, is fairly obvious. We should distinguish data form - quantitative versus narrative - the first is what Likert-style questions on a survey produce so that they can be aggregated across respondents while the second is what responses to paragraph questions produce, which often defy ready aggregation - from the quality of said information. By information quality, I mean whether the information tells the observer anything of interest or if, instead, it is all largely useless.
I believe there has been a confounding of data form, with a strong preference for the analytical type of information, and information quality, assuming that the information will be high quality because there is a lot of it. This bias finds expression in "data driven decision making" and "data analytics" and has produced what I view to be pernicious business practices, some of which I will illustrate in this piece. That point, in itself, I believe is straightforward.
The next point, however, is equally important, perhaps more so. It is that poor usage of low quality analytic information that erodes trust in our institutions. If we are ever to restore trust, we must make efforts to counter this push to analytical information, regardless of information quality, and move more to privileging anecdotal information, when it is high quality.
Let's begin with a very well known decision, one which readers are apt to regard with strong feelings. This is James Comey's decision to announce that the Clinton email probe would be re-opened near to the 2016 election. Here is my post mortem on that decision. First, based on events subsequent to that choice, I think it fair to say that Comey made this choice for internal-to-the-FBI reasons. He was afraid of leaks that would prove embarrassing. So he was trying to get ahead of that. He was not trying to impact the election. Second, indeed he must have been certain that Clinton would win, which is what the polls at the time showed. The polls are just the sort of analytic information that we tend to privilege. Third, it is probably impossible to do an interview with Comey now to get him to do his own post mortem on that decision, but I assume he would have a great deal of regret about about it. He underestimated the consequence of that choice on the election.
The upshot is is this. You can have a straight shooter, with access to the data, and who is trying earnestly to make the right choice. This in no way is a safeguard against making a bonehead play. There still will be a a lot of residual uncertainty. Using analytic data of this sort doesn't eliminate that. In retrospect, a decision can look quite bad in these circumstances.
The Comey decision, as bad as it was for the country, was a one-off. I want to turn to ongoing decisions. A particular practice that I'd prefer to take on in this piece is the online survey, delivered to people in two possible circumstances: (a) there was a recent transaction that involved a user where some stakeholder wants to get information about the user's experience, or (b) there was no transaction but the stakeholder has access to the user's email address, so the stakeholder solicits information about user experiences more broadly construed. This practice should largely be abandoned, because the quality of information it produces is poor. I know it is wishful thinking to hope for that, but below I will explain why the approach produces low information quality as well as to consider alternative practices that would produce better information and thereby improve the decision making.
First, let me consider one sort of data collection effort done in this vein, though mainly not done online, which is the course evaluations students fill out at the end of the semester. At the U of I, those are referred to as ICES. Students have no incentive to fill out these forms; they are done anonymously and since they are delivered at the end of the semester the feedback that is provided will not impact the instruction they are getting (and it might not impact the instruction the next time around as well for the following reason). Course evaluations of this sort impact grade inflation. Students want high grades and report greater satisfaction when they expect to get a high grade, regardless of how much they learned in the course. So we get the phenomenon where satisfaction is reportedly high yet not much learning has taken place. George Kuh refers to this as the Disengagement Compact.
Drilling down a little on this, one should note that the type of questions one would ask if the main goal was to improve instruction are quite different from the type of questions one would ask if the goal is to rate instruction. Improvement happens via formative questions, much of which are simply about describing the current practice. Evaluation happens via summative questions, thumbs up or thumbs down. In theory, you might have a mixture of both types. In practice, the summative questions trump the formative ones, as to what people tend to care about. On the ICES, the first two questions are summative - rate the course and rate the instructor. The rest of the questions hardly matter.
These lessons from the course evaluations apply to the type of surveys I'm critiquing here. The user gets no benefit from doing the survey. (Recently I've gotten requests that offer a lottery of winning some prize for giving a response. It is impossible for the user to calculate the likelihood of winning that prize, so as to do some expected value calculation to determine whether it is reasonable compensation for completing the survey. They do give an estimate for how long it takes to complete the survey. My sense is that the compensation is far too little - and as a retiree I'm time abundant.) The user also isn't told how the person the user engaged with in the transaction will be impacted by the survey information.
A second issue that comes up, though less so with the course evaluations because much of the teaching practice is actually reasonably well set. In general doing formative assessment, the questioner, ignorant of the user experience ahead of time, won't know what to ask. So there needs to be some way for the user to define the relevant issues in the user's response and not be blocked from doing so by closed-ended questions that are off point. Focus groups may be preferable to surveys, just for this reason, though focus groups are sometimes hard to assemble, which is why they are employed less extensively than they otherwise might be. Within the survey itself, paragraph questions really are better for getting at the right formative question in that they give the user the ability to define the issues. Of course, the person reading the responses then needs to determine whether a particular comment provides a guide for a sensible change to the process or if the comment is too idiosyncratic and therefore should be ignored. I know of no algorithmic way to make that determination. For me, it is more art than science.
Let me turn to a practice that I think can do things better, which might work if there are repeat transactions involving the same parties. This is to evaluate the individual transaction or a small set of related transactions, and do that repeatedly, so it is evident that the evaluation information gets utilized in the subsequent transactions. Further, this makes the evaluation seem part of an ongoing conversation, which is precisely what you would have if all the interactions between subject and evaluator were one-on-one.
I first did this sort of thing back in fall 2009, when I taught a CHP class that had evaluations of individual class sessions, writing this up in a post called More on Teaching with Blogs. The survey I used can be found in that post. It really was quite simple - 5 Likert questions on whether our discussion went well or not, emphasizing a variety of features that a good discussion should have, and then a paragraph question for general comments. While the process was interesting for a while and did educate the students some about my goals for the discussion- eventually enthusiasm for it waned when the novelty wore off. Further, because that class was small, only 17 students, I hadn't tried to repeat the approach in my teaching since.
I reconsidered this prior to the past fall semester. I had been struggling with attendance so I wanted to do something to encourage students to come to class, but I wanted students to experience "gift exchange" in our class, so while I ended up recording attendance, I didn't want that measure to directly impact the grade. What I came up with instead was for those who did come to class to have the option of filling out a survey just like the one in that post, and then giving the student a few bonus points for doing so. (This means that the surveys couldn't be done anonymously, because there was no way to give credit in that case.) After a few of these, but before I had said I'd stop delivering them, the forced response questions stopped having any meaning to me or to the students. Most of the comments were about whether the discussion was effective and if it had only a few of the usual subjects participating or if it got more students involved. As with that CHP class, some students commented that the discussion was good although they sat on the sidelines during it and only listened. This is just the sort of thing that you learn from paragraph questions that you wouldn't anticipate ahead of time.
After we exhausted the bonus points I had planned to give out (and that were announced in the syllabus) I moved to an even simpler form, in case the students still wanted to give their comments on a class session. I got one comment for the remainder of the semester; that was it. Evidently, getting people to participate in giving this sort of feedback is a big deal, even if it very easy for them to do so.
Now let me segue to other sorts of transactions where I'm the one being asked to fill out the survey. Later today I have a visit with my primary care physician, a routine visit that we do periodically. Invariably, after that visit I will be sent a link to a questionnaire provided by the health provider's portal and then forwarded to the email account I have linked with that portal. The first couple of times I filled these things out. But I have since stopped doing that. It is entirely unclear that there is any benefit to me for filling out such a survey. Instead consider the following alternative.
Suppose that rather than evaluate my visit, there was an ongoing conversation about my wellness. I've been taking blood pressure meds on an ongoing basis for several years. I have the ability to monitor my blood pressure at home, but I don't always do that, especially if I've not been exercising and have overindulged with food and rink. Suppose that periodically, perhaps monthly, I got a request from the nurse to upload my recent information, which would then be reviewed with perhaps a bit of back and forth electronically. My incentives for providing that would be quite different than they are for filling out those surveys. The information would be about my health and go directly to my healthcare provider. And my course of behavior might be modified a bit based on this sort of communication. This would be a much more holistic way to learn about my user experience and to promote my well being. Further, it might not take that much more time on the provider end to do this type of tracking. But, it wouldn't be used to evaluate the doctor or the nurse as to the quality of job hey are doing. That wouldn't be the purpose. My care and my health would be the purpose. I have interest in both of those.
Moving to quite a different context, I've thought a bit about how Amazon might improve its processes (I have a Prime account) to focus less on the individual transaction and more on the overall experience. Based on some discussion over the holidays with other members of my family, I'm quite convinced that most users don't understand Amazon pricing much at all - how long will videos be free to the user and when will they return to having a rental fee, for example? It is disappointing to watch something for free, then either wanting to watch it again sometime later or watching another in that series, only to find that later the video has a rental. It's the sort of experience that might drive the user to another provider. Is there a way for Amazon to both elicit user preferences this way and to educate the user about time windows and prices? If so, why don't they conceive of their evaluation effort more that way. It is true that for fundamentally new items how many stars it gets might matter to other users, as that pattern of evaluation has already been established. But is that the only thing that does matter?
Let me continue with a brief discussion about analytic information obtained purely by clicks, where there is no survey whatsoever, as well as what optional comments adds to this picture. How useful is either sort of information and do they give the same picture or not? Here let me note there is a tendency for dashboard presentation of analytic information, rather than provide the full distribution, perhaps because some people may be overwhelmed if the full distributions were provided. Dashboards either give aggregates or means only, but otherwise don't give a sense of the underlying distribution. For example, I have a profarvan YouTube channel and it gives monthly information about the top ten videos, measured by minutes watched and then again by number of views. My number one video in December was on the Shapiro-Stiglitz model. There were 150 views and the average length per view was 4:21. The thing is, this video is more than a half-hour long. It goes through all the algebra that's in this paper and there is quite a bit of it. What is one to make of this information that the average length of viewing is so much shorter than the full video?
Based on comments on the video I received a few years ago, I gather that a handful of students go through the whole thing rather carefully. The comments are from those students, who said it helped them really understand the paper. Many more students, however, take a quick look and that's it. A reality about online information today is that for many people it merits a glance, nothing more. So I would describe the distribution of users as bi-modal. There is a small mode of quite serious watchers and a much larger mode of casual viewers. Is that good or bad? Mostly, I don't know. When it is students currently taking my course and they are in the casual viewer category, that's disappointing to me, but I have enough other sorts of observations like this to know that it will happen. When it is students elsewhere, the viewing is entirely optional on their part, so this is consistent with them trying it out, mainly not finding it particularly useful, but once in a while there is a student for whom the video does hit the mark. Naturally, I'd like the mode with the serious watchers to be larger. But, I'm far less sure that I can do anything to impact that with the videos I do provide. There simply isn't the sort of information you'd need to figure out what would convert a casual viewer into a serious watcher or, indeed, whether that is even possible. In that sense the information YouTube does provide is quite rudimentary. We don't know why the person checked out the video, nor why the person stopped viewing. The data given are simply insufficient to answer those questions.
Further, even with the comments on the videos, they are coming from viewers I haven't previously met. So the comments are of a one-and-done form or perhaps a little back and forth but then it is over. There is no ongoing conversation here. (My own students would likely comment on the class site rather than in YouTube. On the class site there is a better chance for the comments to be part of the ongoing class discussion.) Truly valuable anecdotal information comes when there is both an ongoing conversation and there is a great level of trust between the participants, so they are willing to open up and give "the skinny." Of course, for this to matter the person has to be an insider and have either information or a point of view that others would value having. You might think this type of inside information is restricted only to VIPs. My experience, however, is that it is much broader than that and that information gathering of this sort of information, time consuming to be sure, gives a much better picture of what is going on in a social setting than can be gotten merely by peering at the numbers.
Yet this information requires some human being to assemble it into a coherent picture. Does our current tendency to pooh pooh anecdotal information speak to our inability to sketch such a picture? Or is it simply an argument against coming to a conclusion based on far too little information altogether? Can we agree that in a world of complexity understanding what is going on is an arduous task? If so, can we then agree that information in all forms, analytic and anecdotal, can be helpful in producing that understanding, but that some information is very low quality so don't help that much? If we can agree on these things, that would be progress, and this piece has found its mark.
No comments:
Post a Comment