Saturday, September 22, 2007

A Vulture of Evidence

Last week Bill Richardson wrote an opinion piece on No Child Left Behind --- scrap it. His view sounds right to me. My kids go to reasonably good schools in Champaign which suffer from none of the problems that Kozol has identified about inner city schools. But even for them, we’ve seen how NCLB has narrowed the curriculum and pushed the teach-to-the-test approach, something my wife and I are not happy about. Yet Richardson’s pronouncement got me agonizing for another reason. The question is whether the bad outcomes associated with NCLB are a consequence of an approach to education that emphasizes measuring student performance or if instead that main issues with NCLB are a consequence of racial prejudice and that an evidentiary approach to student learning can work and work well if there aren’t other issues that dominate and handicap it. I was reminded of discussions I had in graduate school about whether communism was inherently a bad system or if, instead, it was the Soviet Union that was bad but communism might flourish under other circumstances. It’s hard to cling to the theoretical possibility that such a system might work well when there is such a prominent case in point indicating otherwise.
So I began to fret about what NCLB should be telling us in Higher Ed and mostly independent of NCLB and Richardson’s pronouncement, I’ve been trying to find my own comfort zone thinking about accreditation. The College of Business where I work is going through its year of self-study for AACSB accreditation and at the same time my Campus is getting ready for its NCA review. And so I’ve been wondering about whether these efforts will create their own pernicious consequence and so should be resisted. (Or, perhaps, we should comply with the letter but tacitly resist the spirit because overt resistance is too costly). Within the learning technology field, I’ve heard nobody express this concern. There seems to be a lot of buy in to the general idea of an evidentiary approach. So in this post I’m going to try to poke holes. My goal is to show my source of discomfort so to encourage others to ask similar questions. Some of this stuff, I believe, needs to be debated not just taken for granted.
Here I’m further reminded of a book I read while in graduate school, The Foundation of Statistics by Leonard Savage, where in the first chapter the author asserts that sometimes it is the foundations of a discipline which are the most controversial, even for disciplines that are flourishing. The foundational question for those who do advocate for an evidentiary approach is this: what exactly are we measuring? And for those who might want to want to follow Kozol’s lead but focus on the situation in Higher Ed, the question might be: what are the unintended consequences of the measurement efforts we do engage in and how harmful are those consequences? Those are my core questions in my effort when I try to poke holes.

Let me begin with testing and my own personal memories of that, some of which are still strong although I took my last class for credit in 1979 and defended my dissertation in 1981. There are many things I’ve learned subsequently and have since forgotten. But some of these memories of testing remain strong. Why is that? And is it good or bad.

Let me start with the SAT. I took it twice. The first time, I had a broken right arm which was in a cast not quite up to the elbow. I was nervous that it would affect my ability to fill in the bubble sheets. After I got my scores I joked that it only adversely affected my score on the Verbal, not the Math. At the time, there was a lot of emphasis on the word “aptitude” in the SAT name and that the test measured aptitude. I got a 790 on the Math. I assume that means one question wrong, but I don’t really know that. The next memory is blurred between that test, the next time I took the SAT, and GRE, but on at least one of those exams there was a question on Roman Numerals. To this day, I don’t remember which is the Roman Numeral for 500 and which for 1000. I’m ok with Roman Numerals up to 100 but not beyond that. So I likely garbled the M and the D, and got that question wrong. But there is no way that was an indicator of aptitude. After all, you can look these things up. I’ll get back to that in a bit.
My scores that first time were 620 V and 790 M --- definite math nerd. I took it a second time, hoping that the Verbal would go up and got 650 V and 770 M. In the West Wing, my favorite TV series the last few years, the President, who is supposed to be a genius, got 1590 total the first time and took it again! The SAT, in particular, seems to play the same role for high testing students that tattoos play for society at large. Does it leave such a mark on students who don’t test so well? There is now a big industry devoted to getting tattoos removed. Are those who advocate for the evidentiary approach aware of the marks they may be leaving behind on the students from whom the evidence is collected?
Next let me talk about my last semester as an undergrad taking classes at Cornell. Having taken just one intro to macro course in economics, I was advised to take the advanced graduate course in math econ – I had the math credential but no other economics. The course started out with 7 students in total, 4 grads and 3 undergrads. The grads all dropped so for the final there was just the three of us undergrads. We had a take home final and it was brutal. I got a 65 on it (two problems right, partial credit on a third, and the fourth wrong) and with that I was put in my place. Because much of it was over my head I had made some major conceptual errors – trying to prove that a divergent series converges, that sort of thing. I latter learned I was the high scorer among the three of us. One of the other students, who has since become quite a prominent economic theorist, but who was a year or two behind me at the time, got a 20 on the test. I know that during the class I had the feeling that I could do the math (in spite of the serious mistake on the final) but I had no sense at all about why any of the theorems were interesting. Absent that intuition it was hard to be passionate about learning the method of attack for the proofs. So perhaps I didn’t work hard enough for the course. But what else might the exam measure? I know that in many cases we talk of students being able to apply what they learn in a course to other problems they might confront outside the course. Just what other problems would I confront on this stuff? (In grad school I didn’t see this content at all, but when I came to Illinois I taught Undergrad Math Econ and used David Gale’s book, so did ultimately return to some of this stuff and the problem I did confront was how to teach it.)
I took several exams in grad school that I recall but I want to focus on one in particular. Northwestern is on the quarter system and in the first quarter there I took probability from Ted Groves, a tough and rigorous course. The second quarter I took statistics from Ehud Kalai. He is a world class game theorist but his heart wasn’t into teaching statistics. Also, my class got kind of burnt out by the Paper Chase approach to instruction we experienced the first quarter. So the course was less intense than the one Groves taught. Kalai asked us if we wanted a midterm. You can guess the answer. And the in-class final was open book. During that final I recall doing a problem where I read the book to learn the relevant statistics, not as a reference to what I already knew but rather to learn it the first time through; then I did the problem. I got 100 on that exam. And I finished in a little more than an hour, definitely the first one among my classmates to leave the room. Nobody else got 100. But I didn’t know much about statistics. I probably could have scored nearly as well if I were given that test before the course was offered because what I did know was how to solve math problems. I’m an ace at that. Clearly solving math problems is a correlated skill, but it is not the same as knowing statistics. In much else we teach we really want students to develop learning to learn skills so they can solve problems in situ. Knowing how to do math problems is a learning to learn skill on how to do exams with math. Exams with math might want to test that to some degree. But they want to test knowledge of the subject where the math is applied as well. It’s very hard to parse the one from the other, at least with an open book test.
The best type of exams are oral. There is back and forth and the follow up questioning becomes situated in the previous responses. In this manner the examiner can learn quite a lot about the student’s knowledge, how robust it is and how well the student can think through related issues based on what the student is presumed to know. In this sense every real conversation has an element of testing in it. Perhaps that is an observation to suppress. Most students who go through an oral exam, and here I’m thinking of doctoral studies, are extraordinarily nervous beforehand and during and may be quite self-conscious when giving their responses. That type of fear can be an inhibitor and restrict the conversation; to go deeper into the issues the participants in the conversation should be relaxed. In the process the participants learn quite a lot about what each other is thinking. We can surmise that Aristotle certainly understood the competencies of the young Alexander.
Apart from testing and conversation, the other evidence we have of student learning is through the works they create – their writings and increasingly their multimedia creations – and through in class presentations they give. As the latter are typically not recorded (and many courses don’t entail in class student presentation) there has been more focus on the former. And because we now live in a digital world, the works can readily be archived for later review, scrutiny, and reflection. This is the basis for the current fascination with ePortfolios. And at this level, that is for the good.
Originally, I was quite high on ePortfolios as a concept because I thought it was associated with longitudinal assessment, measuring growth and hence measuring learning, looking for increments rather than for snapshots of performance as most testing does. Every eager parent anticipates the weighing and measurement of height of their infants to assure the child is healthy and growing properly. And most parents I know (admittedly a small sample) continue with the pencil marks on the basement wall to track the growth of their kids after those visits to the pediatrician have ended. Measuring child growth in this way is a labor of love. In primary school some of my kids’ teachers kept an archive (paper based) of the kids’ work and used that to show the kids’ progress when we had parent-teacher conferences. Primary school teaching, at least in the school my boys attended, had a labor of love aspect to it for which I’m quite grateful. That changed with middle school as the kids rotate through the subjects with different teachers and more of the work they do is regular homework.
Longitudinal growth becomes harder to measure because the rate at which students grow intellectually slows, because the teachers become less invested in the individual students, and because the bulk of what we do measure is within course learning. At the college level, ePortfolios are not about longitudinal growth. Instead, they are about “measuring competencies” via “genuine assessment.” In other words, ePortfolios are about setting a bar and seeing whether the student has cleared it. I’ve written a prior critique about measurement of this sort, about the interrelationship between where the bar is set and what to do about the imprecision in measurement, particularly as it pertains to the performance of students in the gray zone near the bar. I don’t see those advocating for the evidentiary approach talking about this issue much if at all. They should. Here, I’m going to push on.
There are some disciplines where a portfolio approach has a long tradition, e.g. design and writing studies, and instructors in those fields may be drawn by instinct to ePortfolios and welcome the technology for this purpose. But in many other fields an ePortfolio approach is alien, with traditional testing much more common. Faculty in such fields show little interest in ePortfolios and in the evidentiary approach more broadly. The push for this is coming from elsewhere.
It is coming from the Department of Education, the Spellings Commission Report on Higher Education, the various state and local governments who have become increasingly stingy about funding public higher education (this link requires access to the Chronicle), and from the accrediting agencies, who are feeling the heat from these other sources. As one of my colleagues in the Finance department here put it, we’re driven by a yuppie culture, consumerist to the extreme, where the key to any financial transaction from the buyer view is understanding what you’re paying for. With college tuition so high and still rising at a hyperinflationary rate, the consumerism demands an answer. That is the core driver.
The traditional approach has been a trust model, where at each point in the hierarchy there is some activity that indirectly affects quality assurance of teaching and learning. For example, academic departments put in huge effort on recruiting faculty. Recruitment is a big deal. Promotion and tenure, another big deal, gets reviewed at multiple levels --- department, college, and campus. Those and salary review are the main mechanisms for quality assurance. They are all indirect. As a faculty member, I have an ethical responsibility to teach my courses where the content is correct for the course listing and where the teaching approach is suitable for students at that level. Usually, there is not direct monitoring of that, but only the indirect results from course evaluation, which feeds into the other indirect mechanisms, and the write ups by the instructor to document their teaching at the various junctures where they are evaluated along their career path.
The trust model was working reasonably well when I joined the faculty at Illinois in 1980. I’m not saying it was perfect, but it was reasonably effective. And the confidence in public higher education was high. The confidence is waning, as evidenced by the documentary Declining by Degrees. The question is whether the decline in confidence is purely a matter of perception – the quality is largely the same as it was in 1980 but in an industry clearly subject to Baumol’s Cost Disease, the increased tuition has created this change in perception in and of itself, or if there are cracks forming in the trust model and those cracks are largely the cause of the change in perception. Truthfully, I don’t know the answer to that question and I’m not sure how to produce an answer, though I think it is a good question to ask. But returning to the start of my post, my fear about the evidentiary approach that the Bill Richardson’s editorial triggered is that it will lead to complete breakdown of the trust model and overall worse results, particularly at public higher education institutions such as Illinois, creating a view of Higher Education akin to the view many now have of urban public schools. Let me explain how that might happen.
The best teaching is highly innovative, one experiment after another with the approach under constant modification. The learning of the instructor about how to teach most effectively drives the learning of the students; it serves both as inspiration and model. My friend Barbara Ganley is a great exemplar of this excellence in instruction. Innovation of this type leads to idiosyncrasy in the teaching, an idiosyncrasy we should welcome. This type of innovation can thrive under the trust model. That is not to say that the trust model drives the innovation but rather that creative instructors like Barbara can feel free to experiment.
The evidentiary approach is different. It demands a roll up or aggregation of the evidence. Goals in the course syllabus must somehow align with goals articulated by the department, which in turn must align with goals articulated by the college and so on up the hierarchy. The need for evidence that can be so aggregated might very well act as an inhibitor on instructor innovation, encouraging a more cookbook approach. (Much of Kozol’s criticism of NCLB is indeed that it has engendered a militaristic approach to instruction pushed from above that seemingly hinders the play and creativity of the students.) This would be the unintended but highly pernicious consequence of a total embrace of the evidentiary approach. We’re not there yet, but it seems to be a real possibility worth considering.
Of course it is possible to envision an alternative outcome with the evidentiary approach, one more benign. The goals that get articulated could be sufficiently broad as to not inhibit the good teaching at all (and not inhibit the bad teaching either), indeed that accreditation may have only a minimal impact at a place like Illinois but is nonetheless sufficient to alert the public about disreputable Diploma Mills.
Whether those represent two endpoints of a spectrum I’m not sure. It seems so to me but so many other of my colleagues are advocating for the evidentiary approach that I’m willing to admit I may be missing something. But mostly I fell there is a lot of fuzzy thinking out there about measuring learning and as a consequence not nearly enough attention put on the how the gathering of evidence might affect the teaching and learning practice as well as how it might affect the teachers and learners themselves. Further, there is the related issue that if the measurement is done poorly then should we be doing it at all.
At Illinois we already know what the core learning issues are. These relate to large class size, particularly during the general education part of the curriculum and the concomitant problem that students can become anonymous and consequently disengaged in their own learning. What we need to do is promote conversation, between students and faculty and between students and other students, particularly during the general education phase. We do that with our Living and Learning Communities. It is an excellent approach but it does not scale. We need some other alternative that does for the students who reside elsewhere. But it’s not the families of these other Freshman who are really concerned about the value they are getting. The public that is most upset with us are the families of students who applied to Illinois but were not admitted, especially in the case where the parent did attend yet had a lower ACT score and GPA than the child.
Given that, accreditation is more of a tax on the institution (or college in the case of disciplinary accreditation like AACSB) rather than an opportunity to engage in needed self-reflection. As long as it is perceived as such, like most tax payers we’ll engage in mild tax avoidance and otherwise comply.
But will that approach satisfy the consumerist demand for information? And if not, will that force an outcome like what NCLB has produced? I don’t know, but I think we should talk about it.

Thursday, September 06, 2007

Form Factor

Yesterday I finally got around to purchasing Acrobat 8 Pro. Staff on my campus can get this through the Webstore, with purchase through department at a very good price of $40 and purchase by individuals at a price of $137 (this is still much cheaper than the list at the Adobe site). It turns out that the product comes bundled with another Adobe offering, Adobe Lifetime Designer, which is a form builder tool. It makes forms within PDF files. This was immediately intriguing to me. As regular readers will note, I’ve been fascinated with the possibility of instructors collecting data from their students and then repurposing for instructional use. For example, this post on the future of Learning Management Systems has a section specifically on data ingest and data manipulation.

Also, I’ve had some experience building dialogic presentation and self-test quizzes inside an Excel workbook, for example here and a different one here. The students really liked those (they give a good sense of how I think basic economic theory should be taught) so on content fine. But I had always wanted to collect the student responses and have those compiled into a single file. (There is some randomization in each spreadsheet based on the demographic info the students supply, so the answers will differ across students, even if they each got all of them correct.) The last time I taught I tried to do that compilation with Excel. It was a disaster. The linking required was quite unreliable. (And maybe there was fault with the person doing the linking. That would be me.) So I’ve had it in the back of my mind since then to find another way to achieve that data collection end.

I also did some work on the reverse problem. If you have a table of responses from a survey, with each row a response for a particular individual and each column a response to a particular question, can you find a convenient way to display the individual responses back inside the form in which the survey was rendered? It is hard to read the individual data in the table – much easier to read within the form. I built this spreadsheet as an example, not a systematic solution.

Given that background, I was like a kid with a new toy as I played with Acrobat Pro and Lifetime Designer. I started in on the tutorial, but after 5 or 10 minutes of that I lost patience and tried to build my own form from there. It’s always refreshing to consider how one learns to use new software. I had never tried building forms in PDF files before. I was extremely impatient at first because I didn’t know how to get stuff done and where I could make some progress I made tons of mistakes in design – the first looked very ugly. But in the “learning by futzing” manner that I favor to get familiarity with an application (my approach has to be similar if not identical to Ericcson’s effortful study) I go through a variety of creations of my own design and negotiate that with my understanding of the software and the underlying capability of the software (those are not the same but it would be nice if they converge eventually). I can’t learn software without those creations. Further, my personal sense of competence is wrapped up in being able to make something with the software that gets over my personally set bar – not bad, pretty decent, functional if not beautiful.

The first few didn’t make it. This is the fourth one I made, the first that used Tables in the layout. To view it you need version 8 or later of Acrobat or Acrobat Reader, not some other PDF viewer. While some readers of this blog might prefer non-proprietary solutions, my own view is that the functionality I’m after is pretty hard to deliver in a clean way and since Acrobat Reader is freely available, this works fine. Also, I want to make a point of comparing this solution to other possible ways of gathering the data. For that purpose it would help if the reader took a look, so the reader has some understanding for why I’m high on this approach.

Let’s consider the form first from the point of view of someone filling it out. The look is fairly clean (and with some more experience in design it could be even cleaner). The main table is 4 by 3 with a pull down menu in each cell. Those work reasonably well; when I did the analogous thing in Excel sometimes the pull downs would be “sticky.” I probably should have increased the column width since the longest response runs over, but otherwise it looks ok.

I believe these forms have to be one page and no longer. (I need to verify that.) So it is useful to put a lot of response fields in a small amount of space. This also has the benefit of convincing the person completing the form that there isn’t too much work to be done. For this particular survey, the person fills out the table, perhaps adds some comments in the paragraph box, puts in some identity information and that’s it.

The key feature is the Submit by Email button and I’m going to belabor the discussion of it because, as it turns out, using email is not the critical part though it is nice how it can integrate with the user’s email tool, if that is desired. What is critical, however, is that with that button and only Acrobat Reader, not the full Acrobat, the user can save the form with the data the user supplied filled in. This is most obvious if after clicking on the Submit by Email button the user selects the second option (Internet Email) or the third option (Other) and then clicks the button in step 1 that allows the user to save the pdf file. One can then transmit that saved pdf to the collector of the data in any manner conceivable – it doesn’t have to be by email. But for obvious reasons, email is the least common denominator and they do have a nice integration with it.

From the perspective of the survey creator it is really a snap to get those responses into the data file. If they are submitted as email attachments, double clicking on the attachment is all that is needed. But it is also possible to bring in other completed surveys simply by navigating to the folder where those surveys are located and then selecting them. Here is the view of the survey designer who has collected some data. In this case there are three entries. The first is from “Mo Better,” a fictitious character I used when I completed the form myself. The second and the third are identical, actually submitted by my colleague Norma Scagnoli. In this case I deliberately wanted to see if it would accept the ingest from the same file more than once. There is the answer.

With that basic familiarity for how it works, let’s consider the benefits of the approach. The miniscule number of instructors who design their own databases and make Web forms to collect data will find nothing new here. But for the rest, this approach should be empowering. Heretofore these instructors had to rely on programmers to make those Web forms. Now they can dispense with the Web form altogether and build the PDF form themselves.

Learning Management Systems have some ability to collect this sort of data via the survey tool. I could do something analogous in our LMS but then I wouldn’t have the ability to use pull down menus, I’d have to use radio buttons instead, and I couldn’t deliver the survey as a single table. I’d need 12 separate multiple choice questions to gather the data. Apart from the convenience in presentation I mentioned earlier, by delivering it as separate questions the person completing the survey might not see the connections across questions that become transparent in the table view. Further, at least with the WebCT Vista survey tool, which is the one I’m most familiar with, horizontal space is eaten up by the place that records whether questions have been completed and one saves the response on a question by question basis so the form is filled with buttons (if the entire survey is delivered as a single scroll), so the look is clunky not clean.

Further, from the instructors viewpoint looking at the collected data, one can look at the report as a whole and one can view the individual’s response, but that appears as a row of data, it is not visible back in the form that the student completed. The LMS approach is clearly more secure and so I would not use with grade related information. Further, the LMS approach is likely time saving for the instructor in high enrollment courses. But in smaller classes, I’d likely favor using the Adobe tools.

Also, in case it isn’t obvious, by getting away from the LMS as the distribution vehicle, one doesn’t have to restrict the population for a survey to a class roster. It can be quite other things. And, of course, it can be used in many other contexts, among the most obvious being IT training activities where paper surveys are currently administered at the end. Student groups who want to administer surveys might find it a real boon, as might committees, research groups, and perhaps even bloggers who’d like their readers to answer specific questions. (I’m not sure of that last one, but you can dream, can’t you?)

I think this is worth a try and I’d be very interested to hear from anyone who does.