The other day Academic Cog linked to a NYT article showing a very scary looking equation used to evaluate teacher performance. The piece described pretty well how standardized tests scores are being used to decide who does and doesn't get tenure. Without getting too deeply into the methodological weeds, the basic approach is to try to isolate the effects that teachers have on student performance, controlling for other factors. The "value-added model" (or VAM) uses changes in student performance over time to rank teachers. These statistical estimates have been used to create unnervingly specific ratings for teachers, ratings that have been published in major newspapers. It's not often that social science statistics inspire such heated controversy (with major spreads in NYT (above), USA Today, and the Columbia Journalism Review), but that's what happen when statistics determine the fate of millions of teachers.
So what's wrong with the idea of using objective measures of student achievement to assess teacher performance? Well, as it turns out, quite a lot. Two pretty good reviews of problems with the VAM were published last year, one by Sean Corcoran for Annenberg and the other by Baker et al for the Economic Policy Institute. To summarize:
1. Imprecision: the statistical models have too much built-in error to make reliable determinations of individual teacher performance. A report by NCES suggests that there is at least a 25% chance of misdiagnosis (they suggest school-level analysis instead).
2. Instability: teacher and student performance tends to bounce around from year to year. An teacher who is considered very good in one year might be labeled as terrible in another.
3. Poorly constructed assessments: state standardized tests have all sorts of problems, such as being predictable (and thus easy to game). There's also some reason to think that the choice of assessment has a big effect on how a teacher performs. A major problem is that some tests have ceilings that are too low. If you're teaching a class in which 75% of your students score in the 90th percentile, are you a bad teacher because they haven't gotten better by the end of the year?
4. Model specification problems: it is very difficult to make sure that you have appropriately controlled for all the various confounding variables. As complex as Academic Cog's equation looked, the biggest concern might be too few variables.
5. Narrow subjects: there's an unfounded assumption that standardized tests accurately measure the whole of student knowledge in a given subject. In fact what they do is test how well students are able to answer questions that are amenable to standardized testing. Lots of things are hard to ask about in a standardized format. To make matters worse, even if we choose to believe that the assessments that have been created are good enough to be used to evaluate students accurately, what do we do for teachers who aren't teaching math or reading in grades 3-8? Are we just going to eliminate the other subjects? Or are only elementary and middle school teachers going to be on the hook?
6. Perverse incentives: a whole laundry list, not least of which is that teachers and principals are going to have every reason in the world to do nothing but testing drills all year, not to mention outright cheating.
Good statistical models for assessing student and teacher performance are very desirable, but what worries me most is that they are being used for purposes for which they were never intended. They're designed to determine general, aggregate influences, not label a specific teacher "good" or "bad." Policymakers and neoliberal activists are mis-using these tools. The research on using data to improve student performance states clearly that standardized tests should just be one of a range of assessment tools - not the final word. Value-added models aren't being used to identify what effective teaching practices, rather they are being used to identify scapegoats, to lay blame on individual "bad" teacher and to fire them. More on the whole "bad teacher" obsession of neoliberals next time.
Great assessment of the situation. I've bookmarked those links you posted so I can read more later. My state is going to this model and one concern you didn't raise was how does one determine the value-added score for those educators teaching subject where standardized tests aren't utilized (the arts, phys ed, etc.). No one, to my knowledge, has been willing to take on that, so to speak, elephant in the room. I mean, that's a lot of educators that have been left out of the equation.By RageyOne, at 1:38 PM
Your last few lines are spot-on, in my opinion. There are too many things that need to be considered before one states that a value-added model is The way to go to determine if a teacher is being effective.
Thanks Ragey. I tried to mention the problem of tests only being in math and reading in the text, but there were so many other points I wanted to hit that I might not have been clear enough. It's a huge problem! I mean, how in the world are we supposed to come up with a good metric for evaluating art classes, for pete's sake? And there will be tremendous pressure to shift the focus within disciplines to topics that lend themselves to standardized testing. So for history we'll have lots of memorization of dates and the like. It's a total nightmare.By Arbitrista, at 9:03 AM
I agree, a total nightmare. I'm in awe of some of the decisions being made in my state. I'm totally intrigued, and in a bit of disbelief, as to how the powers-that-be plan on pulling off providing a value-added score to those in non-tested grades & subjects.By RageyOne, at 8:30 PM