The New York Times has a symposium of 8 contributors who debate the value of tests in evaluating teacher effectiveness.
First up, the panelists who favor some use of value-added tests.
Lance Izumi on the plight of parents who can’t act on the data: “If, based on teacher assessment results, parents learn that their child’s teacher is lousy, then what’s their option? Most kids will continue to be stuck in assigned classrooms regardless of their teacher’s performance. Waiting for a teacher to be remediated or fired could take years, by which time their child’s learning could be derailed.”
Marcus Winters says we should use value-added tests, though cautiously: It would be irresponsible to use only value-added analysis to evaluate teachers. Nonetheless, imperfect value-added assessment is surely an improvement upon the current system, which makes no meaningful attempt to differentiate teachers by their effectiveness.”
Vern Williams says that value-added tests must be used in context: “One student with severe emotional issues can change the entire social and academic character of a classroom. Such situations are rarely if ever explained when value-added results are reported. These results should therefore be used carefully as part of a teacher’s evaluation when appropriate.” He also mentions using school-wide measures, which I think might be appropriate.
Kevin Carey says, “Value-added results should be interpreted carefully, in light of statistical margins of error. But perfection can’t be the enemy of the good, and annual testing is here to stay.” He adds that teachers should also be judged by peer evaluations and “more rigorous classroom observations,” which sounds about right.
Amy Wilkins sums up the logic of value-added testing this way: “Instead of relying on a single end-of-the-year test score, it examines growth over the course of a school year. So even when a student enters a classroom far below grade level, if that student makes big learning gains, the teacher gets credit for those gains. In fact, she gets far more credit for that student than for one who started the year a little above average but ended in the same place.” She concludes, “No one is suggesting that ‘value-added’ measures be the sole criteria of teacher reviews,” and points out that in Los Angeles, teachers expressed “frustration” that they aren’t being given this information.
Now, those who oppose their use.
Linda Darling-Hammond says, “studies repeatedly show that these measures are highly unstable for individual teachers,” which of course is a serious methodological problem that draws into question the validity of such tests. She decries “evaluating and rewarding teachers primarily on the basis of state test score gains,” a proposition I don’t see advocated by anyone in that forum. She prefers, “the career ladder evaluations in Denver and Rochester, the Teacher Advancement Program and the rigorous performance assessments used for National Board Certification, all of which link evidence of student learning to what teachers do in teaching curriculum to specific students.”
Jesse Rothstein prefers “more frequent visits from trained evaluators and master teachers will require substantial additional resources.” He points out that student gains can fade over time. While he casts this as an argument against value-added testing, I think it points to the importance of having good teachers throughout a student’s career.
Diane Ravitch, the current darling of the education establishment for her about-face on school choice (she was for it before she was against it) says, “There is no technocratic fix for the problems of American education,” apparently thinking that value-added tests qualify as a “technocratic” fix. No, I think it’s an attempt to add another dimension to the evaluation process. She also mentions problems of cross-time validity (something to consider) and the possibility that tests will narrow the curriculum.
Summary: It’s really a mess, isn’t it? Reading these articles takes me back to my beginning classes in graduate school. Validity and reliability are key concepts for any researcher. Validity, roughly speaking means whether your measurements measure what you want them to measure. Reliability, on the other hand, asks whether, if you take one measurement, you get a different result when you take a second measurement, even though nothing has changed.
These are serious questions that need to be addressed. But as one of the panelists said, we should not let the pursuit of the perfect be the enemy of the good. And what we have now is, too often, “not good.” As a scientist, I may want to see another ten year’s worth of research into this matter. As a human being who knows that thousands if not millions of children suffer from poor teachers, I am saddened to think that their futures will be compromised as we seek the “perfect” means of evaluating teachers.