The perils and promise of value-added measures of teacher quality

One “final frontier” in school policy is this: Can we determine which teachers are better than others, and if so, how do we do that and what do we do going forward?

The Urban Institute has recently published a paper by Eric A. Hanushek and Steven G. Rivkin (Using Value-Added Measures of Teacher Quality) that describes the perils and promises of such an approach. (In brief, the idea is to test students at the beginning of the year,  test them again at the end of the year, and see how much they’ve learned. That is, don’t judge a teacher by the absolute test scores of his or her students, but look instead at how much the students learned over the year. Ideally, do this for several years.) What follows is not quite a summary,  but it is my attempt to rephrase in “normal” (non-academic) English some of the key points of the paper.

What we know already

We know two things from the literature on student achievement:

One. Not all teachers are equally effective. (Effectiveness is defined as “value added to student achievement or future academic attainment or earnings.”)

Two. When deciding whether to hire a teacher or how much to to pay a teacher, schools look at the number of postgraduate credits, years of experience, and scores on licensing examination scores. None of these items “explain little of the variation in teacher quality.” The exception is that teachers with very little experience are not as effective as those with some.

Getting technical

Researchers typically use an “education production function” to evaluate the effects of a teacher. The function is a mathematical formula (something like regression analysis, roughly) that predicts the achievement of a student in year 2 by adding a number of variables to the student’s achievement in the first year. The variables include:

  • School and peer factors
  • Family and neighborhood factors
  • Other variables we don’t know
  • The effects of the teacher

How much difference can having an effective teacher make?

Research has found that there’s more variation in teacher effectiveness for math than there is for reading.

Here’s one way of stating the power of teacher effectiveness:

“Having a teacher at the 25th percentile” of of quality rankings would move a student from being at the 50th percentile in math performance to the 59th percentile.

To put that number in comparison, that gain would  eliminate at least 20 percent of the black-white achievement gap. It would also be equivalent to reducing classes by 10 students.

Eliminating the bottom 6 to 10 percent of teachers could “have strong impacts on student achievement, even if these teachers were replaced permanently by just average teachers.”

What role should value-added measures play in personnel policies?

So why don’t use value-added measurements of teacher effectiveness in personnel decisions? There are “concerns about accuracy, fairness, and potential adverse effects of incentives based on limited outcomes,” [e.g., “teaching to the test.] But that doesn’t mean we should abandon all uses of such measurements.

The last paragraph is worth quoting in full:

All in all, cataloging the potential imperfections of a value-added measurement is simple, but so is cataloging the imperfections of the current system with limited performance incentives and inadequate evaluations of teachers and administrators. Potential problems certainly suggest that statistical estimates of quality based on student achievement in reading and mathematics should not constitute the sole component of any evaluation system. even so, the key policy question is whether value-added measures, despite short-comings, can provide valuable information to improve personnel decisions that currently rely on limited information about teacher effectivenessand often provide weak performance incentives to teachers and administrators. The case for objective measures is likely strongest in urban or rural areas where there is more limited competition among public and private schools. In such places, a hybrid approach to evaluation in which value-added measures constitute one of several components may have great promise.

[emphasis added]

Both comments and trackbacks are currently closed.
%d bloggers like this: