Are Teacher Evaluations Education ‘Reform’s’ Biggest Bust?

Jeff Bryant

Would you like your job performance judged by a 5-year-old?

That’s a relevant question for public school teachers in Hawaii, where the state’s new teacher evaluation system attributes 10 percent of their job performance rating on what children as young as 5 years old think.

Although 10 percent may not seem like a whole lot, in a metric based evaluation system where harsh judgments of “effective” versus “needs improvement” can swing either way based on a point or two, 100 percent can be 100 percent of the reason for a bad grade.

But the child’s portion is not the sole problem Hawaiian teachers are having with their new evaluation system, which will ultimately affect their pay and can subject them to penalties as severe as termination.

As the article cited above reported, a recent survey conducted jointly by the state Department of Education and the teachers’ union found that “as many as four in five” teachers responding to the survey have problems with the new evaluations, ranging from “confusion … to skepticism about its fairness”.

Hawaii isn’t the only state having problems with new teacher evaluation systems that are being rolled out across the nation at the encouragement – others would contend, coercion – of the federal government.

According to Education Week, at least a dozen states have asked the U.S. Department of Education to allow them delays in rolling out new teacher evaluations systems. Two of those states, Maryland and North Carolina, received Race to the Top grants that committed them to erecting new teacher evaluation systems. The rest of the states pledged to implement new evaluation systems in order to receive waivers from the federal No Child Left Behind law.

In states that have had more success at implementing new teacher evaluations, the results have been decidedly underwhelming.

As Education Week reported last year, “In Michigan, 98 percent of teachers were rated effective or better under new teacher-evaluation systems recently put in place. In Florida, 97 percent of teachers were deemed effective or better. Principals in Tennessee judged 98 percent of teachers to be ‘at expectations’ or better last school year, while evaluators in Georgia gave good reviews to 94 percent of teachers taking part in a pilot evaluation program.”

Indiana‘s new evaluation program found, “88 percent of teachers and administrators were rated as either effective or highly effective under the system; only about 2 percent need improvement, and less than a half a percent were deemed ineffective.”

In many of these states where supposedly under-performing teachers have been spotted, there are numerous anecdotes that the labeling has been either highly questionable or blatantly mistaken. Teachers in Florida, for instance, have their performance rated using the test scores of students they’ve never even taught. Really!

More recently, in Washington state, the stakes over new teacher evaluations ratcheted up further when legislators refused to commit the state to new evaluations, which could result in the state losing control of over roughly $38 million in federal funds for schools serving low-income students.

And just this week, a key underpinning to the whole teacher evaluation program pushed by the Obama administration was cast into doubt. As Education Week’s Stephen Sawchuk again reported, the American Statistical Association, “the world’s largest community of statisticians,” examined the practice of basing teachers’ performance evaluations on students’ standardized test scores – a key criterion for getting Race to the Top money or an NCLB waiver – and warned against this approach.

All this controversy over teacher evaluations could be just a grand debate among academicians and policy wonks if it weren’t for the fact that these new schemes are doing great harm to teachers and, consequentially, the students in their charge.

Further, results from these questionable measures are being used to construct all sorts of new mandates, building an even more imposing policy edifice on a foundation of sand. While in the meantime, the only response from those in charge has been to “stay the course.”

How We Got Here

The idea of basing the fate of teachers, even whole schools, on how students perform on standardized tests started as a theory cooked up by what University of Texas education professor Julian Vasquez-Heilig has called “a motley alliance” between civil rights proponents and politically influential foundations and organizations who advocate for more private sector control of public education.

Civil rights proponents have long maintained, correctly, that schools that serve the most disadvantaged children have tended to have the least qualified teachers, as measured by preparation, experience, and other factors. So in the interest of “equity,” these advocates insist that in order for states to receive grant money from big ED, or have certain federal laws waived, they must impose teacher evaluation systems that use standardized test scores to evaluate teacher “effectiveness” and then distribute the “most effective” teachers more widely to underserved schools.

The goal of privatization proponents, on the other hand, is to reduce the role of the state in public education and shift the education system toward being a profit-making enterprise. Labeling teachers “effective” or “ineffective” serves their purposes because when they can define the act of teaching as an “output” – a rating based on student test scores – they can use that as leverage to pry teachers’ unions away from their influence and argue for giving pay raises and merit rewards to smaller pools of only the most “effective” teachers.

Any attempt to ease the impractical and often erroneous mandate to evaluate teachers based on student test scores is met with strong opposition from these two seemingly incompatible parties.

With the backing of what appears to be a “bipartisan” constituency, lawmakers and policy wonks line up to support these evaluation systems, even though the technical aspects are largely unresolved.

Reducing Teaching To A Math Problem

Reflecting on the new ASA study referenced above, education journalist Valerie Strauss wrote on her blog at The Washington Post that current teacher evaluation methods of evaluating teachers “purport to be able to take student standardized test scores and measure the ‘value’ a teacher adds to student learning through complicated formulas,” but “these formulas can’t actually do this with sufficient reliability and validity.”

As Rutgers professor Bruce Baker has stated on his School Finance 101 blog, “different choices of statistical model or method for estimating teacher ‘effect’ on test score growth … even subtle changes … can significantly change individual teacher’s ratings and significantly reshuffle teachers across rating categories.”

Does that sound like a valid and reliable evaluation to you?

Yet in the meantime, politicians and policy makers are advocating for these evaluations – measures that they by-and-large do not understand, as Kevin Welner of the National Education Policy Center recently observed. Writing on the blog site of education historian Diane Ravitch, Welner explained, “The math is just too complex … vectors capturing the effect of lagged scores, mathematical descriptions of Bayesian estimates, and within-student covariance matrices … has the obvious effect of placing policy makers at the mercy of whichever experts they choose to listen to.”

Welner concluded, “We should, at the very least, recognize and acknowledge the reality that these policies are being adopted by policy makers who pretty much have no clue what it is that they’re putting in place.”

Consequently, the results on the ground rarely resemble the neat and clean explanations given to lawmakers by the “experts.” In Connecticut, for instance, the new teacher evaluation system has resulted in something “conceptually appealing,” according to David Title, a superintendent of schools in that state, but “very difficult to do technically … There are so many different variables that impact student achievement … What you’re not able to do, in my view, is prove cause and effect.”

The Impact On Teachers

With so little understanding of the technical difficulties with teacher evaluations, it’s even more important to consider, as Welner argued, “the non-technical evidence” of teacher evaluations.

Indeed many of the non-technical results of new teacher evaluations should make their use questionable to anyone with an ounce of common sense.

In Hawaii, teachers are understandably concerned that “children as young as 5 years old who evaluate them will put ‘thought and effort into their answers.’” The data input for the system “obliges teachers to prioritize testing over more constructive forms of teaching,” according to one veteran teacher. And more time that teachers spent on instruction has been diverted to filling out paperwork.

In Connecticut, school principals have to complete 17 reports throughout the year for every single teacher in their school, and teachers have to spend hours completing evaluation goals, student learning targets, and observation records – all of which take time from educating students.

In Rochester, N.Y., teachers are suing the state because the new evaluations don’t take into account any student factors such as poverty levels and number of absences.

New Jersey teacher-blogger Jersey Jazzman recently observed that the proposed evaluation system for his state “relies on Student Growth Percentiles (SGPs), based on standardized tests, to evaluate teachers. Yet the very man who is the inventor of SGPs has said that they cannot be used to determine a teacher’s effect on student learning!” (emphasis original)

As Diane Ravitch observed in one of her numerous blog posts about faulty teacher evaluations, “My government spent billions to find teachers to fire, and all we got was confusion.”

The Benefits Of All This?

Meanwhile, no one advocating for test-based teacher evaluations can be satisfied.

Those who advocate for teacher evaluations that result in distribution of more high-quality teachers to schools serving low-income minority students don’t have much to celebrate. Tennessee, for instance, has had a teacher evaluation system based on student test scores in place longer than most other states. When that state outpaced the rest of the nation in gains on the most recent National Assessment of Education Progress, some credited it to the state’s teacher evaluation system, among other “reforms.”

However, astute blogger Audrey Amrein-Beardsley, who devotes her site to the issue of teacher evaluations, looked at Tennessee’s accomplishment more closely and found the state has an expanding achievement gap. “The state’s lowest socioeconomic students continue to perform poorly on the test.” Tennessee didn’t make gains significantly different from many other states. And “other states with similar accountability instruments and policies (e.g., Colorado, Louisiana) did not make similar gains, while states without such instruments and policies (e.g., Kentucky, Iowa, Washington) did make similar gains.”

Teachers in Tennessee, in fact, have filed two lawsuits against the state for its unfair evaluation scheme.

For those who want to see a greater role for the private sector in public education, the results are frustrating, too. Writing at the conservative journal National Affairs, Frederick Hess of the American Enterprise Institute recently blamed this colossal policy failure on “getting public employees to actually do what policymakers think they’ve told them to do.”

Insisting, “the right response to these disappointing trends is certainly not to abandon the reform agenda,” Hess declared, given his observation that “schools and districts do not go out of business,” a need for “a complete reform movement” that is “willing and able to rethink old norms.”

Certainly, one of those “old norms” is that the quality of teachers matters a great deal to our students. But it makes no sense to believe we’ll get better quality teachers by treating them this badly.

Even a 5-year-old could see that.

Comments