Principals continue to rate nearly all teachers as 鈥渆ffective,鈥 despite states鈥 efforts in recent years to make evaluations tougher, two new studies show.
And there鈥檚 good evidence that those scores are inflated: When principals are asked their opinions of teachers in confidence and with no stakes attached, they鈥檙e much more likely to give harsh ratings, the researchers found.
That鈥檚 in part because principals want to maintain good relationships with their teachers, which can be tough to do when they have to confront them with bad reviews, the researchers say. For some principals, though, the hesitation to give low scores is a product of being strapped for time.
鈥淚t鈥檚 very, very time-consuming to document poor performance,鈥 said Marilyn Boerke, a former principal who is the director of talent development for the Camas school district in Washington state. 鈥淎t the end of the year, if you haven鈥檛 repeatedly gone into the classroom and given the teacher suggestions for improvements, it鈥檚 not really fair to give a poor evaluation.鈥
In 2009, TNTP (formerly the New Teacher Project) published a striking report, 鈥The Widget Effect,鈥 which found that less than 1 percent of teachers were being rated as unsatisfactory. Since then, many states have worked to put more-rigorous evaluation systems in place, including by incorporating student test scores.
But according to the pair of new studies, little has changed. On formal district evaluations, nearly all teachers continue to be deemed effective.
鈥淲e鈥檝e invested a lot in making these systems rigorous, and yet they still seem to identify the vast majority of teachers as effective, especially when you look at the observation ratings from principals,鈥 said Jason Grissom, an associate professor of public policy and education at Vanderbilt University, who co-authored the study with Susanna Loeb, an education professor at Stanford University.
鈥楽omebody鈥檚 Job Is in Your Hands鈥
That study, published recently in the journal Education Finance and Policy, analyzed how 100 principals from Miami-Dade County public schools rated the same teachers in two different settings: a confidential one-on-one with the researchers and the formal district evaluation.
On district evaluations, which could have consequences for compensation and employment, nearly every teacher was rated as 鈥渆ffective鈥 or 鈥渧ery effective鈥 on all the standards measured. In the confidential setting, the scores were still positive overall, but principals were much more likely to give low ratings.
In fact, the teachers who received scores of 鈥渧ery ineffective鈥 on the low-stakes assessment, on average were deemed 鈥渆ffective鈥 on the high-stakes evaluation.
鈥淭he stakes here are really important,鈥 said Grissom. 鈥淲hen they talk to the researchers, there are no stakes attached鈥攚e鈥檙e not going to do anything, it doesn鈥檛 count for anything.鈥 It makes sense a principal would in that case give 鈥渁 true assessment,鈥 he said.
The tendency to be more lenient on a district evaluation is understandable, said Jennifer E. Nauman, the principal at Shields Elementary School in Lewes, Del. 鈥淪omebody鈥檚 job is in your hands,鈥 she said. 鈥淭he rubric is very subjective.鈥
Another study, to be published soon in Educational Researcher, also found a disconnect between what principals said about their teachers privately and in a formal review.
The researchers, Matthew Kraft, an assistant professor of education and economics at Brown University, and Allison Gilmour, now an assistant professor of special education at Temple University, surveyed more than 200 principals in a large urban district in the Northeast. Again, evaluators identified far more teachers as weak in a confidential survey than they did on the formal district evaluations.
For instance, the 2014-15 data show that evaluators perceived 19 percent of teachers as below proficient鈥攂ut they rated only about 6 percent of teachers that way on the district assessment.
Kraft and Gilmour鈥檚 study also looked broadly at teacher ratings in 24 states that had overhauled their evaluation systems.
Nearly all teachers in most of those states continued to get positive ratings. Hawaii was the least likely to designate teachers as ineffective or needing improvement.
But New Mexico was an outlier. There, about 1 in 4 teachers were rated as either minimally effective or ineffective, the state鈥檚 two lowest categories.
While nearly every other state had less than 1 percent of teachers in the ineffective category, New Mexico had 5 percent in that lowest designation.
But how long New Mexico retains its outlier status remains to be seen. Teachers there have fiercely pushed back on the stringent evaluation policies, which have been dubbed the toughest in the country. And the governor recently announced the state would be making major changes to the system.
A Matter of Time
So what鈥檚 behind these almost universally high ratings from principals? Some say it鈥檚 the need for positive relationships with their staffs.
With the district evaluations, 鈥渢eachers know what the rating is,鈥 Grissom said. 鈥淚n many systems, that involves a postconference. If I gave you low ratings, that would be very uncomfortable for me to talk to you about. 鈥 We have to take seriously the fact that teacher evaluation is a relational enterprise.鈥
In interviews for the Kraft and Gilmour study, principals talked about personal discomfort as well. One veteran principal is quoted in the report as saying, 鈥淭he most difficult part of the job is probably to deliver those difficult messages, and not everyone is capable of that.鈥
But other principals not involved in the studies push back on that notion.
鈥淭hose are challenging conversations, and you don鈥檛 want to hurt someone鈥檚 feelings,鈥 said Boerke. 鈥淏ut the principals I know do not shy away from those conversations.鈥
Dwayne Young, who was an administrator in Fairfax County, Va., for 17 years before recently retiring, said giving honest feedback isn鈥檛 hard for administrators鈥攂ut assessing the complex process of teaching can be.
鈥淧rincipals do strive to have great relationships,鈥 he said. 鈥淏ut I don鈥檛 think they would not evaluate someone according to what they believe to be really good instruction.鈥
Concerns about teacher turnover can also lead to high ratings, some say.
鈥淚t would be a rational response for a principal to think, if I give this person a low score, they might get angry and leave my school,鈥 said Grissom, 鈥渙r they might be dismissed, and then I have to replace this person, and I might be facing a hiring pool that doesn鈥檛 look appreciatively better than the teacher who would leave.鈥
Among the largest factors, though, many say, is time.
鈥淲e鈥檙e spread so thin as administrators,鈥 said Boerke of the Camas school district. 鈥淲hen all鈥檚 said and done and it鈥檚 June and you鈥檙e responsible for submitting 32 evaluations, you鈥檇 err on the side of effective if you don鈥檛 have the documentation to prove ineffective.鈥
Interestingly, a closer look at the scores given in the high-stakes evaluations showed that principals actually were differentiating between teachers. They were just doing so within the 鈥渆ffective鈥 categories.
Even though nearly all teachers got 3s and 4s (on a 4-point scale), which labeled them 鈥渆ffective,鈥 the 3s seemed to be going to the weaker teachers, Grissom and Loeb found. And teachers with the lower evaluation scores also had lower value-added measures鈥攚hich aim to determine how well a teacher is doing using student test scores.
鈥淭here is a difference between a teacher rating of effective and highly effective,鈥 said Grissom. 鈥淚t鈥檚 just not the level of differentiation that when these systems rolled out people thought they would see.鈥