The Answer Sheet: ‘Well I’ll Be VAMned!’ Why Using Student Test Scores to Evaluate Teachers is a Sham
New York Gov. Andrew Cuomo. (EPA/Jason Szenes)
If by now you don’t know what VAM is, you should. It’s shorthand for value-added modeling (or value-added measurement), developed by economists as a way to determine how much “value” a teacher brings to a student’s standardized test score. These formulas are said by supporters to be able to factor out things such as a student’s intelligence, whether the student is hungry, sick or is subject to violence at home, or any other factor that could affect performance on a test beyond the teacher’s input. But assessment experts say that such formulas can’t really do that accurately and reliably. In fact, the American Statistical Association issued a report in 2014 on VAM and said: “VAMs are generally based on standardized test scores and do not directly measure potential teacher contributions toward other student outcomes.”
Still, the method has been adopted as part of teacher evaluations in most states — with support from the Obama administration — and used for high-stakes decisions about teachers’ jobs and pay. “Growth” scores also use test scores to evaluate teachers based on student test scores but don’t control for outside factors.
Use of student test scores to evaluate teachers has created some situations in schools that are, simply, ridiculous. In New York City, for an example, an art teacher explained in this post how he was evaluated on math standardized test scores and saw his evaluation rating drop from “effective” to “developing.” Why was an art teacher evaluated on math scores? There are only tests for math and literacy, so all teachers are in some way linked to the scores of those exams. (Really.) In Indian River County, Fla., an English Language Arts middle school teacher named Luke Flynt learned that his highest-scoring students hurt his evaluation because of the peculiarities of how he and his colleagues are assessed. (You can read about that here.)
Here’s a piece by educator Carol Burris showing, with data, how using “growth” scores to evaluate teachers in New York is something of a sham. Burris just retired after 15 years as principal of South Side High School in the Rockville Centre School District in New York. She was named New York’s 2013 High School Principal of the Year by the School Administrators Association of New York and the National Association of Secondary School Principals, and was tapped as the 2010 New York State Outstanding Educator by the School Administrators Association of New York State. She retired early, she said, to advocate for public education in new ways.
[The odd thing Arne Duncan told Congress]
By Carol Burris
Well I’ll be VAMned! Using growth scores to evaluate teachers is producing miracles in the state of New York!
Why, just look at teacher scores from Rochester, N.Y. In 2013, 26 percent of Rochester teachers got “ineffective” growth scores based on student performance on the state tests. Just one year later,it dropped to 4 percent! In the same year, the percent of teachers in Yonkers who got “ineffective” growth scores fell from 18 percent to 5 percent.
But wait. Where did all the “ineffective” teachers go? Looks like they are now hiding out in some of New York’s highest performing suburban districts. It has to be. The percentage of teachers with “ineffective” test growth scores in Scarsdale went from 0 percent to 19 percent, and it jumped from 0 percent to 13 percent in Roslyn on Long Island. Those “ineffectives” have even slipped into Jericho, New York, where 8 percent of those teachers are now “ineffective,” even though 81 percent of Jericho students were proficient on the Common Core math tests, far exceeding the state average of 35 percent.
Take that – -white, suburban moms. Remember when Education Secretary Arne Duncan told you that your schools aren’t quite as good as you thought?
My tongue-in-cheek account of New York teacher growth scores is not intended to imply that suburban teachers are better than urban teachers, nor that the reverse is true. And it is certainly not an argument that the teacher growth scores in 2013 were right, and the growth scores in 2014 were wrong. The above examples demonstrate the silliness of a system that produces such wild swings in ratings over the course of a year.
The New York “growth score” system (a modified VAM) is a closed model that sets each teacher against the rest. By design, it will produce about the same number of “ineffective” and “highly effective” teachers every year. All of the test scores in the state could dramatically improve, and there would still be the same percentage of “ineffective” teachers. And all of the state scores could precipitously fall and there would still be roughly the same percentage of “highly effective” teachers. The above examples simply illustrate how the deck chairs on the Titanic shift.
[Statisticians slam popular teacher evaluation method]
It is an impossible mission to create a valid and fair formula by which to rate teachers using student test scores. In both 2013 and 2014, the New York State Education Departmentincluded 20 different variables in the growth model—variables to account for factors such as poverty, special education and English language learning status. Should other factors be included, or should some be excluded? Are the included variables the right ones? Are there too many or too few? How heavily should one put one’s thumb on the scale to account for differences among students when rating their teachers? No one really knows.
Let’s look at more outcomes. Should we believe that there are, as a percentage, more than twice as many “ineffective” teachers in high-scoring Nassau and Westchester County schools than there are in New York City? There are also disparities in scores of “highly effective.” In 2014, 10 percent of the teachers in Brooklyn (Kings County) got “highly effective” state-generated growth scores. Not one teacher in Roslyn or Scarsdale did. Yet one year prior, 13 percent of Scarsdale teachers got “highly effective” scores from NYSED. Did they all stop doing their job?
This would be amusing except for the fact that there are real life consequences for teachers and principals, and thanks to Gov. Andrew Cuomo and the legislature, those consequences are now far worse. The recently passed APPR legislation gives test scores equal weight with observations in teacher and principal evaluations. Soon all teachers with “Ineffective” growth scores will be on an improvement plan or on the road to termination. If you are untenured, and you receive an “Ineffective” growth score in even one of your four probationary years, you cannot receive tenure. And that happens no matter how highly regarded you are by parents, students, colleagues or your boss.
This system will tear at the moral fabric of New York public schools. Teachers, principals and superintendents will struggle as they choose between making day to day sensible decisions in the best interest of children, and avoiding the negative consequences of VAM.
[How students with top test scores wound up hurting their teacher’s evaluation]
Think about a fourth-grade, veteran teacher in Rochester who just received an “ineffective” score. She knows that her yearly social studies project builds her students’ creative talents. She has watched learners become more skilled and confident presenters—especially the English language learners in her class. Parents tell her how excited their children are to come to school.
What does she do? Does she abandon it to make time for Common Core worksheets? Are hands-on activities replaced by reading informational text?
What do principals in places such as Scarsdale, Jericho and Roslyn do? Do they tell parents that all of the enrichment programs in the arts that their communities treasure need to go to make time for double math and ELA in an attempt to pump up already high scores? Do they abandon their high quality curriculum and replace it with modules from Engage New York?
And what of superintendents? Do they work with principals to move teachers, in whom they have every confidence, to non-tested grades in order to protect them from another possible “Ineffective” growth score? Do they encourage principals to manipulate class rosters to give that teacher the best chance?
Who will want to be a school leader when tenure, which is an invaluable tool for making difficult and often unpopular decisions, depends on a capricious growth score over which a principal has no real control? Will principals force teachers to engage in test-prep? Will they manipulate teacher grade level placements in the hope of improving their own score? Will they need to waste their time creating improvement plans for teachers who do not need them, resulting in less time for students and resentment among their faculty?
In case readers believe that the above questions are little more than fear-mongering and negative predictions, I will tell you that I have heard some New York educators actively discuss, consider and even engage in the above since the law became even worse.
Even if those growth scores were remotely accurate, you would not want to use them to evaluate teachers because of the inevitable consequences and changes in adult behavior that will result. And kids will always pay the price.
At the end of the legislative session, in what appeared to be a feeble attempt to undo some of the damage done by once again fiddling with teacher evaluation, the legislature inserted additional language into the budget bill. It requires that the state-provided growth score model “take into consideration” student characteristics such as poverty, disability, English language learning status and prior achievement. It is clear that the legislature and governor had no idea that all of these factors have been in the New York growth score model for years.
And yet, without even the slightest understanding of how VAM works, they have elevated its impact to 50 percent.
Perhaps they, or their assistants, might read this blog. And then they can explain to parents how any of this will improve their public schools and help their children learn.
This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:
The views expressed by the blogger are not necessarily those of NEPC.