Assessing Ourselves to Death
I have two points to make. The first is something that I think everyone knows: Educational outcomes, such as graduation and test scores, are signals of or proxies for the traits that lead to success in life, not the cause of that success.
For example, it is well-documented that high school graduates earn more, on average, than non-graduates. Thus, one often hears arguments that increasing graduation rates will drastically improve students’ future prospects, and the performance of the economy overall. Well, not exactly.
The piece of paper, of course, only goes so far. Rather, the benefits of graduation arise because graduates are more likely to possess the skills – including the critical non-cognitive sort – that make people good employees (and, on a highly related note, because employers know that, and use credentials to screen applicants).
We could very easily increase the graduation rate by easing requirements, but this wouldn’t do much to help kids advance in the labor market. They might get a few more calls for interviews, but over the long haul, they’d still be at a tremendous disadvantage if they lacked the required skills and work habits.
Moreover, employers would quickly catch on, and adjust course accordingly. They’d stop relying as much on high school graduation to screen potential workers. This would not only deflate the economic value of a diploma, but high school completion would also become a less useful measure for policymakers and researchers.
This is, of course, one of the well-known risks of a high-stakes focus on metrics such as test scores. Test-based accountability presumes that tests can account for ability. We all know about what is sometimes called “Campbell’s Law,” and we’ve all heard the warnings and complaints about so-called “teaching to the test.” Some people take these arguments too far, while others are too casually dismissive. In general, though, the public (if not all policymakers) have a sense that test-based accountability can be a good thing so long as it is done correctly and doesn’t go too far.
Now, here’s my second point: I’m afraid we’ve gone too far.
I am not personally opposed to a healthy dose of test-based accountability. I believe that it has a useful role to play, both for measuring performance and for incentivizing improvement (and, of course, the use of testing data for research purposes is critical). I acknowledge that there’s no solid line and I realize that what I’m saying is not at alloriginal, but I’m at the point where I think we need to stop putting more and more faith in instruments that are not really designed to bear that burden.
One can often hear people say that test-based accountability won’t “work.” The reality, however, is that it probably will.
If we mold policy such that livelihoods depend on increasing scores, and we select and deselect people and institutions based on their ability to do so, then, over time, scores will most likely go up.
The question is what that will mean. A portion of this increase will reflect a concurrent improvement in useful skills and knowledge. But part of it will not (e.g., various forms of score inflation). To the degree the latter is the case, not only will it not help the students, but we will have more and more trouble knowing where we stand. Researchers will be less able to evaluate policies. We’ll end up celebrating and making decisions based on success that isn’t really success, and that’s worse than outright failure.
Obviously, this is all a matter of balancing the power of measurement and incentives against the risks. We most certainly should hold schools accountable for their results, and there are, at least at the moment, relatively few feasible alternatives to standardized tests. Furthermore, states have ways to keep track of tests’ validity, such as comparing them with the results of low-stakes tests, so we’re not quite flying blind here (though, even at this early stage, some of these comparisons are not exactly encouraging, and we sometimes seem unaware of what it means to have to resort to low-stakes tests to justify high-stakes test-based policies).
But think about what’s been happening – the big picture. Tests have been used for decision making for a long time, but, over the past decade or so, U.S. public schools have been held formally accountable for those outcomes. The pressure to boost scores is already very high – I would say too high in some places – but it’s now shifting into overdrive. More and more schools are being subject to closure, restructuring, reconstitution, and other high-stakes consequences based mostly on how their students’ test scores turn out. Several states are awarding grant money and cash bonuses using test results. Schools are receiving grades and ratings, and, just like their students, their futures depend on them.
In many places, the jobs and reputations of superintendents and principals rise and fall with scale scores and proficiency rates. Such increases are a necessary (though hopefully not sufficient) condition for being considered a success. Every year, the release of data makes headlines. Mayors run campaigns on them. Districts’ hire publicity experts to present results in the most favorable light.
Moreover, over just 2-3 short years, it has become the norm to evaluate teachers based to varying degrees on their students’ testing outcomes. Non-test measures are often deemed suitable based on their correlation with test measures. Teachers are the core of any education system, and we are increasingly moving toward hiring, payingand firing them using standardized tests (which, by the way, most of them don’t particularly trust).
New assessments – additional grades and subjects - are being designed largely for accountability purposes. There is a growing movement to hold teacher preparation programs accountable in part for the test-based productivity of their graduates. Websites and other resources are proliferating, allowing parents to choose schools (and eventeachers) using testing data. Districts hire high-priced consultants specifically to boost achievement outcomes. We are even experimenting with test-based incentives for students.
Any one of these developments, or a group of them, might very well be a good thing. As a whole, however, they show how, at every level of our system, we are increasingly allocating resources and picking winners and losers – people and institutions – based in whole or in part on scores. This is a fundamental change in the relationships and structure of U.S. schools.
(And, making things worse, the manner in which in which the data are used and/or interpreted is often inappropriate.)
Few if any other nations in the world have gone this far. That doesn’t make it wrong, but it does mean that we have little idea how this will turn out.
I suspect that our relentless, expanding focus on high-stakes testing has already eroded the connection between scores and future outcomes. Some of this erosion is inevitable and even tolerable, but the more it occurs, the less able we’ll be able to have any sense of what works or where we are. I think that research on this connection and how it is changing over time is among the most important areas in education policy today.
And I’m troubled by the possibility that, if we don’t pull back the reins, this research may eventually show that we pushed the pendulum to its ultimate breaking point and structured a huge portion of our education system around measures that were only useful in the first place because we didn’t use them so much.
- Matt Di Carlo
This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:
The views expressed by the blogger are not necessarily those of NEPC.