Numbers Games: The Great SAT and Grade Inflation Scams
Recently, I discussed some guidelines for navigating claims about education evidence to inform education reform, specifically how research is framed in those claims. In the weeks following this piece, hand-wringing about dropping SAT scores have been joined by charges of grade inflation within teacher education—both of which highlight that we remain trapped innumbers games that insure we are all destined to lose in the long run.
First, Rick Hess at his Education Week blog, Straight Up, offered three consecutive posts [HERE, HERE, and HERE] claiming that teacher education is a failure based on his charges of grade inflation:
Claims of grade inflation have existed for a century (or more), leading me to wonder when the golden age of deflated grades existed, but parallel to Hess’s charges came the perennial concerns about SAT scores, ironically well represented by E. D. Hirsch, in the New York Times, misreading the dropping SAT verbal scores:
“This is very worrisome, because the best single measure of the overall quality of our primary and secondary schools is the average verbal score of 17-year-olds. This score correlates with the ability to learn new things readily, to communicate with others and to hold down a job. It also predicts future income.”
While I agree with Hess about the need to examine the quality of an education degree and with Hirsch about the negative consequences of the accountability era, I cannot accept the misuse of data to reach those valid conclusions.
Grade Inflation, Really?
The grade inflation claim—whether is it used to bash further teacher education and teacher quality (as Hess does) or to suggest once again that education is in a perpetual state of decline (as has been the case for over 150 years)—offers another opportunity to examine closely claims made about education, the agendas behind those claims, and strategies for culling valuable conclusions out of the entire process.
First, concerning grade inflation, a point of logic must be confronted. If we genuinely have been experiencing grade inflation (which suggests that educators offer grades above what students deserve—either out of negligence, ineptitude, or some sort of brazen and cavalier attitude) and standardized testing such as the SAT is an objective, and thus not inflated, reflection of student ability (although we tend to believe all tests are about “achievement”), how do we explain that GPA remains a better predictor of student success than the SAT?
Both claims of grade inflation and of the objective nature of standardized testing are misleading and far more complicated than we suggest.
Further, claims of grade inflation are framed inside as assumption that grades have a static meaning and purpose among teachers and students. For example, if grades are intended to label and sort students, this is a far different purpose than grades being a mechanism for supporting better teaching and deeper learning. Teachers who use grades to label and sort assign “A’s” that are far different than the “A’s” under teachers who view assessment as a subset of teaching and learning. (I, for example, require and allow student revision in a pursuit of student growth, rejecting the use of averaging as a means to calculate grades—a far different philosophy of grades than someone seeking to label and sort a class of students; I guarantee my students' GPAs will be higher, but more authentic, than students in a traditional classroom.)
To be clear, we do not (and cannot) have a solid baseline of data to make any pronouncement about grade inflation because grades are used in a wide variety of ways and within a wide range of philosophical and statistical norms.
For a brief example, in order to determine if grades are inflated, we would all have to start with a tenuous premise—the bell curve—resulting in (for simplicity) something like 10% As, 20% Bs, 40% Cs, 20% Ds, and 10% Fs for a normal distribution of students. Even if this is predictive (and I find that hard to accept), we never have classrooms that are normal distributions of students (we may have all As in a class or all Ds—who knows?), but even if we did, this is the problematic thing about both embracing the possibility of a normal distribution of grades and then raising concerns about inflated grades: If we use grades to rank and sort, and then cull who proceeds in the education process, don’t we have to end with students making all As? (If not, that is, if we force unique populations into a bell curve, then we are contradicting the very premise we start with).
Now, let’s look at how Hess makes charges specifically against education colleges/departments—by comparing among all elite students (those culled and allowed to move on to higher education; thus, seemingly comparing like populations, as in education majors compared with chemistry majors).
Hess’s charge of grade inflation in education majors begins with the unsubstantiated (and unspoken) assumption that there exists some norm of grades against which education GPAs can be compared. Why are those lower grades deemed “accurate,” but the education GPAs are “inflated”? The corrosive and easily manipulated assumption that lower is better, harder (in the warped language many use, more “rigorous”).
With my comments above about the bell curve in mind, consider this: Many content areas and departments—committed to grades as a device to label and sort students (in order to “weed out” weaker students from their department and field)—force their introduction courses onto a bell curve (yes, shaping a uniquely elite population into a curve that reflects, in theory, a normal distribution of students). These actions, directly deflating grades, is never challenged, discussed, or examined, especially while we are charging other areas with grade inflation.
In short, claims of grade inflation are almost always ideology masked by a numbers game that depends on assumptions about the purposes of grades, the nature of grades, and relatively warped views of teaching and learning. (GPAs by discipline are likely a greater reflection of how that field views assessment than it is any evaluative reflection of the inherent nature of the so-called rigor of that field.) It is just as possible, if not likely, that many fields are practicing grade deflation as it is that education degrees are reflecting grade inflation—though I suspect that determining either is both impossible and a waste of energy.
SAT: What It Is Good For? Absolutely Nothing
Hirsch’s comment above about the SAT, once again, triggers and reflects significant errors in what we say about education data and the conclusions we draw from that data.
We appear to have a pathological obsession with low SAT scores—never considering an important rule of thumb for test data: Never use test data for purposes other than those for which the instrument was designed (See Bracey).
While Hirsch claims that the SAT verbal score provides important data for, apparently, everything (learning, communication, jobs, income), he fails to note that this test has only one purpose, to predict freshman college success (which, again, as I noted above, it does less well than simple GPA). That we can use SAT scores to label, sort, and rank does not justify that we do such, specifically since the College Board itself warns against just that.
Further, the cries over lower scores ignores the exact dynamics of statistics that those enamored with numbers claim to trust (see my above concerns about both embracing and distorting the bell curve theory): The plummeting SAT scores—always more closely correlated with out-of-school factors than anything else—have declined as they should have over the past seven decades because the population of test takers has shifted from an elite population toward the normal distribution. If SAT scores were to rise while the population shifted toward the norm, then this would be a sign that the data are corrupted (likely by SAT-prep strategies or inherent flaws in the test itself, such as links to socio-economic status of the test takersand remaining biases related to race and gender [1]).
Is Hirsch correct in charges against the negative impact the accountability movement has had on teaching and learning quality in the U.S.?:
“In the decades before the Great Verbal Decline, a content-rich elementary school experience evolved into a content-light, skills-based, test-centered approach.”
Yes, he is, but his use of the SAT isn’t a valid avenue for that claim—it simply is a convenient way to make that claim because it triggers misconceptions common among the public. (The College Board itself recognized the statistical trap of the growing populations taking the SAT, and the impending public relations nightmare—leading to a re-centering of the SAT in the mid-1990s—and the fact of scores dropping.)
Claims of grade inflation and plummeting SAT scores make for easy ideological rants, and even some credible charges against our schools and suggestions for reform, but more often than not, the actual basis for those claims themselves are either flawed, overly simplistic, or self-contradictory.
I suspect we can all do better than that, unless we are more enamored with the numbers games themselves than creating the education all children deserve.
References
Santelices, M. V., & Wilson, M. (2010, Spring). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80(1), 106-133.
Spelke, E. S. (2005, December). Sex differences in intrinsic aptitude for mathematics and science? American Psychologist, 60(9),950-958.
This blog post has been shared by permission from the author.
Readers wishing to comment on the content are encouraged to do so via the link to the original post.
Find the original post here:
The views expressed by the blogger are not necessarily those of NEPC.