What’s the point of standardized tests?

[O]n Wednesday, Barbara Byrd-Bennet, the [Chicago] district CEO, said that she still has big concerns about the [PARCC] test and doesn’t want to administer it to students this spring:

“The purpose of standardized assessments is to inform instruction. At present, too many questions remain about PARCC to know how this new test provides more for teachers, students, parents, and principals than we are already providing through our current assessment strategies.” [from a recent Washington Post article]


No, alas. The purpose of large-scale standardized tests is to audit performance, not be an authentic, transparent, and therefore useful assessment of key goals. Nor, by the way, do current district tests in Chicago or most anywhere else meet her standard. How can any test inform instruction if it stays secure?

This is dismaying to realize but it has always been thus. Large external tests – including most district and many school tests – are almost never designed to be exemplary feedback. How could they be, if the test remains secure and if the results come back weeks/months later in only a general form? In addition, by relying on proxies for authentic work – because that is what a multiple-choice item is, a proxy – the feedback, such as it is, is practically useless.

Thought experiment. To see this oddity more clearly, imagine if we tested sports teams the way we test students now. Imagine if league championships in soccer or basketball were not decided by “local” games but by district, state and national tests on secure drill-like “items.” And now imagine if the test gave you no feedback as you shot, i.e. you didn’t know whether or not each shot – yours or other teams’ – did or didn’t go in. Imagine waiting weeks – ‘til the middle of the next sports’ season – for the “official results” and yet having no access to either the tasks or scoring methods used to support those results. Who would learn to be a better ballplayer (or coach) under such a system?

So, it is wrong to claim that mass secure testing provides helpful feedback and accountability. Not when the whole system is a bunch of secrets. (cf. the pending lawsuit against New York State, brought by a teacher whose accountability score cannot be challenged or analyzed; this defies common sense fairness.)

One clear solution: complete openness. A pedagogically sensible solution is to release all tests and the item analysis after the tests are given, so that teachers and students can learn from them. This is what Florida used to do in a clear and elegant way in its old FCAT and which Massachusetts still does (though less often). Here is an example of the helpful data provided for each test question from the old FCAT:

author purpose fcat

The Massachusetts MCAS remains the most open state testing system. You can go to the MCAS site and see results from the last decade (and thus use their tests and analysis for your own purposes locally today):

MCAS gr 4 alg equation d

Ohio offers something more thorough in its K-8 testing (even though few items are released). Look at all this useful information:

Ohio Item

Great, you say! No. Because Florida long ago stopped releasing such information on its tests and Massachusetts only releases a few items now, like Ohio.

The culprit? Cost. It is very expensive to release all the test items. And the situation has gotten far worse in recent years: test companies strike deals with states to protect their intellectual property and demand that few items be released to protect it. And state departments of education let them get away with it.

So, any realistic hope of making test results formative is disappearing, to the harm of learning.

An old lament. Here’s what bugs me. Many of us made this argument 20-25 years ago. (Here is my piece from 20 years ago: immorality-of-test-security-wiggins.)  The limited value of secret one-shot standardized tests as feedback has been known for decades. They may be acceptable as low-stake audits; they are wretched as feedback mechanisms and as high-stakes audits. Why don’t audits work when they are high-stakes tests (unlike, say NAEP or PISA)? Because then everyone tries to “game” them through test prep. This inevitability was discussed by George Madaus and others 40 years ago.

Openness is everything in a democracy. Without such openness, what difference does it make if the PARCC or SB questions offer better tests – if we still do not know what the specific question by question results are? There can be no value or confidence in an assessment system in which all the key information remains a secret. Indeed, in some states you can be fired for looking at the test as a teacher!

PARCC or no PARCC, educators and educational associations should demand that any high-stakes test be released after it is given, supported by the kind of item analysis noted above. We don’t need merely better test questions; we need better feedback from all tests. Fairness as well as educational improvement demand it. And PS: the same is true for district tests.


PS: A few people have written asking me why I am calling large-scale M. C. tests “audits” of performance. They are audits in the same way your business is audited or in the same way once per year you go for the physical at your doctor’s office. Neither the audit nor the physical is the true measure; they are efficient indirect indicators or proxies for the real thing. Your aim is not to get good at the physical or the audit. It’s the other way around: if you or your business is “healthy” it shows up on the proxy test. That’s why test prep as the only strategy locally is dumb: that’s like practicing all year for the doctor’s physical; it confuses cause with effect. The point is to for local assessment to be so rigorous and challenging locally that the kids easily pass the state “physical” once per year.