The title of this post refers to the title of an article I wrote twenty years ago: The Immorality of Test Security. It is basically immoral to hold people accountable for improved results on tests that are so secure that teachers aren’t even allowed, in some cases, to see them. And as more states back off releasing tests and allowing teachers to score them, it seems timely to revisit the argument. (See this and this article on the changes to make the Regents exam no longer teacher scored).

As I have long written, I have no problem with the state doing a once-per-year audit of performance. But what far too many policy-makers and measurement wonks fail to understand is that if the core purpose of the test is to improve performance, not just audit it, then most test security undercuts the purpose. Look, I get the point of security: you can get at understanding far more easily and efficiently (hence, cheaply) if the student does not know the specific question that is coming; I’m ok with that. But complete test security after the fact serves only the test-makers: they get to re-use items (and do so with little oversight), and they make the entire test more of a superficial dipstick, using proxies for real work, than a genuine test of transparent and worthy performance.

To claim that such an audit is able to improve student (and teacher) performance over time is thus harmful nonsense. You don’t have to be anti-accountability – I am not – to see the illogic here. How can secrecy advance performance? Foolish test prep is just one obvious bad consequence of the policy: people mimic the format of the test instead of its rigor, in their ignorance as to what lies behind the curtain, and thus make matters even worse.

To grasp the harm of security after the fact, imagine complete test security in music: imagine if the state music test required students to play pieces of music unknown to the tester prior to the test. Now, imagine that the young musicians cannot hear themselves play as they perform for the test (i.e. they can’t really know how they are doing as they perform). Now, imagine that the results come back months later via an abstract “item analysis” completely divorced from the specific musical passages. Who could possibly improve under these conditions, be they the student or the teacher? Who could have faith in the validity of such a test? Again, the test may succeed as a quick and dirty audit but it utterly fails as a feedback and improvement system.

I find it sad that Massachusetts is backing off its longstanding practice of releasing the entire MCAS test right after it is given. As I have written and said many times, MCAS has been a model for how to do large-scale testing right. Not only have all the tests been released for over a decade, but the item analysis for each question is extraordinarily useful (go here). By seeing how often students get items wrong that require inference and transfer you gain more faith in the test – and you realize that mere “coverage” is poor preparation for the test. Now, consider: is it a mere coincidence that Massachusetts as been the top-performing state for the last few years, as judged by NAEP results? I think not.

Security involves not only the items. When writing is scored by the state or company we deprive all teachers the opportunity to understand what counts as performance to standards. That’s why it is vital that large-scale assessment involve teachers in scoring student work. As anyone in the AP or IB world knows, the collective scoring of work is as interesting and informative as any professional development can be. Check out my daughter’s great blog post on the IB as a model to emulate based on her experience as an IB teacher and now reader.

Such collective scoring also has the desirable effect of making teachers more sensitive to the problem of teacher inconsistency in grading. Indeed, in IB and in Canada and Great Britain, teachers are required to get together to “moderate” their judgments via scoring the assessments, i.e. learn what the prevailing norms/standards are in scoring and use that information to adjust their own personal scoring/grading in the future, accordingly. This is a sorely needed solution to the problem of worthless report cards in a standards-based world, as I have written.

In short, don’t conflate audits with feedback systems. Nothing in the new assessments will likely improve performance, no matter how much better the items, if teachers and students are prevented from learning from their specific successes and weaknesses. We must fight to ensure that teachers play a role in scoring work and in having access to the tests after they are given. As far as I know, this issue has been unaddressed by the 2 testing consortia. (Can readers confirm or refute this?) I encourage all readers to pressure them and their own state department of education on the issue.