If you agree that the track we are going down on high-stakes one-shot testing of every student in terms of Common Core is unproductive and unsustainable, I have a modest proposal to make about how to ditch the tests but move common core standards forward.
Let’s use matrix sampling in national testing, as NAEP has always done it and as CAP in California used to do it. Matrix sampling means that no student sees all or even most of the questions, and that individual student scores need not be reported (or, if they are reported, they are less reliable than school results). That way, building-level and district-level results would be the focus, as it arguably should be. And the test could then use many more tasks and types of tasks, distributed over many students, to give us valid and reliable data on all the Standards that we cannot now make happen, due to time constraints and the need for individual scores to be precise and comparable. And, last but not least, the test for the individual student could be short.
This approach would also allow for teacher accountability to head back where it belongs: a local decision, on both criteria and policy. And it would thereby rid us of some of the current ridiculous schemes that require such practices as the music teacher being partially evaluated using school English test scores.
The further benefit of this approach is that we could then couple it with a policy requirement that all schools and districts demonstrate that local assessment is of high quality: calibrated to national standards, and that there are policies and practices in place to ensure quality control in local assessment. Because despite all the work in standards over 25 years, most local assessment systems are still neither valid nor rigorous, as I have learned the hard way in working with hundreds of schools on assessment.
Yes, I know: some people insist on having individual student scores for various reasons (incentive for students and teachers, data, reporting to parents, etc.) Yet, through well-known psychometric practice (Item Response Theory, or IRT), we can approximate student scores with SUFFICIENT reliability for the context of the assessment.
Here is the NAEP account of how this works:
To ensure that the item pool covered broad areas, the booklets were assembled using a variation of matrix sampling called Balanced Incomplete Block (BIB) spiraling. Like matrix sampling, BIB spiraling presents each item to a substantial number of students but also ensures that each pairing of items is presented to some students. The result was that the correlation between any pair of items could be computed, albeit with a smaller number of students than responded to a single item.
The major design feature in 1983 was scaling the assessment data using Item Response Theory (IRT). At that time, IRT was used mainly to estimate scores for individual students on tests with many items. IRT was fundamental to summarizing data in a meaningful way. Basically, IRT is an alternative to computing the percent of items answered correctly. Given its assumptions, IRT allowed the placing of results for students given different booklets on a common scale.
A “balanced incomplete block (BIB) spiraling” design ensures that students receive different interlocking sections of the assessment, enabling NAEP to check for any unusual interactions that may occur between different samples of students and different sets of assessment questions. This procedure assigns blocks of questions in a manner that “balanced” the positioning of blocks across booklets and “balanced” the pairing of blocks within booklets according to content. The booklets are “incomplete” because not all blocks are matched to all other blocks. The “spiraling” aspect of this procedure cycles the booklets for administration so that, typically, only a few students in any assessment session receive the same booklet (Messick, Beaton, and Lord 1983).
Matrix sampling with IRT scores and local assessment quality control policy is win-win: lowers the stakes that lead to test prep, makes teacher accountability more valid and owned; and ensures that educators locally take firm hold of the problem of unreliable and invalid local assessments.
A bit complex and with some compromise – but it HAS to be better than the current path…