Yes, we know: kids “don’t know much about history” in the words of the immortal Sam Cooke. The weak NAEP results are basically unchanged since this test began – ironically, under Diane Ravitch. There might be another irony here: are the poor results perhaps related to history teachers talking too much – the question I raised that generated more lengthy and thoughtful responses than any other post?
What caught my eye, though, was one released item on the test:
I am not confident we can infer that results on this question correlate with understanding of the 60s. As I noted in my series on the validity of questions and tasks, validity is tricky because “face validity” can mislead. However, it strikes me as likely that an 8th grader in 2015 could know a fair amount about 60s history and still get this wrong (which might call its validity as a question into doubt). All but the first answer are plausible responses IF you do not know this bit of song lyric (and there is no reason why you should).
Knowledge of the lyric is not what is being tested, of course: the challenge for the test-taker is to know, generally, that spirituals and mass singing were common to the Civil Rights era. Yet, there is nothing specific enough in the lyric to link it easily to only that one choice. So, the item seems questionable to me – especially since we are talking 8th-graders here, not 11th graders.
So, I tweeted out my concern. Well, this is the beauty of the Internet: within an hour there were dozens upon dozens of responses, in a lively dialogue.
Let’s continue the dialogue here, shall we? Is this a sound test question or not?**
PS: I have more faith in NAEP than some of its critics. And many of the released items seemed just fine to me, for example:
But this is why transparency in testing is so important. All tests should be released after having been given, as I have long argued and recently re-argued. Otherwise, we cannot have faith in the results – or, demand better questions.
** I am fully aware that validity in a test is not actually measured this way. Individual questions and their results have to be correlated with many other results in order to determine if the question is a good one. In other words, what matters is not whether 44% of students got this question right, but which 44% got it right. If the students who got this right were also the same students who got other hard questions right, then validity can be established. Technically, the validity of the question is threatened only if the less able students got this one right and the strong students got it wrong.
This is why the famous pineapple test question in NY ELA could be a valid question even if a majority of kids were totally befuddled by it.