My reply to Willingham, Part 2

In part 1 of my reply to Willingham’s article on reading comprehension strategies published recently in the Washington Post, I took issue with his reasoning and analogies. In Part 2 of my response to Willingham, let me get right to the evidence. I propose that he has quoted very selectively and drawn questionable conclusions from the research he does cite.

In his Washington Post article, he doesn’t cite research specifically; rather, he refers us to a paper he co-authored with Lovette, published in the Teachers College Record. Oddly, that article is almost identical to the Post article. The only difference is that he provides a paragraph of citations to support the claims he wishes to make on duration of intervention related to the strategies. And he changes the baseball analogy to golf.

Here is the key paragraph from his Washington Post article:

Gail Lovette and I (2014) found three quantitative reviews of RCS instruction in typically developing children and five reviews of studies of at-risk children or those with reading disabilities. All eight reviews reported that RCS instruction boosted reading comprehension, but NONE reported that practice of such instruction yielded further benefit.

Here are the two related paragraphs from the TC Record article; note the different, less sweeping conclusion:

RCS instruction has a serious limitation. Its success is not due to the slow‐but‐steady improvement of comprehension skills, but rather to the learning of a bag of tricks. The strategies are helpful but they are quickly learned and don’t require a lot of practice.

And there is actually plenty of data showing that extended practice of RCS instruction yields no benefit compared to briefer review. We know of eight quantitative reviews of RCS instruction, some summarizing studies of typically developing children (Fukkink & de Glopper, 1998; Rosenshine, Meister, & Chapman, 1996; Rosenshine & Meister, 1994) and some summarizing studies of at‐risk children or those identified with a learning disability (Berkeley, Scruggs, & Mastropieri, 2009; Elbaum, Vaughn, Tejero Hughes, & Watson Moody, 2000; Gajria, Jitendra, Sood, & Sacks, 2007; Suggate, 2010; Talbott, Lloyd, & Tankersley, 1994); none of these reviews show that more practice with a strategy provides an advantage. Ten sessions yield the same benefit as fifty sessions. The implication seems obvious; RCS instruction should be explicit and brief.

Thus, in his Washington Post article he actually overstates what he and Lovette claimed in the original article. There is no justification for his claim in the Post article that “All eight reviews reported that RCS instruction boosted reading comprehension, but NONE reported that practice of such instruction yielded further benefit.” In fact, even the original claim is over-stated, as we shall see.

What the cited studies actually say. Here is what we learn when we actually go to the studies that Willingham cites to make his point:

  • The Rosenshine, Meister, and Chapman study (1996) looked only at one strategy – generating questions of the text – not many reading strategies. Yet, it appears that Willingham’s sweeping conclusion that 10 sessions are as good as 50 about all strategy instruction was drawn from this one analysis. Here is the text and chart from that study:

Length of training. The median length of training for studies that used each type of procedural prompt is shown in Table 4. We uncovered no relationship between length of training and significance of results. The training period ranged from 4 to 25 sessions for studies with significant results, and from 8 to 50 sessions for studies with nonsignificant results.

 Screen Shot 2015-05-24 at 12.07.24 PM

  • Here, from a second study Willingham cites (Gadjria et al 2007), are the cautious comments on amount of time on strategies from the authors:

Unfortunately, the limited database does not allow us to infer the capacity of strategy use to achieve maintenance or transfer. Also, more research is needed to draw conclusions about the duration and length of treatments needed to positively affect maintenance and transfer effects. Although the database is larger for treatment intensity than for maintenance and transfer effects, we cannot make persuasive conclusions about the potential relationship between these variables.

  • Willingham is clearly leaning on the research in Elbaum et al (2000) since that study shows that duration of treatment is not as salient as we might think. (Suggate and Gadjria et al also quote from this study). However, Willingham chose to not mention a critical distinction in that study that bears on his claim. Here is the salient section:

Intervention intensity was examined in two ways: by duration, coded as the number of weeks over which the intervention was carried out, and total instructional time, coded as the number of hours of instruction provided to each student. Information on the duration of the intervention was available for 30 samples of students; information on total instructional time was available for 27 samples. The interventions ranged in duration from 8 to 90 weeks and in total instructional time from 8 to 150 hr. Duration of the intervention was reliably associated with the variation in effect sizes, QB(1) = 7.9; interventions lasting up to 20 weeks had a mean weighted effect size of 0.65, compared with 0.37 for those lasting longer than 20 weeks. Total instructional time, however, was not reliably associated with effect size variation, fiB(l) = 0.35. We further examined the relation between intervention duration and intensity. The mean instructional time for interventions lasting up to 20 weeks was 63 hr; the mean time for interventions lasting longer than 20 weeks was 61 hr. Duration and total instructional time did not significantly covary (r = .116, ns). This finding suggested that the same amount of instructional time, delivered more intensively, tends to have more powerful effects. [emphasis added]

Furthermore, the Elbaum study focused exclusively on one-on-one tutoring in both phonics and strategies, not teacher instruction and student practice of strategies in class. Even so, most of the interventions were far longer than Willingham lets on. For example:

One study that contrasted a standard Reading Recovery program with a modified Reading Recovery program (Iversen & Tunmer, 1993) reported that students in the modified program were discontinued after an average of 41.75 lessons, compared with 57.31 lessons for students in the standard program. The effect size for students in the modified program was comparable to that of students in the standard program, suggesting that it is possible to achieve the same outcomes in a much shorter period of time by modifying the content of instruction. This finding suggests that efficiency, or the amount of progress over time, may be a useful variable to consider in conducting future studies.

That’s a far cry from “10 quick lessons” …

More disconcertingly, not once in either article does Willingham discuss the results of the six most well-known and well-studied interventions using multiple strategies: PALS, POSSE, CSR, TSI, CORI – all of which show significant gains through a significant investment in time, and many of which are highlighted in the various meta-analyses.

Here, for example, is the data on PALS:

  • In the study, 20 teachers implemented PALS for 15 weeks, and another 20 teachers did not. Students in the PALS classrooms demonstrated greater reading progress on all three measures of reading achievement used: words read correctly during a read-aloud, comprehension questions answered correctly, and missing words identified correctly in a cloze (maze) test. The program was effective not only for students with learning disabilities but also for students without disabilities, including low and average achievers.

Michael Pressley, author of Reading Instruction That Works, arguably did more direct and indirect research on reading strategies than anyone, and his work is cited in almost every review of research. Here is what he says about duration and results:

  • As far as policymakers are concerned, however, the gold standard is that an educational intervention make a difference with respect to performance on standardized tests. What was striking in these validations was that a semester to a year of transactional strategies instruction made a definitive impact on standardized tests…

In light of this set of quotes, does the following Willingham conclusion seem warranted to you?

RCS instruction has a serious limitation. Its success is not due to the slow‐but‐steady improvement of comprehension skills, but rather to the learning of a bag of tricks. The strategies are helpful but they are quickly learned and don’t require a lot of practice.

“Tricks” and transfer. Willingham is clearly having some fun referring to the strategies as “tricks” but he might have taken a page from the research he cites instead. Because in Rosenshine, Meister, and Chapman, they say this about the strategies:

In contrast, reading comprehension, writing, and study skills are examples of less-structured tasks. Such a task cannot be broken down into a fixed sequence of subtasks or steps that consistently and unfailingly lead to the desired end result. Unlike well-structured tasks, less-structured tasks are not characterized by fixed sequences of subtasks, and one cannot develop algorithms that students can use to complete these tasks. Because less-structured tasks are generally more difficult, they have also been called higher-level tasks. However, it is possible to make these tasks more manageable by providing students with cognitive strategies and procedures.

A cognitive strategy is a heuristic. That is, a cognitive strategy is not a direct procedure or an algorithm to be followed precisely but rather a guide that serves to support learners as they develop internal procedures that enable them to perform higher-level operations. Generating questions about material that is read is an example of a cognitive strategy. Generating questions does not lead directly, in a step-by-step manner, to comprehension. Rather, in the process of generating questions, students need to search the text and combine information, and these processes help students comprehend what they read.

Such heuristic thinking is essential to transfer; it’s hardly a trick, as we know from all the research on how general ideas and schemas bridge seemingly unique experiences (cf. Chapter 3 in How People Learn). Yet, Willingham does not mention transfer once, though it is indeed worried about in almost every study he cites. Why worry? Because results on the experimental post-test, designed by the researchers to assess their intervention on specific strategies, are typically much higher than results on a standardized test of reading comprehension later, where no prompts or reminders about the particular intervention studied are provided – i.e. transfer.

Here are two relevant quotes, one from the paper by Gadjria et al cited by Willingham, and the second from Allington and McGill-Franzen in the Handbook of Research on Reading Comprehension that I have quoted from before:

Unfortunately, the limited database does not allow us to infer the capacity of strategy use to achieve maintenance or transfer. Also, more research is needed to draw conclusions about the duration and length of treatments needed to positively affect maintenance and transfer effects (Gersten et al., 2001). Although the database is larger for treatment intensity than for maintenance and transfer effects, we cannot make persuasive conclusions about the potential relationship between these variables. Furthermore, few studies helped children develop a deep understanding of complex text by effectively processing structural elements of expository text (e.g., Bakken et al., 1997; Smith & Friend, 1986) or stressed the social aspect of collaborative learning (e.g., Englert & Mariage, 1991; Klingner et al., 2004; Lederer, 2000) that Gersten et al. (2001) noted is critical to mediate learning and transfer effects.

Improving performance is possible. However there is less evidence that comprehension focused interventions produce either autonomous use of comprehension strategies or longer-term improvements in comprehension proficiencies. The lack of evidence stems from the heavy reliance on smaller sample sizes and shorter-term intervention designs as well as limited attention to a gold standard of transfer of training to autonomous use.

Arguably, transfer can only be caused by many interventions, a gradual release model, and lots of practice of multiple strategies simultaneously over a long period of time – as the research repeatedly says and as common sense tell us.

Indeed, to close with one more sports analogy, the drills do not transfer easily to the game. It basically takes a full season of scrimmages, de-briefings, and lots of practice trying to apply the drills to game situations to make that transfer happen. Nor are the drills “tricks” even though they ultimately fade away in fluent automatic performance. And that’s arguably a more apt analogy for reading the research than Willingham’s discussion of sport and furniture-building.

 

PS: I neglected in the first post to copy and paste my comments on one of the other research studies that Willingham cites: Sheri Berkeley, Thomas E. Scruggs and Margo A. Mastropieri (2010).

Here is what they say about intervention duration:

For criterion referenced measures, mean weighted treatment effect sizes were highest for treatments of medium duration (more than 1 week but less than 1 month). Differences among treatments of varying length were statistically different according to a homogeneity test, Q(2, N = 30) = 6.68, p = .04. However, differences on norm-referenced tests by study duration were not statistically significant (p = .83). That treatments of moderate length were associated with higher effect sizes than either shorter or longer treatments is not easily explained.

Screen Shot 2015-05-25 at 3.38.38 PM

Note that only three studies were examined that took place over more than 1 month, due to the parameters of their study (a focus on remedial education for special needs students.) As we have seen many such studies exist for regular students, with strong effect sizes. Nor do this data quite support Willingham’s conclusion about the value of practice.

Follow

Get every new post delivered to your Inbox.

Join 6,541 other followers