A recent piece in Education Week on the need to ensure that Standards are pegged in a K-16 way reminded me that people are still confused about what standards are.

There are three different aspects to a so-called standard; a standard is not just one demand. Whether stated or implied in standards documents, when we talk about students “meeting standards,” there are three aspects involved: content, process, and performance. The content standard says what they must know. The process standard says what they should be able to do (in terms of a discrete skill or process). A performance standard says how well they must do it and in what kind of complex performance.

Here is a simple example, using track and field:

    • Content = know the techniques and rules of jumping
    • Process = be able to make technically sound jumps
    • Performance = be able to high jump six feet

See the difference? You have not really “met” the standard of “jumping” if you merely know the techniques and can use them. The key question is: how high can you jump in the event called the high jump?

Let’s apply this to writing, an equally straightforward example, using a Common Core Standard: Write arguments to support claims in an analysis of substantive topics or texts, using valid reasoning and relevant and sufficient evidence.

    • Content = know and recognize what a good argument is
    • Process = be able to write an argument that uses “valid reasoning” “relevant” and “sufficient” evidence. (i.e. write a “good” essay)
    • Performance = be able to write such an argument to a high standard of rigor, i.e. as well as or better than the high-level specific exemplars provided. (i.e. not just “good” arguments but “up to a high valid external standard”).

Notice, again, that meeting the content and process aspect of the standard is necessary but not sufficient. The key question we must be able to answer is: is the student’s argument writing really good? The ultimate “standard” is set by the exemplars; the exemplars are chosen to represent wider-world valid “standards of performance” (for example, they are “good” essays from college freshmen); they answer the question in the concrete of how good is good enough.

In other words, a rubric is not sufficient and can never be sufficient for setting the performance standard: I only really know the standard for “valid” and “relevant” reasoning when you show me valid examples that are “up to standard” – that’s why we say that examples “anchor” a performance assessment.

Without such anchors, we get into arguments. For example, you might say that a student’s reasoning is “valid” and “relevant” but I might say: boy, you have low standards; that’s pretty weak performance; you’re setting the bar for meeting that process standard pretty low. Our argument can only be resolved by looking at “valid” exemplars for each score point on the rubric. (I will return to the challenge of agreeing on valid anchors in my next blog entry).

Where this gets even trickier is in subjects like math, science and history. Because up to this point we aren’t worried or confused about the performance task itself – i.e. whether the task is up to standard. So far it has been uncontroversial and clear: high jumpers should jump the high jump; arguers should write good essays. But almost all local math, history and science assessments involve inauthentic and unrigorous tasks that fail to challenge the student sufficiently; they tend to be superficial discrete items that don’t demand complex thinking and use of content in context. Consider the following Car Talk puzzler from last week:

A farmer has a 40-pound stone and a balance beam that he uses to measure his sacks of grain. He lends the stone to a friend; the friend breaks it into 4 pieces by accident. The farmer is delighted, however: the 4 pieces permit any item between 1 and 40 pounds to be weighed accurately on the balance beam. What are the sizes of the four pieces?

Note that the math is “easy” – an 8th grader knows all the relevant math – but the task is “very hard” since there is no scaffold or clarity about how to find the answer, much trial and error is required, and the reasoning involves numerous steps. (To find out the answer, go here).

So, we begin to realize that a performance standard actually has two distinct elements to it, as the writing anchor-paper issue and puzzler reveal:

  • Are the tasks the student is being asked to do “up to standard” as tasks? Are the tasks sufficiently challenging?
  • Is the level of achievement on such tasks “up to standard”? Is the rigor of the performance sufficiently demanding (as reflected in how demanding the judge is in scoring)?

Almost all math, history, and science assessment questions – either locally or on state tests – fail to meet this 2-part standard for standards: the tasks are simplistic and out of context, and the student is typically expected to meet only a low-level performance demand (i.e. the answer on the simplistic question need only be “correct” as opposed to a very precise, polished, well-argued, and convincing answer to a complex question).

Here is a great example of this point from the excellent book Driven by Data:

  • Consider the following test questions:
    • What is 50% of 20?
    • What is 67% of 81?
    • Shawn got 7 correct answers out of 10 possible answers. What % did he get correct?
    • JJ Redick was on pace to set an NCAA record in career free throw %. He had made 97 of 104; what was his %?
    • In his first tournament game, Redick missed his first 5 free throws. How far did his free-throw % drop?

The authors write:

“Though these questions differ tremendously in scope, difficulty, and design, all of them are ‘aligned’ to the NJ state standard Understand and use ratios, proportions, and percents in a variety of situations…“The level of mastery that will be reached is determined entirely by what sort of questions students are expected to answer.”

This is why so many students and teachers find out the hard way that most local assessment is inadequate for preparing students for most standardized tests. Standardized tests in general have more challenging questions (and rigorous scoring, when the question is constructed-response) than local assessment, as studies using Bloom’s Taxonomy, Webb’s Depth of Knowledge or other such schemes have long shown. Countless teachers and administrators fixate on the format of the external test instead of the degree of challenge and rigor in such tests.

Here’s one more revealing contrast, again from Driven by Data:

  • Solve the following quadratic equation: X2 –x – 6 = 0
  • Given the following rectangle with the lengths shown below, find x:  (the rectangle says Area = 6 and its sides say x and x – 1)

The authors note: “Question 1 is taken straight from an algebra 1 textbook test question; Question 2 is from the SAT”

In a thorough review I did of all the released math and English questions from the MCAS tests in Massachusetts at one HS grade level, 38% of the questions demanded “extensive inference” while in the local tests in 2 districts I studied only 9% of the questions demanded “extensive inference” using rubrics and scoring materials I developed. (You can download those materials here: Assess Project Coding & Rubrics 2011).

So, we must never forget that “meeting standards” means more than just teaching and testing content and skills. Only when students can use content and skill, in challenging performance tasks, to a high degree of rigor, are we allowed to say that they “meet standards.”