Assessment, grading and rigor: toward common sense and predictable outcomes on tests

Over the last few months I have worked with a number of high schools and middle schools where the grading and assessment practices simply do not work in a world of standards. The schools are not making local assessment rigorous enough in their concern with demoralizing students through low grades. The solution is straightforward: don’t thoughtlessly translate scores into grades.

The problem. Schools have to meet standards, and local assessment should prepare kids to deal with the standards as tested by PARCC and SB. But the new tests are harder and more rigorously scored than most local tests. So, test scores will have to be low. (Anyone following NAEP results has known this for years, alas.) This seems to run headlong into a long tradition of grading whereby we do not want to punish kids with low grades (akin to the outrage over sharply-lower school scores on accountability measures this year).

Yet, there seems to be no alternative: to significantly raise local standards of performance seems to mean we have to lower student grades. Or, conversely, we can keep our current average grade of a B for students locally, but then have less rigor than is needed to prepare kids for the tests – and predict results on them (which local assessment should surely do if it is valid and useful).

Note that so-called “standards-based grading” does NOT inherently solve this problem. Just because we go to standards-based grading doesn’t mean the grading is rigorous. In fact, if you look at schools that use standards-based grading, it is rare for students to get “scores” that are vastly different from the range of “grades” in such schools previously. i.e. we are doing standards-based grading in a norm-referenced framework! The local failure was to assume that assessing against the standards was sufficient to establish rigor. But that is insufficient; it cannot work by itself.

So, what is rigor? Rigor is not established by the teaching. It’s not established by framing teaching against standards, therefore. Rigor is established by our expectations: how we evaluate and score student work. That means that rigor is established by the three different elements of assessment:

  1. The difficulty of the task or questions
  2. The difficulty of the criteria, as established by rubrics
  3. The level of achievement expected, as set by “anchors” or cut scores.

Many districts and schools don’t even pass the #1 criterion now. Routinely, when my colleagues and I audit local assessment, the tests are much easier than what the external tests test – even in pretty good districts. The usual explanation? The problem of fair/normed grading! [Update: new report from Princeton University in which they back down from policy of limiting # of As given - which shows the power of local norms to frame official grading policies.]

Note, too, from these three elements that even a difficult task and high-quality rubric aren’t enough to establish rigor. The task could be challenging and the criteria demanding – but if the expectations for student products or performance are very low (as established by either specific models or local norms), then the assessment is not rigorous. That’s why having a “cut” score of 40 or 50 on the state tests is a terrible solution – IF the goal is to communicate standards-based results vs. finding a way to pass most kids.

Think of the high jump or pole vault in track: you could enter a challenging event and be judged against the true criteria, but if the height you have to clear is absurdly low, then the assessment is not rigorous – even though it is “standards-based” testing and scoring.

Solution: avoid thoughtless calculations based on false equivalencies.  Stick with track and field to see the solution: we need not and in fact never do calculate the “grade” for the athlete by mechanically turning the height they jump into a grade by some arbitrary but easy to use formula. To do so, would greatly lower grades and provide powerful disincentives for the less skilled athletes. On the contrary, we judge progress and performance relative to early jump heights and look for “appropriate” growth, based on effort and gains in height. (I blogged previously about this point at greater length here and here.) However, the expectations for all jumpers are high and constantly increasing.

The same solution is needed locally in academics, if genuine standards are going to be used to alert students as to where they are without discouraging them. (This is the idea behind the SLOs and SGOs in many states.)

So, numerous times a year, their work needs to be evaluated against the external standards (as established by high-quality tests and student work samples). “But we have to give grades all year in our online grade book!” I know. But instead of turning their “score” into a “grade” by some unthinking formula, we use our wisdom and judgment to factor in fairness, growth, and effort on some uniform basis.

Suppose, for example, that in a writing assessment done against national standards, we anchor the assessment by national samples culled from released tests. Further suppose that a 6-point rubric is used. Now, assume that in the first administration, say in October, almost all students get a 1 or a 2 (where those are the lowest scores on the scale). Here’s what we might say BEFORE the scores are given to students and turned into grades:

“Guys, I’m scoring you against the best writing in the state. So, your first grade this fall will reflect a fair assessment of where you are now. A score of 1 will equal a B-. A score of 2 will equal a B+. Any score above a 2 is an A – for the first semester.

“Next semester, in the winter, to get those same grades, you will have to move up one number on the scale. And by spring, you will have to move up 2 numbers to get those grades.”

This already happens, of course, in AP and IB courses. So, it should be relatively easy to do so in all courses.

We have thus solved the problem: grades become fair, standards are made clear, and there are incentives to improve over time.

A postscript: Rigor and verbs

Rigor is not established by the unthinking use of Webb or Bloom or other verbs. Here, for example, is a widely-findable chart I found on the NJ Dept of Education website, in which Webb’s rubrics have been turned into a leveled chart of supposedly-appropriate verbs:

A moment’s thought after looking over these verbs should make you say: Huh? How can the verb, itself, determine the rigor? Couldn’t the rigor of so-called high-level verbs be compromised by a simplistic task and scoring system? Vice versa: can’t we imagine some of the low-level verbs occurring in highly-challenging and rigorous assessments? (e.g. Who, what, when, and why in a complex journalism case would be rigorous work.)

Take “predict” for example. It is viewed as relatively low-level – Level 2. But what if I ask you to predict the effects on plants of using special soil, food, and artificial lights, and I score you against industry-level standards? Vice versa: suppose I ask you to critique a drawing against the criterion “pretty”. Pretty low level stuff.

You can find the rubrics Norman Webb developed below the post to see how the circle by itself completely misses the point. Note, especially, his description of what Level 4 demands in terms of cognitive and task demands.

So, just throwing some verbs around as starters for “rigorous” tasks is not enough to address the first bullet concerning the challenge of the task. Rigorous tasks are a function of cognitive demand and situational complexity, not just the verb used. Self-assess against our audit matrix to test your tests, therefore:

1. Audit Matrix for Assessments





Webb rubric for math:


Level 1:Recall Level 1 includes the recall of information such as a fact, definition, term, or simple procedure, as well as performing a simple algorithm or applying a formula. That is, in mathematics a one-step, well-defined, and straight algorithmic procedure should be included at this lowest level. Other key words that signify a Level 1 activity include “identify,” “recall,” “recognize,” “use,” and “measure.” Verbs such as “describe” and “explain” could be classified at different levels depending on what is to be described and explained.

Level 2: Skill/Concept Level 2 includes the engagement of some mental processing beyond a habitual response. A Level 2 assessment item requires students to make some decisions as to how to approach the problem or activity, whereas a Level 1 item requires students to demonstrate a rote response, perform a well-known algorithm, follow a set procedure (like a recipe), or perform a clearly defined series of steps. Key words and phrases that generally distinguish a Level 2 item include “classify,” “organize,” “estimate,” “make observations,” “collect and display data,” and “compare data.” These actions imply more than one step. For example, to compare data may require first identifying characteristics of the objects and then grouping or ordering the objects.

Level 3: Strategic Thinking Level 3 requires reasoning, planning, using evidence, and a higher level of thinking than the previous two levels. In most instances, requiring students to explain their thinking is a Level 3 activity. Activities that require students to make conjectures are also at this level. The cognitive demands at Level 3 are complex and abstract. The complexity does not result from the fact that there are multiple answers, a possibility at both Levels 1 and 2, but because the task requires more demanding reasoning. An activity, however, that has more than one possible answer and requires students to justify the response they give would most likely be a Level 3 activity. Other Level 3 activities include drawing conclusions from observations, citing evidence and developing a logical argument for concepts, explaining phenomena in terms of concepts, and using concepts to solve problems.

Level 4: Extended Thinking Level 4 requires complex reasoning, planning, developing, and thinking—most likely over an extended period of time. The extended time period is not a distinguishing factor if the required work is only repetitive and does not require applying significant conceptual understanding and higher-order thinking. For example, if a student has to take the water temperature from a river each day for a month and then construct a graph, this would be classified as a Level 2 activity. However, if the student is to conduct a river study that requires taking into consideration a number of variables, this would be a Level 4 activity.

At Level 4, the cognitive demands of the task should be high and the work should be very complex. Students should be required to make several connections—relate ideas within the content area or among content areas—and have to select one approach among many alternatives on how the situation should be solved. Level 4 activities include designing and conducting experiments; making connections between a finding and related concepts and phenomena; combining and synthesizing ideas into new concepts; and critiquing experimental designs.


Level 1: Recall of Information Level 1 requires students to receive or recite facts or to use simple skills or abilities. Oral reading that does not include analysis of the text as well as basic comprehension of a text is included. Items require only a shallow understanding of text presented and often consist of verbatim recall from text or simple understanding of a single word or phrase.

Level 2: Basic Reasoning about text Level 2 includes the engagement of some mental processing beyond recalling or reproducing a response; it requires both comprehension and subsequent processing of text or portions of text. Inter-sentence analysis of inference is required. Some important concepts are covered, but not in a complex way. Standards and items at this level may include words and phrases such as “summarize,” “interpret,” “infer,” “classify,” “organize,” “collect,” “display,” “compare,” and “determine whether fact or opinion.” Literal main ideas are stressed. A Level 2 assessment item may require students to apply some of the skills and concepts that are covered at Level 1.

Level 3: Complex Reasoning about text. Deep knowledge becomes more of a focus at Level 3. Students are encouraged to go beyond the text; however, they are still required to show understanding of the ideas in the text. Students may be encouraged to explain, generalize, or connect ideas. Standards and items at Level 3 involve reasoning and planning. Students must be able to support their thinking. Items may involve abstract theme identification, inference across an entire passage, or students’ application of prior knowledge. Items may also involve more superficial connections between texts.

Level 4: Extended Reasoning. Higher-order thinking is central and knowledge is deep at Level 4. The standard or assessment item at this level will probably be an extended activity, with extended time provided. The extended time period is not a distinguishing factor if the required work is only repetitive and does not require applying significant conceptual understanding and higher-order thinking. Students take information from at least one passage and are asked to apply this information to a new task. They may also be asked to develop hypotheses and perform complex analyses of the connections among texts.


Writing Level 1: Recall of Information Level 1 requires the student to write or recite simple facts. This writing or recitation does not include complex synthesis or analysis, only basic ideas. The students are engaged in listing ideas or words as in a brainstorming activity prior to written composition, are engaged in a simple spelling or vocabulary assessment, or are asked to write simple sentences. Students are expected to write and speak using standard English conventions. This includes using appropriate grammar, punctuation, capitalization, and spelling.

Level 2: Basic Reasoning Level 2 requires some mental processing. At this level, students are engaged in first-draft writing or brief extemporaneous speaking for a limited number of purposes and audiences. Students are beginning to connect ideas using a simple organizational structure. For example, students may be engaged in note-taking, outlining, or simple summaries. Texts may be limited to one paragraph. Students demonstrate a basic understanding and appropriate use of such reference materials as a dictionary, thesaurus, or Web site.

Level 3: Complex Reasoning Level 3 requires some higher-level mental processing. Students are engaged in developing compositions that include multiple paragraphs. These compositions may include complex sentence structure and may demonstrate some synthesis and analysis. Students show awareness of their audience and purpose through focus, organization, and the use of appropriate compositional elements. The use of appropriate compositional elements includes such things as addressing chronological order in a narrative or including supporting facts and details in an informational report. At this stage, students are engaged in editing and revising to improve the quality of the composition.

Level 4: Extended Reasoning Higher-level thinking is central to Level 4. The standard at this level is a multi-paragraph composition that demonstrates synthesis and analysis of complex ideas or themes. There is evidence of a deep awareness of purpose and audience. For example, informational papers include hypotheses and supporting evidence. Students are expected to create compositions that demonstrate a distinct voice and that stimulate the reader or listener to consider new perspectives on the addressed ideas and themes.




Get every new post delivered to your Inbox.

Join 3,448 other followers