To develop a teacher evaluation system that is exemplary, there have to be clear, valid, and robust standards for such a system. So, before offering my particular version, as promised recently, I offer below a set of standards for use in building, critiquing, or improving any such system – including my own, to be posted next time.
The purpose of any proper evaluation is legitimate accountability and helpful feedback. Accountability means that we must be both responsible and responsive to feedback against legitimate organizational goals. Humans need to be held accountable because we have blind spots as well as good intentions. So, formal feedback against results is useful for both organization and employee in a healthy system.
Evaluation asks: how are we doing against our obligations? i.e. in schools it means asking: how well are students engaging, learning, and achieving? What have been our personal successes as causers of learning? What (inevitable) improvements are suggested by results to better honor our responsibilities?
Thus, for any evaluation to be legitimate and helpful it must be –
- Outcome-based, using salient performance-based job descriptions & indicators
- Evidence-based, in which all key inferences are supported by data
- Valid, based on Mission and key learning goals and tasks, with no arbitrary value-added targets, tests, performance criteria, or weighting of criteria
- Reliable, based on multiple measures, evidence, and feedback sources over time
- Transparent, based on direct evidence that provides a clear account of achievement as well as helpful and actionable feedback
- Honest about employee strengths and weaknesses relative to goals
- Fair, based on opportunities to show one’s results and strengths, in context; and where one can appeal a rating that one believes in unfair
- Growth focused, to encourage ongoing learning and constant adjustments, not unimaginative compliance
- Credible to all stakeholders; not hypocritically imposed unilaterally
- Feasible in terms of time for sufficient evidence collection by supervisors and discussion with employees.
- Effective, whereby evaluations have substantive consequences that align with institutional interests and personal aspirations
I trust that you find these standards sensible on their face, but let me make a few observations about their implications: By these standards –
- The current systems in place in New York and New Jersey (among other states) are utterly unacceptable. By virtue of relying somewhat on ‘secure’ tests, non-released items and no item-by-item analysis, there is zero transparency. (The decision by many states in recent years, such as Florida, to end release of tests is truly wrong-headed and unethical.) The current value-added measures – while ‘growth focused’ in theory – are based on completely arbitrary growth targets, a function of non-transparent psychometrics, as opposed to direct and actionable feedback that one can use to improve. This is true even if the value-added scores are psychometrically sound – which many researchers question when used for just 1 year. (See my prior post on the analogy of basketball players and coaches playing without direct evidence of their achievement.) No teacher evaluation system can be valid and credible to many stakeholders without a careful look at student work on meaningful academic tasks.
- Almost all evaluation systems based on one or two classroom observations lack an outcomes-focus, honesty, validity, reliability, and a growth focus; and they lack credibility to most stakeholders (since historically almost everyone is evaluated as being fine.) Evaluations based on a few observations – the dominant approach nationally – are especially unreliable and invalid when they focus on teacher behavior as opposed to teacher accomplishment – i.e. if they focus on the teacher and students instead of the learning. A teacher evaluation system that ignores teacher assessments and results (and student feedback) is invalid on its face.
- District-test-score-based evaluation systems are unreliable and mostly invalid. There are too few data-points and varied points of view from which to triangulate the data, direct assessment of complex performance related to Mission is typically missing, and there is no opportunity to address one’s context or special situation.
- Any generic evaluation system (e.g. the Danielson or Marzano framework) is likely to be invalid since it fails to evaluate according to local Mission statements, school and program aims, and personal goals; and too many of the dimensions in those frameworks have little to do with core outcomes.
Though I’ll save my own system for next time, readers will be able to predict many of its elements simply by considering the standards proposed and the weaknesses just cited. More to the point, one can easily imagine better evaluation systems simply by considering the systems that exist in the top professional and corporate organizations in which many of these standards have been deliberately attended to.
Put bluntly: many people outside of education would not stand for the evaluation systems inside education, and so it is rank hypocrisy for leaders in those organizations to propose most of the current schemes.
Here are some other helpful resources on teacher evaluation standards and processes:
Readers should post links to other documents that offer sensible evaluation principles or policies.