The following is from a WORD file on my computer:

Imagine if school basketball seasons ended in a special test of discrete drills, on the last day of the winter, in which the players – and coaches – did not know in advance which drills they would be asked to do.   Imagine further that they would not know which shots went in the basket until the summer, after the season and school were over.  Imagine further that statisticians each year invented a different (and secret) series of such “tests of basketball.” And formulae for generating a single score against a standard.  Finally, imagine a reporting system in which the coach and players receive only the single scores – without knowing exactly which specific drills were done well and which were not.

The inevitable would then happen (since these new basketball test results would be reported in the newspaper). Imagine what happens to coaches and coaching. Coaches would stop worrying about complex performance (i.e. real games) entirely, to concentrate on having students practice the most likely to be tested drills – at the expense of student engagement and genuine learning.

Who would improve at the real game under these conditions?

Yet, this is what is happening nationally as “accountability of schools,” based on a handful of tests that provide woefully sketchy and delayed feedback, on tasks that do not reflect real achievement.

Where the results are hard-to-fathom proxies for genuine performance.

Where the test is unknown until test day.

Where the feedback comes after the end of the school year, so it cannot be used to improve the performance of the students tested (and their coaches’ coaching). And where the feedback is inscrutable.

Where “coaches” end up pressured to focus on a handful of superficial indicators instead of the larger aims of learning.

We should not be surprised, then, that there is a rising tide of disenchantment with current testing in many professional and public quarters. Key educational organizations do not support the current approach. Nor do such citizen groups as diverse as the PTA and School Boards Association.  Not because key groups want to avoid accountability for schools, but because the current approach doesn’t provide it.

No one’s interests – not those of policy-makers, taxpayers, parents, teachers, and especially students – are adequately served by a so-called accountability system that relies exclusively on a handful of secret “audit” tests. The current approach causes an impoverished “teaching to the test items” instead of rich and creative instruction and assessment.  Policymakers need to understand why the effect is harmful even if the intentions are noble.

What is genuine accountability? As the analogy suggests, then, the current system really offers only the illusion of accountability.

Accountability is ‘responsibility for’ and ‘responsiveness to’ results, as the dictionary reminds us. Teachers who are sometimes deemed unwilling by the public to be held accountable are the same educators who serve as athletic coaches and teachers in the performing and vocational arts – where they are happy to be held responsible for performance results, since the tasks are worthy, the scores are valid and (over time) reliable, and the whole system is public and fair.  But if “coaches” and “players” never know from test to test the specific “game” upon which evaluation will be made, how can they be truly responsible for the results? And when the results come back in the summer (in cryptic form), how can teachers actually be responsive to the results? As a feedback system our current tests and value-added scores are a complete failure, in other words, regardless of the worthwhile attention to standards they foster.

We propose a better way, a new assessment and accountability system, based on common-sense principles about how people get better – teachers as well as students. A more responsive system based on helpful and timely feedback designed to improve learning and teaching, not just audit it. A system that makes local assessment and teacher judgement more central to state accountability. A system designed to provide incentives for school renewal and on-going professional development. A system that will inspire more creative teaching instead of more fearful compliant behavior.

Why did I say that this is ‘on my computer’? Because it is a paper I wrote over a decade ago, in support of a proposed accountability system for the state of New Jersey that was commissioned and endorsed by three different state organizations in response to the first wave of state testing.
The situation today is even worse because the stakes are higher, the arcane mathematical formulae are even more indecipherable, and fewer and fewer states release items and item analysis after testing for careful and helpful study.
As I have long said, accountability is important. It’s how we improve, based on legitimate feedback and responsiveness to results. The current system is not only a sham, it is pulling down with it the Common Core Standards as ‘collateral damage’ even though there is nothing in the Standards to link them to these wretched accountability policies.
I have a single question for all critics or doubters on this subject: would you happily be held accountable under the equivalent of these new state-run systems? There is only one word – harsh though it is – to apply in this situation. Hypocrisy.
If you are expressly or tacitly behind this one-mysterious-score-hopelessly-ineffective feedback system, then you are a hypocrite.
i say this with confidence. No one would happily work under such a policy, in any field. You would fight it in your own job. Shame on the unions for going along. Shame on the national professional associations for not going against it. (In Massachusetts, the associations DID work together and resisted the single accountability score that New York and New Jersey put in place.)
The Standards can help us. The current draconian and hypocritical state policies are likely to kill not just the Standards but public education.
In my next post, I will describe the accountability system I proposed 13 years ago. (It is feasible: it was based on actual experience in North Carolina for the Standards and Accountability Commission chaired by Governor Hunt, in which over a two year period we developed locally-scored performance tasks and portfolios that were used in a state-wide pilot with positive results and feedback.)

Categories:

Tags:

23 Responses

  1. Nice! Great analogy. It really hit home for Me the pointlessness of that final test. Christ spoke using payables and stories and this has got to be the most powerful way of getting a message across. Is it any wonder that Finlands education ‘system’ doesn’t do state tests!?
    It must say something about trust and control. Accountability can be done via data or ‘controls’ or by trust and encouragement/pro development.
    John Stradwick http://www.mdis.net
    >

  2. It occurs to me as I read this post that many educators are framing their arguments incorrectly. I don’t know many educators who are against improving performance and learning but I know numerous who are anti-test. It’s not that tests are inherently bad. In the case of standardized testing in the US, however, it is that the system of measurement and feedback is deeply flawed. The argument against testing, in my opinion, has been oversimplified.
    Instead, we need a critical mass of educators, informed parents and students to reframe the problem and push for assessment and accountability systems that are designed to improve learning, performance, and teaching as you stated so profoundly in this post. Thank you for this insightful reframing of the problem!

    • I completely agree! The problem is currently poorly framed. The challenge is to invent a powerful improvement system, not a compliance system. The tests can be used for good measure as well as bad. In my next post I will propose a better use of such tests as well as a way to give teachers more ownership – hence, responsibility – for local results.

  3. Hi Grant,
    Another great post.
    I’d be curious what your opinion is of AP exams, specifically the AP Calculus exam. As you know, all free response questions are released every year (just days after the test is administered), and the college board does provide some aggregated feedback to the teacher (although this doesn’t come until the summer).
    I realize these tests are given for a different purpose than the state tests you are referring to, but I’m wondering where they would fit in with your “basketball final” analogy. Thanks.

    • I’m ok with AP for a number of reasons. (OK, not widely enthusiastic.) Same with IB. I’ll address this in my follow-up. I’m OK with it because, though not ideal, there is plenty of transparency and teacher ownership via the scoring process and release of tests; and everyone chooses to be in the program, kids and teachers. Ideally, though, AP would include more local work as biology has tried to do; and a pre-test.

      • As a physics teacher, I’m not enthralled with the AP exam (speaking of the C exam only, as the B is a jumbled mess of too many topics) but it is certainly not bad. For a national standardized exam, it does a pretty good job of probing conceptual understanding as well as specific skills. What I find utterly bewildering about the whole concept of the AP is the stated goal of replicating a college course. I find this absurd for several reasons. First, my students take five other courses while taking my senior-level, calculus-based physics course (not designated “AP,” rather built around Carnegie Mellon’s freshman physics experience). I have them in class/lab (it all runs together) for 4 hours a week and can assign about 3 hours of homework a week (school guidelines). My son’s physic’s class in college meets for 3 hours of class, 3 hours of lab, and an optional hour of problem-solving time. They are expected to spend seven hours a week working outside of this classroom/lab time. Now a little math: seven hours of time on physics in my high school class vs. 13-14 hours on physics in college. Is it any wonder that students who take AP classes report staying up until after midnight regularly, and that the course is a frenzied slog (yes, I meant to put frenzied alongside slog) through content that leaves them exhausted? I feel great about my year when I get a few topics into the second semester by the end of the year.
        Finally (and for me this is enough), when is the exam? Oh, let’s see, a month before the end of the semester! When I’ve asked college professors if they would consider chopping their class/lab time in half and giving the final exam a month early, they just look at me like I’m crazy. “That wouldn’t be freshman physics.” Exactly.
        This complaint of mine is different from disagreeing with the validity of the AP exam. Again, I think it is ok. But I have little regard for the typical courses that supposedly prep students for the exam. And colleges seem to agree with me. My son was told that not even an AP score of 5 would get students out of first semester physics at his college. But when he showed his electronic portfolio from my class, his professor said “Darn, this has never happened,” and let him skip the first semester. I’ve had other students not even take the AP and still skip first semester physics by showing their portfolios (and in one case sitting through an impromptu oral exam by a department head). Isn’t this a better approach, anyway?

        • Thanks for this analysis. I have the same mixed feelings. And so do the colleges. Fewer and fewer want to give credit, and I learned from Eric Mazur at Harvard that Harvard data shows that AP kids do worse than non-AP kids in their courses (though this was 5-6 years ago, before the new course overhauls). It’s also worth keeping in mind that AP used to be modeled on ‘typical’ freshman courses and was a way for mostly prep school and elite suburban kids to skip over that typical course. The massive use of AP in all schools has made the tension harder to handle, and the new AP courses – not designed to be ‘typical’ but be ‘model’ courses for in depth work – have not been happily accepted by some colleges. So, it’s bound to be a compromise; it is.

        • Excellent comments Mr. Hammond. You mentioned the timing of the test in the school year so I thought I’d leave a comment to that. Timing is very important, as you note, and given the school year, the need to grade the tests in a timely manner (and not mess up placement for the next school year), and the competition for test dates on the high school calendar- this is what has had to happen. I have seen the high school testing calendar and it is obscene. A huge percent of the job of admin now is to form and constantly edit this highly volatile, ever changing testing calendar.
          The most ridiculous timing issue I have seen yet is the timing of the middle school EOC tests. Here, in FL, we have changed our testing for high school level math to an End of Course exam (EOC) instead of the FCAT standardized test (only in upper level math). So, when a middle school student takes Algebra 1 or Geometry, the also must satisfy the FCAT math requirement that all middle schoolers take. No big deal? Wrong. So, they are typically made to stop teaching Algebra and Geometry about a month ahead of the FCAT math so that they can review as they have found even the upper level math kids forget some of the basics and need that review. And, since they are in Algebra and Geometry, they are not getting 7th grade math (or 8th). So, they “train” them for a month prior to FCAT. Then, these same students who had to have 1 month taken out of their academic year in Algebra (or Geometry), must take the EOC a full month prior to the end of school. So, instead of spending 9 full months learning Algebra or Geometry, they get the same amount of material over 7 months. And that doesn’t include all the days lost to the practice diagnostics given in Sept., Dec. Jan, and May that are required of all students. It was considered allowing them to opt out of FCAT math. They decided to leave that up to the school Principals. What they are finding out though, is that the Principals do not generally want to do this since it lowers their school grade (FCAT is formulated heavily) and their teacher grades. So, the schools and teachers lose money – and if they do too badly risk DA and closure- if they let their best students opt out of testing. So, they don’t. This is about the worst example of how the school calendar and high stakes testing has lead to reduced focus on what is best for the student.

  4. I absolutely love these posts. They provide the sense that I am not nuts! This post is so true and I can really identify with the Basketball Analogy. Thanks!
    For Our Kids,
    Deb
    Debra S. Young Administrator WPLC – Webster 585-216-0132 debra_young@websterschools.org
    The challenge of leadership is to be strong, but not rude; be kind, but not weak; be bold, but not a bully; be thoughtful, but not lazy; be humble; but not timid; be proud, but not arrogant; have humor, but without folly. – Jim Rohn
    This e-mail and any attachments may contain Privacy Act information. If you received this message in error or are not the intended recipient, you should destroy the e-mail message and any attachments or copies. You are prohibited from retaining, distributing, disclosing or using any information contained herein. Please inform me of the erroneous delivery by return e-mail. Thank you.

  5. 13 years ago, that is exactly the age of the packet containing the “collection of evidence” document. Imagine, I arrive at the “proficiency based grading” meeting paper in hand. I tell everyone that the work has already been done, and hold up a booklet produced by my district dated 2001. The messenger, me, is looked at as if he is daft. I email this same info to the SUP- School Board Members-Others- just silence. When we here in Oregon were moving toward the CIM and CAM, the student portfolio and some standardized test scores were the basis of our determination of proficiency. The students had to produce evidence of mastery. In this business, anyone with a memory is executed. Good Luck Grant-and keep it up!!

  6. Grant,
    I am blown away by your post! Sports is a great analogy for a positive way of assessment. Our current CCSS assessments are way off course and it is going wrong on so many levels,especially when funders (i.e. the GOVERNMENT and local DEs) judge success and failure on one or two tests.
    Regards.
    Harry, teacher in LIC, NY

  7. This is right on point Grant. Thank you! I have never had any problem with any test. It’s how it is given, interpreted, used for accountability, etc (all the things you stated). As one of the commenters here stated, there is a problem with parents and teachers rallying against the test. The state then responds by making changes to the test. No one is satisfied because that is not the problem. The only ones who gain are the businesses who make the tests and the practice instruments for the tests. Enough!
    I look forward to reading your follow-up on how testing can be accountable without the problems we have currently.

  8. Grant,
    I read your blog daily but am commenting for the first time. Posts like this one keep me coming back.
    I’m anxious to see what you come up with in your research of “outlier” schools. Are you embarking on a formal study or will you post your research here? Will you be posting anything soon?! I really am curious.
    And, in light of your great post, have you considered whether outliers’ impressive scores on state-administered standardized tests correlate with a pedagogy, an approach to teaching and learning, that is explicitly designed to maximize said scores? Is that pedagogy really what we want to study and replicate? Maybe the answer is yes, and the outliers can teach us more than how to prep for flawed assessments, but I don’t see that right now.
    No snark intended here; I’m genuinely curious what your thinking is.

    • Well, that’s what I want to research. I honestly don’t know the answer. I suspect in many cases it’s just a matter of having all staff be on the same page about some norms and practices rather than ‘best practice’. I want also to do interventions using UbD and other ‘best practices’ in some places as part of the work. We will soon be announcing some very positive results after a year of intensive work in a poor urban district in the midwest where their scores jumped significantly after a year of monthly UbD training and coaching.

  9. I’m not really sure if anyone really has an idea about this whole testing mess. What needs to be measured? What level is truly proficient? What if students are not proficient – what do we do? How should the tests be administered? Should there even be tests? How long should they be? How much weight should this/these test(s) have? Are we ready to “man-up” when we get the results? Should there be only one test? Should a future woodworker/writer/programmer/etc. know the same as a future chemical engineer?
    How do you hold someone accountable when we don’t know what we are measuring or how we should measure it? What percent of the population agrees on any one testing-accountability format?
    Also, what does a low score really mean? So a superb artist scores low in history, is that person a failure? An athlete, who is also a “people person” who scores low – failure? Is the school a failure? What about a math wiz who is terrible at English? So what do we as a society feel about those people failing? Is it failing? Did those teachers “fail as a teacher” for those students? For a student who is capable… sure, but how do you sort that out?
    I don’t have the answers because I personally am so conflicted with the whole process. I do think that we in the education get too full of ourselves and think that this is so very important when maybe it is, maybe it isn’t. When is the last time someone (not in academia) asked you for:
    1. Your primary/secondary school grades?
    2. Your ITBS, MAP, Stanford, TABS, Regents, etc. scores?
    3. Your PSAT scores?
    4. Your SAT/GRE/MAT scores?
    People do ask for GPA’s right out of college, but let’s be honest – no one cares what GPA/test scores top athletes, programmers, etc. have – they care about performance. Does anyone care what grade Dr. Grant Wiggins made in English Comp I, or what your test scores were? Do you ask your electrician for references or do you ask for grades/test scores?
    Outside of academia, it’s job performance – not test scores. Not many jobs have performance tests because it would be way too complicated and not take into account factors that we love: creativity, social skills, attitude, speed, attendance (promptness), etc. Maybe accountability is way too complicated for one test/input to measure accurately…?
    I don’t know that we will ever answer the “test” question(s) and accountability problem because we as a society don’t really know what the question should be. The stakes are high, but what are our expectations? “Vague” “contradictory” and “wavering” are kinda hard to measure.

    • All good questions. Years ago I wrote about this problem – see my book Educative Assessment. My basic point was that ‘assessment’ is fundamentally different from ‘testing’ and that the history of ‘testing’ is an ugly one because the whole issue of the purpose of the ‘test’ gets lost once the policy-makers get ahold of it. The title of the book was my answer: the purpose of a test in education is to improve performance, not just measure it. Mere measuring is done as cheaply as possible – it’s an audit, as I often call it – to emphasize differences between people in order to make a decision. So, you see this at work in all state tests: not designed to be useful but designed to rate. Famously, also, Lauren Resnick said 35 years ago that American students were among the most tested but the least examined in the world.
      40 years ago David McClelland, a well known psychologist from Harvard, wrote a seminal paper blasting the tradition of testing and raised the issue you also raise: testing for genuine competence. That was really where my own work in assessment reform got started. I read that paper and the famous study of colleges focused on proficiency called On Competence, and Ted Sizer asked me to develop this idea of his called ‘diploma by exhibition of mastery’ for the Coalition of Essential Schools. That led to my first major article on authenticity in assessment, in the Kappan in the 1980s.
      So, this is a direction and challenge of longstanding. There are so many political, financial, technical, and logistical aspects to this challenge that progress is slow. And sometimes it seems like we have regressed. But keep in mind when I started talking about and showing performance tasks and rubrics 30 years ago, the testing community scoffed. Now it’s universal.

  10. I subscribe to an e-newsletter from NEPC (Natl Education Policy Ctr, at the Univ. of Colorado).
    http://nepc.colorado.edu . On Oct. 22, a report on assessment came out. I quote excerpts:
    Can We Reverse the Wrong Course on Data and Accountability?
    http://us4.campaign-archive2.com/?u=b4ad2ece093459cbf2afb759f&id=e7f6fcba2e&e=04eae22583
    New NEPC report and model legislation offer a positive alternative to today’s poor uses of student data and punitive approaches to accountability
    BOULDER, CO (October 22, 2013) – A new report by two professors at Boston College urges American schools to use data and accountability policies in the more successful ways now seen in high-performing countries and in other sectors of U.S. society.
    In their report, Data-Driven Improvement and Accountability, authors Andy Hargreaves, the Thomas More Brennan Professor of Education in the Lynch school of Education, and Henry Braun, the Boisi Professor of Education and Public Policy in the Lynch School of Education, find that the use of data in the U.S. is too often limited to simply measuring short-term gains or placing blame, rather than focusing on achieving the primary goals of education. The report is published by the National Education Policy Center (NEPC), which is housed at the University of Colorado Boulder.
    The report’s findings have national significance because data-driven improvement and accountability (DDIA) strategies in schools and school systems are now widespread. When used thoughtfully, DDIA provides educators with valuable feedback on their students’ progress by pinpointing where the most useful interventions can be made. DDIA also can give parents and the public accurate and meaningful information about student learning and school performance.
    However, in the United States, measures of learning are usually limited in number and scope, and data suggesting poor performance schools and teachers are often used punitively. …
    To ensure that student improvement becomes the main driver of DDIA – and not simply an afterthought to accountability concerns – Hargreaves and Braun offer two key recommendations:
    * Base professional judgments and interventions on a wide range of evidence and indicators that properly reflect what students should be learning.
    * In line with best practice in high-performing countries and systems, design systemic reforms to promote collective responsibility for improvement, with top-down accountability serving as a strategy of last resort when this falls short.
    A report containing model legislation, authored by attorney Kathy Gebhardt, accompanies Data-Driven Improvement and Accountability. The legislation, based on the Hargreaves and Braun brief, details a legal structure that would use data effectively to create a multi-level system of accountability designed for school improvement. …
    http://nepc.colorado.edu/publication/data-driven-improvement-accountability
    [A teacher-leader and I think that the draft legislation can be improved, in Section 103. Classroom Assessments. Take a look.]

  11. I subscribe to an e-newsletter from ACHIEVE. This came on Nov. 21. I quote statements on assessment, fyi, since they pertain to Grant’s excellent blogpost. They don’t have the vision of assessment to improve learning, do they! How can they be influenced? — Jane Jackson, Co-Director, Modeling Instruction Program, Arizona State University.
    New Report Details States’ Progress on College and Career Readiness
    Washington, D.C. – November 20, 2013 – With all 50 states and the District of Columbia having adopted college- and career-ready standards, Achieve’s eighth annual “Closing the Expectations Gap” report, released today, shows how all states are aligning those standards with policies and practice to better ensure that all students are academically prepared for life after high school.
    “All 50 states deserve credit for confronting the expectations gap – that is the gap between what it takes to earn a high school diploma and what the real world actually expects graduates to know and be able to do,” said Mike Cohen, Achieve’s president. “But raising standards is just the start. Supporting teachers and leaders with the time and tools they need to change classroom practice is critical, and many states are doing just that. It is also important to align graduation requirements, assessments and accountability policies to college- and career-ready standards. This work is complicated and it will take time to get it right. Governors, chiefs and other state and districts leaders must continue to make the work a top priority …

    Assessments: Today, 19 states have or will administer college- and career-ready
    high school assessments capable of producing a readiness score that postsecondary
    institutions use to make placement decisions. The 42 states and District of Columbia
    participating in the Partnership for Assessment of Readiness for College and Careers
    (PARCC) or the Smarter Balanced Assessment Consortium working to develop CCR assessments will face many key decisions in the months and years ahead, including how these next generation assessments can support aligned and rigorous instruction, how to ensure postsecondary use of the results, and how and whether to factor the results of new assessments into high-stakes graduation decisions for students. …
    The full report is at
    http://www.achieve.org/files/2013ClosingtheExpectationsGapReport.pdf
    I quote page 19 of the report, on assessment:
    A total of 19 states currently administer, or have adopted policies to administer, assessments that meet Achieve’s criteria for a CCR assessment; five use state-developed CCR assessments while the remaining 14 require all high school students to take a college admissions test such as the ACT or SAT in addition to other statewide assessments.
    [page 20:] Some states administer the ACT or SAT to all students (typically in the 11th grade) and use student scores as measures of college readiness. While these tests have credibility in postsecondary education as a college-ready indicator, particularly for admissions, they are of unknown alignment to the CCSS. The College Board has announced that it is overhauling both the SAT and PSAT to alignwith the CCSS.

  12. Forgive my ignorance here. I’m a teacher at an international school in Taiwan. How are Common Core Standards seen as ‘collateral damage’ with this accountability practice?

    • Because the Standards are now in jeopardy politically as a result of the negativity surrounding the accountability systems being put in place that depend upon tests against those Standards. The accountability policy and poor implementation have led many people to blame those problems on Common Core when they are unrelated to Standards per se.

Leave a Reply

Your email address will not be published. Required fields are marked *