20 years later: the immorality of test security, revisited

The title of this post refers to the title of an article I wrote twenty years ago: The Immorality of Test Security. It is basically immoral to hold people accountable for improved results on tests that are so secure that teachers aren’t even allowed, in some cases, to see them. And as more states back off releasing tests and allowing teachers to score them, it seems timely to revisit the argument. (See this and this article on the changes to make the Regents exam no longer teacher scored).
As I have long written, I have no problem with the state doing a once-per-year audit of performance. But what far too many policy-makers and measurement wonks fail to understand is that if the core purpose of the test is to improve performance, not just audit it, then most test security undercuts the purpose. Look, I get the point of security: you can get at understanding far more easily and efficiently (hence, cheaply) if the student does not know the specific question that is coming; I’m ok with that. But complete test security after the fact serves only the test-makers: they get to re-use items (and do so with little oversight), and they make the entire test more of a superficial dipstick, using proxies for real work, than a genuine test of transparent and worthy performance.
To claim that such an audit is able to improve student (and teacher) performance over time is thus harmful nonsense. You don’t have to be anti-accountability – I am not – to see the illogic here. How can secrecy advance performance? Foolish test prep is just one obvious bad consequence of the policy: people mimic the format of the test instead of its rigor, in their ignorance as to what lies behind the curtain, and thus make matters even worse.
To grasp the harm of security after the fact, imagine complete test security in music: imagine if the state music test required students to play pieces of music unknown to the tester prior to the test. Now, imagine that the young musicians cannot hear themselves play as they perform for the test (i.e. they can’t really know how they are doing as they perform). Now, imagine that the results come back months later via an abstract “item analysis” completely divorced from the specific musical passages. Who could possibly improve under these conditions, be they the student or the teacher? Who could have faith in the validity of such a test? Again, the test may succeed as a quick and dirty audit but it utterly fails as a feedback and improvement system.
I find it sad that Massachusetts is backing off its longstanding practice of releasing the entire MCAS test right after it is given. As I have written and said many times, MCAS has been a model for how to do large-scale testing right. Not only have all the tests been released for over a decade, but the item analysis for each question is extraordinarily useful (go here). By seeing how often students get items wrong that require inference and transfer you gain more faith in the test – and you realize that mere “coverage” is poor preparation for the test. Now, consider: is it a mere coincidence that Massachusetts as been the top-performing state for the last few years, as judged by NAEP results? I think not.
Security involves not only the items. When writing is scored by the state or company we deprive all teachers the opportunity to understand what counts as performance to standards. That’s why it is vital that large-scale assessment involve teachers in scoring student work. As anyone in the AP or IB world knows, the collective scoring of work is as interesting and informative as any professional development can be. Check out my daughter’s great blog post on the IB as a model to emulate based on her experience as an IB teacher and now reader.
Such collective scoring also has the desirable effect of making teachers more sensitive to the problem of teacher inconsistency in grading. Indeed, in IB and in Canada and Great Britain, teachers are required to get together to “moderate” their judgments via scoring the assessments, i.e. learn what the prevailing norms/standards are in scoring and use that information to adjust their own personal scoring/grading in the future, accordingly. This is a sorely needed solution to the problem of worthless report cards in a standards-based world, as I have written.
In short, don’t conflate audits with feedback systems. Nothing in the new assessments will likely improve performance, no matter how much better the items, if teachers and students are prevented from learning from their specific successes and weaknesses. We must fight to ensure that teachers play a role in scoring work and in having access to the tests after they are given. As far as I know, this issue has been unaddressed by the 2 testing consortia. (Can readers confirm or refute this?) I encourage all readers to pressure them and their own state department of education on the issue.

7 Responses

Jupiter Mom says:

May 4, 2013 at 11:14 am

I am amazed that you wrote about this 20 years ago. How far have we come? Not very I guess. This has been a huge issue for me as well. You mention it is not right for teachers- but it is also not right for parents to not be able to see how their child did on a test. We get a general score and then they break that down to 4 key elements for each test and how the student did on that key element. This is meaningless. We never see the tests after they are given to our kids. We don’t know if there are errors made in scoring, “dumb” errors made by the child, or flat out they just had no idea type of errors. Testing occurs in April and May. Scores come in the mail over the summer sometime. How is a student to learn from this process? It’s bewildering. And how can a parent know that placement in a course, based on the student’s test score, is an accurate reflection of that student’s abilities? We never do.

Reply
- grantwiggins says:
  
  May 4, 2013 at 11:30 am
  
  Indeed, it is astonishing that we have tolerated it for as long as we have. In many cases it boils down to money: it is very expensive, obviously, to develop a new test each time. So, the politics at the state and national level are not on the side of the angels, alas. But as IB shows, this is do-able if you start from different assumptions. Indeed, for 100 years NY Regents have been scored by teachers, so it’s not like it’s an untried idea. With more regional scoring so you don’t score your own kids’ work, it would be quite feasible and cost-effective (as long as unions permitted the assignment as part of the job).
  
  Reply
  - CitizensArrest says:
    
    May 4, 2013 at 6:01 pm
    
    I suspect that unions would accept this as long as in the contract, their membership were given the time and resources to do the scoring and not be expected to take this work home to do on what should be their own time. There’s enough of that already, and the question of test security then becomes a real issue. Resources are as simple as a room where all the teachers can sit and score together, something that should further enhance test security. The best security enhancement is to remove the high stakes now associated with testing, returning them as you said to being actual feedback systems designed to facilitate improvement. We have already gone too far in the opposite direction these days.
    
    Reply
Mary Whitehouse (@MaryUYSEG) says:

May 6, 2013 at 4:44 am

So different here in the UK.
National testing is marked by examiners, many of whom are also teachers. Marking external examinations is seen as an important mode of professional development. I know personally the enormous insight I gained from finding out how the examiner was thinking. After grades are published the mark schemes are available to schools. At a later date all past papers and mark schemes are available publicly for parents, students and anyone else who is interested.
In addition, schools can request to see the script of any student and the marks awarded for each part question. There is a fee for this, so schools tend to only request the scripts when there is a query over the final grade.

Reply
mrtheriaultfvhs says:

May 13, 2013 at 12:22 pm

When my co-worker travels to grade AP English Literature Essays, she comes back and tells me that it is some of the best professional development she has ever had. What a wasted opportunity to talk about and learn about assessment strategies and student work. We should be doing this on a national scale for ALL testing if we truly believe in the value of the assessment.

Reply
Janet Abercrombie says:

June 2, 2013 at 1:32 am

Odd that teachers cannot see the tests but textbook companies can :).

Reply
smkelly8 says:

September 15, 2013 at 7:19 pm

Reblogged this on Diary of a Temporary Full Time Foreign EFL Instructor and commented:
More trenchant insights in our age of standardized testing.

Reply

20 years later: the immorality of test security, revisited

7 Responses

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Subscribe to the AE Newsletter