A modest proposal for when Common Core testing collapses

If you agree that the track we are going down on high-stakes one-shot testing of every student in terms of Common Core is unproductive and unsustainable, I have a modest proposal to make about how to ditch the tests but move common core standards forward.
Let’s use matrix sampling in national testing, as NAEP has always done it and as CAP in California used to do it. Matrix sampling means that no student sees all or even most of the questions, and that individual student scores need not be reported (or, if they are reported, they are less reliable than school results). That way, building-level and district-level results would be the focus, as it arguably should be. And the test could then use many more tasks and types of tasks, distributed over many students, to give us valid and reliable data on all the Standards that we cannot now make happen, due to time constraints and the need for individual scores to be precise and comparable. And, last but not least, the test for the individual student could be short.
This approach would also allow for teacher accountability to head back where it belongs: a local decision, on both criteria and policy. And it would thereby rid us of some of the current ridiculous schemes that require such practices as the music teacher being partially evaluated using school English test scores.
The further benefit of this approach is that we could then couple it with a policy requirement that all schools and districts demonstrate that local assessment is of high quality: calibrated to national standards, and that there are policies and practices in place to ensure quality control in local assessment. Because despite all the work in standards over 25 years, most local assessment systems are still neither valid nor rigorous, as I have learned the hard way in working with hundreds of schools on assessment.
Yes, I know: some people insist on having individual student scores for various reasons (incentive for students and teachers, data, reporting to parents, etc.) Yet, through well-known psychometric practice (Item Response Theory, or IRT), we can approximate student scores with SUFFICIENT reliability for the context of the assessment.
Here is the NAEP account of how this works:

To ensure that the item pool covered broad areas, the booklets were assembled using a variation of matrix sampling called Balanced Incomplete Block (BIB) spiraling. Like matrix sampling, BIB spiraling presents each item to a substantial number of students but also ensures that each pairing of items is presented to some students. The result was that the correlation between any pair of items could be computed, albeit with a smaller number of students than responded to a single item.

The major design feature in 1983 was scaling the assessment data using Item Response Theory (IRT). At that time, IRT was used mainly to estimate scores for individual students on tests with many items. IRT was fundamental to summarizing data in a meaningful way. Basically, IRT is an alternative to computing the percent of items answered correctly. Given its assumptions, IRT allowed the placing of results for students given different booklets on a common scale.

A “balanced incomplete block (BIB) spiraling” design ensures that students receive different interlocking sections of the assessment, enabling NAEP to check for any unusual interactions that may occur between different samples of students and different sets of assessment questions. This procedure assigns blocks of questions in a manner that “balanced” the positioning of blocks across booklets and “balanced” the pairing of blocks within booklets according to content. The booklets are “incomplete” because not all blocks are matched to all other blocks. The “spiraling” aspect of this procedure cycles the booklets for administration so that, typically, only a few students in any assessment session receive the same booklet (Messick, Beaton, and Lord 1983).

Matrix sampling with IRT scores and local assessment quality control policy is win-win: lowers the stakes that lead to test prep, makes teacher accountability more valid and owned; and ensures that educators locally take firm hold of the problem of unreliable and invalid local assessments.
A bit complex and with some compromise – but it HAS to be better than the current path…

12 Responses

rob says:

May 9, 2014 at 3:21 pm

it would also lessen the amount of direct (giving kids answers) or indirect (everyone gets extra recess if we try hard) ‘cheating’ which occurs everywhere.

Reply
- grantwiggins says:
  
  May 9, 2014 at 4:30 pm
  
  Agreed – though, believe it or not, many people over the years have claimed that NAEP results are the result of kids having no incentive to care about the results. So, extra recess might be a good ‘carrot’!!
  
  Reply
Matt Renwick says:

May 9, 2014 at 8:02 pm

Very smart post, Grant. I share the same belief on these types of assessments.

Reply
Kevin Hall says:

May 10, 2014 at 4:59 pm

Will the adaptive aspect of SBAC and PARCC mean that they can get the same (or more) information on student proficiency from fewer questions? I was hoping so.

Reply
- grantwiggins says:
  
  May 10, 2014 at 6:41 pm
  
  That’s a great question. I would think so, but I would need to look into it.
  
  Reply
- Sandra McDermott says:
  
  May 16, 2014 at 9:14 am
  
  PARCC will not have adaptive aspects, only SBAC.
  
  Reply
danwinters12 says:

May 10, 2014 at 8:25 pm

In your travels have you seen a rigorous, valid, and reliable measure of reading comprehension in elementary school? Our district is using the level-set test from Metametrics and some of us are skeptical of its validity.

Reply
- grantwiggins says:
  
  May 10, 2014 at 8:49 pm
  
  Reading assessment is a black hole – so hard to have trust in any one measure. I think the key is triangulated data – Running records, comprehension tests, portfolios of work. But for comprehension per se, I think DRP tests have the virtue of being quick and transparent.
  
  Reply
danwinters12 says:

May 11, 2014 at 1:33 am

Triangulation makes a lot of sense. Most of us are too quick to settle for the quick, and less reliable solutions. I’ll definitely take a look at the DRP.
Cheers

Reply
Joe Schwartz says:

May 12, 2014 at 7:04 am

Would that it were so. As I’ve heard Stephen Krashen say, “When you go to a doctor for a blood test, he doesn’t take ALL your blood.”

Reply
Tracy says:

May 14, 2014 at 12:03 pm

I love the idea of BIB spiraling. I’m afraid that issues with SBAC and PARCC testing may lead more and more states to throw out CCSS as a whole. Can our decision-makers separate the framework itself from the high-stakes assessment?

Reply
- grantwiggins says:
  
  May 14, 2014 at 4:57 pm
  
  It’s a good question, and the answer is: we do not know. It could be that states form consortia, as in Northern New England; it could be that states go back to going it alone. It is unclear to me whether the key policy people have the stomach for a protracted battle over national testing.
  
  Reply

A modest proposal for when Common Core testing collapses

12 Responses

Leave a Reply Cancel reply

Recent Posts

Recent Comments