As I have often written here, the Common Core Standards are just common sense – but that the devil is in the details of implementation. And in light of the unfortunate excessive secrecy surrounding the test items and their months-later analysis, educators are in the unfortunate and absurd position of having to guess what the opaque results mean for instruction. It might be amusing if there weren’t personal high stakes of teacher accountability attached to the results.
So, using the sample of released items in the NY tests, I spent some time this weekend looking over the 8th grade math results and items to see what was to be learned – and I came away appalled at what I found.
Readers will recall that the whole point of the Standards is that they be embedded in complex problems that require both content and practice standards. But what were the hardest questions on the 8th grade test? Picayune, isolated, and needlessly complex calculations of numbers using scientific notation. And in one case, an item is patently invalid in its convoluted use of the English language to set up the prompt, as we shall see.
As I have long written, there is a sorry record in mass testing of sacrificing validity for reliability. This test seems like a prime example. Score what is easy to score, regardless of the intent of the Standards. There are 28 8th grade math standards. Why do such arguably less important standards have at least 5 items related to them? (Who decided which standards were most important? Who decided to test the standards in complete isolation from one another simply because that is psychometrically cleaner?)
Here are the released items related to scientific notation:
Screen Shot 2014-11-24 at 9.11.40 AMScreen Shot 2014-11-23 at 8.40.04 AMScreen Shot 2014-11-23 at 8.41.43 AM Screen Shot 2014-11-23 at 8.40.48 AM bad english saturn Screen Shot 2014-11-14 at 6.26.31 PM
 
It is this last item that put me over the edge.
 
The item analysis. Here are the results from the BOCES report to one school on the item analysis for questions related to scientific notation. The first number, cast as a decimal, reflects the % of correct answers statewide in NY. So, for the first item, question #8, only 26% of students in NY got this one right. The following decimals reflect regional and local percentages for a specific district. Thus, in this district 37% got the right answer, and in this school, 36% got it right. The two remaining numbers thus reflect the difference between the state score for the district and school (.11 and .10, respectively).
#22 Screen Shot 2014-11-17 at 4.48.16 PM #14 Screen Shot 2014-11-17 at 4.49.00 PM
#13 Screen Shot 2014-11-17 at 4.49.14 PM #11 Screen Shot 2014-11-17 at 4.49.25 PM #08 Screen Shot 2014-11-17 at 4.49.42 PM
 
Notice that, on average, only 36% of New York State 8th graders got these 5 questions right, pulling down their overall scores considerably.
Now ask yourself: given the poor results on all 5 questions – questions that involve isolated and annoying computations, hardly central to the import of the Standards – would you be willing to consider this as a valid measure of the Content and Process Standards in action? And would you be happy if your accountability scores went down as a teacher of 8th grade math, based on these results? Neither would I.
There are 28 Standards in 8th grade math. Scientific Notation consists of 4 of the Standards. Surely from an intellectual point of view the many standards on linear relationships and the Pythagorean theorem are of greater importance than scientific notation. But the released items and the math suggest each standard was assessed 3-4 times in isolation prior to the few constructed response items. Why 5 items for this Standard?
It gets worse. In the introduction to the released tests, the following reassuring comments are made about how items will be analyzed and discussed:
explain commentary intro Screen Shot 2014-11-15 at 9.10.13 AM
 
Fair enough: you cannot read the student’s mind. At least you DO promise me helpful commentary on each item. But note the third sentence: The rationales describe why the wrong answer choices are plausible but incorrect and are based on common errors in computation. (Why only computation? Is this an editorial oversight?) Let’s look at an example for arguably the least valid questions of the five:
bad english saturn Screen Shot 2014-11-14 at 6.26.31 PM
Oh. It is a valid test of understanding because you say it is valid. Your proof of validity comes from simply reciting the standard and saying this item assesses that.
Wait, it gets even worse. Here is the “rationale” for the scoring, with commentary:
Screen Shot 2014-11-15 at 9.12.20 AM copy
 
Note the difference in the rationales provided for wrong answers B and C: “may have limited understanding” vs. “may have some understanding… but may have made an error when obtaining the final result.”
This raises a key question unanswered in the item analysis and in the test specs. Does computational error = lack of understanding? Should Answers B and C be scored equal? (I think not, given the intent of the Standards). The student “may have some understanding” of the Standard or may not. Were Answers B and C treated equally? We do not know; we can’t know given the test security.
So, all you are really saying is: wrong answer.
Answers A, B, C are plausible but incorrect. They represent common student errors made when subtracting numbers expressed in scientific notation. Huh? Are we measuring subtraction here or understanding of scientific notation? (Look back at the Standard.)
Not once does the report suggest an equally plausible analysis: students were unable to figure out what this question was asking!!! The English is so convoluted, it took me a few minutes to check and double-check whether I parsed the language properly:
bad english saturn Screen Shot 2014-11-14 at 6.26.31 PM
 
Plausible but incorrect… The wrong answers are “plausible but incorrect.”  Hey, wait a minute: that language sounds familiar. WTF?!? – that’s what it says under every other item! For example:
plaus incorr - linear  Screen Shot 2014-11-15 at 9.11.56 AMScreen Shot 2014-11-23 at 9.11.51 AM
All they are doing is copying and pasting the SAME sentence, item after item, and then substituting in the standard being assessed!!  Aren’t you then merely saying: we like all our distractors equally because they are all “plausible” but wrong?
Understanding vs. computation. Let’s look more closely at another set of rationales for a similar problem, to see if we see the same jumbling together of conceptual misunderstanding and minor computational error. Indeed, we do:
Screen Shot 2014-11-23 at 9.41.36 AM
Look at the rationale for B, the correct answer: it makes no sense. Yes, the answer is 4 squared which is an equivalent expression to the prompt. But then they say: “The student may have correctly added the exponents.” That very insecure conclusion is then followed, inexplicably, by great confidence: “A student who selects this response “understands the properties of integer exponents…” – which is of course, just the Standard, re-stated. Was this blind recall of a rule or is it evidence of real understanding? We’ll never know from this item and this analysis.
In other words, all the rationales are doing, really, is claiming that the item design is valid – without evidence. We are in fact learning nothing about student understanding, the focus of the Standard.
Hardly the item analysis trumpeted at the outset.
Not what we were promised. More fundamentally, these are not the kinds of questions the Common Core promised us. Merely making the computations trickier is cheap psychometrics, not an insight into student understanding. They are testing what is easy to test, not necessarily what is most important.
By contrast, here is an item from the test that assesses for genuine understanding:
Screen Shot 2014-11-23 at 8.42.18 AM
This is a challenging item – perfectly suited to the Standard and the spirit of the Standards. It requires understanding the hallmarks of linear and nonlinear relations and doing the needed calculations based on that understanding to determine the answer. But this is a rare question on the test.
Why should the point value of this question be the same as the scientific notation ones?
In sum: questionable. This patchwork of released items, bogus “analysis” and copy and paste “commentary” give us little insight into the key questions: where are my kids in terms of the Standards? What must we do to improve performance against these Standards?
My weekend analysis, albeit informal, gives me little faith in the operational understanding of the Standards in this design, without further data on how item validity was established, whether any attempt was made to carefully distinguish computational from conceptual errors in the design and scoring- and whether the tentmakers even understand the difference between computation and understanding.
It is thus inexcusable for such tests to remain secure, with item analysis and released items dribbled out at the whim of the DOE and the vendor. We need a robust discussion as to whether this kind of test measures what the Standards call for, a discussion that can only occur if the first few years of testing lead to a release of the whole test after it is taken.
New York State teachers deserve better.
 
 

Categories:

Tags:

30 Responses

  1. Regarding the question that “put you over the edge” – all the answers are wrong in pmy opinion – because they do not include the UNITS being used – a real problem of measurement of course includes units…

    • Well put Gordon, from a science perspective neither do the 2nd or the 5th. Both problems have “correct” answers that have more significant figures than the numbers from which they were calculated. The second one should be 2 x 10^9 and the 5th one 8.2 x 10^8. Just another example where the languages of the disciplines don’t resonate.

  2. I agree with your sentiment regarding what can be learned from the wrong answers–not much. I do think you need to provide a little slack however. The first and third questions would NOT be described as dealing with scientific notation but rather just the algebra of numbers containing exponents. Though scientific notation employs that sort of algebra, one does not need to know scientific notation to solve such problems.
    I’ve always struggled with the idea that complex questions, like the aquarium problem, that demand two operations should be turned into two questions. Then the individual answers might shed light on whether or not a student knew how to, say, convert a number into scientific notation or whether they have trouble with the algebraic manipulation of the scientific notation. If we are just assessing whether a kid “got it right” or not, then I guess a single question would do–hardly formative.
    It’s funny that the question you mentioned as most favorable, I found troublesome in a different sort of way. “Which of these “describes” a non-linear function?” might provoke a student to look for a description or a definition. A typical example of math language in conflict with other sorts of understandings. “Which of these “is an example” of a non-linear function?” provides clarity and a bit of direction to the student as to how to assess the possible answers

  3. The selling point for PARCC is that we will get timely data that will help us create learning pathways for our students. This data doesn’t tell us anything. Even more importantly, as a classroom teacher, it tells us nothing about the individual students in front of us. Does our whole school need to focus on scientific notation or is there a group that gets it? It would help to know who understands and how they understand so we can learn from them to help others.

  4. Sad. Eighth grade is a pretty good year for the CCSS, with the introduction of functions and the move from proportional to linear models.
    So maybe this is just a story of psychometricians who have had their run of the test. Maybe the good of eighth grade is harder to assess reliably than the mundane. I wonder what part technology plays, though. The technological story here is that the technology of selected response items is easier to manipulate than the technology that would allow a student to sketch a non-linear function (for example). So those items either get booted over to the constructed response section or they aren’t assessed at all. If we can get the technology right, the assessments may not improve, but at least the psychometricians would have one less excuse.

    • Your line “the good of eighth grade is harder to assess” is right on the money. Students in the 8th grade may already be in high school Algebra or even higher level classes, or still in the process of crossing that bridge from proportional to linear models, from concrete to the abstract. There is an acceptable range for first words and first steps. Why not for Algebraic reasoning?

  5. In my opinion, the first item you’ve shared is answered by struggling students more usually computing powers than using exponents properties, so it hardly assesses understanding/use of properties. In the sample of released items we find:
    “A student who selects this response understands how to know and apply the properties of integer exponents to generate equivalent numerical expressions.”
    That’s at least wishful thinking

  6. Grant, you know I don’t favor the standards, and I certainly don’t favor high stakes tests. Still, if the issues you illustrate here are to be fixed, we’d have to begin by having real teachers write the standards. Have real teachers, including some who wrote the standards, create the test, and have real teachers, including those writing the standards and the test items, assess the results.
    All of this, though, will take more wasted dollars. Shelve the entire misguided process, and the conversation becomes moot. Plus, education will take a quantum leap forward.

  7. I think he gets the posts but not positive Should I send it?

    Nicole Santora, Ed.D.
    Administrative Supervisor of Curriculum and Instruction
    Freehold Regional High School District
    11 Pine Street
    Englishtown, New Jersey 07726
    732-792-7300 ext.8536

  8. NYs “common core tests” of the past few years have been a disaster from start to finish. When trying to understand why NY ever supported changing their tests prior to PARCC, analyses like this make us scratch our heads and sadly agree with the notion that politicizing teacher blame and supporting industry money from testing, textbooks and certain charters pushed moves like these. These questions are unnecessarily obfuscated, the rationale behind each piece is either kept from us (as Pearson does with so many of its tests) or useless (like these), and the real means of increasing “rigor” was to just escalate the cut score.
    Granted, my background makes me more able to critique ELA / literacy exams than math, but what I’ve seen from PARCC so far looks better than this. Fingers crossed.

  9. I agree with this analysis, and am frustrated that New York has delayed it’s switch to PARCC for at least two years. I sincerely hope that Grant isn’t writing the same type of post in a year about PARCC and Smarter Balanced tests because if they aren’t any better, the already tenuous common standards movement will be nearly impossible to sustain.

    • Didn’t Pearson create the PARCC assessments? Pearson created the NYS assessments. What makes you think they will be an different?

  10. I am a 25 year veteran Science teacher from K-university. As I was chasing my “Piled Higher and Deeper” degree, I happened upon an article (forgive me for not having the reference available), that speaks to not only CCSS, but also NCLB (or as my colleagues call it, NTLS (No Teacher Left Standing)).
    This article spoke to the notion that according to their work, assessments can assess student learning, teaching, or program development. However, when one assessment assess all three components at the same time, compromises take place: teaching becomes shallow, student learning takes a back seat to teacher evaluation and program development.
    After all, to what degree should the focus of assessment be the authentic measure of student learning growth and that above all else?
    Perhaps other standards of teaching effectiveness and program effectiveness need to be developed. In my state, we have a teaching rubric by which we are evaluated; however, CCSS presents a much higher stakes evaluation of teacher effectiveness and causes my colleagues much higher concern than the evaluation rubric.

  11. I do not know where you teach, but my population consists of severely learning disabled fifth graders. Over half my class are second language learners. Like you, the CC standards for math are common sense ideas and not really new as much as they foster vocabulary and problem solving skills. My kids went farther than ever because we were able to spend a lot more time on each skill. Mind you, most of my class came into my functioning at a kindergarten thru second grade level on their norm-ref. tests and by the time I had to retest them for their IEPS my lowest kids (I had 5 that were actually developmentally delayed & not LD) functioning at a 3rd grade level and my three highest kids test at a 6th grade level. They felt really good about themselves. And then the NYS Beast came. This is the second year in a row where there were so many errors in the presentation of the test that I was floored and many of the questions were intentionally set up for failure. The perverse view of NYSED is the majority of students will fail regardless if they are gen. ed, special ed., or ELL. It’s what King wants, it’s what Cuomo wants and it’s what Pearson wants. As I already agreed with you CC standards reflect a common sense approach for all students but the testing methodology is draconian.

  12. I am a grade 8 math teacher in NJ. The question you cited about nonlinear functions is challenging – too challenging, I think, for the developmentally average 13 year old. AND our students will be taking the PARCC test on a Chromebook with no paper to sketch out a graph if they can’t reason it out the answer mentally. You’re concerned with the number of questions that deal with scientific notation. I’m totally flabbergasted by the number of standards related to rotations and other transformations. Again, no paper and Chromebook test.
    I am very concerned by the fact that, as with most math standardized tests, one can never know by just receiving a score whether the student made a computational error, a conceptual error, or even a READING error (my all time favorite test question about 5 years ago on measurement: “the 35 mm in a camera is a) the opening of the lens, b) the space of the exposed film, c) the distance between the sprockets, and d) the width of the film – all this in an age where kids used digital cameras and cell phones)!
    But the thing I’m most upset about is not what my evaluation will look like, but the fact that the new standards were not phased in. Our average 8th grade students are now expected to master concepts that make up the first half of the high school Algebra course. My students are expected to have mastered material that was never covered before the start of the year. Also, in our district, if a student is developmentally ready for Algebra or above, they are already in those classes. . The tears in my class break my heart and I will never forgive the “deformers” for sucking the joy out of students who can learn successfully but at their own pace. As the saying goes, “if you judge an elephant on its ability to climb a tree, it will always be a failure.”
    Please forgive my rant.

    • Forgiven. I find it unfortunate that 8th grade is now where you are expected to learn the core of Algebra I. That has not worked out well in districts that have tried it in the last few years.

  13. Thank you for looking out for teachers and students. I would just like to give credit where credit is due.
    From my experience, New York has been very transparent about the state test compared to New Jersey and other states. The fact that NY will publish annotated test items is a treat that not every state provides. I teach in NJ now, but after having taught in NY, I often visit engageny.org to get some ideas about what to expect on a Common Core-aligned test. Maybe NJ is publishing something? If so, I’d like to know where to get that information.
    As tempting as it is to slam the annotated test items, I’d rather provide helpful feedback to NY so they continue to provide that resource. The resource may not be perfect, but it’s something that could be improved. Do you think NYSED should stop providing the resource altogether? They could save themselves all the hassle and the criticism. As an 8th grade math teacher, I certainly hope they continue to provide those items each year.

    • I am with you completely – I wrote a recent post on the fundamental need for transparency on tests via released tests. Better to have what NY offers than what NJ does, for sure. But the reports I criticized were poor. I would rather just see all the released items and get the correct %s than have them waste all that print on bogus analysis of some items. But if the choice was some items vs. none – for sure. Neither, however, is optimal. What i do not understand is why the Regents do not demand such openness, something done with Regents exams for 100 years.

  14. Hi! Math educator, here. Mr. Wiggins, you spend a great deal of this post criticizing this planet problem as an example of “Picayune, isolated, and needlessly complex calculations of numbers using scientific notation.” You’re right, but quite literally so, calculations are *not necessary* to solve this problem. A conceptual understanding of scientific notation – presumably one given to you by the Common Core – can give you the power to solve this without a single calculation.
    The idea behind teaching scientific notation is that kids get an idea of how big these numbers are. So we have two numbers and we want to know the difference between them. Choices A and B are immediately eliminated. Their orders of magnitude (sizes) make them impossible options. The author seems unusually stuck on B, but the difference between two positive numbers cannot be larger than either number. So the student is left to choose between C and D. They differ in size by a factor of ten. So you ask yourself, what’s the difference between several tens and several hundred? (It’s in the hundreds.) Several hundred and several thousand? (It’s in the thousands.) Several thousand and several ten thousands? (It’s in the ten thousands.) So you get the idea. The answer has to have an exponent that matches the larger number, so D is your answer, which is the correct solution.
    This problem is solvable “by inspection,” which is fancy math talk for “just by looking at it,” but *only* if you have a solid understanding of scientific notation. I am only a little depressed that you devoted this much time to eviscerating a question when the quickest way to solve it does in fact hinge upon a student’s underlying understanding of the subject matter. A plug and chug approach is not necessary and certainly a last resort. I’d imagine how this unit is taught in the classroom would make this particular approach even clearer.
    Anyway, thank you for showing exactly why the Common Core would be a great improvement on our current system, even if you didn’t mean to.

  15. What scares me, Mr. Wiggins, as a teacher in Georgia, is that I doubt test items will be released after the test is administered, we won’t be able to discuss or look at test items, and we really do not know what to expect from our state’s version of the Common Core assessments. This year I won’t be evaluated on student growth, but next year I will be. How unfair that I’m evaluated on a test where my resources consist of a test bank of many items from the the days of the CRCT! I know the standards, and I try to teach reading, writing, language, and the whole shebang, but I’m not sure how to teach students in a way they will master the test. The test may have language/grammar questions out of context–from years’ past. At the same time, I have to teach them to compare two texts for tone, craft, theme, main idea, etc. Plus, students need to be able to respond to the text in writing. All this in 55 minutes with one class for reading, writing, and language. I remember when you spoke in Cobb years ago and said that it would take 2-3 years to teach the standards with the depth they deserve, and those were the GPS standards. My goodness–I think the same is still true with the Common Core–especially the way some of the the assessments are designed. I’m not sure how to do the best at what I do and help student growth move up and to the right (especially when the tests are so poorly designed and ask so many language questions out of context). Plus, I have no idea what the Georgia CTB McGraw Hill version of the test will look like. Will this look like the NY version of CC? Advice on how I move forward? Or do I just keep feeling my way through the dark doing what’s best for kids in reading and writing and hope for the best. Truly in all my years teaching I have never felt such pressure.

  16. Hmm, the question seems straight forward and easy to me…but I’m an adult. But I think any kid who understand scientific notation wouldn’t have a problem with it.

Leave a Reply

Your email address will not be published. Required fields are marked *