We interrupt this general look at test validity to comment on very important educational research that was just made public (though the news of the findings was made known a few months ago on the APA website).
In an exhaustive study that used MCAS test scores from Massachusetts and various tests of cognition (related to working memory, processing speed and fluent reasoning) researchers from Harvard, Brown & MIT examined the relationship between achievement in school as measured on standardized tests and student cognition.
We already knew that these cognitive skills are fundamental in advancing or inhibiting intellectual achievement generally and school achievement specifically:

These maturing mental abilities are thought to broadly underpin learning and cognitive skills. Variation in these measures predicts performance on a wide range of tasks among adults, including comprehension (Daneman & Carpenter, 1980), following directions, vocabulary learning, problem solving, and note-taking (Engle, Kane, & Tuholski, 1999). Critically, these cognitive abilities are associated with academic performance. Executive function measured in preschool predicts performance on math and literacy in kindergarten (Blair & Razza, 2007), and parental reports of attention span persistence in 4-year-olds predicts college completion at age 25 (McClelland, Acock, Piccinin, Rhea, & Stallings, 2013). Likewise, [working memory] skill correlates with math and reading ability among 5- and 6-year olds (Alloway & Alloway, 2010) and among 11- and 12-year olds (St Clair-Thompson & Gathercole, 2006), and predicts mathematics and science achievement among adolescents (Gathercole et al., 2004). Thus, cognitive skills appear to promote or constrain learning in school.

Given that results on tests of cognition predict achievement, might it work in the other direction? In other words, do results on achievement tests predict cognitive abilities?

What is unknown, and crucial for informing educational policy, is whether general educational practices that increase academic performance also have a positive impact on basic cognitive skills. Schools traditionally focus on teaching knowledge and skills in content areas, such as mathematics and language arts. Use of such knowledge can be referred to as crystallized intelligence (Cattell, 1967). In contrast, fluid intelligence refers to the ability to solve novel problems independent of acquired knowledge; the cognitive measures in the present study are indices of fluid intelligence. Do schools where students are experiencing high levels of academic success in crystallized intelligence achieve this success by promoting the growth of fluid cognitive abilities? The strong relation between cognitive ability and academic performance suggests that schools that are particularly effective in improving academic performance may also improve domain-independent cognitive skills..

And so: what did the researchers find?
Oops. Better achievement on state standardized tests yields little or no gain on these cognitive skills:

Which school students attended explained substantial variance in students’ achievement scores, but not in measures of their cognitive skills…. These findings suggest that school practices that influence standardized achievement tests have limited effects on the development of cognitive skills associated with processing speed, working memory, or fluid reasoning…. These findings raise the question of what kinds of abilities are indexed by high-stakes statewide standardized tests that are widely used as a measure of educational effectiveness.

The finding that variation in schooling influences crystallized but not fluid intelligence is consistent with a population study of over 100,000 males in Sweden (Carlsson, Dahl, & Rooth, 2012)

As the researchers point out, such skills are improvable by teaching that targets them deliberately:

Although school-level educational practices that enhance standardized test scores may not increase broader, fluid cognitive abilities, there is evidence that targeted interventions—both in and out of school— may increase cognitive ability. Preschoolers enrolled in a year-long executive function training curriculum improved performance on untrained executive function tests (Diamond, Barnett, Thomas, & Munro, 2007). Children receiving an intervention emphasizing the development of cognitive and conceptual skills…from birth to either 5 or 8 years of age, performed better on both standardized intelligence (IQ) and academic tests (Campbell, Ramey, Pungello, Sparling, & Miller-Johnson, 2002). Teaching inductive reasoning to third and fourth grade students improved performance on untrained reasoning tests and a fluid reasoning measure if the intervention lasted for two years (de Koning, Hamers, Sijtsma, &Vermeer, 2002). … Eight-week training in after-school programs focused on either reasoning or speed training selectively enhanced performance in 7-9 year-olds (Mackey, Hill, Stone, & Bunge, 2011).

So, we are left with a vital question, once we realize the importance of traditional school tests (both standardized and locally designed): if schooling is supposedly key to adult success, what happens when schooling separates content knowledge from thinking skills and measures (and thus teaches) only the former? And might our stubborn problem of the achievement gap be based on measuring and teaching the wrong things?

It is unknown, however, how a selective enhancement of crystallized intelligence, without the enhancement of typically correlated fluid intelligence, translates into long-term benefits for students, and whether additional enhancement of fluid intelligence would further bolster long-term educational and SES outcomes.

No one familiar with my work will be surprised by these findings. They buttress the work of the last 15 years of Understanding by Design as well as warning what happens if the curriculum is reduced to teaching and testing of discrete content.
Yes, of course: it’s one study. And as a researcher will always say: “More research is needed.” And there are caveats in the research, noted by the authors (and, thus, ironically, a validity question, given the specific evidence gathered in terms of a broader goals of the research, just as was discussed in the previous post).
However, this research underscores many other findings, summarized in the National Academy of Sciences seminal text How People Learn. This latest finding helps explain why transfer is far rarer than we want it to be and expect it to be given all the teaching. It helps explain the science and mathematics misconception literature which highlights the non-fluent rigidity of naïve concepts and knowledge.
And, as the authors note, it raises troubling questions about the validity of all typical tests of achievement used to evaluate student achievement and school effectiveness. Because if the tests reward content knowledge but not powerful thinking – yet, all Standards highlight important thinking – then the tests may be yielding invalid inferences and thus very harmful consequences.
Which won’t surprise anyone who has been paying attention to the reform agenda of the last 50 years. But it should make a whole lot of traditionalists – in psychometrics as well as in classrooms – do some re-thinking.
[Thanks to Rob Ackerman for alerting me to this research]

Categories:

Tags:

20 Responses

  1. I do wonder for all the acclaim the “skill and drill” test prep factory schools brag about, if in fact, the students are not truly learning for the long term. Or in simple terms, is the success of some of these schools overrated?
    -Rob Ackerman

  2. Thanks the post and especially to the alert of the new research. For clarification, the MCAS test is more of content oriented assessment? Does this study clarify the type of standardized tests? The reason I’m asking is that our state uses college entrance exams as an indicator (ACT) which I have always thought of as a skill based test and therefore a more valid indication of cognitive growth. I will try to get my hands on the study and read it myself. Thanks for any comment you have to offer.

    • The ACT is different from MCAS which focuses on content standards. The ACT (and the SAT) focus on what might called analytical and linguistic skills. Whether the same results would hold true is unclear to me without a close look at the test. But given that the students’ HS grades are a better predictor than the SAT or ACT of college performance, we can probably conclude that there is something interesting going on here.

  3. I wonder if some of the difficulty we have in discussing this is because we may have an unclear idea of what fluid cognitive abilities really are. Sure, we readily acknowledge that we want our students to be creative problem solvers who can work with others, but when pressed for specific examples, we revert to listing “recall” items or rote skills. I know I struggle with this because my own education focused primarily on the learning of skills and the recall of facts.
    Maybe our conversation could reach more people if we spent more time showing specific examples of cognitive abilities in action? Your sports analogies are useful, but not everyone is involved in sports, so business or community examples would be more real.

    • Your point is well taken: people need a way of making sense of these kinds of studies through experiences that make the ideas real. I think a helpful point here is that ‘fluency’ is the ability to avoid getting stuck in routines, stereotypes, rules – a good example, thinking that an essay must have 5 paragraphs. Memory and processing speed is easy to see in video games. A person on the floor of the New York stock exchange has to have these cognitive capacities, too. Yes: let’s better understand the real-world facility in using such skills.

  4. The key to your argument lies in the evidence that fluid cognitive abilities can be taught. I am skeptical about this, despite what the researchers in this particular paper say. Carl Bereiter wrote a seminal chapter on the topic of teaching such transferable thinking skills in his book “Education and Mind in the Knowledge Age”. In discussing thinking skills programmes, he notes that, “The measures used to evaluate thinking usually embody the same assumptions as the program being evaluated. They usually consist of brief, trivial tasks similar to the exercises used in the program and they offer no evidence that improved scores predict any improvement in real-world performance”
    I suspect that the evidence that such skills can be taught falls foul of this argument. In other words, students are trained in the exact activities that are in the fluid cognitive abilities assessment and – no surprise – they then perform better on this assessment. Far from improving their fluid cognitive ability, such an intervention simply reduces the validity of the assessment in determining it.
    If we cannot really change things such as our working memory then the logic of cognitive load theory suggests that we might be able to mitigate these limitations by building reserves of knowledge in our long term memory. Indeed, this seems to be what education has largely evolved to do. I am not claiming that there are no generic, transferable skills that can be taught. There are. However, they are likely to be quite limited in scope. The long, hard slog of building domain knowledge is still required.
    Such work is often disparaged as the teaching of ‘rote, disconnected facts’ (I’ve never really understood why facts must necessarily be rote or disconnected). However, the quest for a thinking skills alternative is, by the evidence available, the quest for El Dorado.
    You might enjoy my post “Thinkamajiks” on this issue. http://websofsubstance.wordpress.com/2013/12/28/thinkamajiks/

    • While I appreciate the need for caution in saying to what extent basic cognition is improvable, I think there are enough studies to show that transfer across domains is possible. Many so-called thinking skills programs are not representative of what I think the authors are getting at. I did my dissertation, in part, on the critical thinking skills literature and I found it to be a mess.
      And again I think you are guilty of caricature in talking about ‘rote’ learning. The problem is exactly what the study authors describe about crystallized vs fluent thinking. The misconception and transfer literature are quite clear that transfer IN THE CONTENT AREA is greatly impeded by conventional methods of instruction – e.g. providing students with simple rules of thumb or initial explanation and then not deepening it (e.g. the problem of the 5 minute essay or the equal sign in math). Look at the chapter in How People Learn on Transfer and I think you’ll find that the research is clear: there are ways to inhibit and advance transfer in the manner in which content is addressed in ways that link closely to this recent study. It’s the same reason why Mazur’s Physics students do better than traditionally taught students: he constantly confronts their misconceptions (from the FCI) by eliciting them and then examining them via feedback and discussion.
      No one I know says domain knowledge is not important. The question, rather, is how to make it fluent. And there is lots of research to show that conventional pedagogy is more ineffective that we would wish. There is, in fact, no other reasonable way to explain the science misconception literature or the glaring errors on tests were stuff that was ‘taught’ was not learned by most students.
      I plan a friendly rebuttal to you and edrealist on this subject next month. Good conversation.

  5. A large part of cognitive ability, working memory, etc., is genetically determined. It may be improved to small degrees, but there is not much research to that. Saying it can be “taught” is similar to saying you can teach someone how to run a 10 second 100 meter dash. Good luck.
    I interpret this study as saying that schools cannot generally improve cognitive ability. Hence, it follows from this that there should not be one Standard, but students’ different levels of cognitive ability should be taken into account for assessment.

    • I don’t think your interpretation is warranted by the study. You analogy is also misleading: of course almost none of us can be taught to run under 10 seconds in the 100 meter dash. Neither can we be taught to be the next Einstein. But this doesn’t show that there cannot be considerable improvement due to training, etc. To adopt your analogy: the Olympic Gold medal times in the 100 meter dash 100 years ago are now surpassed regularly by high school students. This is because current high school students actually have better training methods than Olympians 100 years ago did.
      The problem with a quick suggestion of “genetic determinism” (whatever that means, exactly – all genes only have causal effects with the environment), is that we almost never know what the norms of reaction are for a given genotype. Even in the case of identical twins raised in two different environments, we only get two data points for that genotype. So the truth is that in most cases, with skills as complex as cognitive ones, we are not in a position to tell how constraining ones genetic endowment is.

      • “we are not in a position to tell how constraining ones genetic endowment is”
        Yes, we are. Read a bit: http://www.ncbi.nlm.nih.gov/pubmed/24002885
        Or any of the thousand genetic studies out there.
        My analogy is not misleading because most people do not have the ABILITY to run under 10 seconds. Better training, higher expectations, etc., can all have an impact. But ability makes up the majority of individual performance outcomes.
        This shouldn’t be controversial given the genetic research out there.

        • Mr. Metzger,
          I am a track coach. Are you suggesting that since most people are not genetically gifted enough to run under 10 seconds that we can simply stop coaching athletes toward becoming better sprinters because their genetics have all ready determined their outcome?

          • No, this is not what I am saying at all. Coaching improves outcomes, but most people do not have the physical ability to run it under 10 seconds. Grant talks about setting “one standard”. My point is that if you put the standard at 10 seconds, very few will able to run it, regardless of training, coaching, etc., because that standard is beyond the ability of most people.
            My issue is with setting standards, not with coaching or expectations. Standards must in some way reflect the underlying ability of students, otherwise they are worthless.

          • I think you are confusing standards with expectations. The standards for college-ready work are what they are – that’s a given. Reasonable expectations are something else entirely. If you want to make it on Top Chef you had better be able to meet their standards…

          • I don’t believe so. I am saying you can’t expect someone to meet a standard if they don’t have the ability to do so. I am also saying that the standards themselves are implicitly based upon what people can actually do.
            Defining “college-ready work” is not set in stone. It must be implicitly based upon what some previous college students have accomplished.
            If someone wants to set a criterion-referenced standard with no connection to ability, then it can become meaningless. Say for example, a person would have to run 100 meters in under 5 seconds to avoid being eaten by a lion. It’s a clear standard – either you reach it or you’re done. But if you put that standard out there – run 100 meters under 5 seconds – absolutely no one on the planet can do it. So it would be a “standard”, but it would be worthless since no one has the ability to reach it.

          • But your example is off. College ready is a defensible criterion, even if there are, in fact, different standards because Harvard is not Lenox Community College. But that was my whole point about the track example: you need to know what the predictive standard is. That’s why the ACT and SAT have stayed in business even if we think they shouldn’t – because there is a standard of predictive value.
            It’s no different for the arts. You cannot hope to get into Juliard or Berklee without an ability to sing or play at a certain level. You can’t expect to get a black belt if you only have a white belt. You can’t expect to get a job as a translator if you are only at Novice High on ACTFL. It is a disservice to kids to not communicate these standards and where they stand against them – that’s all I am saying.

      • Thanks for the post in the New Yorker. I wanted to share a conversation spawned by this blog and the article posted above I had with a colleague in our school regarding standardized test, specifically the ACT.
        Over the past few years along with many other schools, our administration has been focused on driving ACT scores up. The thought here is that since ACT is a skills based test than a score increase shows an improvement in transferable skills, a higher likelihood for student success beyond high school, and a measurement that says we are fulfilling our responsibility as educators. To a certain degree there is some truth to this, but I have found some interesting patterns of behavior. Teachers have started doing ACT prep. They have even gone to the lengths of putting example questions in their exams.
        These actions had me questioning my colleague… “Doesn’t the idea of test prep reduce the skill to something non-transferable? If the skills assess in the ACT are transferable, shouldn’t we be able to address each skill within our own discipline rather than reducing our instruction down to ACT prep?” My Colleague, indeed, agreed if we do emphasize the learning of these skills within the context of our discipline than we should see improvements in ACT scores. However, he also supported the use of test prep using an analogy from football……”Recruiters often use standardized tests such as the 40 yard dash to assess player’s skill sets in football. They use other tests for agility and explosive power as well. Coaches tend to give these same tests throughout the season not only for the players performing well for recruiters, but also for the purpose to track their improvement over the course of the season.”
        That is when it dawned on me. Football coaches and recruiters use a much smarter assessment than the ACT. They assess specific skills like speed with parameters that allow for a comparison to a specific target – he ran the 40 yd dash in 4.35 seconds as opposed to a composite of 28 on the ACT. Further, there is a cause and effect relationship in terms of what a coach can do and improvements of a 40 yard dash –for example, more speed drills. Since ACT assesses skills in a variety of contexts without isolating anything in particular the scores become vague and as a result a daunting task for instructors to figure out how to improve that skill set in their students. In addition I suspect the interplay between the different skills are relatively unique among these standardized test and so the scores can be interpreted as a great indicator of how well a person will do on a standardized test. (Just because a person has a lot of speed and agility does not necessarily mean that the athlete will be great at both basketball and football)
        So how do you improve scores with such a vague value? By test prep of course! So it seems we have the perfect system in place to reinforce test preparation rather than true disciplinary literacy. I’m not saying nothing can be gained from these tests, but someone ought to lay down the rules that a standardized test is merely a vague, broad, value that should not be used as an end for planning instruction.

        • You are going to love my next post, then, because it addresses this very issue. If the test is a proxy and not a direct measure of the real thing, it MAY serve psychometrics but it hurts education. And that’s what I will argue, based on as seminal paper 40 years ago that argued this very point.

Leave a Reply

Your email address will not be published. Required fields are marked *