One of my favorite stories concerns the legendary basketball coach John Wooden. He always gave himself a research project in the off season. As recounted in the insightful and practical book You Haven’t Taught If They Haven’t Learned, one year Wooden’s UCLA Bruins had done a poor job at shooting fouls. What did Wooden do? Call up the coaches of the best foul-shooting teams as well as the best-shooting players to find out what they did in practice. He learned a vital lesson: too often, he realized, UCLA foul-shooting was not done under authentic game conditions in every-day practice.
So, he changed the routine: players would scrimmage, some would be subbed out. The ones removed would then run sprints, after which they had a set and brief amount of time to shoot no more than a few free throws on a side basket while the scrimmage went on. Players gasping for breath, but only seconds to shoot – just like in a real game. Needless to say, the following year, his team led the league in free throws.
What’s your summer research project? What deficits do you need to ponder and research before school starts in the fall? Let me propose two projects and a general method for doing action research next school year.

Student misconceptions. We now know and have for a long time known that students fail to understand essential (though often counter-intuitive) concepts. There is now a 30-year research history of such hard to eradicate errors in the sciences, for example: see here, here, and here. What stubborn misconceptions did your students have difficulty escaping this past year? Solid gains occur from addressing these.

Student self-assessment. We know from research (especially Hattie’s) that students’ ability to predict their grades accurately, metacognitive ability, and self-assessment on authentic tasks are highly correlated with great gains in achievement. How might you make student self-assessment more central to your work (and your measure of progress) next year?

Go for the gain: pre and post. The general method for doing useful personal research (in these two areas or others) is to construct a pre- and post- assessment system so that you can formally track how much progress you make next year. Indeed, the science misconception literature is typically based on a pre- and post-assessment using a test of misconception, such as the longstanding Force Concept Inventory in Physics.* (Here are some follow-up interviews on the FCI). You can find other science misconception tests here. Here and here are some resources on math misconceptions. In other subjects, what might be a good pre- and post test using the same questions? A simple way to get started is to think of using a key Essential Question as the pre and post assessment question.
A more formal way of developing a baseline, ongoing, and final assessment of performance/understanding is to track the effect size of your teaching. Mathophobes, don’t freak: it is very easy to calculate.  E = post ave – pre ave. /Stdv ave. i.e. Effect size = the class average in the post-test minus the class average in the pre-test, divided by the average of the standard deviations on both tests. 
If you own Visible Learning for Teachers by John Hattie, Appendix E offers a brief, easy to understand, and practical discussion of effect size and how to calculate it. He even walks you through the design of an Excel spreadsheet design to make it automatically calculated if you just input student grades/scores/times. There are other resources here and here.
A virtue of using such effect size calculations is that you can not only compare different tests composed of different amounts of questions, but you can compare very different kinds of tests and measuring systems. For example, history teachers and track coaches can have a common metric that permits progress to be compared across those two different measuring systems (decrease in running times vs. increase in grades).
Another virtue of using effect size calculations is that you can then compare your overall results to all the effect sizes in Hattie’s book and against the key effect size of .40. Why is .40 a key effect size? Because Hattie exhaustively studied all effect sizes and found that this is the average gain of all educational interventions; it is also the normal gain in a typical class after a year of study. In other words, if you get an effect size of .6 or .7 you are achieving a far greater gain with your kids than would be expected just on the basis of growth. If you are getting gains of only .3 or .4 then your teaching is not making a significant difference in the area(s) you targeted. You may also want to look at Hattie’s list of effect sizes of the most commonly used interventions in education to get a better feel for what effect sizes are typical and what great gains are possible. Some of the findings will surprise you and motivate you.
The final virtue of looking at gain or effect size is that the assessment is completely fair and credible. You are looking at progress with the kids you have, on tests you choose; you are not comparing apples to oranges or holding your students to unreasonable expectations or stuck with dopey tests that strike you as irrelevant to your goals. As a result, as with computer games, swimming, and running you and your students will feel more in control of achievement and overcome the fatalism that infects so much of education.
When I taught briefly at Brown University, in my education course I used the same paper topic as the first and last assignment of the course. I also made them add an Appendix to the 2nd paper describing their reaction to looking back at the 1st version. Many of the Brown students said that the exercise was among the most enlightening and gratifying that they had experienced as students. One young man said it perfectly: I had no idea how much I had learned!
Have a thoughtful summer.
* In the original post, I had provided links to the FCI and was quickly (and properly) reprimanded by the official guardians and distributors of the test. To obtain the test and the right to use it (along with lots of other helpful info), go here.

Categories:

22 Responses

  1. It was great to meet you in St. Louis last week, Grant. I asked you about David Hestenes and lo and behold, you write this fantastic blog post.
    I understand how Hattie et al calculates gain, but all I have ever used is normalized gain. John clement had this to say yesterday on the phys-l digest ”
    Effect size = (post – pre)/STD
    An effect size of 1 is considered enormous and many studies do not get
    effect sizes larger than .5. Many PER practicioners get effect sizes
    greater than 1.
    But this definition of gain has the problem that it is skewed by the size of
    the pre-test and also is highly dependent on class homogeneity. Just a
    straight post-pre has a large dependence on the pre test. So Hake came up
    with Hake gain or normalized gain.
    Normalized Gain = (post-pre)/(max score – pre)
    This is relatively independent of the pre test score and is how most PER
    results are quoted. You can convert it to a percentage, and that is how I
    like to quote it. It indicates what percentage of unknown material was
    learned during the class.”
    Does it matter which I use in your opinion? And what are your thoughts in trying to get other teachers to try this too? Thanks again for a great article. Now I have a bunch of links to read!

    • Hey, Jim, I recall our chats fondly. I don’t think it matters at all which you use. Nothing is perfect, especially with small sample sizes. I didn’t want to go into the stats. complexity – that would scare off a lot of readers. But it’s worth playing around with different measures and letting us know what you find. (I also didn’t address the issue of validating the test(s) used or the issue of equating tests if you want to use a different summative than the pre-test). Have a great summer, and let us know how it goes next year!

  2. Thank you for sharing your wisdom and experience and challenging insights here! I am a regular reader and beneficiary of your generosity, as are others I am sharing your blog with. And thanks for the summer challenge, I intend to take it. I think a couple of links above were missing from “There is now a 30-year research history of such hard to eradicate errors in the sciences, for example: see here and here.” I’d love to follow those if you get the chance to post them. Thanks!!

  3. As usual your posts are thought provoking and insightful. As an elem school it hasn’t dawned on me to have our teachers identify the greatest misconceptions out there. Fraction parts and wholes would be one.
    -rob

    • My favorites, some from my own kids:
      1. Dad, I get that there were originally 13 colonies but then how did they move all that land to make the whole country? (look on the school maps how 13 colonies are ‘out’ in the ocean)
      2. Dad, is Spanish just English pronounced differently?
      3. 4th grader: I was mad, Ms. Jones: we flew cross country and not once below did I see lines of latitude or longitude!
      4. Most elem. students think the South Pole is much bigger than most continents because of Mercator distortion
      5. Well-known math one, usually tested on state and national tests: which is bigger, 4.01 or 4.000011?

  4. Dear Grant,
    Please remove your link to the FCI from this blog immediately. We try hard to protect the confidentiality of this instrument. Teachers and researchers depend on us to keep this assessment out of the hands of students. The American Modeling Teachers Association holds the copyright to this instrument. If you wish to discuss this further please contact the executive officer at amtaexec@realstem.com.
    AMTA appreciates your desire to promote good assessment. We also appreciate your (and your readers’) understanding of the need to maintain test security. If teachers desire access to this assessment they can contact fcimbt@verizon.net.

  5. Grant, thank you so much for this important post and references.
    To widen the topic: Derek Muller completed his doctoral dissertation at the University of Sydney by researching the question of what makes for effective multimedia to teach physics. “Results showed that treatments containing alternative conceptions involved higher cognitive load and resulted in higher post-test scores …”
    His guest blogpost summarizes his doctoral research. He included two short videos of his research interviews; they are powerful evidence that direct expository lectures and videos are usually ineffective for novice learners. Including misconceptions can increase the learning.
    http://fnoschese.wordpress.com/2011/03/15/what-puts-the-pseudo-in-pseudoteaching/
    Muller curates the science blog Veritasium: http://www.youtube.com/1veritasium.
    It includes a playlist of his 31 short videos on misconceptions in science, crafted using results from his doctoral research. He is good-humored as he interviews adults. I like watching his videos, and I find them insightful.
    Derek followed up with this guest blogpost:
    http://fnoschese.wordpress.com/2011/03/17/khan-academy-and-the-effectiveness-of-science-videos/
    The 60th comment is by Derek.
    Derek’s work on misconceptions should inform face-to-face class discussions.

  6. These videos taught me a lot about the importance of misconceptions: http://www.learner.org/resources/series26.html — and how it’s important to dig **deeply** for them. We were asked at a conference (as a demonstration of using software for group responses) whether the matter in a stick of wood came from dirt, air, water or the sun… and got to see just how overwhelmingly many people thought wrongly (which is different from not knowing). Smart kids with good teachers conscioulsy trying to address misconceptions still don’t get it right ..

  7. Please, please speak with a trusted colleague or friend with a statistics background before you continue recommending that individual teachers calculate effect sizes based on pre and post tests. This plays right into the rampant misuse of data and statistics in education.
    More homogeneous classes will have less variation, hence the effect size will be inflated relative to a less homogeneous class. The statistic is likely to be so volatile that it will be almost meaningless. Etc. etc.

    • I’m well aware of the challenge, as is Hattie. I deliberately kept it simple. All the recommended resources caution people about sample size and homogeneity. I think your (mildly insulting) comment misses the point of my recommendation and – worse – fails to offer any useful alternative. So, I’d rather they had half a loaf for their own use than none.

      • Your John Wooden example illustrates some of the difficulty with measuring improvement. In his last 9 years the team free throw percentages were 67, 69, 65, 70, 65, 70, 62, 70, and 72. UCLA had a higher percentage than their opponents in 4 of those 9 years and were tied in the last year at 72%. Even individual players see their percentages jump around a lot – LeBron has been anywhere from 70% to 77%. There is so much noise or variation from year to year that it would be difficult to tell if a particular strategy or technique produced a small gain or not.
        My suggestion is to be particularly observant of the mood, energy, and learning in your classroom as you implement changes. A million pieces of information go into forming that impression, most of which are not directly comparable to prior years with different students. If you do a pre & post test, great! That is a good way to put a little pressure on yourself to follow through on your goal. Look over the results and see if there is anything really surprising or different from previous years. But, don’t get wrapped up in computing a number that is measuring change due mostly to factors other than what you are trying to measure. Also, the more focus you put on that number, the more likely you are to unconsciously “game” your own test – at that point, the effect size measurement has actually become destructive. Your gut reaction is worth a lot more than an effect size statistic for a class or two of students.

    • Dr. Richard Hake’s method of calculating what is called Normalized gain is simple and effective!
      Post test% – pre test% / 100-pre test %. I must admit, I have never taken statistics and barely get standard deviations. Normalized gain makes sense to me however.
      How much of what they could grow, did they grow? My question is, what score in normalized gain correlates to Hatties 0.4 (I think that is the magic number) gain?

      • Jim and all,
        A good question! I was wondering that, too. So I looked at Richard Hake’s posts and articles. I found that Richard Hake compares effect size with normalized gain for the Force Concept Inventory, in the first part of his online article:
        Hake, R. 2002. Lessons from the physics education reform effort. Conservation Ecology 5(2): 28. [online] URL: http://www.consecol.org/vol5/iss2/art28/
        He refers to his 1998 article:
        Hake, R.R. 1998a. “Interactive-engagement vs traditional methods: A six thousand-student survey of mechanics test data for introductory physics courses,” Am. J. Phys. 66(1): 64-74.
        [online] URL: http://www.physics.indiana.edu/~sdi/ajpv3i.pdf
        He computed an average effect size of 2.16 for 24 interactive engagement physics courses (1620 students) for which the average normalized gain was 0.50.
        He computed an average effect size of 0.88 for 9 traditional physics courses (1843 students) for which the average normalized gain was 0.24.

  8. I think you’re misrepresenting the effect size teachers should ordinarily expect to achieve in their classes a bit. In his original Visible Learning book, Hattie states that the average effect size a teacher would achieve over a year’s time without deliberate interventions is somewhere in the range of 0.2-0.4. A “year of growth” without average teaching is somewhere around an effect size of 0.15.
    All this is to say that I don’t think an effect size of 0.4 is anything to sneeze at (“not a significant difference”). We’d ideally want to do better than that, and if you’re successfully implementing an intervention in your class you should hope to exceed that bar, but above 0.4 is “above average” for Hattie and better than 0.6 is “excellent”.
    Hattie also stresses that you need to take the amount of effort to implement an intervention into account as well, since some lower-effect-size factors are much easier to implement and do significantly improve outcomes.

Leave a Reply

Your email address will not be published. Required fields are marked *