澳门跑狗论坛

Opinion
Assessment Opinion

Are Fourth Graders Who Don鈥檛 Test Like Seventh Graders Really Failures?

By James R. Harvey 鈥 April 30, 2014 6 min read
  • Save to favorites
  • Print
Email Copy URL

This week Madhabi Chatterji, associate professor and founding director of the Assessment and Evaluation Research Initiative at Teachers College, Columbia University and James Harvey, executive director of the National Superintendents Roundtable wrap up this month-long conversation between measurement experts and educators on the front line. Meiko Lin, who managed the blog on behalf of Chatterji and Harvey, helped guide the discussion by formulating some questions about unresolved issues. Read their final thoughts and 鈥渢akeaways鈥 from Assessing the Assessments.


Meiko Lin: What emerged from this exercise? What gaps were left in the discussion?

James Harvey: This was an extremely useful undertaking. It revealed the eagerness of both measurement experts and front-line educators to engage around assessment issues. There seemed to be considerable concern on both sides about overuse and potential misuse of testing. Madhabi Chatterji鈥檚 initial commentary and directly addressed these issues. I believe we got an excellent sense of what impact the assessment for accountability movement of the last decade has had at the district, school, and classroom level. From a practitioner鈥檚 perspective, I learned that there鈥檚 a strong sense of shared interest from academics like and superintendents like and in strengthening formative assessment. There also seems to be some concern on the part of academics like and about the importance of learning from the assessment mistakes of the past.

There also seems to be disagreement on some points that could be empirically settled. On one hand, we have the assertion from that education plays a crucial role in economic competitiveness and the (undoubtedly true) argument that U.S. performance cannot be attributed solely to poverty. On the other, we have the arguments, from , that education, per se, plays at best a limited role in competitiveness and, from , that 40 percent or more of outcomes by nation, on average, are related to socio-economic factors.

So, one gap is that we were never really able to iron out (or even highlight) these differences. It鈥檚 not clear the blog had the capacity to change closely held beliefs on either side. Another is that I don鈥檛 think we did the issue of Value-Added Measurement justice--either arguments for it or against it.

Finally, we didn鈥檛 really address the implications of the fact that NCES continues to insist in each of its major reports that the NAEP benchmarks 鈥should continue to be used on a trial basis and should be interpreted with caution.鈥 Yet each of the Common Core assessment consortia have with defined the NAEP proficiency benchmark as the acceptable level of performance for every student in the United States. That decision by itself is enough to explain why the passing rates of New York students dropped like a stone when the Common Core assessments were implemented on a trial basis. Why should schools be deemed a failure because only a third of fourth-graders are comfortable with ?

Meiko Lin: How do we guarantee that all stakeholders have a role in assessing teaching and learning? What should we expect from each of them?

James Harvey: This is a powerful and difficult question. It governs how measurement professionals and testing companies on one hand and policymakers, educational leaders, and principals and teachers, on the other, understand what each side is doing. One must not make the mistake of assuming that educators are not interested in assessing learning. They have always assessed it. The issues that need discussion from the educational side include: Should computer-based formative national assessments replace teacher-developed assessments? What has a reductive emphasis on reading and language arts done to the curriculum? What has it done to the school calendar? Does accountability at the school and district level really require that every child, in ever school be tested every year, in every subject? While each assessment may be justified for different purposes in its own terms, the accumulation of assessments overwhelms schools, as pointed out.

From the measurement side, I believe the testing companies (whether profit-making or non-profit) need to do a much better job of explaining how they construct and administer these assessments. While it may be difficult to explain the complexities of large-scale assessments to educators, the reality is that teachers, principals, and district administrators are entitled to a much clearer sense of how much judgment goes into selecting items, samples, and the matrices which determine which students get which testing booklet. It is hard to find much evidence of practitioner involvement in the rapid development of the Common Core or its assessments. I鈥檓 overstating it when I say that I鈥檝e often thought that the modified Angoff procedures used to establish benchmarks on some of these large-scale tests are little better than throwing darts at a blackboard, but I do know that many measurement experts themselves have serious questions about this process.

Meiko Lin: What are some of the ways you would recommend that the general public interpret international large-scale assessment (ILSA) results? What can we infer and what can鈥檛 we infer from ILSA results?

James Harvey: I think I鈥檇 ask the general public and policymakers to remember the words of Daniel Kahneman, Princeton University Emeritus Professor of Psychology and winner of the 2002 Nobel Prize in Economics. In , Kahneman observed, 鈥淭he errors of a theory are rarely to be found in what it asserts explicitly; they hide in what it ignores or tacitly assumes.鈥

He explained it by a weakness he observed in himself: 鈥淚 call it theory-induced blindness: once you have accepted a theory and used it as a tool in your thinking, it is extraordinarily difficult to notice its flaws.鈥 In fact the theory about how the world works can easily mislead us as to what is actually going on.

I truly thought Oren Pizmony-Levy鈥檚 contribution to the blog broke new ground. He reminded us that ILSA鈥檚 were developed not to rank countries against each other but to provide educators and policymakers within each country a better sense of how successful they were in pursuit of their own education goals. What the ILSA鈥檚 (and to some extent NAEP) now tacitly assume is that they are assessing school outcomes. I believe they are ignoring what is actually going on and, in many ways, misleading policymakers as to what needs to be done. Granted, policymakers, in demanding report cards that rank countries, are complicit in this. I think measurement experts should be in the vanguard of those questioning these demands. What ILSA rankings are ignoring is the social context in which schools around the world function. They make the mistake of believing that because they assess learning in schools that鈥檚 where it all takes place. What they are measuring is not school success alone (although that鈥檚 part of it), but each society鈥檚 commitment to the well -being of its next generation.

The opinions expressed in Assessing the Assessments are strictly those of the author(s) and do not reflect the opinions or endorsement of 澳门跑狗论坛, or any of its publications.