Emerging technology and research on learning have the potential to dramatically improve assessments, if educators and policymakers take a more balanced approach to using them.
That鈥檚 the conclusion of two years of analysis by the Gordon Commission on the Future of Assessment in Education, a panel of top education research and policy experts that was launched in 2011 with initial funding from the Educational Testing Service.
In a report that was set for release this week, the commission lays out a 10-year plan for states to develop systems of assessment that go beyond identifying student achievement for accountability purposes and toward improving classroom instruction and giving greater insight into how children learn.
Joanne Weiss, the chief of staff to U.S. Secretary of Education Arne Duncan but not part of the commission, said the report 鈥渟hines a needed spotlight on the future of assessment, pushing us to make the next stages of this vital work coherent, coordinated, and sustainable.鈥
鈥淲hen we get assessment right, it helps families, teachers, schools, and systems tailor learning to students鈥 needs and make wise decisions,鈥 Ms. Weiss said in a statement. 鈥淭oday, we stand on the cusp of the biggest advances in assessment in a generation, with assessments that are more useful and less intrusive, thanks in part to advances in education technology.鈥
At a time when student performance on state tests is used to judge everything from teacher effectiveness to school improvement to a high school senior鈥檚 right to a diploma, many in the education world have been pushing hard for better assessments.
Interest in the so-called 鈥渘ext generation鈥 assessments being developed for the Common Core State Standards is so high that last summer visitors crashed the Internet servers of the Partnership for Assessment of Readiness for College and Careers, or PARCC, one of the consortia developing the tests, when it posted sample test items.
Not 鈥楻evolutionary鈥
Both PARCC and the Smarter Balanced Assessment consortium are building computer-based testing systems accompanied by benchmarking tools to help guide instruction. However, the Gordon Commission says the common-core tests planned for rollout in the academic year 2014-15, 鈥渨hile significant, will be far from what is ultimately needed for either accountability or classroom instructional-improvement purposes.鈥
The common-assessment consortia 鈥渁re trying hard to reform what we currently do, and the commission has been thinking about revolutionary change,鈥 said Edmund W. Gordon, the commission鈥檚 chairman and a professor emeritus of psychology at Yale University and Teachers College, Columbia University.
鈥淎ssessment has been almost hung up on a commitment to help account for status and to use those assessments of prior achievements to hold individuals and systems accountable,鈥 Mr. Gordon said in an interview.
By contrast, the commission argues that future educators should use systems of aligned assessments, which would inform instruction through a balance of fine-grained classroom diagnostic tests, challenging tasks and projects, and even analytic tools to sift through background data produced by students in the classroom or online.
Such tools would be used in conjunction with larger-grained accountability tests, which are administered less frequently and tend to have too long a turnaround time to be used to help teachers.
For example, middle school students learning to subtract mixed numbers might use several different methods and substeps to solve different types of problems within that unit, and a teacher might give multiple formative tests on the subject. Formative tests are diagnostic tools that measure a student鈥檚 growth in an academic area over time. In contrast, summative tests provide a snapshot of student achievement at a specific point and are more commonly used for accountability.
鈥淚t makes a lot of sense to check along the way to see where your kids are doing well and getting hung up,鈥 said Robert J. Mislevy, a member of the commission and the chairman in measurement and statistics at the Princeton, N.J.-based ETS, which has helped design the National Assessment of Educational Progress, the SAT, Advanced Placement tests, and other well-known exams.
But in an accountability test, he said, a state education chief may need only a representative sample of students to be given a handful of mixed-number-subtraction problems to get a picture of how well the state鈥檚 students understand that area.
鈥淭o have 20 or 30 problems for every 5th grader to take鈥攖hat鈥檚 a waste of time,鈥 Mr. Mislevy said.
Assessment Council
Roy Pea, a professor of education and learning sciences at Stanford University, who was not part of the commission, agreed that tests developed for accountability purposes 鈥渓argely ignore鈥 the need for formative diagnostic tests used to improve instruction.
鈥淭here are boundless benefits to endorsing [the commission鈥檚] proposal of transforming assessment to render it for education so as to inform and guide daily progress in learning and development, supporting education鈥檚 primary learning and teaching processes with richer pedagogies informed by the learning sciences,鈥 he said in a statement.
The commission calls for states to create a permanent 鈥渃ouncil on educational assessments,鈥 modeled on the Education Commission of the States and supported with a small tax on sales of tests.
The council would, among other tasks, evaluate the effectiveness of the common-core assessments; help set performance-level benchmarks for cross-state tests; provide professional development for teachers and the public on how to use different tests; and develop and study policies and protocols to protect students鈥 privacy while allowing the use of assessment data for research.
The Gordon commission also urges that the next iteration of the Elementary and Secondary Education Act鈥攖he federal government鈥檚 centerpiece education law, currently called the No Child Left Behind Act鈥攅ncourage states and districts to experiment with new, even 鈥渞adically different鈥 forms of assessments.
For example, Mr. Mislevy pointed to diagnostic systems now used in computer-based programs such as Carnegie Learning and Khan Academy, in which students work through individual topics at their own pace, taking brief tests of their mastery along the way, with feedback delivered to the student and teacher on individual processes or misconceptions that cause the student problems.
The panel members also advocate developing more tools to collect information as students work through a task in the classroom, in the same way that some programs are beginning to analyze background data generated by students working online.
鈥淚t鈥檚 assessment, not testing per se,鈥 said Jim Pellegrino, a co-chairman of the commission and a co-director of the Learning Sciences Research Institute at the University of Illinois at Chicago. Rather than trying to build a single test that will cover content and other cognitive competencies, Mr. Pellegrino envisioned, for example, giving teams of students a series of challenging mathematics problems to tackle as a group, and then observing both their ultimate answer and how they collaborate to solve it.
鈥淭hat鈥檚 how you get these other dimensions of competence into the picture, but it鈥檚 very difficult to create a single test,鈥 he said. 鈥淚t鈥檚 why a dropped-in-from-the-sky accountability test, no matter how well designed, can鈥檛 give you everything you want to know about the competencies of students.鈥
At the Margins
Mr. Mislevy of the ETS said he believes the biggest assessment breakthroughs will come at the margins, through individual groups like Carnegie and Khan, rather than the 鈥渂ig machine鈥 of the federal and state testing industries.
鈥淎nd maybe that鈥檚 OK,鈥 he said. 鈥淢aking things happen in the big machine is hard. You need to be more quick, nimble, easy to fail. The big machine doing it all at once is a bad place to try new things and fail at scale.鈥
The commission acknowledges that its paper does not grapple with several big hurdles in developing more-comprehensive assessment systems, among them the cost of developing complex test items and the widely disparate digital infrastructures of the schools that would use the tests.