In states across the country, field-testing of the exams that will measure students鈥 mastery of the Common Core State Standards is well underway. Much attention is focusing on the questions that this 鈥渢esting of the test鈥 will inevitably raise about bandwidth, access for special populations, and standard-setting.
But behind those questions lurks a more conceptual one: In terms of overall execution, how do the exams crafted by the two main state testing coalitions鈥攖he Smarter Balanced Assessment Consortium and the Partnership for the Assessment of Readiness for College and Careers, or PARCC鈥攕tack up to what they promised in their $360 million bids for federal funding?
There are two ways to consider the question. One is a glass-half-full reading, which focuses on the exams鈥 technological advances and embrace of performance-based assessment. On the flip side, a confluence of political, technical, and financial constraints have led to some scaling back of the ambitious plans the consortia first laid out.
With regard to technology, in 2014-15, most students will take the exams on computers, rather than use bubble sheets, for instance. The Smarter Balanced assessment will adapt in difficulty to each student鈥檚 skill level, potentially providing better information about strengths and weaknesses.
In addition, students taking the PARCC test will write essays drawing on multiple reading sources. And to a level not seen since the 1990s, students taking both exams will be engaged in 鈥減erformance鈥 items that ask them to analyze and apply knowledge, explain their mathematical reasoning, or conduct research.
Still, the exams, nearing their final stage, contain some notable changes from the designs initially put forward by the consortia nearly four years ago. Both have scaled back the length or complexity of some test elements, and their development of tools and supports for teachers has lagged behind the construction of the year-end tests that will be used to generate school ratings.
In sum, say testing experts, what the consortia have accomplished thus far is more like a first draft of their original goals.
鈥淏oth consortia will have tests in 2014-2015 that will be better than almost all existing state tests, if not all. Neither will be as good as promised in their response to the department鈥檚 [request for proposals],鈥 said Scott Marion, an associate director at the Dover, N.H.-based National Center for the Improvement of Educational Assessment, which advises both consortia. 鈥淏ut if they can survive until 2018, 鈥19, 鈥20, they actually might have something pretty good that comes close to living up to their promises.鈥
Great Expectations
The U.S. Department of Education鈥檚 Race to the Top program envisioned an entirely different approach to testing, one that would provide more helpful and timely information to teachers and students.
Both PARCC and Smarter Balanced proposed testing systems that coupled extended, performance-based tasks with traditional items. And they promised to provide tools and resources that would help teachers translate year-end testing targets into instructional units.
Interviews with consortia officials and advisers paint a picture of somewhat different working cultures in the two organizations. PARCC, a more centralized body, took a conservative approach deeply grounded in measuring the common standards faithfully. Smarter Balanced, by contrast, was more loosely structured and open to experimentation, driven by its belief in the power of adaptive testing.
Both groups will continue to use some multiple-choice or machine-scored questions, but many of those items have been enhanced鈥攁llowing students to select multiple answers, for instance, or to drag and drop text from reading passages to cite evidence.
From the beginning, though, the development of the performance-based tasks has been a heavy lift. States initially came to the consortia with very different understandings about what such testing might entail.
鈥淪tates that tended to have multiple choice thought all it meant was an open-ended question,鈥 said Shelbi Cole, the director of mathematics for Smarter Balanced. 鈥淥thers, mainly those on the East Coast, thought, 鈥楴o, it鈥檚 two-weeks long.鈥 They were on really different sides of the continuum.鈥
And the novelty of some of those formats often meant training and retraining item-writers.
鈥淭hey needed time to innovate and learn to break free from old writing templates and rules,鈥 said Bonnie Hain, the former English/language arts senior adviser for PARCC.
One notable technological issue affected design and price point. Both consortia had expressed interest in using 鈥渁rtificial intelligence鈥 scoring to ease the burden of hand-scoring answers. But as it became clear that AI scoring would not be ready to measure the evidence-based reading and writing skills demanded by the common core, both consortia decided to rely on trained educators to score students鈥 responses to the performance-based tasks. (Each group plans to carry out additional studies of AI scoring, in the hope that it might become feasible in the future.)
A Tough Sell
In essence, testing experts say, the consortia faced one of the quandaries of performance-based assessment: It makes for longer, more expensive exams鈥攁 tough sell at a time when resistance to standardized testing and its effect on curriculum is growing from some quarters.
鈥淭he point the consortia are emphasizing is that it鈥檚 very good testing in a sense, and will tell you things we haven鈥檛 been able to tell you before,鈥 said Derek Briggs, a professor of research and evaluation methodology at the education school at the University of Colorado at Boulder who serves on the technical-advisory panels for both consortia. 鈥淏ut it鈥檚 still a hard sell to a lot of parents and children and people who are already skeptical about testing.鈥
Although a vast majority of states have adopted the common core, they won鈥檛 all be using the same assessments to gauge learning tied to those standards. Many will use tests developed by the PARCC or Smarter Balanced consortium; others will go their own way.
SOURCE: 澳门跑狗论坛
Such constraints affected both consortia鈥檚 initial proposals. PARCC early on discarded a plan to scatter three smaller tests, administered at equal intervals over the course of the year, in favor of one window for multiple performance tasks followed by a year-end, machine-scored component.
鈥淭here was a lot of sensitivity about not trying to influence implementation of the standards in terms of the curriculum and the sequence of instruction,鈥 said Jeffrey Nellhaus, the director of policy, research, and design for PARCC.
Smarter Balanced, meanwhile, reduced the number of performance tasks in each subject from three in the initial application to one, comprised of several steps.
鈥淭he price point people felt they could manage politically has meant we鈥檙e doing less than we could have done, and it will not signal as firmly that we want kids to demonstrate their learning,鈥 said Linda Darling-Hammond, a Stanford University education professor who advises the Smarter Balanced consortium.
Smarter Balanced has kept, however, a classroom-based introduction and activity for each performance-based segment meant to help level the playing field for students who come to the exam with different levels of background knowledge.
Some of the consortia鈥檚 decisions also reflect the parameters of the Education Department鈥檚 grant criteria. The federal agency wanted the year-end tests to go live in the 2014-15 school year鈥攁 short timeline for producing the level of complexity demanded, testing experts say.
Meanwhile, the K-12 testing policy inscribed in the No Child Left Behind Act remained unchanged. That meant certain ideas鈥攖esting samples of students rather than every child, for instance鈥攃ouldn鈥檛 be entertained.
Standards鈥 Shifts
Those constraints, though, shouldn鈥檛 detract from some real breakthroughs, according to testing experts. Performance testing in K-12 has never been done at the scale it will occur once the two groups鈥 tests go live, they say. And the consortia鈥檚 advances in that area directly respond to the instructional shifts in the common core.
The performance-based math items created by Smarter Balanced aim to measure whether students exhibit the set of mathematical practices identified in the standards, Ms. Cole noted, such as making sense of problems and persevering in solving them, and reasoning abstractly and quantitatively.
鈥淪tates have claimed anything involving words is problem-solving,鈥 she said. 鈥淲e are asking students to take some steps in terms of sense-making, which is very different from finding a keyword in a problem like 鈥榓ltogether鈥 and knowing that you have to add.鈥
There are innovations in the exams鈥 approaches to measuring reading skills, too. Traditionally, states鈥 reading tests have relied on 鈥渃ommissioned鈥 passages written explicitly for the exam鈥攁nd stripped of interesting, varied syntactical features.
鈥淲e wanted authentic texts, because one of the critical things in the common core is that text should be rich and worthy of reading,鈥 said Ms. Hain, now a consultant to PARCC. 鈥淲hat you find with a lot of commissioned texts is that they鈥檙e pablum. The structure is not worth discussing because it鈥檚 the same old boring, dry, deductive statement, a main idea in the first paragraph, and then three details and a closing sentence.鈥
As a result, she said, PARCC has committed to using only 鈥減ermissioned鈥 texts drawn from actual novels, books, and journal articles in its reading tests.
About a third of Smarter Balanced reading texts are commissioned, but that group, too, says that using authentic texts is a priority. It has struck an agreement with the Copyright Clearance Center, a Danvers, Mass.-based company that irons out copyright permissions with texts not yet in the public domain.
If many of the breakthroughs focus on the year-end tests, there is a general sense that the development of the supplemental, nonstandardized supports for educators, such as model units, videos, and formative assessments, lag behind.
鈥淲hile it鈥檚 not the case that they鈥檝e done nothing on interim and formative assessment鈥攖hey have鈥攖he first priority is the one with the most accountability in it. You could argue that clearly the formative and interim features have not gotten, in either consortia, the same degree of attention,鈥 said Mr. Briggs of the University of Colorado.
Smarter Balanced didn鈥檛 contract with vendors to begin building its Digital Library until early 2013. That resource, which the group hopes to unveil this summer, will include online training modules, exemplar units, and teacher-submitted resources.
PARCC, meanwhile, is still seeking a contractor for its Partnership Resource Center, an online site that will host released test items, model curricula frameworks, and formative-assessment tools, some provided by states鈥 own repositories.
Although many instructional experts support those efforts, they worry that they are coming too late, since teachers are facing instructional challenges now.
Teachers are aware of the end goals espoused in the common-core standards, but need more support in learning how to break them into manageable units, said Margaret Heritage, an assistant director for professional development at the National Center for Research on Evaluation, Standards, and Student Testing at the University of California, Los Angeles.
鈥淢y concern for teachers is getting a handle on these standards and understanding the depth of them, and what it鈥檚 going to take to reach these deeper-level learnings the standards require,鈥 said Ms. Heritage, who sat on Smarter Balanced鈥檚 formative-assessment advisory panel. PARCC does have an optional diagnostic exam, which teachers can use to better pinpoint students鈥 weaknesses, said Mr. Nellhaus. And Smarter Balanced is now deep in the work of creating the teacher supports.
More than 1,400 K-12 teachers are now helping to generate鈥攁nd vet using common criteria鈥攖he resources for the Smarter Balanced digital library, according to Chrys Mursky, the group鈥檚 director of professional learning.
Questions Remain
Finally, a few elements remain open points of concern.
Smarter Balanced鈥檚 adaptive-test model has raised a tricky policy dilemma: whether students who are demonstrably performing significantly above or below proficiency should be given test questions outside their grade level.
To date, the federal Education Department has forbidden that practice, citing the requirements of the NCLB law. Smarter Balanced plans to make its case to the agency, with the input of a variety of advocacy groups and assurances that it will institute plenty of safeguards, said Joe Willhoft, the executive director of Smarter Balanced.
鈥淚f we have a 4th grade student who is very good in math, we want to open up the pool for them to see harder items,鈥 he said. 鈥淏ut we don鈥檛 want to give them something about the Pythagorean theorem. We want to be sure that if they get it wrong, it鈥檚 because they don鈥檛 know the math, not that they鈥檝e just never seen it before.鈥
Another element lies outside the consortia鈥檚 direct control but is of equal import: Will districts鈥 notoriously variable technological capacity be able to support a full testing schedule for all students?"There is the fear certainly in some quarters that it will be an Obamacare-type disaster,鈥 said one consultant on the exams who spoke on the condition of anonymity because he continues to work with consortia officials. 鈥淚f that occurs, it will be more the fault of the Education Department, which insisted on rushing it along.鈥
Both consortia hope that any such glitches will occur during field-testing, allowing enough time for corrections, because any mistake after that point could be costly.
Indeed, support for common assessments seems less assured than for the standards themselves.
Only one state, Indiana, had reversed its adoption of the standards as of mid-April. But criticism of the testing has led several states, including Florida, Georgia, and Pennsylvania to decide against using the consortia tests. And there are external pressures, too, as a variety of nonprofit and for-profit vendors begin to build suites of tests to compete for market share with the consortia products.
With such pressures looming, many in the assessment community hope the consortia鈥檚 efforts will continue to grow stronger over time. The tests mark an important shift away from the basic skills that the NCLB-era exams tended to measure, they argue.
鈥淚t鈥檚 important for people to give the consortia a little bit of charity, given the size of the task,鈥 said the University of Colorado鈥檚 Mr. Briggs. 鈥淚 worry that if they don鈥檛 have it perfect from the start, then people will want to pull the plug. And then we鈥檇 be back to having assessments that look an awful lot like what we had before.鈥