Corrected: A previous version of this story misspelled the first name of Jesse Register, the director of schools for the Nashville, Tenn., school district.
Includes updates and/or revisions.
Next week marks a major milestone in an assessment project of unprecedented scope: the start of field-testing season for new, shared tests of a common set of academic standards.
Between March 24 and June 6, more than 4 million students in 36 states and the District of Columbia will take near-final versions of the tests in mathematics and English/language arts. Those exams鈥攖ied to the Common Core State Standards that all but a handful of states have adopted鈥攚ere created by a bevy of vendors hired at the request of two groups of states: the Smarter Balanced Assessment Consortium and the Partnership for Assessment of Readiness for College and Careers, or PARCC.
鈥淚 don鈥檛 think a trial of this magnitude has been done anytime in the history of student testing in the U.S.,鈥 said Keith Rust, a vice president at the Rockville, Md.-based Westat, where he oversees the sampling of schools and students for the National Assessment of Educational Progress, or NAEP.
The exercise won鈥檛 produce detailed, scaled scores of student performance; that part is still a year away. Instead, this spring鈥檚 field-testing is a crucial part of the assessments鈥 design stage, undertaken to see what works and what doesn鈥檛. Questions like these are on test-makers鈥 minds: Will schools鈥 hardware and bandwidth be able to handle large-scale, computer-based testing? Do the tests work equally well on desktops, laptops, and tablets? Which items might confuse or overwhelm students?
Immense stakes are riding on the field tests. The federal government is watching closely to see how well its $360 million investment鈥攁warded in grants to the state consortia developing the exams鈥攊s paying off so far, especially since it has let more than a dozen states drop all or part of their current testing regimens in order to participate fully in the field tests.
See a graphical breakdown of the two state consortia鈥檚 field-testing plans.
States that pledged loyalty to the project need to see that they can rely on the tests, since those states plan to base crucial decisions on them鈥攕uch as how to evaluate schools, teachers, and students鈥攚ithin a year or two after the final tests are available in spring 2015.
School districts have made massive investments in technology to manage the consortium tests, and have spent countless hours preparing teachers, students, and parents for the new system鈥攁ll on the faith that enduring the inevitable problems during the transition will pay off in a much better assessment than what they鈥檝e been using. Amid a wave of anti-testing sentiment, many parents and activists are poised to seize on problems in field-testing as one more sign that large-scale testing is misguided.
A Combustible Moment
Those elements create a combustible moment: An experiment deliberately designed to uncover weaknesses in a high-profile test takes place under intense public scrutiny.
鈥淭he consortia are going to have to be pretty confident they鈥檒l see minor glitches, but not major problems,鈥 Mr. Rust said. 鈥淵ou wouldn鈥檛 want to go into this on a wing and a prayer. If it goes badly wrong, it shakes people鈥檚 confidence that it will be right the next time.鈥
In fact, just days before the planned March 18 start date for field-testing by Smarter Balanced, the organization took the major step of postponing the launch by one week to allow time for what Jacqueline King, a spokeswoman for the consortium, called some final 鈥渜uality checking.鈥 She said the delay was not about the test鈥檚 content, but rather ensuring that all the important elements, including the software and accessibility features鈥攕uch as read-aloud assistance for certain students with disabilities鈥攚ere working together seamlessly.
The Partnership for Assessment of Readiness for College and Careers and Smarter Balanced Assessment Consortium say their computer-based tests will offer an array of accessibility features. Many of these features can be used by any student, but some are geared specifically to students with disabilities, or to English-language learners. The field tests will offer the first opportunity for students to try the accommodations in a test situation.
SOURCE: Partnership for Assessment of Readiness for College and Careers; Smarter Balanced Assessment Consortium
There are some key differences between the field tests and the fully operational assessments that will be used in the spring of 2015. Length, for instance: Students will typically be involved in three to four hours of field-testing, less than half as long as what they鈥檒l face next spring.
In the real PARCC test, students will take both a multiple-choice, end-of-year component and a more extended and complex performance-based section. On the field test, only 25 percent to 30 percent of students will take both pieces, and only in one subject, said Jeffrey Nellhaus, the director of assessment for PARCC. The rest will take either the end-of-year or performance segment.
The Smarter Balanced operational test in 2015 will be computer-adaptive鈥攁djusting the difficulty of questions to the student鈥檚 skill level鈥攂ut the field test, for the most part, will not be. A small number of students will get the adaptive version at the end of the field-testing window, said Ms. King. That鈥檚 because test-makers will use the questions students answer earlier in the field test to calibrate the adaptivity of the test engine later in the field-testing window.
Representative Samples
While some schools volunteered to participate in the field tests, most were chosen by their state or their state鈥檚 consortium as the multistate groups sought to build demographically representative samples of students. The result is a distribution of students taking the field tests that is wide nationally but not, in general, deep in individual schools.
The PARCC field tests will involve about 10 percent of the students in the participating states and districts, but they are scattered across half the schools. That pattern is deliberate and beneficial, Mr. Nellhaus said.
鈥淎 more spread-out testing pattern,鈥 he said, 鈥渕eans that you won鈥檛 get a clustering effect in the sampling鈥 that could magnify the impact of anomalous conditions in any one place. 鈥淚t also avoids a heavy impact on school life.鈥
Most students are taking the field tests in only math or English/language arts; a subset will be tested in both subjects. Some states, however, such as California, Connecticut, Idaho, Montana, and South Dakota, have chosen to wade much deeper into the field-test exercise. They鈥檙e involving all鈥攐r nearly all鈥攐f their students. While that takes a greater toll on schools鈥 time and focus, leaders in those states decided that the payoff would justify the effort.
Those states were among the ones that obtained waivers from the U.S. Department of Education to cut back or eliminate their existing state tests to free up time to try the field tests. Since the new tests aren鈥檛 final, the data they produce can鈥檛 be used for accountability purposes, so the federal government has agreed to let the waiver states hold their accountability ratings steady for another year.
鈥淲e decided that it was a great opportunity for students to experience the test when it doesn鈥檛 count,鈥 said Deborah V.H. Sigman, the deputy state superintendent of education in California, where 95 percent of the students will answer Smarter Balanced field test items in both subjects, and the rest will take the test only in one content area.
鈥淚t鈥檚 also a way for adults in the [school] building to think about what they need to do to optimize the experience for next year.鈥
Some districts are doing more comprehensive field-testing than their states. The suburban system in Burlington, Mass., 15 miles west of Boston, chose to give the PARCC field test to every student in grades 3-11 in both subjects. Eric Conti, the superintendent of the 3,600-student district, said he thinks it鈥檚 good for adults and students to experience something 鈥渁s close to the real thing鈥 as possible.
A federal waiver allows Massachusetts students participating in the PARCC field test to skip the state鈥檚 regular testing under the Massachusetts Comprehensive Assessment System, although 10th graders still must take the MCAS to graduate.
Burlington was originally chosen by PARCC to do only the paper-and-pencil version of the field test, and only in some classrooms, in grades 3, 4, 8 and 10, Mr. Conti said. But he wanted to put his district鈥檚 technological readiness to the test鈥攊t has a computer for every student鈥攕o he appealed to the state for permission to use the computer-based version with all children in tested grades, he said.
The district has made a deliberate research subject of itself, not only with PARCC, but with the Cambridge, Mass.-based Rennie Center for Education Research and Policy. Working with the state teachers鈥 union, the superintendents鈥 association, and the state education department, the Rennie Center will examine what happens in different field-testing scenarios in Burlington and in Revere, a small urban district near Boston.
Burlington, for instance, will 鈥渓ivestream鈥 the field test, so any loss of its network connections will interrupt the exam availability, Mr. Conti said. Revere, on the other hand, will 鈥渃ache鈥 the field test, downloading it and pumping it out locally. Burlington is trying the field test on varying devices, including iPads, Chromebooks, Mac desktops, and PCs, in a bid to see what works well and what doesn鈥檛.
鈥淚t makes no sense to show off technologically,鈥 Mr. Conti said. 鈥淲e could probably test all our kids in three days. Our network could handle it. Instead, it will be a three-week disruption.
鈥淏ut the point is to see what happens,鈥 he said. 鈥淎s a superintendent, I plan 18 months in advance. When we do it live a year from now, it will impact my budget if we have to make changes. I鈥檇 rather know that sooner than later.鈥
Balancing Opposition, Potential
The Nashville, Tenn., school system illustrates both the promise and the risks districts face when taking part in what the consortium test designers call 鈥渢esting the test.鈥 About 10 percent of the district鈥檚 83,000 students will take the PARCC field test, either in math or in English/language arts.
Jesse Register, the district鈥檚 director of schools, said he thinks the experience will 鈥渢ake away the fear of the unknown鈥 for teachers, students, and parents. It also complements the work the district has been doing to invest heavily in technological infrastructure and in training teachers to use technology to differentiate instruction, he said.
Since Nashville鈥檚 schools enroll one-third of Tennessee鈥檚 English-language learners, Mr. Register considers his district鈥檚 participation pivotal to ensuring the PARCC test works well for students whose native language isn鈥檛 English. 鈥淔or our data to be included in how PARCC is going is important to influencing the design of the test,鈥 he said.
Even as the Nashville schools inform a potentially better test, the district is treading on bumpy turf. Without a federal waiver for Tennessee, Nashville鈥檚 students will have to take both the PARCC field tests and the state鈥檚 regular assessments. And that likely will draw some criticism, Mr. Register said.
鈥淲e鈥檙e getting some pushback now about too much assessment,鈥 he said. 鈥淲e have to communicate very effectively with our parents and with our teachers to make sure this doesn鈥檛 become a negative.鈥
Looking for Weak Spots
More than a few worries are shadowing the landscape as field-testing gets underway. Technological capacity is high on the list.
鈥淲e have 60 computers in one computer lab in our school. Our tech people are worried about our servers,鈥 said Kristin Winder, a 6th grade teacher in Great Falls, Mont.
One district experienced such problems in the run-up to the PARCC field tests that it decided against participating. District sources said their faith was undermined by last-minute changes in test dates, student files uploaded but then lost, and other logistical and communications slip-ups.
鈥淲e simply couldn鈥檛 allow our system鈥檚 first experience with PARCC to be a negative one,鈥 a district official said in a confidential email obtained by 澳门跑狗论坛. 鈥淲e believe it would have undermined our work and our staff. Students and parents deserve better.鈥
The complexity of mounting field tests on such a large scale is daunting. PARCC鈥檚 field-test-administration manual weighs in at 180 pages. The readiness exercise has spawned countless memos and staff meetings across states and districts as systems gear up for the field test. Educators have spent time trying practice tests with students, and administrators overseeing the coming exams have experimented with 鈥渢raining tests.鈥
鈥淵ou can imagine the planning it takes to put something like this in place,鈥 Ms. Sigman said of preparing for the Smarter Balanced field tests.
Some see big benefits in all that planning, as it provides a glimpse into how the common standards should inform instruction and a preview of the forthcoming tests. Others see those hours as a tragic mischanneling of education energy and resources.
鈥淪chools are spending all this money trying to get wired and ready for PARCC and Smarter Balanced. And who鈥檚 getting that money? Corporations,鈥 said Peggy Robertson, an Aurora, Colo., literacy coach who co-founded United Opt Out National, which seeks to eliminate high-stakes standardized tests. 鈥淭he less money schools have, the more likely it is that they鈥檒l fail. All of it is a setup for charter schools and the privatization of public education.鈥
Teams from each consortium will be watching many aspects of field-testing closely to figure out what works well and what doesn鈥檛.
Questions of technology loom large: How many children can a given school test at one time? If a teacher is streaming video in her classroom while other children take the test down the hall, will it overload the system?
The teams are looking for many other outcomes as well. What kinds of answers does a given question elicit from a range of students? Test designers will have detailed student-level information鈥攑egged to unique new identifiers to protect students鈥 identities鈥攖o enable them to see if some questions stump subgroups of students, such as those in a given area of the country or those from certain racial or socioeconomic backgrounds. Do students who perform well on most parts of the field test consistently trip on some items?
Those kinds of observations will lead to a weeding-out or revision of questions, typically as many as 10 to 20 percent of the total, Ms. King of Smarter Balanced said.
Other questions involve how to scale and score the tests. PARCC officials, for instance, will be considering whether to treat the end-of-year portion and the performance-task portion as separate exams, with separate scales and scores, or to combine them into 鈥渙ne big test,鈥 Mr. Nellhaus said. And if they are combined, should the two pieces be weighted differently?
In the end, the two consortia are keenly aware that they鈥檙e asking a lot of participating schools and districts: major time investments and schedule disruptions for what amounts to a research project to refine the test.
鈥淭his is why we do this,鈥 Ms. King said. 鈥淭o see what works and what doesn鈥檛.鈥
Even some of those most committed to the project are feeling trepidation. One district official who described himself as 鈥渒nee deep鈥 in preparations said he is bracing for blowback from his staff and his parent community if even moderate problems arise with the test.
鈥淚 just hope it鈥檚 worth it in the end,鈥 he said.