Benchmark Assessments Offer Regular Checkups On Student Achievement

Save to favorites
Print

Copy URL

School districts worried about how students will perform on end-of-the-year state tests are increasingly administering 鈥渂enchmark assessments鈥� throughout the year to measure students鈥� progress and provide teachers with data about how to adjust instruction.

Nearly seven in 10 superintendents surveyed for 澳门跑狗论坛 this past summer said they periodically give districtwide tests, and another 10 percent said they planned to do so this school year. Such tests typically are aligned to state or district standards for academic content and given three to five times during the year. Some are given as often as monthly.

Victoria Todd, a 3rd grader at London Towne Elementary, finishes one of 38 problems on a Benchmark Assessment Resource Tool test.
Christopher Powers/澳门跑狗论坛

Most benchmark assessments take one hour each for reading and mathematics, but may include other subjects. Extensive reporting systems break down test results by the same student categories required under the federal No Child Left Behind Act, such as by race, income, disability, and English proficiency, in addition to providing individual progress reports at the district, school, classroom, and student levels.

鈥淚 do believe that three years from now, certainly five years from now, no one will remember a time when there weren鈥檛 benchmarks,鈥� said Robert E. Slavin, the director of the Center for Data-Driven Reform in Education, at Johns Hopkins University.

Burgeoning Market

That鈥檚 certainly what test vendors hope. Last year, Eduventures Inc., a market-research firm based in Boston, identified benchmark assessments as one of two high-growth areas in the assessment industry, alongside state exams, with a compound annual growth rate of greater than 15 percent. The company predicted that by 2006, what it called 鈥渢he formative-assessment market鈥濃€攗sing a term sometimes treated as a synonym for benchmark assessment鈥攚ould generate $323 million in annual revenues for vendors.

Read related story,

Not All Teachers Keen on Periodic Tests

But while many assessment experts agree that the idea of frequent testing of students to monitor their learning and adjust instruction is sound, some also warn that districts should take a close look at what they鈥檙e getting for their money and how they are using such exams.

鈥淵ou might say that the message here is, 鈥楪et a second opinion,鈥� 鈥� said Grant Wiggins, the president of Authentic Education, a Hopewell, N.J.-based consulting service that works with districts.

It鈥檚 no secret why districts are turning to benchmark tests. The No Child Left Behind Act, signed into law by President Bush in January 2002, and states鈥� own accountability systems have created a high-stakes environment in which both districts and schools can face penalties for failing to meet performance targets.

View a complete collection of stories in this 澳门跑狗论坛 special report, Testing Takes Off.

In this standards-based environment, the feeling is that the sooner and more often schools have information about how they鈥檙e doing against the standards, the better.

鈥淭he reason that there is a boom in benchmark assessments is that most states and school systems are providing nothing more than autopsy reports right now,鈥� said Douglas B. Reeves, the founder of the Center for Performance Assessment, a private consulting organization based in Denver that works with districts to design fair and rigorous assessments and classroom activities. 鈥淭hey tell you why the patient died at the end of the year, and then marveled that the patient didn鈥檛 get any better.鈥�

Studies by the Washington-based Council of the Great City Schools, the Austin, Texas-based National Center for Educational Accountability, and others have found that one feature of high-achieving districts is their use of periodic, benchmark assessments to track student achievement and make adjustments.

鈥淕ood formative assessments, good benchmark assessments,鈥� Mr. Reeves said, 鈥減rovide feedback throughout the year, and that is far more fair to principals and teachers, provided they are used wisely.鈥�

Vendors Vary

In the past few years, according to Eduventures鈥� 2004 report, 鈥淭esting in Flux,鈥� new competitors have flooded the formative-assessment market, including:

鈥� Major test publishers, such as the New York City-based CTB/McGraw-Hill and the San Antonio-based Harcourt Assessment;

鈥� Test-preparation companies, including the New York City-based Princeton Review;

鈥� For-profit providers that specialize in linking assessment results with prescribed remediation plans and curricula, such as the San Diego-based Compass Learning and the New York City-based Kaplan K-12 Learning Services;

鈥� Nonprofit organizations, such as the Portland, Ore.-based Northwest Evaluation Association; and

鈥� Suppliers of 鈥渨hole-school-reform models,鈥� such as the New York City-based Edison Schools Inc. and Mr. Slavin鈥檚 Baltimore-based Success for All Foundation, which designed the 4Sight assessment series.

The products of such suppliers range from formatted tests linked to the standards in individual states, to item banks that districts and schools can use to develop their own assessments, to online testing, scoring, and reporting systems.

Skimming the Surface?

Lorrie A. Shepard, the dean of the school of education at the University of Colorado at Boulder, voices caution about the trend.

A 2004 report predicted that the market for benchmark or formative assessments would expand by a compound annual growth rate of more than 15 percent from 2003 to 2006.

TEST MARKET
New competitors have emerged in recent years to supply school districts with benchmark assessments. They include:

MAJOR TEST PUBLISHERS, such as CTB/McGraw-Hill, based in New York City, and the San Antonio-based Harcourt Assessment;

TEST-PREPARATION COMPANIES, including the Princeton Review, based in New York City;

SUPPLIERS of whole-school-reform models, such as Edison Schools Inc., of New York, and the Success for All Fouondation, of Baltimore;

FOR-PROFIT PROVIDERS that specialize in linking assessment results with prescribed remediation plans and curricula, such as the San Diego-based Compass Learning and the New York City-based Kaplan K-12 Learning Services;

NONPROFIT ORGANIZATIONS, such as the Northwest Evaluation Association, in Portland, Ore.

SOURCE: Eduventures Inc., 澳门跑狗论坛

While 鈥渘ot all formal benchmarking systems are bad,鈥� she said, she worries about the effects of using 15- or 20-item multiple-choice tests that mirror the format of state exams to drive classroom instruction.

Previous research by Ms. Shepard and others has found that students who do well on one set of standardized tests do not perform as well on other measures of the same content, suggesting that they have not acquired a deep understanding.

鈥淭he data-driven-instruction fad means earlier and earlier versions of external tests being administered at quarterly or monthly intervals,鈥� Ms. Shepard said. 鈥淭he result is a long list of discrete skill deficiencies requiring inexperienced teachers to give 1,000 mini-lessons.鈥�

Good benchmark assessments, she suggested, should include rich representations of the content students are expected to master, be connected to specific teaching units, provide clear and specific feedback to teachers so that they know how to help students improve, and discourage narrow test-preparation strategies.

Rather than trying to assess everything, added Mr. Reeves, the best benchmark tests focus on the most important state or district content standards. And they provide results almost immediately, in simple, easy-to-use formats, he said.

The National Center for Educational Accountability stresses that good benchmark assessments measure performance 鈥渙n the entire curriculum at a deep level of understanding.鈥� They also begin before grade 3 in both reading and math and provide a process to ensure that data on student performance are reviewed and acted upon by both districts and schools, the center says. In addition to such tests, it adds, districts may provide unit or weekly assessments that principals and teachers can use to monitor student progress.

Approaches Differ

But in talking about benchmark assessments, not everyone means the same thing.

According to Mr. Slavin, some benchmark tests, like 4Sight, are designed primarily to predict students鈥� performance on end-of-the-year state exams. They measure the same set of knowledge and skills at several points during the school year to see if students are making progress and to provide an early warning of potential problems.

Other benchmarks are tied more closely to the curriculum, and to the knowledge and skills students are supposed to have learned by a particular time. For example, a skill-by-skill benchmark series in math might focus on fractions in November, decimals in January, geometry in March, and problem-solving in May, rather than testing all skills at the same time, Mr. Slavin said.

Such benchmarks serve as pacing guides for teachers and schools, providing information on whether students have learned the curriculum they鈥檝e just been taught. Some companies claim their tests serve both purposes, predicting students鈥� ultimate success on state tests and gauging how they鈥檙e progressing through the curriculum.

Historically, vendors would design one set of benchmark tests for the entire country. Now they craft tests for each state, starting with the larger ones.

While not everyone means the same thing by the term, benchmark assessments typically:

鈥� Are given periodically, from three times a year to as often as once a month;

鈥� Focus on reading and mathematic skills, taking about an hour per subject;

鈥� Reflect state or district academic-content standards; and

鈥� Measure students鈥� progress through the curriculum and/or on material in state exams.

SOURCE: 澳门跑狗论坛

Many companies also work with districts to design the districts鈥� own assessments, tied to state and district standards, or permit districts and schools to modify previously formatted exams. Some vendors provide large, computerized pools of item banks that teachers and schools can use to create their own classroom tests and check students鈥� progress on state standards.

Stuart R. Kahl, the president of Measured Progress, a Dover, N.H.-based testing company, says that while item banks hold great promise, because they permit teachers to design tests that can be used during the ongoing flow of instruction, one issue is whether teachers are prepared to use them appropriately.

鈥淣ow we鈥檙e putting individual items in the hands of teachers,鈥� he said, 鈥渟aying, 鈥榊ou construct the test; make it as long or as short as you want.鈥� Do we think they have the understanding to know how much stock they can put in the generalizations they make from such exams?鈥�

Some also worry that as vendors have rushed in, quality has not kept pace. The Eduventures report noted that many vendors have marketed formative assessments 鈥渙n the basis of the quantity of exam items, as opposed to those items鈥� quality.鈥� For example, companies may tout having tens of thousands of exam items, it said, although many of the items have not been extensively field-tested or undergone a rigorous psychometric review.

鈥淚 think vendors in our space have found it challenging,鈥� said Marissa A. Larsen, the senior product manager for assessment at the Bloomington, Minn.-based Plato Learning Inc., whose eduTest online assessment system is now used in more than 3,000 schools.

While districts sometimes apply the same psychometric standards to benchmark tests that are applied to high-stakes state exams, she said, 鈥渋n many cases, that鈥檚 not what vendors in this space are trying to do. If we did that, it would be well beyond what districts could afford to buy for formative systems.鈥�

Critics also say that even the best benchmark assessments are more accurately described as 鈥渆arly warning鈥� or 鈥渕ini-summative鈥� tests, rather than as true 鈥渇ormative鈥� assessments, which are meant to help adjust teaching and learning as it鈥檚 occurring. In contrast, summative tests are designed to measure what students have learned after instruction on a subject is completed.

鈥淔ormative assessments are while you鈥檙e still teaching the topic, providing on-the-spot corrections,鈥� said Mr. Kahl. 鈥淲ith benchmark assessments, you鈥檙e finished. You鈥檝e moved on. Not that you don鈥檛 get individual student information, but at that stage, it鈥檚 remediation.鈥�

What Is 鈥楩ormative鈥�?

Yet Eric Bassett, the research director for Eduventures, said the terms formative and benchmark assessments are often used interchangeably in the commercial education market.

And that, some critics say, is precisely the problem.

鈥淚 recognize that I鈥檝e lost the battle over the meaning of the term 鈥榝ormative assessment,鈥� 鈥� said Dylan Wiliam, a senior researcher at the Educational Testing Service, based in Princeton, N.J.

In the 1990s, he wrote an influential review that found that improving the formative assessments teachers used dramatically boosted student achievement and motivation. Now that same evidence, he fears, is being used to support claims about the long-term benefits of benchmark assessments that have yet to be proven. 鈥淭here鈥檚 a lack of intellectual honesty there,鈥� Mr. Wiliam said. 鈥淲e just don鈥檛 know if this stuff works.鈥�

He and others say the money, time, and energy invested in benchmark assessments could divert attention from the more potent lever of changing what teachers do in classrooms each day, such as the types of questions they ask students and how they comment on students鈥� papers.

鈥淚f you鈥檙e looking, as you should be, at the full range of development that you want kids to engage in, you鈥檙e going to have to look at their work products, their compositions, their math problem-solving, their science and social-studies performance,鈥� said Mr. Slavin of Johns Hopkins.

Mr. Wiggins of Authentic Education said that while some commercially produced benchmark assessments are far from ideal, they鈥檙e better than nothing. 鈥淚 would rather see a district mobilizing people to analyze results more frequently,鈥� he said. 鈥淭hat鈥檚 all to the good.鈥�

The key point, he and others stress, is what use is made of the data.

鈥淚t鈥檚 only a diagnosis,鈥� Mr. Slavin said. 鈥淚f you don鈥檛 do anything about it, it鈥檚 like going to the doctor and getting all the lab tests, and not taking the drug.鈥�

Lynn Olson

Lynn Olson was managing editor of special projects for 澳门跑狗论坛. She also covered national policy (including 鈥淧-16 issues鈥� issues, NCLB standards, accountability, and reform), assessment and testing.