Overboard on Testing?

Save to favorites
Print

Copy URL

For the first six weeks of every school year, Tege Eric Lewis puts away his math books and gets out the overhead projector to prepare students for Indiana鈥檚 statewide testing program. 鈥淏asically, I go through specific overheads and questions that are just like what they鈥檙e going to have on the [test],鈥� says the math teacher at Francis Joseph Reitz High School in Evansville, Ind. 鈥淲e absolutely teach to the test.鈥�

In Florida, Susan Reifenberg spends each homeroom period helping her students get ready for the state assessment: practicing test items and reviewing questions on the overhead projector.

鈥淚t鈥檚 important that we test the students,鈥� sighs the social studies teacher at Brandon High School in Brandon, Fla., 鈥渂ut there is more in life than just passing a test.鈥�

In the push to raise academic standards and achievement in American schools, no strategy has stirred fiercer debate than statewide testing and the use of those results in state accountability systems.

Assessments have always been viewed as an essential part of the standards movement. The reason is simple: If you cannot measure performance, you cannot know whether students have met the standards or identify the areas in which schools and districts need to improve.

But now, some argue that the heavy emphasis on test results to compare schools and districts, dole out rewards and punishments to schools, and decide if students graduate or advance to the next grade has gone too far. In a system where tests were supposed to play a key--but not an exclusive--role in improving performance, they have come to dominate. The issue is especially crucial when a student鈥檚 fate is made to hinge on passing a test.

鈥淭esting is of particular importance,鈥� as Bob Chase, the president of the National Education Association. 鈥淭here is no way that you cannot have assessment to ensure that things are being done right.鈥�

But, he adds, 鈥淚 can鈥檛 tell you how many times I go into schools, and teachers talk to me, totally off the record, about how everything is driven by these assessments: how they know they鈥檙e not teaching thing that kids need to know; how they know they鈥檙e not teaching how to learn because it鈥檚 not on these tests. That doesn鈥檛 mean that the materials that are on tests aren鈥檛 important. But there are other things that should be taken into consideration.鈥�

What鈥檚 more, experts say that even though state testing programs have improved markedly in recent years, far more attention must be paid to the quality of such tests and the extent to which they reflect the standards they鈥檙e designed to measure.

Teaching to the Test?

No one denies that tests are important. They help signal what students should be learning; identify gaps in children鈥檚 knowledge and skills; highlight the unequal achievement among racial, ethnic, and income groups; and provide schools with data to modify instruction. A common feature of schools and districts that have made impressive learning gains is their continuing, close attention to data.

But evidence suggests that, without a better balance, the current emphasis on test scores is leading to some undesirable practices. Beyond the highly publicized, but relatively rare, cases of cheating is the nagging disquiet that tests may be driving instruction to focus on the wrong content.

In a survey for Quality Counts 2001, more than six in 10 public school teachers said state standards have led to teaching that focuses 鈥渢oo much鈥� or 鈥渟omewhat too much鈥� on state tests. About two-thirds said state testing was forcing them to concentrate too much on information that would be tested to the detriment of other important areas. Twenty-two percent reported that they have amended what they teach to fit what is on tests 鈥渁 great deal"; 43 percent have done so 鈥渟omewhat.鈥� And nearly eight in 10 reported instructing their classes in test-taking skills either a 鈥済reat deal鈥� or 鈥渟omewhat.鈥�

鈥淲hat happens is it becomes very clear that you have to teach to the test,鈥� says Eva Morris, who teaches pre-algebra at South Park Middle School in Corpus Christi, Texas.

鈥淭he bad thing is that a lot of it is short-term memory,鈥� she contends. 鈥淎 lot of it is not going to be life-enhancing. There鈥檚 not enough time for enrichment. And there鈥檚 also not enough time for review because there鈥檚 so much to cover.鈥�

Other data reveal similar findings. A survey of 245 New Jersey teachers, for example, asked how often teachers engaged in instructing their students about a variety of test-taking strategies throughout the year and in the month immediately before the exams were given. Teachers in the so-called Abbott districts, a group of urban, mostly high-poverty districts named in a decades-old school finance lawsuit, were more likely to report such practices than teachers in the wealthiest districts.

In a 1998 study of Chicago elementary schools. Researchers Julia B. Smith, BetsAnn Smith, and Anthony S. Bryk concluded that the demand for high test scores had actually slowed down instruction. as teachers stopped introducing new material to review and practice for upcoming exams.

In states that specify the time students are expected to spend on state exams, the mean testing time per year is five hours, 19 minutes. The figure excludes the time students devote to district and classroom assessments. The accumulation of such tests, combined with the time teachers spend preparing students for them. likely contributes to educators鈥� sense that tests are overwhelming instruction.

Brian M. Stecher, a senior social scientist with the Santa Monica. Calif.-based RAND Corp, has surveyed teachers in Kentucky, Vermont, and Washington state about changes in their classroom practices related to state assessments. He found that the tests were influencing classrooms in both positive and negative ways.

On the positive side. for example, portfolio use in Kentucky and Vermont had sent a clear signal to teachers that they needed to work on problem-solving in mathematics and on the written communication of mathematical ideas. Teachers also added new content to their classrooms to reflect what was on the assessments.

But teachers also were shifting the amount of time allocated to various subjects. depending on what was tested. 鈥淚n Kentucky. teachers shifted as much as an hour a week into and out of mathematics instruction, depending on whether math was tested at their grade level .... Stecher says.

<b>Survey Highlights</b> <br> <br>

The same was true in Washington state, where 4th grade teachers reported decreasing the amount of time given to non-tested subjects-such as health, the arts, science, and social studies-and increasing the time for math, writing, and reading. In the Quality Counts survey, 60 percent of teachers said new state standards had made no difference in the amount of time their schools spend on art, music, and sports.

Connecticut鈥檚 board of education became so concerned about the danger of overemphasizing test results that last fall it issued a public warning: Giving too much attention to state test scores, the board declared, could narrow the curriculum and result in inappropriate instructional practices. 鈥淔ocused preparation for state tests,鈥� it urged, 鈥渟hould be a small fraction of a yearlong comprehensive curriculum that balances the competencies assessed on the state tests with other critical skills and objectives.鈥�

鈥楬elping Students Learn More鈥�

One reason educators might be overemphasizing tests, experts suggest, is that many of the academic standards crafted by states in the early 1990s lacked clarity and specificity. That made it easy for teachers to rely on the tests for guidance. In some states, that is still true. Achieve, a Cambridge, Mass.-based nonprofit group created to promote standards-based school improvement, and other organizations have found that some state standards are still too vague and all-encompassing to provide teachers with enough information about what to teach. (See related story, Page 33.)

But the most obvious reason is the pivotal role tests play in state accountability systems. A 50-state survey for Quality Counts shows that 11 states identify low-performing schools solely on the basis of test scores. Sixteen include such additional measures as student attendance and dropout rates, although those rarely count enough to alter a school鈥檚 rating.

Eighteen states require students to pass a test to receive a high school diploma. Seven require youngsters to pass a test to be promoted in specified grades, or they plan to do so in the future.

In North Carolina, where schools can receive bonuses for high performance or test-score improvements or be punished for chronically low test results, former Gov. James B. Hunt Jr. argues that 鈥渉igh-stakes testing is really helping students learn more and be more successful.鈥�

As proof, the Democrat notes that the percentage of students who perform at or above 鈥済rade level鈥� in the state has risen by 32 percent since the testing program began, while the number of schools identified as schools of 鈥渆xcellence鈥� or 鈥渄istinction鈥� has increased dramatically. 鈥淧eople do respond to real consequences,鈥� says Hunt, who left office this month.

Of course, to some, having state tests drive instruction was always the idea. If the assessments are good enough, goes one theory, they will be worth teaching to.

鈥淚f the test is not measuring what was taught, what is it measuring?鈥� asks Diane Ravitch, a senior scholar at New York University. 鈥淚 don鈥檛 have a problem with teaching to the test if it鈥檚 a good test, a test that accurately reflects the curriculum.鈥�

In the 1980s, researchers found that the minimum-competency tests then popular in states were narrowing the curriculum, encouraging teachers to focus on test-taking strategies, and fostering a drill-and-skill mentality. Those complaints led many to call for a new generation of assessments that would be more challenging, provide models for good teaching, and more closely reflect the desired curriculum. In the early 1990s, states such as California, Kentucky, and Vermont and such groups as the New Standards project pioneered work on those newfangled performance assessments and portfolios, although California subsequently dropped its efforts.

鈥楢ssessment Hypocrisy鈥�

In the past decade, many states have expanded their testing programs to incorporate a better mix of multiple-choice and open-ended questions that can probe students鈥� grasp of higher-level skills. But many argue that much more must be done to improve the quality of state assessments and to open them up to public scrutiny. (See related story, Page 27.)

鈥淵ou simply can鈥檛 accomplish the goals of this movement if you鈥檙e using off-the-shelf, relatively low-level tests,鈥� asserts Robert B. Schwartz, the president of Achieve. 鈥淭ests have taken on too prominent a role in these reforms, and that鈥檚, in part, because of people rushing to attach consequences to them before, in a lot of places, we鈥檝e really gotten the tests right.鈥�

Research by Achieve suggests that while state tests have improved, many do not adequately match their states鈥� academic standards. Often, they measure some standards, but not others. And they tend to emphasize less demanding knowledge and skills, rather than the more ambitious academic content spelled out in the standards documents. (See story, Page 33.)

While most states have added short-answer questions to their testing programs, for example, few have invested in assessments that use student portfolios or extended-performance tasks, beyond their writing exams. Some states, such as Arizona, California, Kentucky, and Wisconsin, have abandoned or pulled back on earlier, more ambitious efforts.

In part, that was because early studies suggested that the new assessments did not yield as consistent results as multiple-choice tests. Moreover, the tests were much more time-consuming and costly than off-the-shelf, norm-referenced exams. In Iowa, for example, the cost of administering the Iowa Tests of Basic Skills is 93 cents per student, less than the cost of french fries and a Big Mac at McDonald鈥檚.

鈥淏asically, we haven鈥檛 made the case to the political folks that they should be spending $12 or $14 a test for a student, rather than $2 or $3 a test,鈥� says Marshall S. Smith, a professor of education at Stanford University who was the acting deputy secretary of the U.S. Department of Education under President Clinton. 鈥淭he irony here is that the amount of money is so small compared to the amount of money that states spend educating a student.鈥� In 1999, the average total per-pupil expenditure in the United States was $6,408.

In states that use richer measures of classroom instruction, teachers say they鈥檙e useful. In Maryland, where the exams ask students to apply their knowledge to solve problems that often span multiple subjects, 7th grade teacher Meredeth Haley says the state tests have been 鈥渁 good thing.鈥�

鈥淚 think it鈥檚 because of the type of test they鈥檙e using,鈥� says the teacher at Lansdowne Middle School, which in 1999 posted the largest improvement in state test scores in Baltimore County. The skills demanded on the exams 鈥渁re the skills students are going to need,鈥� she says, such as communicating their thoughts in writing, graphing and interpreting data, and synthesizing information. But few states have followed Maryland鈥檚 lead. Research also suggests that high-quality, ongoing assessments designed by classroom teachers are linked with gains in student learning.

W. James Popham, a professor emeritus in the school of education at the University of California, Los Angeles, says states may be using the 鈥渨rong test for the right job.鈥� Norm-referenced tests, such as the Iowa Tests of Basic Skills, were designed primarily to compare the performance of students against one another, not against a body of content to be mastered, he points out. And it鈥檚 extremely difficult to show progress on such exams based on changes in classroom instruction. Although some of the newer tests, such as the Stanford Achievement Test-9th Edition, may be customized to include additional items that more closely reflect a state鈥檚 standards, Popham says, the match is often superficial at best.

In addition, many states鈥� standards are still so vague, numerous, and ambitious that it鈥檚 impossible to measure them all. Popham urges states to divide their standards into three categories--absolutely essential, highly desirable, and desirable--and to craft assessments for only the first of those groups. 鈥淭o pretend that we鈥檙e measuring all this stuff is a form of assessment hypocrisy,鈥� he contends.

Teachers also say they need help in using test scores to analyze their teaching and improve instruction. Seven in 10 teachers surveyed for Quality Counts said they use test results 鈥渁 great deal鈥� or 鈥渟omewhat鈥� to help diagnose what individual students need. Even more said they use the results, more generally, to diagnose what they should be teaching.

But only 17 percent of teachers said they have 鈥減lenty鈥� of access to training on how to interpret test scores diagnostically. And nearly half said they had received no such training in the past year.

In a review of the standards-based agenda in eight states and 22 districts, researcher Margaret E. Goertz found that districts were paying far more attention to test data than they used to, but that most educators had difficulty linking test results to the kinds of changes needed in classrooms.

Only four states let teachers know how each student performed on every multiple-choice test item. Only nine send teachers their own students鈥� scored work on essay questions.

Teachers report that scoring state assessments, such as student essays, is a valuable professional-development tool and helps them better understand state standards. Yet, only four states currently require classroom teachers to grade state exams. Twelve involve some classroom teachers in such activities.

Tests as Gatekeepers

But the biggest problem--and the reason tests have become the focus of so much contention--is how they are used in state accountability systems. Many now rely exclusively, or almost exclusively, on test scores to reward or punish schools. A growing number of states are basing decisions about individual students, such as whether they receive a diploma or advance to the next grade, on whether they pass an exam. That鈥檚 true despite advice from measurement experts that no single test score should ever be used to make such high-stakes decisions about young people.

It鈥檚 when the academic fate of individual students is at stake that some parents, in such states as Arizona, Massachusetts, Ohio, and Virginia, have risen up to call for the tests鈥� demise or for a modification of the rules.

In Arizona, Lynn and Bonnie Sweet, the parents of a 17-year-old in the Mesa district, wrote to Gov. Jane Dee Hull urging her to seek the repeal of the Arizona Instrument to Measure Standards test, which students will have to pass in writing and reading to graduate beginning in 2002.

Among other concerns, the Sweets complain that no adequate study guides are available to prepare for the exams and that the tests are not adequately aligned with the curriculum. 鈥淭he AIMS test is setting our students up for failure,鈥� says Bonnie Sweet, whose son, Michael, maintains a B average.

Many states insist that they are not relying on a single test score because students have multiple opportunities to retake the exams. In addition, states often require students to complete a minimum number of course credits to graduate.

But Lorrie A. Shepard, a professor of education at the University of Colorado at Boulder, says the problem is that the 鈥渢ests become the gatekeeper.鈥�

鈥淭he question is, if students failed the test twice, would there be some other way that they could prove that they had the competencies? And, if not, states really are not using multiple measures,鈥� she argues.

Some experts suggest using a 鈥渃ompensatory鈥� model in which a student鈥檚 strong performance in one area, such as coursework, could offset low performance on a graduation exam; or a solid score on one subject tested could offset a low score in another subject. Others suggest providing 鈥渁dvanced鈥� or 鈥渆ndorsed鈥� diplomas to students who do well on such tests rather than withholding diplomas from students who fail the exams. Six states offer students incentives in the form of scholarships.

Teacher Meredeth Haley believes Maryland鈥檚 tests have been 鈥渁 good thing鈥� because they gauge such skills as communicating through writing and synthesizing information.

鈥楢 Fallible Technology鈥�

Some experts warn that the demands now being placed on assessments by state accountability systems simply may exceed the technology.

In the last few years, for example, scoring errors and delays have been reported in California, Minnesota, and New York City. Almost 8,000 Minnesota high school students, including 336 seniors set to graduate last spring, were told they had failed the math portion of the state鈥檚 basic-skills test when they had not.

Indeed, a test score, like any other source of information about a student, is not exact. David Rogosa, a professor of educational statistics at Stanford University, has calculated that a student whose 鈥渞eal achievement鈥� on the Stanford-9 is at grade level--or the 50th percentile--will score within 5 percentage points of that level only about 30 percent of the time on the math exam and 42 percent of the time on the reading exam.

鈥淚 think we鈥檝e got to realize that testing is a fallible technology, and that has to be a starting point,鈥� says George F. Madaus, a professor of education at Boston College. 鈥淥nce we start with that, then if kids or schools don鈥檛 do well, we can either look for other measures or go in and try to find out why, and not immediately start retaining kids in grade or denying diplomas or putting schools in receivership until we know a lot more than we know right now.鈥�

鈥淭ests can play a part,鈥� he says, 鈥渂ut not the ultimate part.鈥�

Wisconsin lawmakers switched to multiple measures after protests about the reliance on a single test.

In Wisconsin, for example, state lawmakers mandated that students pass a test to graduate or to be promoted to grades 5 and 9. But, following protests from parents and educators, the legislature reversed itself. Now, districts must draft policies that rely on multiple criteria, including test scores, a student鈥檚 academic performance, and teacher recommendations.

鈥淚nitially, I was resistant to [the use of multiple criterial],鈥� acknowledges H. Gary Cook, the director of the office of education accountability in the state education department. 鈥淚鈥檝e changed my opinion. I think it really forces districts to consider all the pieces of evidence in a student鈥檚 performance to determine whether they should advance to the next grade or graduate.鈥�

But, in general, experts say that just what is meant by 鈥渕ultiple measures鈥� and what kinds of information states and districts should consider in making important decisions about students and schools remains unclear. 鈥淲e really don鈥檛 have a good handle on how to do that,鈥� says Daniel M Koretz, a senior social scientist at RAND. On the other hand, he says, 鈥淚 don鈥檛 think we have, on the horizon, any prospect of a testing process so good that schools that improve their scores on it will be doing everything we want schools to do.鈥�

Others suggest that states also need to strike a better balance between state, district, and classroom assessments, so that schools aren鈥檛 inundated with tests. States should focus on the bottom-line skills that they think students must master to graduate, advises Stanley A. Rabinowitz, the director of assessment and standards-development services at WestEd, a federally financed research center. 鈥淭he state鈥檚 responsibility is to ensure that the kids can work and vote,鈥� he says.

The one solution that is not feasible, most agree, is to get rid of tests, says Richard F. Elmore, a professor of education at Harvard University: 鈥淚f you assume that by attacking the tests, you will somehow fundamentally change the desire of the public at large and of policymakers to have information about individual student performance by school, you鈥檙e just wrong.鈥�

Lynn Olson

Lynn Olson was managing editor of special projects for 澳门跑狗论坛. She also covered national policy (including 鈥淧-16 issues鈥� issues, NCLB standards, accountability, and reform), assessment and testing.

In March 2024, 澳门跑狗论坛 announced the end of the Quality Counts report after 25 years of serving as a comprehensive K-12 education scorecard. In response to new challenges and a shifting landscape, we are refocusing our efforts on research and analysis to better serve the K-12 community. For more information, please go here for the full context or learn more about the EdWeek Research Center.

A version of this article appeared in the January 11, 2001 edition of 澳门跑狗论坛