International Exams Yield Less-Than-Clear Lessons

Save to favorites
Print

Copy URL

Almost every time the results of an international test of student achievement are released to the world, the reaction among the American public and policymakers is like that of a parent whose child just brought home a disappointing report card.

Elected officials and academic experts question where U.S. students fell short: Was it our curriculum, our teaching, or a confluence of out-of-school factors? What did other nations do well? And what changes to American classrooms would help U.S. students make strides on the next round of tests?

Despite such reaction, many observers鈥攅ven those who interpret the test scores very differently鈥攕ay that American policymakers need to guard against simplistic interpretations of the results of PISA, TIMSS, or PIRLS, the acronyms for three high-profile tests given periodically to samples of students in dozens of countries. Similarly, researchers and test experts urge U.S. officials to be cautious in the lessons they draw from the impressive scores of high-performing Asian and European nations.

Back in 1983, the seminal report A Nation at Risk warned that U.S. schools were slipping, a trend that posed economic risks for the country, the authors said. It鈥檚 a theme that has re-emerged in force today.

The National Commission on Excellence in Education, which issued the report a quarter-century ago, saw Germany and Japan as the United States鈥� chief rivals, but policymakers had only limited means to judge the panel鈥檚 gloomy hypothesis about American students鈥� subpar skills.

Today, international measures provide leaders with test scores and other data. Yet determining what those results say is a vexing task, complicated by variables in demographics, policies, and social and cultural norms.

The tests offer little concrete information about why some countries score so well, making it difficult to mine lessons for school policy and practice, according to Daniel M. Koretz, a Harvard University researcher.

鈥淲e shouldn鈥檛 go assuming that just because [high-performing] Finland did something, we can just adapt what they do to our schools and we know how it鈥檚 going to turn out,鈥� Mr. Koretz said. The comparisons 鈥渁re very good for helping us set expectations,鈥� he said, 鈥渂ut they don鈥檛 tell us what鈥檚 working or give us a new or better tool.鈥�

Applying lessons from high-scoring nations to the United States requires a careful analysis of how those countries educate their students, both in and out of school, figuring out what strategies are useful to American schools, and testing proposed changes before scaling them up, said Andreas Schleicher of the Organization for Economic Cooperation and Development.

鈥淭he temptation is to copy and paste education systems of high-performing countries into your own,鈥� said Mr. Schleicher, the head of education indicators for the Paris-based OECD, which oversees the , or PISA.

鈥淏ut the best way to use the results,鈥� he said, 鈥渋s to look at the drivers that make a particular education system successful or less successful, and think about how to configure those drivers in your own national context.鈥�

Different Outcomes

Even as international assessments have garnered increased attention among U.S. researchers and policymakers, there are broad disagreements about what those results say about American students鈥� performance.

One reason for those disputes is that U.S. test results look very different, depending on the exam. American students, for instance, fare reasonably well on the , or TIMSS, scoring above the 2007 international averages in 4th and 8th grade math and science by statistically relevant margins. American 4th graders also scored high on the Progress in International Reading Study, or PIRLS, notching marks well above the international average.

But on PISA, the United States scored statistically below the 2006 international averages for industrialized countries in both science and math literacy.

TIMSS, like the domestic National Assessment of Educational Progress, primarily tests students鈥� knowledge of school-based curriculum, though it does so across countries.

The goal of PISA is different. It measures the skills students have acquired and their ability to apply them to real-world contexts. Unlike TIMSS, PISA evaluates not only in-school learning, but also abilities students have picked up outside of school. PISA also tests students of a specific age, 15, rather than a grade; most U.S. students, though not all, are 10th graders, and the grade levels of students in different countries can vary, federal officials say.

While the PISA results are more discouraging, that test is arguably the most relevant standard for judging U.S. students, said Gary W. Phillips, a vice president and chief scientist at the Washington-based American Institutes for Research. TIMSS groups the United States with many developing nations with far fewer resources, he noted. PISA, by contrast, compares American students against only relatively wealthy, industrialized nations.

鈥淲hat you should be doing is comparing yourself to your economic competitors,鈥� said Mr. Phillips, who has studied the performance of U.S. states and cities internationally. 鈥淭o me, the OECD鈥濃€攚hose members are all industrialized countries鈥斺€渋s a good average to be comparing yourself against.鈥�

When it comes to gauging the ability of American students against foreign peers, he said: 鈥淚t depends on your goal. We should be discouraged if our goal is to be at the top level. Being in the middle of the pack is where we show up.鈥�

The United States participates in three major international exams, which test students of different ages in different subjects for different purposes.

TIMSS: The Trends in International Mathematics and Science Study gauges students鈥� math and science skills at two grade levels. Thirty-six jurisdictions took part at the 4th grade level in 2007, and 48 participated at the 8th grade level that year. Both industrialized and developing nations take part. Like the primary U.S. test, the National Assessment of Educational Progress, or NAEP, TIMSS measures students鈥� knowledge of school-based curriculum.

PISA: The Program for International Student Assessment tests math, science, and reading skills that students pick up in and out of school. It assesses students at a specific age, 15, rather than at a grade, and measures their ability to apply knowledge to real-world contexts. Thirty industrialized nations and 27 other jurisdictions took part in 2006.

PIRLS: The Progress in International Reading Literacy Study evaluates 4th grade reading comprehension, in both literary and informational skills. In 2007, 40 jurisdictions took part. U.S. scores were mostly unchanged from 2001, though the United States surpassed a majority of the participating nations.

Source: 澳门跑狗论坛

Mark S. Schneider, a former commissioner of the National Center for Education Statistics, also sees reasons for U.S. policymakers to be discouraged. In a Commentary essay published in 澳门跑狗论坛 in December, he examined the 鈥渆ffect sizes,鈥� or standardized statistical differences, of the gaps between the United States and top-performing countries on TIMSS and PISA. He then compared those effect-size margins with those separating high- and low-scoring states on the primary U.S.-based test, NAEP.

For instance, the distance separating the United States from high-performing countries, such as South Korea, on the PISA math exam is comparable to the one separating Mississippi and Massachusetts, the states with the lowest and highest average scores the 8th grade math NAEP, according to Mr. Schneider, now a colleague of Mr. Phillips鈥� at the AIR.

Abilities Understated

Others, however, say policymakers and the news media misinterpret the data and vastly overstate American students鈥� shortcomings on international exams.

In a recent essay, Hal Salzman, a professor of public policy at Rutgers University, in New Brunswick, N.J., noted that the United States produced a smaller percentage of high-performing science students on the 2006 PISA than countries like Finland, the United Kingdom, and Australia. Yet the United States鈥� population, at 307 million, dwarfs that of those countries, and so in raw numbers, it produces many more top-tier students鈥斺€渢he lion鈥檚 share of the world鈥檚 best鈥濃€攖han its so-called competitors, wrote Mr. Salzman, with co-author Lindsay Lowell.

Mr. Salzman and Mr. Lowell of Georgetown University, in Washington, also have argued that U.S. schools are, contrary to popular opinion, producing sufficient numbers of talented K-12 students to satisfy America鈥檚 economic needs in math and science. Students tend to drop out of the pipeline later, in higher education and the workforce, they say.

鈥淭he tests do not support the conclusions that are being made鈥� about U.S. students鈥� lack of skill, Mr. Salzman said in an interview. 鈥淲e鈥檙e producing students on par with anybody else in the world.鈥�

As evidence, he cites the strong performance of two states, Massachusetts and Minnesota, which took part in the 2007 TIMSS and scored above international and U.S. averages in almost every math and science category. Those states have also fared well on NAEP. (鈥淪tandards Help Minn. Vie With Top Nations,鈥� Jan. 21, 2009.)

鈥淟et鈥檚 look at Massachusetts and Minnesota before we look at Finland,鈥� Mr. Salzman said. 鈥淥ne would think that there鈥檚 more transferability in what they鈥檝e done than there is in looking at foreign countries.鈥�

Mr. Schneider, despite his views of the gaps between U.S. and top-performing foreign students, also cautioned against using the tests to draw broad conclusions about school policies.

TIMSS and PISA, like NAEP, are relatively 鈥渂lunt instruments,鈥� the former statistics chief said. They do not produce longitudinal data, tracking the same students over time, which would be useful in pinpointing the particular policies influencing student performance.

As it now stands, international exams cannot explain whether it was a high-performing country鈥檚 math curriculum, its teacher salaries, or another factor that produced strong results. Only high-quality studies can reveal that, he said.

The tests yield 鈥渋mportant hypotheses, which we should be testing more rigorously,鈥� Mr. Schneider said. A dozen factors could be behind a nation鈥檚 test score, he added.

Benchmarking Value?

Limitations aside, many observers say U.S. policymakers can, in fact, draw important lessons from international test scores.

Several observers say they are encouraged by U.S. Secretary of Education Arne Duncan鈥檚 statements that he would like to see states benchmark their assessments against international standards. Mr. Phillips, of the American Institutes for Research, has conducted studies that compare individual U.S. states鈥� and cities鈥� test results against those of foreign nations by linking NAEP and TIMSS scores.

This is the fourth and final installment of a yearlong, occasional series examining the impact of the 1983 report A Nation at Risk.

The first installment was published on April 23, 2008, as the 25th anniversary of the report was being marked. It explored concerns about global competition and efforts by policymakers and educators to benchmark American performance against that of students in competitor nations.

The second, published September 24, 2008, looked at U.S. progress toward finding more time for children鈥檚 learning.

The third installment, published February 25, 2009, focused on charter quality and came a quarter-century after A Nation at Risk declared that a 鈥渞ising tide of mediocrity鈥� was eroding U.S. education.

View the complete index of stories in the A Nation at Risk series.

Researchers can go much further in creating such state-to-nation comparisons, he argues. In fact, Mr. Phillips expects to release a study soon that assigns comparable letter grades to countries, states, and school districts through a statistical 鈥渃rosswalk鈥� between NAEP and TIMSS data.

While Mr. Salzman believes many of the worries about poor U.S. test performance are exaggerated, he sees a lesson in PISA and TIMSS data that he believes should not be glossed over. Although the United States produces sizable numbers of top-tier students, it also leaves vast numbers in the low-performing category鈥攁 poor showing that has consequences for the economy, he says.

鈥淵ou need a good, middle-skilled [population of workers], or else innovations won鈥檛 work anywhere,鈥� Mr. Salzman said. In technology, medicine, and other fields, he said, 鈥渢he value of innovation depends on your ability to implement it.鈥�

Many state leaders are taking heed of the demand for U.S. schools to be world-class. The Council of Chief State School Officers and the National Governors Association are working on developing international benchmarks of what students should know and embedding them in state standards. The project includes detailed comparisons of standards in some U.S. states with those of several Asian countries. The benchmarks could be assessed either through state participation in the international tests, or by including similar measures in state or national assessments, like NAEP, according to CCSSO Executive Director Gene Wilhoit.

鈥淚t is important to us to look at ... giving students educational opportunities of similar rigor and with similar expectations of the highest-performing nations in the world,鈥� Mr. Wilhoit said. 鈥淭o outperform other countries is not as much the goal as it is to make sure we are, in this country, providing ... what we predict are going to be the essential knowledge and skills we think these students are going to have to have in the future.鈥�

For Massachusetts, measuring the skills of its students on TIMSS did not come cheap: The state spent $600,000 to participate, officials said.

To Mitchell D. Chester, the state commissioner of education, the value of international testing depends largely on policymakers鈥� willingness to probe beneath the raw scores to see what the data say about teaching, the performance of subgroups of students, and other factors.

His state, for instance, is planning a detailed analysis of its TIMSS scores, focusing on performance gaps between boys and girls, and the content of Massachusetts鈥� math and science courses, compared with those of foreign nations.

Although it is important to consider the major political and cultural differences between the United States and high-performing countries in weighing American students鈥� test performance, Mr. Chester believes it is just as important that policymakers not use those dissimilarities as excuses.

鈥淚t鈥檚 easy to say, 鈥楾hey can get away with that in Finland鈥�?,鈥� but not in the United States, he said. 鈥淭hat can be a limiting perspective. We can often be too dismissive.鈥�

Sean Cavanagh

Managing Editor, EdWeek Market Brief

Sean Cavanagh is the managing editor of EdWeek Market Brief.

Kathleen Kennedy Manzo

Kathleen Kennedy Manzo was a managing editor for 澳门跑狗论坛/edweek.org. She was a reporter at 澳门跑狗论坛 for 13 years covering curriculum, standards, and education technology.

Special coverage marking the 25th anniversary of the landmark report A Nation at Risk is supported in part by a grant from the Broad Foundation.
A version of this article appeared in the April 22, 2009 edition of 澳门跑狗论坛 as International Exams Yield Less-Than-Clear Lessons