How the Replication Crisis Works—and Why It Shakes Science

The Problem Behind the Headlines

Science is supposed to be self-correcting. A researcher publishes a finding, other scientists repeat the experiment, and the result either holds up or it doesn't. In practice, though, this feedback loop has been breaking down for years. The replication crisis—also called the reproducibility crisis—refers to the widespread inability of independent researchers to reproduce published scientific results. It affects psychology, medicine, economics, and nearly every empirical discipline.

The scale is staggering. A landmark 2015 effort by the Open Science Collaboration attempted to replicate 100 published psychology studies. Although 97% of the originals reported statistically significant results, only 36% held up under replication. In cancer biology, scientists at the pharmaceutical firm Amgen tried to confirm 53 landmark preclinical papers and succeeded with just six—an 89% failure rate.

A Massive New Test

The most comprehensive effort yet, the SCORE project (Systematizing Confidence in Open Research and Evidence), was published in Nature in April 2026. Funded by the U.S. Defense Advanced Research Projects Agency with nearly $8 million, the seven-year programme enlisted 865 researchers to analyse roughly 3,900 social-science papers published between 2009 and 2018 across 62 journals spanning economics, psychology, political science, education, and more.

The results were sobering. Of 274 claims subjected to direct replication, only 55.1% produced statistically significant results in the original direction. At the paper level, just 49.3% replicated successfully. Replication rates varied modestly across disciplines—from 42.5% in some fields to 63.1% in others—but no discipline was spared. Worse still, even the studies that did replicate showed effect sizes less than half of what had originally been reported.

Why So Many Studies Fail

Several structural forces drive the crisis:

Publication bias. Journals have historically preferred novel, positive findings. A study that finds a dramatic effect gets published; a study that finds nothing languishes in the "file drawer." This creates a literature skewed toward flashy but fragile results.
Low statistical power. Many studies use sample sizes too small to reliably detect real effects. Estimates suggest the average statistical power in psychology hovers around 35%, meaning most studies are underpowered from the start.
Researcher degrees of freedom. At every stage—from hypothesis formation to data analysis—scientists face choices that are not fully constrained by best practices. Flexible decisions about which data to exclude, which variables to test, and when to stop collecting data can inflate false-positive rates, sometimes unintentionally.
Publish-or-perish incentives. Career advancement depends on publishing frequently in high-impact journals, which rewards speed and novelty over rigor and replication.

What Actually Predicts Reproducibility

The SCORE project uncovered one factor that correlated strongly with whether a study could be reproduced: data availability. Only about one-third of the papers in the sample had made their underlying data and computer code readily accessible. Those that did were significantly more likely to replicate. Transparency, it turns out, is the single best predictor of reliability.

Reforms Taking Root

The crisis has already begun to reshape scientific practice. Registered reports—a publishing format in which researchers submit their methods and analysis plans for peer review before collecting data—are now offered by hundreds of journals. Because publication is guaranteed regardless of outcome, this eliminates the incentive to chase positive results.

Open-science practices are spreading as well. Major funders such as the U.S. National Institutes of Health and the European Research Council increasingly require data sharing. Tools like StatCheck automatically scan papers for statistical inconsistencies. Grassroots communities, including the Center for Open Science, provide training and infrastructure for transparent research.

These reforms are not a silver bullet. Changing incentive structures—how hiring committees evaluate candidates, how grants are awarded—remains slow. But the direction of travel is clear: science is learning to check its own work, one replication at a time.

How the Replication Crisis Works—and Why It Shakes Science

The Problem Behind the Headlines

A Massive New Test

Why So Many Studies Fail

What Actually Predicts Reproducibility

Reforms Taking Root

Related articles

How the Montreal Protocol Works—and Why It Saved the Ozone

How Gravitational Lensing Works—the Universe's Telescope

How LiDAR Reveals Lost Cities Hidden Under Jungles

How Amazon's Buy Box Works—and Why It Controls Sales

How Metformin Works—and Why It Does Far More Than Expected

How the Montreal Protocol Works—and Why It Saved the Ozone

How Gravitational Lensing Works—the Universe's Telescope

How LiDAR Reveals Lost Cities Hidden Under Jungles

Don't miss new articles!