Why Good Research Fails

Most research does not fail because the science is poor, the investigators are careless, or the reviewers get it wrong. Most research fails because the systems surrounding it are misaligned with the realities of how evidence is produced, implemented, and sustained.

This kind of failure is rarely dramatic. It does not look like retractions, scandals, or obvious errors. Instead, it looks like studies that technically succeed but never meaningfully influence practice, policy, or health outcomes. It looks like promising findings that arrive too late, designs that cannot survive real-world conditions, and research programs that quietly dissolve without synthesis or legacy.

These are not individual failures. They are systemic ones.

I worked at the National Institutes of Health for two decades, leaving with a wave of others in 2025. Leaving the NIH clarified something I observed for years but struggled to name: The structures surrounding our scientific process are failing us in profound ways.

The science might be impeccable; the usefulness, limited.

Recognizing how we need to rethink structures does not mean abandoning rigor, peer review, or high standards. It does not mean lowering expectations or accepting sloppy science in the name of relevance.

On the contrary, it means taking rigor seriously enough to apply it to the full research lifecycle — designing studies, programs, portfolios, and scientific careers with incentives, scale, and real-world conditions in mind from the outset. It means acknowledging uncertainty earlier rather than later, and building governance and analytic approaches that can accommodate it.

In other words, it means strengthening the systems around science so that strong research has a better chance to impact the ideas and policies that affect our everyday lives.

What I Mean by “Failure”

Research often fails for two main reasons. First, limitations in the original study design, and implementation barriers that prevent findings from benefiting the people they were meant to help.

Study design limitations often show up as results that are too narrow to be useful beyond the original setting — a problem researchers call low external validity.

NIH-funded studies must pass a stringent peer review process, which encourages investigators to propose highly controlled designs. That rigor is valuable, but it can produce findings that do not replicate in everyday clinical or community settings.

Consider a study on the effectiveness of a new teaching method that enrolled only gifted children at a private school in a small rural town. The results could not reliably inform teaching practice in diverse public schools. The science might be impeccable; the usefulness, limited.

A related problem arises when investigators overemphasize internal validity — the ability to rule out alternative explanations for findings — at the expense of relevance.

For example, a researcher studying a chronic diabetes management intervention might exclude participants with co-occurring depression in order to isolate the effect of the intervention. This is methodologically defensible, but depression affects people with diabetes at roughly twice the rate of the general population. Designing interventions without that group in mind substantially limits the public health value of the findings.

To those outside research, the pattern can look like inefficiency or waste.

Implementation barriers present a different kind of obstacle. Over the past several decades, NIH has funded high-quality studies on early childhood prevention programs with demonstrated short- and long-term benefits. Schools and child welfare agencies would value evidence-based interventions developed through rigorous research.

Yet embedding even the most well-supported programs into real-world service systems is a complicated process. It requires navigating organizational culture, training resources, leadership priorities, teacher workloads, local political climate, funding constraints, and procurement processes.

The day-to-day work of implementation typically falls on already-overworked professionals. It is no surprise that many excellent programs developed specifically for young children are never fully implemented.

The same dynamic plays out in behavioral health. Rates of behavioral health problems in the United States remain high, and a growing body of NIH-funded research has produced interventions with demonstrated effectiveness.

To those inside it, the failure usually reflects something more structural: a persistent misalignment between incentives, scale, and reality.

Yet access to these treatments remains low. Integrating behavioral health services into primary care settings — particularly Federally Qualified Health Centers that serve low-income populations — is one of the most promising strategies for reaching people at scale. It is also far harder than it sounds, requiring significant organizational change in resource-constrained environments.

Finally, some research findings simply disappear into the literature. Approximately 39 percent of NIH-funded clinical trials go unpublished, meaning their findings never reach the clinicians, policymakers, or communities that might act on them.

Even published findings often lack any clear mechanism for reaching the practitioners and decision-makers who could use them. The gap between what science produces and what the world receives is wider than most researchers acknowledge.

To those outside research, the pattern can look like inefficiency or waste.

To those inside it, the failure usually reflects something more structural: a persistent misalignment between incentives, scale, and reality.

Incentives Shape Outcomes More Than We Admit

Science likes to think of itself as driven purely by curiosity and the search for truth. But like any human enterprise, it runs on incentives, and those incentives shape everything from what questions get asked to whose careers survive. Whether we acknowledge it or not, what gets proposed, reviewed favorably, funded, published, and rewarded ultimately determines the kind of knowledge we produce.

Many of these incentives are well intentioned. We value innovation, novelty, and methodological rigor. We reward clarity of hypotheses and precision of design. But over time, these preferences can drift away from the kinds of research questions that matter most in complex, real-world settings.

The pressures begin at the most personal level: how scientists get paid.

When a livelihood depends on winning the next grant, the rational move is to propose work that is likely to be funded: safe, incremental, predictable.

Many researchers in the U.S. work under what is known as a “soft money” system, meaning their salaries depend entirely or largely on the grants they win rather than on stable institutional support.

When a grant runs out, so does the paycheck. This creates a career built on financial instability, where staying employed means constantly hunting for the next award, often with success rates as low as 8 to 12 percent.

The result is that scientists spend enormous amounts of time writing proposals instead of doing research, and many talented people eventually leave the field altogether, worn down by the relentless uncertainty.

This precariousness does not just harm individual researchers; it quietly bends the direction of science itself. When a livelihood depends on winning the next grant, the rational move is to propose work that is likely to be funded: safe, incremental, predictable.

Bold ideas with uncertain outcomes are a financial gamble few can afford to take. In this way, the soft-money model nudges science away from ambitious, long-term thinking and toward whatever happens to be trendy or fundable at the moment.

The review process that decides which proposals get funded reinforces these same pressures.

When incentives are misaligned, even strong investigators doing careful work can end up contributing to a literature that grows larger without becoming meaningfully more useful.

The NIH operates in an environment so competitive that often only the top 10 percent of proposals receive support. Under that kind of pressure, reviewers tend to search for reasons to eliminate a proposal rather than reasons to champion it. A single identified flaw can sink an application, while genuine strengths struggle to compensate.

Research suggests that critical feedback lowers scores more readily than positive feedback raises them, meaning the system is better designed to avoid failure than to recognize promise. The practical effect is that novel, unproven ideas are routinely passed over in favor of research that is already nearly complete, and well-established scientists at prestigious institutions hold a structural advantage over early-career researchers trying to break through.

Taken together, these forces create a self-reinforcing cycle. Scientists learn, rationally and correctly, that caution is rewarded and risk is penalized. Institutions have largely offloaded financial responsibility onto individual researchers. And the machinery of peer review, however well-intentioned, can end up selecting for the ordinary over the extraordinary.

None of this reflects bad faith. It reflects a system optimized for producing publishable science rather than actionable knowledge, one that rewards writing new grant applications and accumulating peer-reviewed publications over doing the harder, slower work of translating findings into real-world practice.

When incentives are misaligned, even strong investigators doing careful work can end up contributing to a literature that grows larger without becoming meaningfully more useful. Changing the kind of science we get will require honestly reckoning with the kind of system we have built.

Scale Is Treated as an Afterthought

One of the most common and underappreciated reasons that medical research fails is surprisingly simple: Researchers do not think early enough about whether their study can actually scale up.

A small pilot study might run smoothly with one team at one hospital, but when the time comes to expand to dozens of sites with hundreds or thousands of patients, hidden problems suddenly become impossible to ignore.

And interventions that succeeded in one carefully selected community sometimes do not translate to different hospitals, different patient populations, or different local contexts.

At a small scale, coordination can be informal, data can be managed through conversation rather than rigid systems, and governance can be lightweight because the number of people involved is small. These arrangements often work well enough in the early stages, which is precisely why the warning signs go unnoticed.

But those assumptions do not survive in large-scale, multisite trials. Workflows that were manageable with a small team fall apart across dozens of sites. Recruiting enough patients proves far harder than expected; difficulty finding and enrolling eligible participants is the reason more than half of all clinical trials end prematurely.

Study procedures that worked in a tightly controlled setting often turn out to be too complicated for real-world clinics to follow consistently, leading to protocol deviations and unreliable data.

Organizations discover too late that they lack the staff, funding, or infrastructure to keep up with demand.

And interventions that succeeded in one carefully selected community sometimes do not translate to different hospitals, different patient populations, or different local contexts.

None of these are inevitable problems, but they share a common cause: the question of whether the study could actually work at scale was asked too late, long after the design was locked in and the resources committed.

I observed this pattern in large multisite research consortia during my time as a program officer at NIH. Individual projects were often scientifically strong, but the multisite program as a whole struggled in practice because collaboration was assumed rather than designed.

Shared measures, decision rules, communication pathways, clear site leadership roles, and accountability structures were treated as administrative details rather than scientific necessities.

Sites that failed recruitment targets or repeatedly violated study protocols would be closed, and investigators would scramble to identify and onboard replacements, or the study would continue with a substantially reduced sample size.

When scale is underestimated, programs fragment. The result is underperformance of the science relative to the investment made.

Reality Arrives Late

A third reason good research fails is that real-world conditions are often treated as a complication to be managed later, rather than a constraint to be designed around from the beginning.

Health systems change. Policies shift. Populations evolve. External shocks, such as the COVID-19 pandemic, can upend even the most carefully planned studies. Designs that assume stable conditions, ideal implementation, or consistent participation are especially vulnerable to these disruptions.

When reality intrudes late in the research process, investigators and funders are forced to respond reactively. Protocols are amended, timelines extended, and aims revised under pressure.

Sometimes this adaptation works.

The problem is not that research encounters reality. The problem is when it does so without adequate flexibility, contingency planning, or methodological tools to adapt meaningfully.

During the COVID-19 pandemic, hundreds of NIH clinical trials were immediately affected. Investigators whose designs were flexible enough to transition to remote intervention delivery and data collection were often able to continue with little disruption.

Many others, whose protocols never anticipated a world where in-person visits would be impossible, faced prolonged suspensions.

The downstream consequences could be severe. In one stepped-wedge trial I oversaw as a program official, COVID-related delays created an extended gap between baseline assessments and the start of the intervention.

That gap required adding a second baseline assessment period, which in turn drove up the cost of the entire trial. This was a consequence that earlier contingency planning could have mitigated.

The problem is not that research encounters reality. The problem is when it does so without adequate flexibility, contingency planning, or methodological tools to adapt meaningfully.

In those cases, rigor is preserved on paper, but relevance is lost in practice.

Looking Forward

The good news is that these failures are not inevitable. They are the result of design choices and structural incentives, many of which can be reconsidered. Program officers, funders, institutions, and investigators all have roles to play in realigning those incentives, planning for scale, and integrating real-world conditions earlier in the research process.

Improving science means applying rigor not just to individual studies, but to the full ecosystem in which science is produced, communicated, and used.

Elizabeth Ginexi served the National Institutes of Health for 22 years, shaping biomedical and behavioral health research strategy. During her tenure, she managed a portfolio of 305 grants totaling $132 million across several institutes and centers, and co-authored 18 funding initiatives that generated 1,087 projects totaling $778 million across mental health, substance use, pain management, data science, and health services research. She advised hundreds of principal investigators on what federal agencies look for in competitive proposals.