AI Didn’t Create the Medical Research Crisis—But It’s Scaling It
As AI-generated research floods scientific journals, some say we’re valuing speed over quality. Columbia's George Hripcsak says we are focusing on the wrong metric.
This summer, news outlets including Nature reported a surge of formulaic biomedical papers—many suspected to be generated by AI. Conversations in the media and beyond raised concerns about how scientific findings are evaluated and trusted.
George Hripcsak, the Vivian Beaumont Allen Professor of Biomedical Informatics and founding chair of the OHDSI (Observational Health Data Sciences and Informatics) collaborative, thinks the question itself needs reframing.
The issue, he argues, isn’t how quickly we’re publishing—it’s whether we’re doing science with the structure and transparency needed to make the results trustworthy. And when that structure is in place, science isn’t just more reliable—it can actually move faster.
For those unfamiliar, what is OHDSI and what does it do?
OHDSI—short for Observational Health Data Sciences and Informatics—is an international research community working to improve how we generate evidence from real-world health data—like electronic health records, insurance claims, and public health databases. It provides a shared system: a common way to structure data, open-source tools for analyzing it, and a set of practices to ensure results are reliable. Researchers in different countries can run the same study using the same methods, then compare outcomes to see what holds up across populations. The goal is to make medical research more transparent, reproducible, and actionable.
The Nature article warned that AI is helping flood the literature with unreliable science. What’s your take?
We’ve actually had this problem for a long time—AI just speeds it up. Observational research has always produced contradictory findings. One year coffee causes cancer, the next year it cures it. Same data, opposite conclusions. Sometimes that’s because the question is genuinely hard, but often it’s because most research doesn’t use the kind of structured methods that would make the results reliable.
Now, open data makes it easier to run studies, and generative AI makes it easier to write them. But the issue isn’t speed or technology—it’s the lack of structure.I
If we’d had generative AI in Galileo’s time, we’d still think the Earth is the center of the universe.
Some prominent scientists have warned that we’re prioritizing volume over rigor. How do you respond to that?
Speed is the wrong metric. We need high-volume evidence—we just need it to be reliable. Doing a bad study slowly doesn’t make it better. If someone produces reliable evidence quickly, that should be rewarded.
In fact, we’ve found that when you build structure into the process—when you pre-specify the design, validate your methods, and use shared tools—you don’t just get more reliable results. You get them faster. That’s what OHDSI is doing. The framework doesn’t slow things down—it makes large-scale science possible.
So what does reliable research look like?
In OHDSI, we follow ten principles. I usually summarize them as openness and verification.
Practically, that means a few things: we pre-specify our analysis plans, use open-source software, and publish diagnostics before looking at results. We test for bias and confounding, and we run studies across multiple databases and populations to see whether results generalize. If the diagnostics fail, we stop. If they pass, we publish the result—even if it’s inconclusive. That’s how we avoid publication bias. The goal isn’t to find something novel every time. It’s to produce evidence that others can trust and build on.
Some journals have discouraged the use of causal language in observational studies. Is that helpful.
Not really. I understand why they’ve done it—there’s a long history of people overinterpreting retrospective studies. But taking away the word cause doesn’t fix the issue. Readers still interpret results that way.
A better approach is to use methods that support causal questions when asked properly—through pre-specification, diagnostics, and transparency. That’s the kind of structure we’re building, and journals like JACC are beginning to adopt it.
What role can AI play, and do you use it?
It can be useful—but only with guardrails. LLMs are designed to produce plausible statements, not true ones. They reinforce what we already believe.
If we’d had generative AI in Galileo’s time, we’d still think the Earth is the center of the universe. I sometimes use AI to extract variables or assist with literature reviews—but I always double-check the results. We never use it to draw conclusions. In OHDSI, we rely on tested models, diagnostics, and transparent reasoning.
What would it take for more researchers to adopt the kind of structure you’re describing?
A lot of the methods already exist—we’re using them. The question is whether they become standard. If journals require diagnostics, if reviewers look for pre-specification, and if results are published regardless of excitement, we’ll get a more trustworthy evidence base. That’s what we’re trying to support. And when the process is clear, it actually moves faster—not slower.