
[Adobe Stock]
You don’t recall where they are. You freeze up, and then pull yourself together. Maybe they are on the very laptop you are using. You fire a few plausible-sounding names into a string of search queries and get back. No results.
A longstanding challenge
Such searches are not uncommon. And the problem has persisted for years. When Molecular Brain editor-in-chief Tsuyoshi Miyakawa asked 41 manuscript authors for raw data underlying their results, 97% couldn’t deliver, he wrote in 2020. The authors of 21 of 41 withdrew them rather than provide raw data. Of the remaining 20, Miyakawa rejected 19 for insufficient data quality.
Even when data is available, replication can fail. A $2-million, 8-year effort to replicate influential cancer studies found that fewer than half stood up to scrutiny. The Reproducibility Project: Cancer Biology originally planned to repeat 193 experiments from 53 high-profile papers, but couldn’t proceed. Barriers included uncooperative authors and vague protocols, which forced researchers to complete just 50 experiments at an average cost of $53,000 and 197 weeks per study.
The problem extends beyond life sciences. A 2011 study published in PLOS One, focusing on articles within that journal, found that in chemistry, 0% of articles made data publicly available, though 5.7% indicated data was available on request. In physics, results were even lower: none of the articles with original data made the data publicly available in a repository. The study’s authors noted that physics journals “do little to share research data in a systematic way, at least in the top journals by impact factor,” and that some authors may “print many graphics that summarize the research data, but do not provide direct access to the underlying data.” Meanwhile, a 2016 Nature survey found that more than “70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.”
Some signs of improvement
In the positive column, the recently published FAIR Business Survey suggests that some R&D heavy organizations are getting better at treating data as a reusable asset rather than a byproduct of research. The Pistoia Alliance surveyed 36 life-science organizations and conducted follow-up interviews with 12 companies to understand why they’re investing in FAIR (Findable, Accessible, Interoperable, Reusable) data principles. The results reveal four primary business drivers: “trusted data” that enables AI capabilities and ensures compliance; “cost savings” through reduced duplication and higher resource productivity; “speed” improvements in time-to-market and decision-making; and “effectiveness” gains that unlock insights and innovations previously impossible with fragmented datasets.
The companies seeing the biggest returns started their FAIR journeys five or more years ago, and are “now realizing tangible benefits.” These more mature organizations tend to have “high-level executive support,” and more mature “data management processes” and “enhanced operational efficiency,” to go with it.
One organization cut data search time from three days to hours. Another “shortened the duration of clinical trials.” Several respondents noted that FAIR data management eliminated the manual curation work.
Yet the survey also revealed a persistent challenge: the biggest barrier isn’t technical infrastructure but cultural. Improving data maturity requires “a fundamental mindset change within the business is required to realize the full value of data-driven transformation,” the report concluded.