Reproducibility remains a persistent problem despite numerous efforts over the years to tackle the issue. But solutions are on the horizon, as new computational tools make it easier for researchers to evaluate citations and choose which results must be validated to move their own projects forward.
Close to one million articles are published yearly in biomedical journals alone — ample evidence of the pace at which science is accelerating. Yet how much of this published work is reproducible — replicable by other researchers — versus a one-off finding? There are no clear answers, mainly because reproducing experiments has long posed challenges.
A decade ago, Stanford University professor John P. A. Ioannidis addressed the issue in a paper that explained “Why Most Published Research Findings Are False.” Bias in study design, data, analysis and presentation were among the factors that led to “false” — and therefore unreproducible — findings.
In 2014, the US National Institutes of Health held a joint workshop with Nature Publishing Group and Science to identify ways that scientific publishers could “enhance rigor and….support research that is reproducible, robust, and transparent.” Following the workshop, Nature produced a special collection on reproducibility in response to “growing alarm about results that cannot be reproduced.” Reasons for lack of reproducibility included “increased levels of scrutiny, complexity of experiments and statistics, and pressures on researchers.”
- Read more: Open Data, Open Research, Open Scholarship: Rethinking E-infrastructures for Research and Results
The editor-in-chief and managing editor of The American Journal of Pathology, one of 30 journals invited to participate in the NIH workshop, subsequently wrote “Science Isn’t Science If It Isn’t Reproducible.” The editorial notes that journals are “left with the daunting task of improving review processes to ensure that published results meet high standards for repetition and accuracy in reporting.”
Lack of reproducibility is not just an academic concern. It has costly consequences for industry. Pharmaceutical company researchers rarely move forward without first validating experiments of interest that are reported in the literature. But, because so much that is reported can’t be reproduced, they need to be able to sift through articles using metrics that allow them to identify their best bets. This reduces the amount (and cost) of any repeat work that needs to be done in their own validation processes.
Applying quality metrics
On the most basic level, researchers need products that can rapidly pinpoint quality experiments by enabling users to choose which specific metrics to apply to get the most relevant results. Scientists have told us that instead of applying an algorithm that yields a competence score, they want to be able to toggle various metrics on and off to reach their own conclusions.
Many researchers begin to assess reproducibility by looking at the number of times a particular fact has been found across various publications. If, for example, they were doing something simple, like running a search for biomarkers of a specific disease, they might find five biomarker candidates that have been mentioned once in a single paper, and 10 that have been mentioned in multiple papers. That would allow them to exclude biomarkers that were not mentioned as often, with the caveat that only findings that have been around for a while are likely to garner multiple citations.
Researchers need to be able to further refine their search by ensuring that a particular fact is mentioned as an experimental result in the “results” section of a paper — i.e., that it is definitely a scientific finding, not a citation of someone else’s work mentioned in the introduction, discussion or conclusion sections. The fact should be reported each time by a completely different research group to help eliminate citation bias.
To assess the stature of the authors who reported a particular result, many scientists refer to the author’s “h-index,” which is the number of times that person’s papers have received “x” number of citations. Yet we know that here, too, context is critical. For example, Dr. A may publish a paper on heart disease that receives 50 citations — but at closer look, they all say the authors have proven that Dr. A’s finding is incorrect.
We at Elsevier are beginning to enable the application of sentiment analysis — an emerging data-mining strategy that identifies words and phrases that indicate opinions, attitudes and lack of certainty (e.g., “suggests,” “seems to indicate”) — to biomedical literature searches, as well as to potentially relevant input that appears in social media. Although it’s still early days, with challenges that include the need for linguistic resources, as well as ambiguity of words and of intent (e.g., irony, sarcasm), we believe sentiment analysis will play an increasingly important role in streamlining data analysis in the near future.
Call to action
Action is required on the part of both journal publishers and researchers if reproducibility is to become easier, faster and more cost-effective going forward. Some journals are already providing expanded “methods” sections that incorporate all the steps taken for a particular experiment, as well as supplementary information and, ideally, the actual dataset in online editions. Then, if other researchers are so inclined, they can do the statistical work themselves before going any further with their own experiments. It’s also worth noting that, while the NIH workshop participants recommended that journals have a mechanism to check the statistical accuracy of submissions, some are requiring reviewers to determine whether appropriate statistical methods were used if authors conclude that a finding is statistically significant.
Collaborations between academia and industry, especially early on in the research process, will help ensure that industry scientists can replicate findings using the same methods that were used in the original work. This is important because studies may come out of academia simply to get the work published and “out there” quickly, without concern for reproducibility. This causes a quandary for industry researchers who may want to follow what seems like a promising lead, but have to start at square one, going through a process of validating the authors, as well as the methods.
Collaborations among publishers is also important. The NIH workshop recommendations suggest the establishment of best practice guidelines for the reporting of various components of an experiment, including image-based data and biological material such as antibodies, cell lines and animals. We agree this is the right way to go, and will enhance the reproducibility of biomedical experiments in years to come.
Jaqui Hodgkinson is VP, Product Development, at Elsevier.