For the modern biologist, large-scale OMICs studies—which map all of the genes, proteins, RNA and more that underlie a biological system—are standard tools of the trade. But interpreting these big-data outputs to generate meaningful information is far from routine: Analyzing the results requires sophisticated tools and highly trained computational scientists. These efforts can be costly and time intensive even for experts—taking anywhere from days to weeks to generate actionable information.
Now, scientists from Sanford Burnham Prebys, the Genomics Institute of the Novartis Research Foundation (GNF) and the University of California, San Diego have revealed an open-access, web-based portal that integrates more than 40 advanced bioinformatics data sources to allow non-technical users to generate insights in one click. Called Metascape (http://metascape.org), this tool removes data analysis barriers—allowing researchers to spend more time on important biological questions and less time building and troubleshooting a data analysis workflow. The platform was described today in Nature Communications.
“Biologists seek answers to some of today’s most devastating diseases—from cancer to Alzheimer’s to infectious diseases, such as HIV or influenza (flu),” says Sumit Chanda, Ph.D., senior author of the study and director of the Immunity and Pathogenesis Program at Sanford Burnham Prebys. “By developing Metascape, we hope to help biologists to better understand their own data so they can uncover information that will lead to novel disease targets, improved vaccines and new drugs to treat challenging diseases.”
Adds Yingyao Zhou, Ph.D., first author of the study and director of Data Science and Data Engineering at GNF, “Even for computational scientists, compiling and analyzing large OMICs datasets can be a difficult and time-consuming task. Metascape provides biologists with a platform from which they can access the power of numerous analysis tools all within a simple interface and generate an easy-to-interpret report.”
In the paper, the researchers detail the features and capabilities of Metascape using three previously published genetic screens of flu that sought to find factors involved in viral replication. In its workflow, Metascape integrates and analyzes information from more than 40 popular, open-access databases spanning 10 common model organisms to produce an easy-to-interpret report in about a minute (larger data sets may require more time).
“Metascape has already facilitated the analysis and interpretation of large OMICs datasets in more than 330 published scientific studies. Due to its ease of use, we expect that it will soon become an indispensable platform that will help scientists decipher critical results in the era of big data,” adds Lars Pache, Ph.D., a study author and research assistant professor at Sanford Burnham Prebys.
Options for basic analysis, which utilizes commonly used analysis practices; or advanced analysis, which allows control of individual settings, were demonstrated. A PowerPoint presentation, Excel document and additional visual reporting tools were automatically generated, facilitating the communication of results. To ensure Metascape’s data remains as current as possible, the researchers incorporated a two-phase approach that utilizes a robot that automatically crawls data sources, followed by manual quality control.
Next, the scientists are turning to artificial intelligence to deepen the insights Metascape can provide. “By applying new machine learning tools to Metascape, we can help biologists uncover more nuances in their data that help scientists even better prioritize the direction they want to take their research,” says Zhou.