Combining information about plants, microbes, and the complex biomolecular interactions that take place inside these organisms into a single, integrated “knowledgebase” will greatly enhance scientists’ ability to access and share data, and use it to improve the production of biofuels and other useful products. |
In the decade that has passed since the completion of the first
draft sequence of the human genome, biologists have grown increasingly aware of
a problem ironically generated by the success of their work. Biological experiments
in the age of genomics—including DNA sequencing, gene expression profiles,
studies of cell-signaling pathways, protein binding, and other information-rich
inquiries—generate quantities of raw data so immense that they threaten to
overwhelm researchers’ ability to make sense of them.
Two Cold Spring Harbor Laboratory (CSHL) investigators are among
the leaders of a multi-institutional effort announced by the U.S. Department of
Energy (DOE) to address the problem in one particular area of research
involving plant and microbial life. The team has been awarded funding to create
out of many separate streams of biological information a single, integrated
cyber-“knowledgebase” (called Kbase) focused specifically on these
two fundamentally important forms of life.
A knowledgebase is an essential tool of systems biology—an
approach to the study of life that depends on integrating multiple information
types and bringing them into meaningful relation, providing a basis to measure
and model biological activity within an organism or across groups of organisms.
A particularly exciting aspect of the project is that it will enable scientists
to discover currently unknown relationships that exist between species and
between groups of species and the surrounding environment—interrelated and
interdependent communities of microbes and plants, in this case.
“In contrast to a conventional database, a knowledgebase is
really an entire body of knowledge,” explains Doreen Ware, PhD, of the
U.S. Department of Agriculture and a CSHL Adjunct Associate Professor. “In
Kbase we will focus on a specific assortment of plants and microbes that the
Energy Department hopes to exploit to produce biofuels, to sequester carbon in
the ecosystem, and to clean up environmental pollution.” Ware has been
named principal investigator of the portion of Kbase devoted to plant life.
Quantitative biologist Michael Schatz, PhD, a CSHL Assistant
Professor, is a coinvestigator on Kbase whose work explains a key dimension of
the project. “It’s not as if we have been asked to go out and grow or
collect plants and microbes,” he says. “What we’ve really been
challenged to do by the Department of Energy is to find ways of integrating
different kinds of data and different kinds of tools that can be used to analyze
those data.”
Schatz offers the analogy of Google, which enables anyone with
internet access “to tap into all of human activity, all of human
knowledge,” to the extent it has been recorded in digital form. Today, he
notes, there is no portal like Google for scientists who work with plants and
microbes. “There are many different ‘silos’ of information that have been
painstakingly collected; and there are a number of existing tools that bring
some strands of data into relation. But there is no overarching tool that can
be used across silos,” Schatz says.
“We think by creating such a collection of tools and data
sources, we’re going to be able to facilitate question-asking about huge
datasets. It is our hope that this will help us make progress on improved ways
to generate biofuels or on how to get the maximum yield out of plants even when
the climate is very hot, dry, or wet. All of this knowledge is extractable from
data that has already or is now being generated. The challenge is how, in a
sense, to liberate it, so it can be put to use.”
Thanks to the power of cloud computing, scientists across
institutions will be able to query Kbase in a highly flexible fashion, and on a
democratized basis, since Kbase will be accessible to scientists everywhere.
This will eliminate the need for science teams to separately gather and store
essentially similar data sets, as a condition for conducting experiments.
The entire Kbase effort, spanning plants, microbes, and
metacommunities (microbes in the context of the vast communities in which they
live, both in the environment and within other living things) will be led by
Adam Arkin of Lawrence Berkeley National Laboratory. Coprincipal investigators
include Rick Stevens of Argonne National Laboratory, Robert Cottingham of Oak
Ridge National Laboratory, and Sergei Maslov of Long
Island’s Brookhaven National Laboratory, who, in concert with
CSHL’s Ware, will be deeply involved in the plant section of Kbase.