What secrets does a handful of soil hold? Greengenes, one of the world’s largest database of microbial fingerprints, is helping scientists worldwide better understand the diversity of microbes, and how they can help us develop clean energy technologies and fight disease, among many applications. Photo: Berkeley Lab |
A handful of muck or a bucket of water can teem with millions of
microorganisms—a few of which could be the next big thing when it comes to
learning how to create biofuels or understanding the planet’s carbon cycle.
This search for the movers and shakers of the microbial world is getting
easier thanks to a database of “fingerprints” maintained by Lawrence Berkeley
National Laboratory (Berkeley Lab) scientists that surpassed one million
entries earlier this year.
The database, called Greengenes, is one of the world’s largest collections
of high-quality DNA sequences of 16S ribosomal RNA genes. These protein-making
genes are found in all microbes, and in general each species has a unique
variation. They’re genetic IDs, the one thing that can finger a specific
microbe in a crowded lineup, if you know which 16S rRNA belongs to which
microbe.
That’s where Greengenes comes in. Researchers from around the world can
access the database online and enter 16S rRNA sequences extracted from samples
of soil, water, and even intestinal bacteria. A match with a sequence in
Greengenes is a giveaway that a specific microbe is in the sample. If there’s
not a match, perhaps a new species has been discovered.
In this way, Greengenes is fast becoming a go-to resource for scientists
seeking to better understand what microbes do, their diversity, and what we can
learn from them. The database launched in 2002 and now gets about 100 citations
per year in scientific papers.
“Our goal is to develop the highest quality reference set so scientists can
use it to better understand life at the microscopic scale. We want to cover as
much microbial diversity on Earth as possible,” says Todd DeSantis, a scientist
in Berkeley Lab’s Earth Sciences Division who led the development of the
database under the auspices of Gary Andersen’s lab.
Among its many hits, Stanford University scientists used the database to discover a
microorganism in San Francisco
Bay sediments that plays
a role in the carbon and nitrogen cycles. The scientists could see the
ammonia-oxidizing archaea under the microscope, but they couldn’t grow it in
the lab. They extracted its DNA, sequenced it, and compared to known strains in
Greengenes. It was unique, and a new organism was named: Candidatus Nitrosoarchaeum limnia SFB1.
Italian scientists used Greengenes to detect bacteria from human skin and cleanrooms on Leonardo da Vinci’s Codex Atlanticus, which has been handled by monks and historians for centuries. To better protect art and texts, the scientists now recommend rigorous monitoring of the conditions in storage facilities and improvements to handling procedures. Image: Mario Taddei via Wikimedia Commons |
A Cornell University-led team used Greengenes to identify microbes that efficiently
convert industrial wastewater into methane. Their work could help scientists
engineer microbial communities that are optimized to digest wastewater and emit
methane for use as an energy source.
Elsewhere, a team from the University
of Milan used the
database to analyze bacterial DNA from stains on the pages of Leonardo da
Vinci’s multi-volume Codex Atlanticus. They found matches to bacteria
previously isolated from cleanrooms and human skin, which led the team to
recommend new ways to protect texts from deterioration.
And a Danish team used the database to improve the treatment of a disease,
called necrotizing enterocolitis, which is marked by inappropriate bacteria colonizing
an infant’s intestines.
Expect more uses from Greengenes as it continues to grow. When scientists
find a 16S rRNA gene in the course of their research, they submit its sequence
to one of many gene databanks. Greengenes scours these databanks for new
entries. When it finds one, it uses a computer program to compare the sequence
to other 16S rRNA genes and to ensure its quality. Only the best and most
complete sequences are added.
“There are tens of millions of 16S-like sequences in public databases, but
we only want the highest quality sequences to use as references,” says DeSantis.