2011 brought two of the deadliest bacterial outbreaks the world has seen
during the last 25 years. The two epidemics accounted for more than 4,200 cases
of infectious disease and 80 deaths. Software developed at the Georgia Institute
of Technology (Georgia Tech) was used to help characterize the bacteria that
caused each outbreak. This helps scientists to better understand the underlying
microbiologic features of the disease-causing organisms and shows promise for
supporting faster and more efficient outbreak investigations in the future.
From 2008 to 2010, a team of bioinformatics graduate students, led by School of Biology Associate Professor King Jordan,
worked in close collaboration with the Centers for Disease Control and
Prevention (CDC) to create an integrated suite of computational tools for the
analysis of microbial genome sequences. At that time, CDC scientists were in
need of a fast and accurate system that could automate the analysis of
sequenced genomes from disease-causing bacteria. They turned to the Jordan laboratory
at Georgia Tech to help develop such a tool. The Georgia Tech scientists
created an open source software package, the Computational Genomics Pipeline
(CG-pipeline), to help meet CDC’s need. The software platform is now used
worldwide in public health research and response efforts.
“Determining the order of DNA bases for an entire genome has become
relatively cheap and easy in recent years because of technological advancements,”
said Jordan. “The hard part is figuring out what the genome sequence information means. Our
software takes that next step. It analyzes the sequences, finds the genes and
provides clues as to which genes are involved in making people sick. Manually,
this process used to take weeks, months or a year. Now it takes us about 24
hours.”
The CG-pipeline software has been used to analyze last summer’s (2011)
outbreak of severe Escherichia coli (E. coli) infections that started in Germany and eventually led to illnesses in 16
European countries, Canada,
and the United States.
It was one of the largest E. coli
outbreaks in history, causing 50 deaths and 4,075 confirmed worldwide cases.
The bacterium was traced to sprouts. Andrey Kislyuk, a graduate of the
Bioinformatics PhD program who helped Jordan create the software, used
the CG-pipeline while working at a company that sequences genomes to understand
why the strain of the bacteria that caused the outbreak was so virulent.
“The software was used to determine that two previously distinct strains of E.
coli combined to form a single hyper-virulent strain,” said Kislyuk. “The
resulting hybrid strain seems to be more lethal than either of the parent
strains.”
Another Bioinformatics PhD graduate who helped design and implement the
pipeline, Lee Katz, analyzed the bacteria that caused last year’s (2011)
outbreak of listeriosis in the United
States while working at the CDC. That
outbreak was traced back to cantaloupes from a single farm in Colorado that were tainted with Listeria. Over the span of several
months, there were 146 confirmed cases of listeriosis and 30 deaths, making it
the deadliest outbreak of foodborne illness in the U.S. in 25 years. Using the
CG-pipeline, Katz was able to identify an important epidemiological genomic
marker, which will help track invasive strains of Listeria.
The CG-pipeline software platform can be used to analyze any microbial
genome sequence. It has already been applied to bacteria that cause a variety
of infectious diseases, including cholera, salmonella and bacterial meningitis.
Katz continues to work closely with the Jordan laboratory to improve the
software. This collaboration is important in CDC’s efforts to mine genome
sequence information in the service of public health using software developed
at Georgia Tech.