First-of-its-kind
discovery used revolutionary data crunching computer program running on
48 computer processors for 4 weeks to complete 32 billion searches
Analyzing
massive amounts of data officially became a national priority recently
when the White House Office of Science and Technology Policy announced
the Big Data Research and Development Initiative. A multi-disciplinary
team of University of Missouri researchers rose to the big data
challenge when they solved a major biological question by using a
groundbreaking computer algorithm to find identical DNA sequences in
different plant and animal species.
“Our
algorithm found identical sequences of DNA located at completely
different places on multiple plant genomes,” said Dmitry Korkin, lead
author and assistant professor of computer science. “No one has ever
been able to do that before on such a scale.”
“Our
discovery helps solve some of the mysteries of plant evolution,” said
Gavin Conant, co-author and assistant professor of animal sciences.
“Basic research on the plant genome provides raw materials and improves
techniques for creating medicines and crops.”
Previous
studies found long strings of identical code in different species of
animals’ DNA. But before this new MU research, which was published in
the Proceedings of the National Academy of Sciences,
computer programs had never been powerful enough to find identical
sequences in plant DNAs, because the identical sections weren’t found at
the same points.
The
genomes of six animals (dog, chicken, human, mouse, macaque and rat)
were compared to each other. Likewise, six plant species (Arabidopsis,
soybean, rice, cottonwood, sorghum and grape) were compared to each
other. Comparing all the genetic sequences took 4 weeks with 48 computer
processors doing 1 million searches per hour for a grand total of
approximately 32 billion searches.
Although
the scientists found identical sequences between plant species, just as
they did between animals, they suggested the sequences evolved
differently.
“You
would expect to see convergent evolution, but we don’t,” Conant said.
“Plants and animals are both complex multi-cellular organisms that have
to deal with many of the same environmental conditions, like taking in
air and water and dealing with weather variations, but their genomes
code for solutions to these challenges in different ways.”
The
MU team’s research laid the groundwork for future studies into the
reasons plants and animals developed different genetic mechanisms and
how they function. Their basic research created a foundation for
discoveries that may improve human life. Besides advancing genetic
science’s potential to fight disease, the code-analyzing computer
program itself could help in the development of new medicines.
“The
same algorithm can be used to find identical sequential patterns in an
organism’s entire set of proteins,” said Korkin. “That could potentially
lead to finding new targets for existing drugs or studying these drugs’
side effects.”
The
PNAS paper, titled “Long Identical Multispecies Elements in Plant and
Animal Genomes,” involved collaboration between the Universities of
Missouri, California and Arizona. The computer algorithm was developed
by Jeff Reneker, a senior research informatician at MU’s Center for
Computational Biology and Medicine, during his doctoral study at the MU
Computer Science Department under the supervision of Chi-Ren Shyu,
Director of the MU Informatics Institute.
Source: University of Missouri-Columbia