Aiding Environmental Cleanup
Comparative analysis sheds light on biomediation potential
Many varieties of the Shewanella bacterium have the remarkable ability to transfer electrons to various heavy metals, including environmental contaminants such as iron, uranium, and chromium. For some heavy metals, electron deposit slows down their mobility in the
soil. This makes Shewanella a potentially powerful solution for environmental cleanup at DOE legacy waste sites. However, individual species have diverse respiratory capabilities and display considerable diversity in characteristics such as nutrient requirements and tolerance of salinity, temperature and barometric pressure. The differences reflect the wide variety of habitats in which these organisms live.
Examining the genome data for several of the Shewanella gives researchers a means to identify the genes and proteins responsible for the unique characteristics of each species. Such comparative analyses offer a way to explore the common and unique genes and the functional protein groups encoded by each species. The information enables scientists to form better hypotheses about the species’ physiology and bioremediation potential.
Homology
Researchers used a MeDICi-generated pipeline to analyze a collection of openly available genome data sets for 10 species of the Shewanella bacterium. The MeDICi pipeline first sorted proteins into groups of known functions to determine evolutionary relationships, or homology, between distantly related protein sequences. That was the job of SHOT, a PNNL
program now in development.
SHOT performed homology calculations on the proteins from all 10 Shewanella species — a total of 42,300 proteins — and a basis set of 4352 proteins of known structure and function from the Structural Classification of Proteins
(SCOP) database. SHOT drew associations between the Shewanella proteins and proteins in the SCOP database. This gave an efficient categorization of the Shewanella proteins based on remote homology to SCOP proteins of known structure and function.
The pipeline routed the SHOT output to the PNNL’s Starlight visualization software to display a visual metaphor of the results. The categorization of Shewanella proteins resulted in a collection of pie-chart graphs in which the center node is a SCOP protein that may be linked to many Shewanella proteins. Researchers browsed this large dataset for patterns that indicate the ability of the Shewanella species to interact with metals. Using the filtering capabilities of Starlight, they identified several graphs of interest, which drove a second round of high-performance computing to determine the direct homology relationships of the Shewanella proteins.
Cutting time to solution
The second part of the MeDICi pipeline aimed to detect similarity between Shewanella protein sequences. PNNL’s ScalaBLAST HPC software and an open source tool called InParanoid identified the orthologs. These are proteins directly descended from a common ancestral protein and, therefore, are likely perform the same function.
The analysis produced a collection of ortholog graphs in which each node represented a protein, and each edge represented an orthologous relationship between two proteins. Examination of these graphs using Starlight allowed researchers to identify functions that are present or absent in a small number of individual species. They used this information to formulate hypotheses based on the known capabilities of the species to, for example, survive in extreme environments.
The MeDICi integration framework enables scientists to interactively use the SHOT output visualization to create ScalaBLAST tasks that run on a multiprocessor architecture. This is similar to computational steering in which a user interacts with a high performance calculation while it is happening to influence the result. However, in this case, the second part of the calculation can be repeated and refined as many times as necessary for the user to evaluate a biological hypothesis.
Ian Gorton is the chief architect for PNNL’s Data Intensive Computing program. Christopher S. Oehmen and Jason E. McDermott are senior research scientists in the laboratory’s Bioinformatics and Computational Biology group, Fundamental and Computational Sciences Directorate. They may be reached at editor@ScientificComputing.com.