Fifty
years after the pioneering discovery that a protein’s three-dimensional
structure is determined solely by the sequence of its amino acids, an
international team of researchers has taken a major step toward
fulfilling a tantalizing promise: predicting the structure of a protein
from its sequence alone. This advance could open a series of doors for
previously intractable research into important biological processes and
development of novel therapeutic drugs.
The
team from Harvard Medical School (HMS), Politecnico di Torino / Human
Genetics Foundation Torino (HuGeF) and Memorial Sloan-Kettering Cancer
Center in New York (MSKCC) reported their results on Dec. 7 in the
journal PLoS ONE.
In
molecular biology and biomedical engineering, knowing the shape of
protein molecules is key to understanding how they perform the work of
life, the mechanisms of disease and drug design. Normally scientists
determine the shape of protein molecules by expensive and complicated
experiments, but for most proteins these experiments have not yet been
done, leaving many crucial biological questions unanswered.
In
principle, this problem could be solved by computing a protein’s shape
based simply on its sequence, which is relatively easily determined
based on its DNA, but despite limited success for some smaller
proteins, this challenge has remained essentially unsolved. The
difficulty lies in the astronomically large number of possible shapes
for each protein; without any shortcuts, it would take a supercomputer
many years to explore all of these options and find the right one for
even a small protein.
“Experimental
structure determination has a hard time keeping up with the explosion
in genetic sequence information,” said Debora Marks, a mathematical
biologist in the Department of Systems Biology at HMS, who worked
closely with Lucy Colwell, a mathematician who recently moved from
Harvard to Cambridge University. The two researchers collaborated with
physicists Riccardo Zecchina and Andrea Pagnani in Torino in a team
effort initiated by Marks and computational biologist Chris Sander of
the Computational Biology Program at MSKCC, who had earlier attempted a
similar solution to the problem when substantially fewer sequences were
available.
The
international team tested a bold premise: that evolution can provide a
roadmap to how a protein folds. Their approach combined three key
elements: evolutionary information accumulated over many millions of
years; data from high-throughput genetic sequencing; and a key method
from statistical physics, co-developed in the Torino group with Martin
Weigt, who recently moved to the University of Paris.
“Collaboration
was key,” Sander said. “As with many important discoveries in science,
no one could provide the answer in isolation.”
Using
the accumulated evolutionary information, in the form of the sequences
of thousands of proteins grouped into families of proteins likely to
have similar shapes, the team developed an algorithm to infer which
parts of a protein interact to determine its shape. With these internal
protein interactions in hand, the researchers implemented widely-used
molecular simulation software developed by Axel Brunger at Stanford
University to generate the atomic details of the protein shape.
Using
this process, the team was for the first time able to compute
remarkably accurate shapes from sequence information alone for a test
set of 15 diverse proteins, with no protein size limit in sight, with
unprecedented accuracy.
“Alone,
none of the individual pieces are completely novel, but apparently
nobody had put all of them together to predict 3D protein structure,”
Colwell said.
The
researchers caution that their method does have some weaknesses.
Experimental structures, when available, generally are more accurate in
atomic detail, and the method works only when researchers have genetic
data for large protein families—but advances in DNA sequencing have
yielded a torrent of such data that is forecast to continue growing
exponentially in the foreseeable future.
The
next step, the researchers say, is to predict the structures of
unsolved proteins currently being investigated experimentally, before
exploring the large uncharted territory of currently unknown protein
structures. “Synergy between computational prediction and experimental
determination of structures is likely to yield increasingly valuable
insight into the large universe of protein shapes that crucially
determine their function and evolutionary dynamics,” Sander said.
Full article: Protein 3D Structure Computed from Evolutionary Sequence Variation