A preview of this full-text is provided by Springer Nature.
Content available from Nature
This content is subject to copyright. Terms and conditions apply.
706 | Nature | Vol 577 | 30 January 2020
Article
Improved protein structure prediction using
potentials from deep learning
Andrew W. Senior1,4*, Richard Evans1,4, John Jumper1,4, James Kirkpatrick1,4, Laurent Sifre1,4,
Tim Green1, Chongli Qin1, Augustin Žídek1, Alexander W. R. Nelson1, Alex Bridgland1,
Hugo Penedones1, Stig Petersen1, Karen Simonyan1, Steve Crossan1, Pushmeet Kohli1,
David T. Jones2,3, David Silver1, Koray Kavukcuoglu1 & Demis Hassabis1
Protein structure prediction can be used to determine the three-dimensional shape of
a protein from its amino acid sequence1. This problem is of fundamental importance
as the structure of a protein largely determines its function2; however, protein
structures can be dicult to determine experimentally. Considerable progress has
recently been made by leveraging genetic information. It is possible to infer which
amino acid residues are in contact by analysing covariation in homologous
sequences, which aids in the prediction of protein structures3. Here we show that we
can train a neural network to make accurate predictions of the distances between
pairs of residues, which convey more information about the structure than contact
predictions. Using this information, we construct a potential of mean force4 that can
accurately describe the shape of a protein. We nd that the resulting potential can be
optimized by a simple gradient descent algorithm to generate structures without
complex sampling procedures. The resulting system, named AlphaFold, achieves high
accuracy, even for sequences with fewer homologous sequences. In the recent Critical
Assessment of Protein Structure Prediction5 (CASP13)—a blind assessment of the state
of the eld—AlphaFold created high-accuracy structures (with template modelling
(TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the
next best method, which used sampling and contact information, achieved such
accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance
in protein-structure prediction. We expect this increased accuracy to enable insights
into the function and malfunction of proteins, especially in cases for which no
structures for homologous proteins have been experimentally determined7.
Proteins are at the core of most biological processes. As the function of
a protein is dependent on its structure, understanding protein struc-
tures has been a grand challenge in biology for decades. Although
several experimental structure determination techniques have been
developed and improved in accuracy, they remain difficult and time-
consuming
2
. As a result, decades of theoretical work has attempted to
predict protein structures from amino acid sequences.
CASP5 is a biennial blind protein structure prediction assessment
run by the structure prediction community to benchmark progress in
accuracy. In 2018, AlphaFold joined 97 groups from around the world in
entering CASP13
8
. Each group submitted up to 5 structure predictions
for each of 84 protein sequences for which experimentally determined
structures were sequestered. Assessors divided the proteins into 104
domains for scoring and classified each as being amenable to template-
based modelling (TBM, in which a protein with a similar sequence has
a known structure, and that homologous structure is modified in
accordance with the sequence differences) or requiring free model-
ling (FM, in cases in which no homologous structure is available), with
an intermediate (FM/TBM) category. Figure1a shows that AlphaFold
predicts more FM domains with high accuracy than any other system,
particularly in the 0.6–0.7TM-score range. The TM score—ranging
between 0 and 1—measures the degree of match of the overall (back-
bone) shape of a proposed structure to a native structure. The assessors
ranked the 98 participating groups by the summed, capped z-scores of
the structures, separated according to category. AlphaFold achieved
a summed z-score of 52.8 in the FM category (best-of-five) compared
with 36.6 for the next closest group (322). Combining FM and TBM/FM
categories, AlphaFold scored 68.3 compared with 48.2. AlphaFold is
able to predict previously unknown folds to high accuracy (Fig.1b).
Despite using only FM techniques and not using templates, AlphaFold
also scored well in the TBM category according to the assessors’ for-
mula 0-capped z-score, ranking fourth for the top-one model or first
for the best-of-five models. Much of the accuracy of AlphaFold is due
to the accuracy of the distance predictions, which is evident from the
high precision of the corresponding contact predictions (Fig.1c and
Extended Data Fig.2a).
https://doi.org/10.1038/s41586-019-1923-7
Received: 2 April 2019
Accepted: 10 December 2019
Published online: 15 January 2020
1DeepMind, London, UK. 2The Francis Crick Institute, London, UK. 3University College London, London, UK. 4These authors contributed equally: Andrew W. Senior, Richard Evans, John Jumper,
James Kirkpatrick, Laurent Sifre. *e-mail: andrewsenior@google.com
Content courtesy of Springer Nature, terms of use apply. Rights reserved