Page 1

A Bayesian View on Cryo-EM Structure Determination

Sjors H. W. Scheres

MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK

Received 4 August 2011;

received in revised form

27 October 2011;

accepted 3 November 2011

Available online

12 November 2011

Edited by W. Baumeister

Keywords:

cryo-electron microscopy;

three-dimensional

reconstruction;

maximum a posteriori

estimation

Three-dimensional (3D) structure determination by single-particle analysis

of cryo-electron microscopy (cryo-EM) images requires many parameters to

be determined from extremely noisy data. This makes the method prone to

overfitting, that is, when structures describe noise rather than signal, in

particular near their resolution limit where noise levels are highest. Cryo-

EM structures are typically filtered using ad hoc procedures to prevent

overfitting, but the tuning of arbitrary parameters may lead to subjectivity

in the results. I describe a Bayesian interpretation of cryo-EM structure

determination, where smoothness in the reconstructed density is imposed

through a Gaussian prior in the Fourier domain. The statistical framework

dictates how data and prior knowledge should be combined, so that the

optimal 3D linear filter is obtained without the need for arbitrariness and

objective resolution estimates may be obtained. Application to experimental

data indicates that the statistical approach yields more reliable structures

than existing methods and is capable of detecting smaller classes in data sets

that contain multiple different structures.

© 2011 Elsevier Ltd. All rights reserved.

Introduction

With recent reports on near-atomic-resolution

(i.e., 3–4 Å) structures for several icosahedral viruses

and resolutions in the range of 4–6 Å for complexes

with less or no symmetry, cryo-electron microscopy

(cryo-EM) single-particle analysis has entered the

exciting stage where it may be used for de novo

generation of atomic models.1However, the obser-

vation that reported resolutions vary significantly

for maps with otherwise similar features2is an

indication that existing reconstruction methods

suffer from different degrees of overfitting. Over-

fitting occurs when the reconstruction describes

noise instead of the underlying signal in the data,

and often, these noisy features are enhanced during

iterative refinement procedures. Thereby, overfit-

ting is not merely an issue of comparing the

resolution of one reconstruction with another but

represents a major obstacle in the objective analysis

of cryo-EM maps. In particular, without a useful

cross-validation tool, such as the free R-factor in X-

ray crystallography,3overfitting may remain unde-

tected and a map may be interpreted at a resolution

where the features are mainly due to noise.

At the heart of the problem lies the indirectness of

the experimental observations. A reasonably good

model is available for the image formation process.

Given a three-dimensional (3D) structure, this so-

called forward model describes the appearance of

the experimental images. However, the problem of

single-particle reconstruction is the inverse one and

is much more difficult to solve. The structure

determination task is further complicated by the

lack of information about the relative orientations of

all particles and, in the case of structural variability

in the sample, also their assignment to a structurally

unique class. These data are lost during the

experiment, where molecules in distinct conforma-

tions coexist in solution and adopt random orienta-

tions in the ice. In mathematics, this type of problem

where part of the data is missing is called incomplete.

Moreover, because the electron exposure of the

sample needs to be strictly limited to prevent

E-mail address: scheres@mrc-lmb.cam.ac.uk.

Abbreviations used: 3D, three-dimensional; cryo-EM,

cryo-electron microscopy; SNR, signal-to-noise ratio; ML,

maximum likelihood; MAP, maximum a posteriori; 2D, two-

dimensional; FSC, Fourier shell correlation; EF-G,

elongation factor G; CTF, contrast transfer function.

doi:10.1016/j.jmb.2011.11.010J. Mol. Biol. (2012) 415, 406–418

Contents lists available at www.sciencedirect.com

Journal of Molecular Biology

journal homepage: http://ees.elsevier.com.jmb

0022-2836/$ - see front matter © 2011 Elsevier Ltd. All rights reserved.

Page 2

radiation damage, experimental cryo-EM imagesare

extremely noisy. The high levels of noise together

with the incompleteness of the data mean that cryo-

EM structures are not fully determined by the

experimental data and therefore prone to over-

fitting. In mathematical terms, the cryo-EM struc-

ture determination problem is ill-posed.

Ill-posed problems can be tackled by regulariza-

tion, where the experimental data are complemen-

ted with external or prior information so that the

two sources of information together fully deter-

mine a unique solution. A particularly powerful

source of prior information about cryo-EM re-

constructions is smoothness. Because macromole-

cules consist of atoms that are connected through

chemical bonds, the scattering potential will vary

smoothly in space, especially at less than atomic

resolution. The concept of imposing smoothness to

prevent overfitting is widely used in the field

through a variety of ad hoc filtering procedures. By

limiting the power of the reconstruction at those

frequencies where the signal-to-noise ratio (SNR) is

low, these filters impose smoothness on the

reconstructed density in real space. Traditionally,

filtering procedures have relied on heuristics, that

is, to some extent, existing implementations are all

based on arbitrary decisions. Although potentially

highly effective (and this is illustrated by the high-

resolution structures mentioned above), the heuris-

tics in these methods often involve the tuning of

free parameters, such as low-pass filter shape and

effective resolution (e.g., see Ref. 4). Thereby, the

user (or, in some cases, the programmer) becomes

responsible for the delicate balance between getting

the most out of the data and limiting overfitting,

which ultimately may lead to subjectivity in the

structure determination process.

Recent attention for statistical image processing

methods5could be explained by a general interest in

reducing the amount of heuristics in cryo-EM

reconstruction procedures. Rather than combining

separate steps of particle alignment, class averaging,

filtering, and 3D reconstruction, each of which may

involve arbitrary decisions, the statistical approach

seeks to maximize a single probability function.

Most of the statistical methods presented thus far

have optimized a likelihood function, that is, one

aims to find the model that has the highest

probability of being the correct one in the light of

the observed data. This has important theoretical

advantages, as the maximum likelihood (ML)

estimate is asymptotically unbiased and efficient.

That is, in the limit of very large data sets, the ML

estimate is as good as or better than any other

estimate of the true model (see Ref. 6 for a recent

review on ML methods in cryo-EM). In practice,

however, data sets are not very large, and also in the

statistical approach, the experimental data may

need to be supplemented with prior information in

order to define a unique solution. In Bayesian

statistics, regularization is interpreted as imposing

prior distributions on model parameters, and the

ML optimization target may be augmented with

such prior distributions. Optimization of the result-

ing posterior distribution is called regularized likeli-

hood optimization, or maximum a posteriori (MAP)

estimation (see Ref. 7).

In this paper, I will show that MAP estimation

provides a self-contained statistical framework in

which the regularized single-particle reconstruction

problem can be solved with only a minimal amount

of heuristics. As a prior, I will use a Gaussian

distribution on the Fourier components of the signal.

Neither the use of this prior nor that of the Bayesian

treatment of cryo-EM data is a new idea. Standard

textbooks on statistical inference use the same prior

in a Bayesian interpretation of the commonly used

Wiener filter (e.g., see Ref. 7, pp. 549–551), and an

early mention of MAP estimation with a Gaussian

prior in the context of 3D EM image restoration was

given by Carazo.8Nevertheless, even though these

ideas have been around for many years, the

Bayesian approach has thus far not found wide-

spread use in 3D EM structure determination (see

Ref. 9 for a recent application). This limited use

contrasts with other methods in structural biology.

Recently, Bayesian inference was shown to be highly

effective in NMR structure determination,10while

the Bayesian approach was introduced to the field of

X-ray crystallography many years ago11and MAP

estimation is now routinely used in crystallographic

refinement.12

In what follows, I will first describe some of the

underlying theory of existing cryo-EM structure

determination procedures to provide a context for

the statistical approach. Then, I will derive an

iterative MAP estimation algorithm that employs a

Gaussian prior on the model in Fourier space.

Because statistical assumptions about the signal

and the noise are made explicit in the target

function, straightforward calculus in the optimiza-

tion of this target leads to valuable new insights into

the optimal linear (or Wiener) filter in the context of

3D reconstruction and the definition of the 3D SNR

in the Fourier transform of the reconstruction.

Moreover, because the MAP algorithm requires

only a minimum amount of heuristics, arbitrary

decisions by the user or the programmer may be

largely avoided, and objectivity may be preserved. I

will demonstrate the effectiveness of the statistical

approach by application to three cryo-EM data sets

and compare the results with those obtained using

conventional methods. Apart from overall improve-

ments in the reconstructed maps and the ability to

detect smaller classes in structurally heterogeneous

data sets, the statistical approach reduces overfitting

and provides reconstructions with more reliable

resolution estimates.

407

A Bayesian View on Cryo-EM Structure Determination

Page 3

Theory

Conventional methods

Many different procedures have been implemen-

ted to determine 3D structures from cryo-EM

projection data. The following does not seek to

describe all of them but, rather, aims to provide an

accessible introduction to the Bayesian approach

described below. For an extensive review of existing

cryo-EM methods, the reader is referred to the book

by Frank4or to the more recent volumes 481–483 of

the book series Methods in Enzymology.13

Almost all existing implementations for cryo-EM

structure determination employ the so-called weak-

phase object approximation, which leads to a linear

image formation model in Fourier space:

Xij= CTFij

X

L

l=1

Pf

jlVl+ Nij

ð1Þ

where:

• Xijis the jth component, with j=1,…,J, of the

two-dimensional (2D) Fourier transform of

the ith experimental image Xi, with i=1,…,N.

• CTFij is the jth component of the contrast

transfer function for the ith image. Some

implementations, such as EMAN,14include

an envelope function on the contrast transfer

function (CTF) that describes the fall-off of

signalwithresolution.Otherimplementations,

such as FREALIGN,15ignore envelope func-

tions at this stage and correct for signal fall-off

through B-factor sharpening of the map after

refinement.16The latter intrinsically assumes

identical CTF envelopes for all images.

• Vlis the lth component, with l=1,…,L, of the

3D Fourier transform V of the underlying

structure in the data set. Estimating V is the

objective of the structure determination

process. For the sake of simplicity, only the

structurally homogeneous case is described

here. Nevertheless, Eq. (1) may be expanded

to describe structural heterogeneity, that is,

data sets that contain more than one under-

lying 3D structure, by adding a subscript: Vk,

with k=1,…,K. Often, K is assumed to be

known,17so that each experimental image

can be described as a projection of one of K

different structures, each of which needs to

be estimated from the data.

• Pϕis a J ×L matrix of elements Pjl

operationPL

ϕ. The

l = 1Pf

jlVlfor all j extracts a slice

out of the 3D Fourier transform of the

underlying structure, and Φ defines the

orientation of the 2D Fourier transform

with respect to the 3D structure, comprising

a 3D rotation and a phase shift accounting

for a 2D origin offset in the experimental

image. Similarly, the operationPJ

experimental image back into the 3D trans-

form. According to the projection-slice theo-

rem, these operations are equivalent to the

real-space projection and “back-projection”

operations. Some implementations calculate

(back)-projections in real space, such as

XMIPP;18other implementations, such as

FREALIGN,15perform these calculations in

Fourier space.

• Nijis noise in the complex plane. Although

explicit assumptions about the statistical

characteristics of the noise are not often

reported,commonlyemployedWienerfilters

and cross-correlation goodness-of-fit mea-

sures rely on the assumption that the noise

is independent and Gaussian distributed.

j = 1PfT

ljXij

for all l places the 2D Fourier transform of an

After selection of the individual particles from the

digitized micrographs, the experimental observa-

tions comprise N images Xi. From the micrographs,

one may also calculate the CTFs, which are then kept

constant in most procedures. The estimation of V

from all Xiand CTFiis then typically accomplished

by an iterative procedure (called refinement) that

requires an initial, often low-resolution, 3D refer-

ence structure V(0). As this paper is primarily

concerned with refinement, the reader is referred

to the books mentioned above for more information

about how these starting models may be obtained.

At every iteration (n) of the refinement process,

projections of V(n)are calculated for many different

orientations ϕ and compared with each of the

experimental images. Based on some goodness-of-

fit measure, an optimal orientation ϕi⁎is assigned to

each image. All images are then combined into a 3D

reconstruction that yields the updated model V(n+1).

Many different reconstruction algorithms are avail-

able, but their description falls outside the scope of

this paper (again, the reader is referred to the books

mentioned above). In what follows, I will focus on a

class of algorithms that has been termed direct

Fourier inversion and will mostly ignore complica-

tions due to interpolations and nonuniform sam-

pling of Fourier space. The update formula for V

may then be given by (for all l):

Vnþ1

l

ðÞ

=

PN

i = 1

PN

PJ

j = 1PfT

PJ

i

lj

TCTFijXij

i = 1

j = 1PfT

i

lj

TCTF2

ij

ð2Þ

408

A Bayesian View on Cryo-EM Structure Determination

Page 4

and this procedure is typically repeated until

changes in V and/or ϕi⁎ become small. It is

important to realize that this refinement is a local

optimization procedure that is prone to becoming

stuck in local minima (and the same is true for the

statistical approach outlined below). Consequently,

the initial reference structure V(0)may have an

important effect on the outcome of the refinement,

as wrong initial models could lead to incorrect

solutions. Still, if one ignores local minima and if the

goodness-of-fit measure used in the assignment of

all ϕi⁎is a least-squares or cross-correlation criterion,

then one could argue that this procedure provides a

least-squares estimate of the true 3D structure.

However, as explained in Introduction, the ob-

served data alone are not sufficient to uniquely

determine the correct solution. Consequently, with-

out the inclusion of additional, prior information V

may become very noisy, especially at frequencies

where many CTFs have zero or small values and at

high frequencies where SNRs are lowest. Many

existing implementations reduce the noise levels in

V by means of a so-called Wiener filter. This image

restoration method is based on minimization of the

mean-square error between the estimate and the

true signal and effectively regularizes the ill-posed

problem by introducing prior knowledge about the

correlation structure of the signal and the noise.8

Most often, Wiener filter expressions are given for

the case of 2D averaging, as relatively little work is

published on the Wiener filter for 3D

reconstruction.19If one assumes that both the signal

and the noise are independent and Gaussian

distributed with power spectra τ2(υ) for the signal

and power spectra σi2(υ) for the noise, with v being

the frequency, then (variants of) the following

expression for the Wiener filter for 2D averaging

are often reported:20

PN

i = 1

j2o

Aj=

i = 1

H2o

j2o

H2o

ð Þ

ð Þ

ð ÞCTFijXij

ð ÞCTF2

PN

ij+ 1

ð3Þ

where Ajis the jth component of the 2D Fourier

transform of average image A.

The addition of one in the denominator of Eq. (3)

reduces noise by reducing the power in the average

forthoseFouriercomponentswherePN

filter, the first of which is recognized much more

often than the second. (i) The Wiener filter corrects

for the CTF, that is, A will represent the original

signal, unaffected by the CTF. (ii) The Wiener filter

alsoactsasalow-passfilter. Ifone ignorestheCTFin

the Wiener filter expression by setting all CTFijin Eq.

(3) equal to 1, then a filter remains that solely

depends on the resolution-dependent SNR

Since theSNR incryo-EMimages ofmacromolecular

i = 1

H2o

j2o

ð Þ

ð ÞCTF2

ij

is small. One could discern two effects of the Wiener

H2o

j2o

ð Þ

ð Þ

??

.

images typically drops quickly with resolution (e.g.,

see Fig. 3a), this will effectively be a low-pass filter.

In the case of 3D reconstruction, consensus about

the Wiener filter has not yet been reached, and

existing implementations have worked around this

problem by employing a variety of ad hoc

procedures.19Two common approximations are to

apply Wiener filtering to 2D (class) averages and/or

to assume that

j2o

Wiener constant. Examples of these two approxi-

mations may be found in EMAN14

FREALIGN,15respectively. If one assumes that

the SNR is a constant 1/C, then 3D reconstruction

with Wiener filtering has been expressed as (e.g.,

see Ref. 15):

H2o

ð Þ

ð Þis a constant, the so-called

and

Vnþ1

l

ðÞ

=

PN

i = 1

i = 1

PJ

j = 1PfT

j = 1PfT

i

lj

TCTFijXij

PN

PJ

i

lj

TCTF2

ij+ C

ð4Þ

In many software packages, the heuristics in the

Wiener filter implementation have resulted in

additional free parameters, such as the Wiener

constant (C). Moreover, as existing implementa-

tions typically fail to adequately reproduce the

low-pass filtering effect of the true Wiener filter, it

is common practice to apply ad hoc low-pass filters

to V during the iterative refinement. This typically

involves the tuning of even more parameters, such

as effective resolution and filter shape. Suboptimal

use of these arbitrary parameters may lead to the

accumulation of noise in the reconstructed density

and overfitting of the data. Consequently, a certain

level of expertise is typically required to obtain the

optimal estimate of V, which may ultimately lead

to subjectivity in the cryo-EM structure determina-

tion process.

A Bayesian view

The statistical approach explicitly optimizes a

single target function. Imagining an ensemble of

possible solutions, the reconstruction problem is

formulated as finding the model with parameter set

Θ that has the highest probability of being the

correct one in the light of both the observed data X

and the prior information Y. According to Bayes'

law, this so-called posterior distribution factorizes

into two components:

P QjX;Y

ðÞ~ P XjQ;Y

ðÞP QjY

ðÞð5Þ

where the likelihood P(X|Θ,Y) quantifies the proba-

bility of observing the data given the model, and the

prior P(Θ|Y) expresses how likely that model is

given the prior information. The model Θ̂ that

optimizes P(Θ|X,Y) is called the MAP estimate.

[Note that previously discussed ML methods

optimize P(X|Θ,Y).]

409

A Bayesian View on Cryo-EM Structure Determination

Page 5

The statistical approach employs the same image

formation model as described in Eq. (1) but

explicitly assumes that all noise components Nij

are independent and Gaussian distributed. The

variance σij2of these noise components is unknown

and will be estimated from the data. Variation of σij2

with resolution allows the description of nonwhite

or colored noise. The assumption of independence

in the noise allows the probability of observing an

image given its orientation and the model to be

calculated as a multiplication of Gaussians over all

its Fourier components,21so that:

P Xijf;Q;Y

ðÞ =

Y

J

j=1

1

2kj2

ij

exp

?

jXij−CTFij

PL

l=1Pf

jlVlj2

−2j2

ij

!

ð6Þ

The correct orientations ϕ for all images are not

known. They are treated as hidden variables and are

integrated out. The corresponding marginal likeli-

hood function of observing the entire data set X is

then given by:

P XjQ;Y

ðÞ =

Y

N

i=1

Z

f

P Xijf;Q;Y

ðÞP fjQ;Y

ðÞdf ð7Þ

where P(ϕ|Θ,Y) expresses prior information about

the distribution of the orientations. These distribu-

tions may include Gaussian distributions on the

origin offsets (e.g., see Ref. 6) but their exact

expression and the corresponding parameters will

be ignored in what follows.

Calculation of the prior relies on the assumption of

smoothness in the reconstruction. Smoothness is

encoded in the assumption that all Fourier compo-

nents Vlare independent and Gaussian distributed

with zero mean and unknown variance τl2, so that:

P QjY

ðÞ =

Y

L

l=1

1

2kH2

l

exp

jVlj2

−2H2

l

??

ð8Þ

The assumption of zero-mean Fourier compo-

nents of the underlying 3D structures may seem

surprising at first. However, given that Fourier

components may point in any (positive or negative)

direction in the complex plane, their expected value

in the absence of experimental data will indeed be

zero. The regularizing behavior of this prior is

actually through its scale parameter τl2. By impos-

ing small values of τl2on high-frequency compo-

nents of V, one effectively limits the power of the

signal at those frequencies, which acts like a low-

pass filter in removing high-frequency noise, and

thus imposes smoothness. Note that the explicit

assumptions of independent, zero-mean Gaussian

distributions for both the signal and the noise in the

statistical approach are the same ones that underlie

the derivation of the Wiener filter described above.

Eqs. (6–8) together define the posterior distribu-

tion as given in Eq. (5). For a given set of images Xi

and their CTFs, one aims to find the best values for

all Vl, τl2, and σij2. Optimization by expectation

maximization22yields the following algorithm (also

see Fig. 1):

R

PN

Vnþ1

l

ðÞ

=

PN

i = 1

i = 1

R

fGn

ð Þ

if

PJ

j=1PfT

j=1PfT

ljCTFijXij

j2 n

ð Þ

ij

df

fCn

ð Þ

i/

PJ

ljCTF2

j2 n

ij

ð Þ

ij

df +

1

ð Þ

l

H2 n

ð9Þ

j2 nþ1

ij

ðÞ

=1

2

Z

f

Gn

ð Þ

ifjXij−CTFij

X

L

l=1

Pf

jlVn

ð Þ

l

j2df ð10Þ

H2 n + 1

l

ðÞ

=1

2jVn + 1

ð

l

Þ

j2

ð11Þ

where Γiϕ

ith image, given the model at iteration number (n),

which is calculated as:

?

R

Just like in related ML methods,6rather than

assigning an optimal orientation ϕi⁎to each image,

probability-weighted integrals over all possible

orientations are calculated. Apart from that, Eq. (9)

bears obvious resemblance to previously reported

expressions of the Wiener filter for 3D reconstruc-

tion [see Eq. (4)]. This may not come as a surprise,

since both derivations were based on the same

image formation model and the same statistical

assumptions about the signal and the noise. How-

ever, Eq. (9) was derived by straightforward

optimization of the posterior distribution and does

not involve any arbitrary decisions. As is typical for

parameter estimation inside the expectation–maxi-

mization algorithm, both the power of the noise and

the power of the signal are learned from the data in

an iterative manner through Eqs. (10) and (11),

respectively. The result is that Eq. (9) will yield an

estimate of V that is both CTF corrected and low-

pass filtered, and in which uneven distributions of

the orientations of the experimental images are

taken into account. As such, to my knowledge, this

expression provides the first implementation of the

intended meaning of the Wiener filter in the case of

3D reconstruction.

The relative contribution of the two additive terms

in the denominator of Eq. (9) also gives an objective

(n)is the posterior probability of ϕ for the

Gn

ð Þ

if=

P Xijf;Qn

fVP XijfV;Qn

ð Þ;Y

?

P fjQn

P fVjQn

ð Þ;Y

??

ð Þ;Y

??

ð Þ;Y

??

dfV

ð12Þ

410

A Bayesian View on Cryo-EM Structure Determination