A Microhomology-Mediated Break-Induced Replication
Model for the Origin of Human Copy Number Variation
P. J. Hastings1*, Grzegorz Ira1, James R. Lupski1,2,3
1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America, 2Department of Pediatrics, Baylor College of
Medicine, Houston, Texas, United States of America, 3Texas Children’s Hospital, Houston, Texas, United States of America
Abstract: Chromosome structural changes with nonre-
current endpoints associated with genomic disorders
offer windows into the mechanism of origin of copy
number variation (CNV). A recent report of nonrecurrent
duplications associated with Pelizaeus-Merzbacher dis-
ease identified three distinctive characteristics. First, the
majority of events can be seen to be complex, showing
discontinuous duplications mixed with deletions, inverted
duplications, and triplications. Second, junctions at end-
points show microhomology of 2–5 base pairs (bp). Third,
endpoints occur near pre-existing low copy repeats
(LCRs). Using these observations and evidence from
DNA repair in other organisms, we derive a model of
microhomology-mediated break-induced replication
(MMBIR) for the origin of CNV and, ultimately, of LCRs.
We propose that breakage of replication forks in stressed
cells that are deficient in homologous recombination
induces an aberrant repair process with features of break-
induced replication (BIR). Under these circumstances,
single-strand 39 tails from broken replication forks will
anneal with microhomology on any single-stranded DNA
nearby, priming low-processivity polymerization with
multiple template switches generating complex rear-
rangements, and eventual re-establishment of processive
In the past few years, we have learnt that a major component of
the differences between individuals is variation in the number of
copies of segments of the genome, and of genes included in these
segments (copy number variation or CNV) (for definition of
abbreviations, see Table 1). A considerable portion of the genome
is involved in CNV [1–11]—with estimates of up to 12% —
which can arise meiotically and also somatically as shown by the
finding that identical twins can differ in CNV . CNV has been
a significant component of primate evolution [13–16]. Here we
draw on evidence on the mechanism of DNA transactions in
Escherichia coli, yeast, Drosophila, mammals, and human cancer to
derive a model for the origin of CNV based on the mechanism of
BIR occurring at sites of microhomology (microhomology-
mediated BIR or MMBIR).
Although we can see that considerable variation in copy
number is tolerated or is advantageous to its carrier, some genes
are dosage-sensitive, and duplication or deletion involving these
genes gives rise to human clinical phenotypes collectively referred
to as genomic disorders . This has allowed the ascertainment
of structural changes and thus the study of the origin of CNV. For
recurrent rearrangements, much CNV stems from homologous
recombination between segments that already occur as two or
more copies. When this happens, sequences that lie between the
repeats that recombine will be either duplicated or deleted, thus
changing the copy number. This process is referred to as nonallelic
homologous recombination, or NAHR . The repeated
sequences that recombine might occasionally be highly repetitive
sequences that occur widely in the human genome  but are
usually sequences that occur only twice or a few times (i.e., low-
copy repeats, LCRs, or segmental duplications, SDs). The LCRs
tend to occur in clusters in highly complex regions of the genome.
These repeated segments might be short (about 10 kilobases (kb)),
or up to several hundreds of kb in length, and they occur in either
orientation. Some examples of genomic complex regions are
shown in Figure 1.
The endpoints of CNVs that arose by NAHR occur in a few
positions where there is sufficient homology for homologous
recombination. Although many genomic disorders arise by NAHR
, some rearrangements have endpoints in many different
positions. These CNVs arose de novo by rearrangements at sites
that lack extensive homology. Recent evidence on the distribution
of nonpathological CNVs in two individuals suggests that most
differences in copy number from the reference sequence arose by
nonrecurrent events . Thus nonrecurrent chromosomal chang-
es arise quite frequently . Because the nonrecurrent events
presumably reflect the origin of most genome complexity, the
study of them is important to the understanding of genomic
disorders, genetic variability due to CNV, and human evolution.
Pelizaeus-Merzbacher disease (PMD; Online Mendelian Inher-
itance in Man (OMIM) accession code 312080; http://www.ncbi.
nlm.nih.gov/omim/) is a recessive X-linked genomic disorder
affecting the central nervous system that arises by nonrecurrent
chromosomal changes. The changes involve duplication, triplica-
tion, or deletion of the PLP1 gene. The clinical phenotype allows
identification of individuals showing nonrecurrent chromosomal
changes in the PLP region. In a study of the structural variation in
the genomes of patients with PMD, Lee et al.  describe some
aspects of the fine structure of newly arising CNVs with
nonrecurrent endpoints and report three striking properties of
Citation: Hastings PJ, Ira G, Lupski JR (2009) A Microhomology-Mediated Break-
Induced Replication Model for the Origin of Human Copy Number Variation. PLoS
Genet 5(1): e1000327. doi:10.1371/journal.pgen.1000327
Published January 30, 2009
Copyright: ? 2009 Hastings et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.
Funding: This work was supported by grants from the National Institutes of
Health; R01 GM64022 to PJH and R01 GM80600 to GI.
* E-mail: email@example.com
Editor: Ivan Matic, Universite ´ Paris Descartes, INSERM U571, France
PLoS Genetics | www.plosgenetics.org1 January 2009 | Volume 5 | Issue 1 | e1000327
their structure that help us to understand the origin of CNVs.
First, the authors report that the novel junctions form at sites of
microhomology, i.e., lengths of homology 2 to 5 nucleotides long
that are too short to support homologous recombination. Such
junctions have been reported previously in cases of nonrecurrent
endpoints of deletions and duplications [19,23,24]. Second, they
observed that the new structures are complex, showing duplication
and deletion interspersed with nonduplicated or with triplicated
lengths, and showing duplicated segments in either orientation.
These characteristics were reported previously [25–31]. Third,
although these events did not arise by NAHR, the novel junctions
tend to occur in close proximity to LCRs [32–34]. Figures 2 and 3
illustrate examples of these complex non-recurrent events.
Nonrecurrent rearrangements had previously been attributed to
amechanism of nonhomologous
[19,20,24,33]. However, the characteristics of microhomology
junctions and structural complexity in these new structures, as
revealed by nucleotide sequencing and high-resolution array
comparative genomic hybridization, led Lee et al.  to propose
that the rearrangements arose through a replication-based
mechanism termed FoSTeS (fork stalling and template switching),
a mechanism proposed previously for amplification in E. coli .
Replication-based models have also been proposed to explain the
origin of gross chromosomal rearrangements seen in a low
proportion of patients with cystic fibrosis and hemophilia A.
Analysis of deletions of the genes involved reveals complex
structures similar to those described for PLP1 [28,29,31].
Genome Rearrangements in Cancer
The amount of structural variation in cancer cells is sometimes
so extreme  that it is not possible to determine which changes
occurred within the same event. However, it can be seen that
duplications are often discontinuous, and junction regions include
insertions of nearby, unlinked, and unknown sequences, and
deletions and inversions , showing that rearrangement events
in cancer cells are complex. Many studies report microhomology
at junctions of a large proportion of the structural variation (e. g.,
[37–39]). Studies of translocation endpoints in leukemia and other
cancers find that many junctions have microhomology and are
associated with insertions and deletions of various lengths [40–42].
These observations are compatible with at least some of the
genomic instability seen in tumor formation and progression
having stemmed from the same underlying mechanism as the
formation of nonrecurrent duplications in genomic disorders.
Involvement of Replication in Chromosomal
In the Lac assay system in E. coli , amplification of the lac
operon to 20–100 copies occurs in response to the stress of
starvation [44,45]. The novel junctions of the amplified segments
(amplicons) show that endpoints occurred at sites of microhomol-
ogy of 2–15 bp [35,46]. Some of the amplicons are complex,
containing both direct and inverted repeats. Many others cannot
Table 1. Abbreviations Used in the Text.
BIRBreak-induced replication, a recombination-based mechanism for restarting broken replication forks.
CNV Copy number variation, variation within a population of the number of copies of a gene or length of genome.
DSB Double-strand break, a break in both strands of a DNA molecule.
FoSTeS Fork stalling and template switching, a replicative mechanism for changing chromosome structure.
LCR Low copy repeat, a length of genome that occurs twice or a few times.
MMBIRMicrohomology-mediated break-induced replication, a replication-based mechanism of recombination between sequences with very little
base identity, proposed here.
NAHR Nonallelic homologous recombination, homologous recombination occurring between low copy repeats.
NHEJNonhomologous end joining, a mechanism for repair of DNA double-strand breaks that does not require homology.
SDSegmental duplication, a repetition of a length of genome.
Figure 1. In silico analyses revealed complex genomic architecture in regions of nonrecurrent rearrangement. (A) The ,3 Mb
surrounding the PLP1 gene and (B) the ,4 Mb surrounding the MECP2 gene on the X chromosome contain numerous LCRs in various orientations
[33,106]. LCRs are represented by the colored block arrows, and like LCR copies are designated by color and letter for a given sequence. Orientation is
depicted by the direction of the block arrow.
PLoS Genetics | www.plosgenetics.org2 January 2009 | Volume 5 | Issue 1 | e1000327
49. Ikeda H, Shimizu H, Ukita T, Kumagai M (1995) A novel assay for illegitimate
recombination in Escherichia coli: stimulation of lambda bio transducing phage
formation by ultra-violet light and its independence from RecA function. Adv
Biophys 31: 197–208.
50. Albertini AM, Hofer M, Calos MP, Miller JH (1982) On the formation of
spontaneous deletions: the importance of short sequence homologies in the
generation of large deletions. Cell 29: 319–328.
51. Farabaugh PJ, Schmeissner U, Hofer M, Miller JH (1978) Genetic studies of
the lac repressor. VII. On the molecular nature of spontaneous hotspots in the
lacI gene of Escherichia coli. J Mol Biol 126: 847–857.
52. Shimizu H, Yamaguchi H, Ashizawa Y, Kohno Y, Asami M, et al. (1997)
Short-homology-independent illegitimate recombination in Escherichia coli:
distinct mechanism from short-homology-dependent illegitimate recombina-
tion. J Mol Biol 266: 297–305.
53. Bzymek M, Lovett ST (2001) Instability of repetitive DNA sequences: the role
of replication in multiple mechanisms. Proc Natl Acad Sci U S A 98:
54. Ponder RG, Fonville NC, Rosenberg SM (2005) A switch from high-fidelity to
error-prone DNA double-strand break repair underlies stress-induced muta-
tion. Mol Cell 19: 791–804.
55. Payen C, Koszul R, Dujon B, Fischer G (2008) Segmental duplications arise
from Pol32-dependent repair of broken forks through two alternative
replication-based mechanisms. PLoS Genet 4: e1000175. doi:10.1371/
56. Branzei D, Foiani M (2007) Template Switching: From Replication Fork
Repair to Genome Rearrangements. Cell 131: 1228–1230.
57. Merrihew RV, Marburger K, Pennington SL, Roth DB, Wilson JH (1996)
High-frequency illegitimate integration of transfected DNA at preintegrated
target sites in a mammalian genome. Mol Cell Biol 16: 10–18.
58. Morrow DM, Connelly C, Hieter P (1997) ‘‘Break-copy’’ duplication: a model
for chromosome fragment formation in Saccharomyces cerevisiae. Genetics 147:
59. McEachern MJ, Haber JE (2006) Break-Induced Replication and Recombi-
national Telomere Elongation in Yeast. Annu Rev Biochem 75: 111–135.
60. Smith CE, Llorente B, Symington LS (2007) Template switching during break-
induced replication. Nature 447: 102–105.
61. Lydeard JR, Jain S, Yamaguchi M, Haber JE (2007) Break-induced replication
and telomerase-independent telomere maintenance require Pol32. Nature 448:
62. Motamedi M, Szigety SK, Rosenberg SM (1999) Double-strand-break repair in
Escherichia coli: physical evidence for a DNA replication mechanism in vivo.
Genes Dev 13: 2889–2903.
63. Llorente B, Smith CE, Symington LS (2008) Break-induced replication: what is
it and what is it for? Cell Cycle 7: 859–864.
64. Heiter P, Mann C, Snyder M, Davis RW (1985) Mitotic stability of yeast
chromosomes: A colony color assay that measures nondisjunction and
chromosome loss. Cell 40: 381–392.
65. Deem A, Barker K, Vanhulle K, Downing B, Vayl A, et al. (2008) Defective
break-induced replication leads to half-crossovers in Saccharomyces cerevisiae.
Genetics 179: 1845–1860.
66. Schmidt KH, Wu J, Kolodner RD (2006) Control of translocations between
highly diverged genes by Sgs1, the Saccharomyces cerevisiae homolog of the
Bloom’s syndrome protein. Mol Cell Biol 26: 5406–5420.
67. Bauters M, Van Esch H, Friez MJ, Boespflug-Tanguy O, Zenker M, et al.
(2008) Nonrecurrent MECP2 duplications mediated by genomic architecture-
driven DNA breaks and break-induced replication repair. Genome Res 18:
68. Lovett ST, Hurley RL, Sutera VA Jr, Aubuchon RH, Lebedeva MA (2002)
Crossing over between regions of limited homology in Escherichia coli. RecA-
dependent and RecA-independent pathways. Genetics 160: 851–859.
69. Liskay RM, Letsou A, Stachelek JL (1987) Homology requirement for efficient
gene conversion between duplicated chromosomal sequences in mammalian
cells. Genetics 115: 161–167.
70. Reiter LT, Hastings PJ, Nelis E, De Jonghe P, Van Broeckhoven C, et al.
(1998) Human meiotic recombination products revealed by sequencing a
hotspot for homologous strand exchange in multiple HNPP deletion patients.
Am J Hum Genet 62: 1023–1033.
71. VanHulle K, Lemoine FJ, Narayanan V, Downing B, Hull K, et al. (2007)
Inverted DNA repeats channel repair of distant double-strand breaks into
chromatid fusions and chromosomal rearrangements. Mol Cell Biol 27:
72. Davis AP, Symington LS (2004) RAD51-dependent break-induced replication
in yeast. Mol Cell Biol 24: 2344–2351.
73. Le S, Moore JK, Haber JE, Greider CW (1999) RAD50 and RAD51 define
two pathways that collaborate to maintain telomeres in the absence of
telomerase. Genetics 152: 143–152.
74. Teng SC, Zakian VA (1999) Telomere-telomere recombination is an efficient
bypass pathway for telomere maintenance in Saccharomyces cerevisiae. Mol Cell
Biol 19: 8083–8093.
75. Bentley J, Diggle CP, Harnden P, Knowles MA, Kiltie AE (2004) DNA double
strand break repair in human bladder cancer is error prone and involves
microhomology-associated end-joining. Nucleic Acids Res 32: 5249–5259.
76. Corneo B, Wendland RL, Deriano L, Cui X, Klein IA, et al. (2007) Rag
mutations reveal robust alternative end joining. Nature 449: 483–486.
77. Lisby M, Barlow JH, Burgess RC, Rothstein R (2004) Choreography of the
DNA damage response: spatiotemporal relationships among checkpoint and
repair proteins. Cell 118: 699–713.
78. Pennington JM, Rosenberg SM (2007) Spontaneous DNA breakage in single
living cells of Escherichia coli. Nat Gen 39: 797–802.
79. Saleh-Gohari N, Bryant HE, Schultz N, Parker KM, Cassel TN, et al. (2005)
Spontaneous homologous recombination is induced by collapsed replication
forks that are caused by endogenous DNA single-strand breaks. Mol Cell Biol
80. McIlwraith MJ, Vaisman A, Liu Y, Fanning E, Woodgate R, et al. (2005)
Human DNA polymerase eta promotes DNA synthesis from strand invasion
intermediates of homologous recombination. Mol Cell 20: 783–792.
81. Kawamoto T, Araki K, Sonoda E, Yamashita YM, Harada K, et al. (2005)
Dual roles for DNA polymerase eta in homologous DNA recombination and
translesion DNA synthesis. Mol Cell 20: 793–799.
82. Cannistraro VJ, Taylor JS (2007) Ability of polymerase eta and T7 DNA
polymerase to bypass bulge structures. J Biol Chem 282: 11188–11196.
83. Roth DB, Chang XB, Wilson JH (1989) Comparison of filler DNA at immune,
nonimmune, and oncogenic rearrangements suggests multiple mechanisms of
formation. Mol Cell Biol 9: 3049–3057.
84. Young SD, Marshall RS, Hill RP (1988) Hypoxia induces DNA overreplication
and enhances metastatic potential of murine tumor cells. Proc Natl Acad
Sci U S A 85: 9533–9537.
85. Coquelle A, Toledo F, Stern S, Bieth A, Debatisse M (1998) A new role for
hypoxia in tumor progression: induction of fragile site triggering genomic
rearrangements and formation of complex DMs and HSRs. Mol Cell 2:
86. Subarsky P, Hill RP (2003) The hypoxic tumour microenvironment and
metastatic progression. Clin Exp Metastasis 20: 237–250.
87. Bindra RSS, Chaffer PJ, Meng A, Woo J, Ma ˚seide K, et al. (2004) Down-
regulation of Rad51 and decreased homologous recombination in hypoxic
cancer cells. Mol Cell Biol 24: 8504–8518.
88. Bindra RS, Glazer PM (2007) Repression of RAD51 gene expression by E2F4/
p130 complexes in hypoxia. Oncogene 26: 2048–2057.
89. Huang LE, Bindra RS, Glazer PM, Harris AL (2007) Hypoxia-induced genetic
instability–a calculated mechanism underlying tumor progression. J Mol Med
90. Bindra RS, Crosby ME, Glazer PM (2007) Regulation of DNA repair in
hypoxic cancer cells. Cancer Metastasis Rev 26: 249–260.
91. McVey M, Adams M, Staeva-Vieira E, Sekelsky JJ (2004) Evidence for multiple
cycles of strand invasion during repair of double-strand gaps in Drosophila.
Genetics 167: 699–705.
92. Bindra RS, Glazer PM (2007) Co-repression of mismatch repair gene
expression by hypoxia in cancer cells: role of the Myc/Max network. Cancer
Lett 252: 93–103.
93. Mihaylova VT, Bindra RS, Yuan J, Campisi D, Narayanan L, et al. (2003)
Decreased expression of the DNA mismatch repair gene Mlh1 under hypoxic
stress in mammalian cells. Mol Cell Biol 23: 3265–3273.
94. Myung K, Chen C, Kolodner RD (2001) Multiple pathways cooperate in the
suppression of genome instability in Saccharomyces cerevisiae. Nature 411:
95. Lombardo M-J, Aponyi I, Rosenberg SM (2004) General stress response
regulator RpoS in adaptive mutation and amplification in Escherichia coli.
Genetics 166: 669–680.
96. Fishman-Lobell J, Haber JE (1992) Removal of nonhomologous DNA ends in
double-strand break recombination: the role of the yeast ultraviolet repair gene
RAD1. Science 258: 480–484.
97. Mortensen UH, Bendixen HC, Sunjevaric I, Rothstein R (1996) DNA strand
annealing is promoted by yeast Rad52 protein. Proc Natl Acad Sci U S A 93:
98. Tsukamoto Y, Kato J, Ikeda H (1996) Effects of mutations of RAD50, RAD51,
RAD52, and related genes on illegitimate recombination in Saccharomyces
cerevisiae. Genetics 142: 383–391.
99. Wu Y, Kantake N, Sugiyama T, Kowalczykowski SC (2008) Rad51 protein
controls Rad52-mediated DNA annealing. J Biol Chem 283: 14883–14892.
100. Lee K, Lee SE (2007) Saccharomyces cerevisiae Sae2- and Tel1-dependent
single-strand DNA formation at DNA break promotes microhomology-
mediated end joining. Genetics 176: 2003–2014.
101. Lupski JR (2007) An evolution revolution provides further revelation. Bioessays
102. Ohno S (1970) Evolution by gene duplication. Berlin, New York: Springer-
Verlag. 160 p.
103. Hurles M (2004) Gene duplication: the genomic trade in spare parts. PLoS Biol
2: e206. doi:10.1371/journal.pbio.0020206.
104. Hittinger CT, Carroll SB (2007) Gene duplication and the adaptive evolution
of a classic genetic switch. Nature 449: 677–681.
105. Spence JE, Perciaccante RG, Greig GM, Willard HF, Ledbetter DH, et al.
(1988) Uniparental disomy as a mechanism for human genetic disease.
Am J Hum Genet 42: 217–226.
106. Lee JA, Lupski JR (2006) Genomic rearrangements and gene copy-number
alterations as a cause of nervous system disorders. Neuron 52: 103–121.
PLoS Genetics | www.plosgenetics.org9 January 2009 | Volume 5 | Issue 1 | e1000327