Targeted parallel sequencing of large genetically-defined genomic regions for identifying mutations in Arabidopsis.
ABSTRACT Large-scale genetic screens in Arabidopsis are a powerful approach for molecular dissection of complex signaling networks. However, map-based cloning can be time-consuming or even hampered due to low chromosomal recombination. Current strategies using next generation sequencing for molecular identification of mutations require whole genome sequencing and advanced computational devises and skills, which are not readily accessible or affordable to every laboratory. We have developed a streamlined method using parallel massive sequencing for mutant identification in which only targeted regions are sequenced. This targeted parallel sequencing (TPSeq) method is more cost-effective, straightforward enough to be easily done without specialized bioinformatics expertise, and reliable for identifying multiple mutations simultaneously. Here, we demonstrate its use by identifying three novel nitrate-signaling mutants in Arabidopsis.
-
Citations (0)
-
Cited In (0)
Page 1
METHODOLOGYOpen Access
Targeted parallel sequencing of large genetically-
defined genomic regions for identifying
mutations in Arabidopsis
Kun-hsiang Liu1,2*, Matthew McCormack1,2*and Jen Sheen1,2
Abstract
Large-scale genetic screens in Arabidopsis are a powerful approach for molecular dissection of complex signaling
networks. However, map-based cloning can be time-consuming or even hampered due to low chromosomal
recombination. Current strategies using next generation sequencing for molecular identification of mutations
require whole genome sequencing and advanced computational devises and skills, which are not readily accessible
or affordable to every laboratory. We have developed a streamlined method using parallel massive sequencing for
mutant identification in which only targeted regions are sequenced. This targeted parallel sequencing (TPSeq)
method is more cost-effective, straightforward enough to be easily done without specialized bioinformatics
expertise, and reliable for identifying multiple mutations simultaneously. Here, we demonstrate its use by
identifying three novel nitrate-signaling mutants in Arabidopsis.
Keywords: Next generation sequencing, EMS, PCR-amplified genomic library, Nitrate signalling, Positional cloning
Background
Genetic screens are a powerful approach for studying
diverse processes by isolating mutants showing pheno-
types directly or indirectly involved in biological path-
ways. Identifying the molecular lesion underlying these
phenotypes is crucial towards understanding the
mechanism of the process it is involved in. In order to
reveal the molecular identity of the mutant, positional
cloning is commonly employed to identify the mutations
[1]. However, despite the availability of the Arabidopsis
genome sequence, positional cloning from diverse
mutant screens can be time-consuming or even ham-
pered due to low chromosomal recombination in mega-
base-sized regions surrounding the mutation [1-4].
Next-generation sequencing (NGS) technology for
whole-genome sequencing (WGS) provides an alterna-
tive method for molecular characterization of mutations
[5]. However, the copious numbers of mutations gener-
ated during the mutagenesis processes become a
hindrance due to the presence of hundreds or thousands
of mutations unrelated to the specific phenotype. This
introduces a high degree of complexity in subsequent
WGS data analysis aimed at identifying mutations
responsible for the phenotypes. Specialized computa-
tional methods, hardware, and expertise, not available in
most laboratories, are typically needed to accomplish
the analysis. Backcrosses mutants to wild type plants for
several generations can attenuate complexity by elimi-
nating unrelated mutations [6], but this is very time
consumingwhen using
approaches, SHOREmap and Next-Gen Mapping
(NGM), combine integrated mapping with NGS and
have led to identification of EMS (ethyl methanesulfo-
nate)-generated mutation sites in Arabidopsis [7-9].
However, these strategies require whole genome sequen-
cing, and so huge amounts of uninformative non-target
regions are sequenced which is very costly and can be
impractical for many laboratories involved in genetic
studies. For example, based on published reports, char-
acterizing one mutant in Arabidopsis usually takes one
flow cell (7-8 lanes) using paired-end reads of 38-40
cycles [7-9]. The possibility of using only one lane of a
flow cell and a few F2 lines to identify mutations in a
Arabidopsis.Improved
* Correspondence: Khliu@molbio.mgh.harvard.edu; McCormack@molbio.mgh.
harvard.edu
1Department of Molecular Biology and Center for Computational and
Integrative Biology, Massachusetts General Hospital, Boston, MA 02114, USA
Full list of author information is available at the end of the article
Liu et al. Plant Methods 2012, 8:12
http://www.plantmethods.com/content/8/1/12
PLANT METHODS
© 2012 Liu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Page 2
single mutant is described in one report [7]. However,
detection of the known mutations were found only in
some cases using one-lane sequencing due to variable
and low coverage of the genome [7].
It is both costly and time consuming to associate a
single mutant phenotype with its underlying molecular
mutation. The ability to simultaneously characterize
multiple mutants reduces both cost and labor, and
greatly accelerates the association of genes with path-
ways. Recognizing the benefits of characterizing a large
number of mutants at a molecular level in order to dis-
sect complex signaling networks, and also being aware
of current technical and financial limitations, we have
created a streamlined method, targeted parallel sequen-
cing (TPSeq), for efficient and simultaneous identifica-
tion of multiple causative mutations in Arabidopsis and
other genetic model organisms. The method requires
only simple and quick mutant mapping using polymer-
ase chain reaction (PCR) markers accessible to every
laboratory [1-4]. We have used this method to simulta-
neously identify three novel nitrate-signaling mutants
with altered nitrate marker gene responses and nitrate-
based growth phenotypes.
Results and discussion
Isolation of nitrate signaling mutants by a dual-screen
Nitrate is central to plant gene regulation and growth.
However, little is known about the molecular mechan-
isms of nitrate signaling and also the genetic basis of
diverse nitrate-associated traits in plant growth and
development. Currently, a few transcription factors, pro-
tein kinases, microRNAs and a transporter-sensor have
been reported to participate in regulating nitrate-respon-
sive gene expression and growth in a context dependent
manner [10-13]. Discovery of new signaling components
and the connection of existing regulatory nodes in the
nitrate-signaling network remain challenging.
Forward genetic screen is a very powerful approach as
an initial analysis aimed at identifying novel signaling
components. We designed a dual genetic screen strategy
to isolate mutants involved in nitrate signaling. We first
screened for mutants having a deregulated nitrate
responsive gene expression pattern. We selected nitrite
reductase (NIR) as our nitrate response marker gene
because NIR plays a critical role in the nitrate assimila-
tion pathway, it is encoded by a single gene, and NIR
expression can be rapidly and consistently induced by
nitrate [14]. In order to monitor nitrate responses, we
generated an Arabidopsis transgenic line harboring a
nitrate responsive luciferase (LUC) reporter driven by
the NIR gene promoter. In the first screen, two classes
of mutants were isolated by measuring LUC activities in
a 96-well plate assay. EMS-mutagenized seeds were
placed in a 96-well plate and LUC activities were
measured with a scintillation counter. The nitrate insen-
sitive (nis) mutant showed reduced LUC activity after
nitrate induction, whereas the nitrate constitutive
response (ncr) mutant exhibited higher LUC activity in
the absence of nitrate induction. Approximately 25,000
M2 seedlings were screened. A total of 273 nis mutants
and 65 ncr mutants were isolated during the first step of
the screen. Of these, 4 nis and 5 ncr mutants were
further confirmed in the second generation.
As alternations in a nitrate-responsive marker gene
may or may not be linked to complex nitrate-associated
growth phenotypes, we performed a secondary screen
with nis and ncr mutants based on well-known nitrate-
associated traits. We conducted three distinct assays,
including nitrate (5 mM) promotion of lateral root
growth, high nitrate (50 mM) inhibition of lateral root
emergence, and nitrate-associated greening and leaf
expansion. This second screen yielded three mutants,
nis1, nis2 and ncr1, with reproducibly altered NIR-LUC
expression patterns (Figure 1A) and nitrate-associated
traits in the next generation. We further confirmed by
reverse transcriptase-quantitative PCR (qRT-PCR) that
the endogenous NIR gene expression displayed similar
changes in nitrate responses as the NIR-LUC transgene
in nis1, nis2 and ncr1, respectively (Figure 1B). The nis1,
nis2 and ncr1 mutants represented new classes of nitrate
signaling mutants as they displayed nitrate-specific
response alternations in NIR promoter and transcript
regulation, which are not influenced by other nitrogen
sources, including ammonium or glutamine (Figure 1A
and 1B, and data not shown). Unexpectedly, these
mutants exhibit distinct nitrate-associated traits in sec-
ondary screens: nis1 is deficient in nitrate-promoted
root growth (Figure 1C), nis2 has small pale green leaves
(Figure 1D), whereas ncr1 lacks high-nitrate inhibition
of lateral root elongation (Figure 1E). The mutant phe-
notypes of nis1 and ncr1 are observed only on nitrate
medium, but the phenotype of nis2 persists in medium
with different nitrogen sources (Figure 1 and data not
shown).
Identification of mutation sites by TPSeq
Moving toward a molecular understanding of nitrate sig-
naling, it is necessary to reveal the molecular identity of
NIS and NCR genes. We have developed an efficient
and low-cost strategy, TPSeq, to simultaneously identify
multiple genetic mutations in Arabidopsis (Figure 2A).
Arabidopsis has long been used for genetic studies and
the entire genome was sequenced ten years ago. There
are many available molecular markers based on
sequence polymorphism among Arabidopsis accessions,
which allow for quick mapping to narrow down muta-
tions in relatively much smaller target regions [1-4].
Quick mapping was performed by taking advantage of
Liu et al. Plant Methods 2012, 8:12
http://www.plantmethods.com/content/8/1/12
Page 2 of 12
Page 3
simple PCR-based methods using simple sequence
length polymorphism (SSLP) or cleaved amplified poly-
morphic sequences (CAPs) markers [1]. After quick
mapping, NCR1 was located in the interval between
13.89 Mb and 14.43 Mb on Chromosome II by isolating
287 independent recombinants. NIS1 was mapped to the
upper arm of Chromosome III between 2.82 Mb and
3.23 Mb by isolating 493 independent recombinants,
and NIS2 was mapped to the upper arm of Chromo-
some V between 4.66 Mb and 5.39 Mb by isolating 180
independent recombinants (Figure 2B). All three
mutants were recessive. The phenotypes of the mutants
co-segregated with characteristic LUC activities (Figure
1A). Theoretically, an initial 20-30 recombinants for
establishing the physical map and a total of 50-100
recombinants should be sufficient to narrow down the
location of the mutation to a 1-4 Mb region [1,3,15].
We suggest that isolation of ~150 or fewer recombi-
nants may sufficient for TPSeq.
After the mutation sites had been narrowed down to
three non-overlapping regions of approximately 534 kb,
413 kb and 737 kb, we applied TPSeq (Figure 2A) to
reveal the molecular identity of three non-overlapping
mutations. The first critical step of TPSeq was to
Figure 1 Phenotypic analysis of nis1, nis2, ncr1-1. A. Comparison of NIR-LUC activity in nis1, nis2 and ncr1-1. LUC activities were measured
after 2 h incubation with either 10 mM KCl or KNO3. The NIR-LUC transgenic line in Col is used as the wild type control. Three seedlings were
pooled and grinded for protein concentration determination and LUC activity analysis. Values shown are means ± s.d. of three or four biological
replicates. B. Relative endogenous NIR expression in nis1, nis2 and ncr as measured by real-time PCR. Plants were treated with either 10 mM
KNO3or KCl for 2 h. Relative expression of NIR is normalized to the expression of TUB4. The relative expression level is calculated relative to the
value of wild type treated with KCl. Values shown are means ± s.d. of three biological replicates. C. Altered root architecture in nis1. Plants were
grown on medium containing 2.5 mM ammonium succinate for 3 days and transferred to medium containing 5 mM KNO3for 8 days. D. nis2
showing small pale-green leaves after plants grown in soil for 33 days. E. The lateral root de-suppression phenotype in ncr1-1. Seedlings were
grown on medium containing 50 mM KNO3as the sole nitrogen source for 14 days. Scale bar = 1 cm.
Liu et al. Plant Methods 2012, 8:12
http://www.plantmethods.com/content/8/1/12
Page 3 of 12
Page 4
generate high quality mutant libraries within the targeted
genome regions by PCR-amplified DNA fragments of
average ~7 kb (Additional file 1: Table S1). PCR-primers
were designed with an average 200-800 bp overlap with
the neighbouring PCR fragment. More than 75% primer
pairs worked successfully to cover the targeted regions
with the size range of 6-10 kb using routine long-range
PCR reactions. For regions that failed to amplify, shorter
PCR products (1-6 kb) were redesigned and generated.
We covered 99.7% of the sequence in these three mutant
regions using this protocol. A total of 75 (nis1), 113
(nis2), and 138 (ncr1) amplicons were generated to cover
the targeted regions. After performing PCR, we used
agarose gel electrophoresis to confirm and separate non-
specific PCR products. This step was important to lower
the DNA contamination in the library and to normalize
the coverage based on equal DNA molarity. Although
not expected for EMS mutagenesis, PCR analysis could
potentially reveal insertion, deletion or inversion in the
targeted genomic regions. For each mutant, normalized
PCR DNA fragments covering the targeted genomic
regions were pooled. In order to normalize DNA molari-
ties for each mutant, the pooled PCR mixture from each
of the three mutants were combined so that DNA frag-
ments for each mutant was present in equal molarities.
The combined DNA fragments were physically sheared
to 200 bp, and then ligated to adaptors for NGS in an
Illumina HiSeq 2000 genome analyzer.
In our experiment, we covered 99.7% of the genomic
sequence in the three targeted mutation regions with
8.5 Gb of sequences generated by NGS (Table 1). In
keeping with our intention to make this method accessi-
ble to biology laboratories without specialized infor-
matics support, we have composed a detailed
bioinformatics analysis workflow that can be performed
on the web-based resource Galaxy [16-18]. After
uploading a FASTQ file provided by a sequencing facil-
ity, all the bioinformatics steps from alignment to SNP
(single nucleotide polymorphism) detection can be per-
formed in Galaxy following a simple protocol. This cir-
cumvents the need for sophisticated computer hardware
and specialized bioinformatic expertise, and makes the
bioinformatics analysis of NGS and mutant identifica-
tion practical and accessible to individual laboratories.
After data analysis, a total of 99.7% of the genomic
sequence was covered to a depth of at least one read
(Table 2) with only a few small gaps representing AT-
rich sequences in the three targeted regions. Consider-
ing the coverage rate for the target regions and filtering
out the false-positive variants generated by PCR or
sequencing, a 20 read depth was set for subsequent ana-
lysis. Under this cutoff parameter, a total 98.9% of the
targeted genomic sequence was covered (Table 2). In
Galaxy, sequences were aligned to the Arabidopsis Col-0
genome TAIR10 using Bowtie [19] (Figure 2C). Variants
were determined in the web-based resource Galaxy
using Samtools pileup [20] and Filter pileup (Table 3).
After analyzing, 14 variants were identified and re-con-
firmed by Sanger sequencing (Table 4 and Figure 3A).
Among the remaining true variants, 2 of them are
Figure 2 Identifying mutations by TPSeq. A. Flowchart of the
TPSeq procedure. B. Physical map of mutations on Arabidopsis
chromosomes. Three mutants were mapped to different
chromosomes with the numbers of recombinants and nearest
markers. C. Coverage plot from TPSeq. Y-axis is the average read of
100 kb window. X-axis is the corresponding location on
chromosome shown in B
Table 1 Sequencing statistics
Library
Lane Yield (Mbases)
Read Length
Clusters (raw)
Clusters (PF)
% PF Clusters
Total Sequences
Sequences Align to Reference
8,485
45
4,593,946 ± 382,484
3,842,809 ± 305,442
83.67 ± 0.58
184,454,857
160,990,234 (87.28%)
PF: Pass Filter
Table 2 Coverage analysis
nis1nis2 ncr1
Total
1×
5×
10×
15×
20×
100×
99.36%
99.07%
99.06%
99.05%
99.05%
98.72%
99.85%
99.44%
98.92%
98.69%
98.41%
90.16%
99.65%
99.46%
99.44%
99.39%
99.32%
97.05%
99.67%
99.36%
99.12%
99%
98.86%
94.44%
Liu et al. Plant Methods 2012, 8:12
http://www.plantmethods.com/content/8/1/12
Page 4 of 12
Page 5
mutated in non-coding region (intron and 3’UTR), 4 of
them are within the intergene and 8 of them are exonic.
In the 8 exonic variants, 5 of them are missense and 2
of them are nonsense (Table 4). Theoretically, EMS
mutagenesis induces a G/C to A/T base transition. In
this study, we noticed that 3 confirmed mutations of the
total 14 mutations were non-EMS type mutations and
they all occurred in nis1. We do not know whether
these mutations were caused by EMS mutagenesis or
another mechanism, but these non-typical EMS-
generated mutations have also been observed in other
studies where EMS was used [7,9].
Validation of mutations
We further validated the causal mutations linked to the
specific mutant phenotype. Six mutations have been
identified in the nis1 library based on the Arabidopsis
Col-0 reference genome TAIR10 (Table 4). Among
these mutations, there is only one (G to A) nonsense
mutation (Table 3) and this occurs in the first exon of
RPL4A (ribosomal protein large subunit 4A, At3g09630)
[21] (Figure 3A). To confirm that the altered root archi-
tecture is indeed caused by this mutation, the construct
containing the genomic DNA fragment of RPL4A was
shown to complement the nis1 root phenotype (Figure
3B). Detailed characterization of the NIS1 functions in
nitrate signaling is beyond the scope of this method
paper and will be published separately.
In the nis2 library, six mutations have been uncovered.
One of the mutations (C to T) occurs in the coding
region of APG6/CLPB3 (albino or pale-green/casein
lytic proteinase B3, At5g15450), which converted a con-
served Arg residue to His residue (Table 4). We demon-
strated that a T-DNA insertion mutant allele, apg6-3,
displays the small pale-green leaf phenotype of nis2-1
[22] (Figure 3A and Figure 3C). Thus, NIS2 encodes
APG6 with an important role for nitrate-associated leaf
greening and expansion. It has been shown that null
apg6 mutants cannot survive on soil unless first
Table 3 Summary of mutations generated by Galaxy’s Filter pileup
1Chr.Position
base base
2Ref.
3Con.
4Con. Qual.SNP Qual.Max.
Mapping
Qual.
Coverage
5QA
coverage
Total
number of
deviants
6% deviant reads
II
II
14,208,479
14,427,587
G
G
A
A
225
225
225
225
60
60
3,801
7,847
3,407
4,268
3,379
4,211
99.2
98.7
III
III
III
III
III
III
2,849,685
2,954,586
3,007,742
3,113,098
3,114,003
3,147,629
A
G
C
G
G
G
C
A
G
A
T
A
225
225
225
225
225
225
225
225
225
225
225
225
60
60
60
60
60
60
341
7,820
1,438
4,170
937
2,539
319
2,236
1,177
3,790
568
1,558
316
2,208
1,175
3,776
567
1,537
99.1
98.7
99.8
99.6
99.8
98.7
V
V
V
V
V
V
4,851,838
4,979,060
4,984,678
5,016,518
5,020,510
5,355,232
C
C
C
C
C
C
T
T
T
T
T
T
225
202
152
225
225
225
225
202
152
225
225
225
60
60
60
60
60
60
750
172
134
563
2,961
7,841
669
66
18
509
1,333
6,875
668
65
16
504
1,319
6,821
99.9
98.5
88.9
99
98.9
99.2
1. Chromosome
2. Reference base
3. Consensus base
4. Consensus Quality
5. Quality adjusted coverage
6. The percentage of total number of deviants/quality adjusted coverage
Table 4 List of confirmed mutation site
MutantPosition Base change
A®C
G®A
C®G
G®A
G®T
G®A
C®T
C®T
C®T
C®T
C®T
C®T
G®A
G®A
AnnotationChr
2,849,685
2,954,586
3,007,742
3,113,098
3,114,003
3,147,629
intergenic
W®Stop
N®K
R®H
D®Y
L®F
Q®E
intergenic
intergenic
R®H
intron
3’UTR
R®Stop
intergenic
III
III
III
III
III
III
nis1
4,851,838
4,979,060
4,984,678
5,016,518
5,020,510
5,355,232
V
V
V
V
V
V
nis2
ncr1
14,208,479
14,427,587
II
II
Chr: Chromosome
Liu et al. Plant Methods 2012, 8:12
http://www.plantmethods.com/content/8/1/12
Page 5 of 12