ArticlePDF Available

The apricot (Prunus armeniaca L.) genome elucidates Rosaceae evolution and beta-carotenoid synthesis



Apricots, scientifically known as Prunus armeniaca L, are drupes that resemble and are closely related to peaches or plums. As one of the top consumed fruits, apricots are widely grown worldwide except in Antarctica. A high-quality reference genome for apricot is still unavailable, which has become a handicap that has dramatically limited the elucidation of the associations of phenotypes with the genetic background, evolutionary diversity, and population diversity in apricot. DNA from P. armeniaca was used to generate a standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on Sequel SMRT Cells, generating a total of 16.54 Gb of PacBio subreads (N50 = 13.55 kb). The high-quality P. armeniaca reference genome presented here was assembled using long-read single-molecule sequencing at approximately 70× coverage and 171× Illumina reads (40.46 Gb), combined with a genetic map for chromosome scaffolding. The assembled genome size was 221.9 Mb, with a contig NG50 size of 1.02 Mb. Scaffolds covering 92.88% of the assembled genome were anchored on eight chromosomes. Benchmarking Universal Single-Copy Orthologs analysis showed 98.0% complete genes. We predicted 30,436 protein-coding genes, and 38.28% of the genome was predicted to be repetitive. We found 981 contracted gene families, 1324 expanded gene families and 2300 apricot-specific genes. The differentially expressed gene (DEG) analysis indicated that a change in the expression of the 9-cis-epoxycarotenoid dioxygenase (NCED) gene but not lycopene beta-cyclase (LcyB) gene results in a low β-carotenoid content in the white cultivar "Dabaixing". This complete and highly contiguous P. armeniaca reference genome will be of help for future studies of resistance to plum pox virus (PPV) and the identification and characterization of important agronomic genes and breeding strategies in apricot.
Jiang et al. Horticulture Research (2019) 6:128 Horticulture Research
ARTICLE Open Access
The apricot (Prunus armeniaca L.) genome
elucidates Rosaceae evolution and beta-carotenoid
Fengchao Jiang
, Yingfeng Luo
, Shenghan Gao
, Meiling Zhang
Shuangyang Wu
, Songnian Hu
, Haoyuan Sun
and Yuzhu Wang
Apricots, scientically known as Prunus armeniaca L, are drupes that resemble and are closely related to peaches or
plums. As one of the top consumed fruits, apricots are widely grown worldwide except in Antarctica. A high-quality
reference genome for apricot is still unavailable, which has become a handicap that has dramatically limited the
elucidation of the associations of phenotypes with the genetic background, evolutionary diversity, and population
diversity in apricot. DNA from P. armeniaca was used to generate a standard, size-selected library with an average DNA
fragment size of ~20 kb. The library was run on Sequel SMRT Cells, generating a total of 16.54 Gb of PacBio subreads
(N50 =13.55 kb). The high-quality P. armeniaca reference genome presented here was assembled using long-read
single-molecule sequencing at approximately 70× coverage and 171× Illumina reads (40.46 Gb), combined with a
genetic map for chromosome scaffolding. The assembled genome size was 221.9 Mb, with a contig NG50 size of
1.02 Mb. Scaffolds covering 92.88% of the assembled genome were anchored on eight chromosomes. Benchmarking
Universal Single-Copy Orthologs analysis showed 98.0% complete genes. We predicted 30,436 protein-coding genes,
and 38.28% of the genome was predicted to be repetitive. We found 981 contracted gene families, 1324 expanded
gene families and 2300 apricot-specic genes. The differentially expressed gene (DEG) analysis indicated that a change
in the expression of the 9-cis-epoxycarotenoid dioxygenase (NCED) gene but not lycopene beta-cyclase (LcyB) gene
results in a low β-carotenoid content in the white cultivar Dabaixing. This complete and highly contiguous
P. armeniaca reference genome will be of help for future studies of resistance to plum pox virus (PPV) and the
identication and characterization of important agronomic genes and breeding strategies in apricot.
The Rosaceae family provides most of the worlds well-
known temperate fruit crops classied as pome and stone
fruits according to their fruit morphology. With the global
tendency of consumerspurchasing preferences to shift
from large-scale commodity fruit crops (e.g., apples,
citrus, and pears) to smaller unique fruit crops with
increased nutritional value and a pleasing avor, the rapid
development of stone fruit crops (e.g., apricots, cherries,
peaches and plums) has come to the forefront
Apricot (Prunus armeniaca L.), which is now generally
accepted as a fruit of Chinese origin with a growing his-
tory of more than 3000 years in China
, has been widely
grown throughout the world except for Antarctica due to
its early harvesting season, unique aroma, delicious taste,
high nutritional value and multiple uses. The rich diver-
sity of apricot germplasms indicates that apricots can be
grown even more widely and provide a higher proportion
of the worlds fruit production. Since the early 2000s, the
© The Author(s) 2019
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, whi ch permits use, sharing, adaptation, distribution and reproduction
in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a linktotheCreativeCommons license, and indicate if
changes were made. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the articles Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this license, visit
Correspondence: Haoyuan Sun ( or Yuzhu Wang
Beijing Academy of Forestry and Pomology Sciences, 100093 Beijing, PR China
Apricot Engineering and Technology Research Center, National Forestry and
Grassland Administration, 100093 Beijing, PR China
Full list of author information is available at the end of the article.
These authors contributed equally: Fengchao Jiang, Junhuan Zhang,
Sen Wang
fruit production and harvested orchard area of apricot
have both increased on a worldwide basis, with a 7.9
million Mt fruit tonnage and 536,072 ha orchard area
being recorded in 2017, respectively. Compared with
2002, the world production and harvested area in 2017
were increased by 196.5% and 32.2%, respectively. The
production of apricot and other Prunus species in Europe
is currently subjected to severe damaged from sharka
disease caused by the plum pox virus (PPV)
. PPV is the
most important viral disease affecting a number of Prunus
species, including apricot
. Therefore, it is necessary to
construct an apricot genome to localize PPV resistance-
related genes to guide the PPV resistance breeding of
Apricot fruits are enriched in β-carotene, which repre-
sents 6070% of the total carotenoid content
and gives
the fruit its characteristic orange color
.β-carotene is the
main precursor of vitamin A, one of the most important
functional ingredients in apricots. Vitamin A is an
essential nutrient for humans because it cannot be syn-
thesized within the body. Thus, as a good source of
β-carotene, apricots are highly benecial for human
. In addition to its nutritional characteristics,
apricot fruit also presents some pharmacological sig-
nicance due to its high antioxidant content. Apricot and/
or β-carotene treatment is believed to be effective for
preventing the impairments caused by oxidative stress,
methotrexate-induced intestinal damage and nephro-
. Therefore, the generation of new apricot
genotypes with higher levels of β-carotene in the fruit is a
promising breeding objective. Understanding the regula-
tion of β-carotenoid metabolism and biosynthesis will
allow breeders to develop more effective methodologies
for increasing β-carotenoid content and consequently
achieving the breeding goal.
Over the last 20 years, genomics research in Rosaceae,
and especially in Prunus, has made great advances
. Due
to the small size and simplicity of stone fruit tree gen-
omes, they are considered ideal candidates for the pro-
motion of association genetics approaches based on
whole-genome sequence genotyping and genome-wide
selection. Now that peach
, mume
, cherry
genome sequences are available, genome-level
comparative analysis between multigenome sequences
within a family has become possible, which provide
valuable evolutionary insights and allow the transfer of
knowledge between species. In this study, we aimed to
construct a complementary apricot genome cv.
Chuanzhihongusing third-generation PacBio technol-
ogy combined with second-generation Illumina data to
understand Rosaceae evolution, particularly the evolution
of genes contributing to combating PPV in Prunus and to
beta-carotenoid synthesis. A high-quality apricot refer-
ence genome sequence will afford great opportunities for
further research on germplasm diversity, evolution and
Plant materials
The P. armeniaca L plant (Chuanzhihong) is native to
Hebei Province and has a cultivation history of more than
300 years. It is known as Chuanzhihongbecause of its
red color and fruitfulness. Chuanzhihongfruit with
good comprehensive cultivation characteristics of red
color, high yield, disease resistance, and late maturation is
a major variety in the northern region. The fruit of
Chuanzhihongpresents good commodity value because
it can be stored for more than 10 days under normal
conditions and is suitable for long-distance transporta-
tion. Leaves of Chuanzhihongat an early-to-mid
developmental stage were collected for genome sequen-
cing in the Yanqing District (40°35N and 116°11E), in
the north of Beijing, China, and were immediately frozen
in liquid nitrogen and stored at 80 °C until DNA was
Two Chinese apricot varieties, Chuanzhihong(yellow-
eshed fruit) and Dabaixing(white-eshed fruit), from
the garden of the Beijing Academy of Forestry and
Pomology Sciences, Beijing, China, were selected for RNA
sequencing to analyze the biosynthesis of β-carotene.
Samples were collected at four developmental stages:
green fruit with a soft kernel (G1), green fruit with a hard
kernel (G2), color-returning fruit (CT) and fully ripe fruit
(FR). Three biological replicates were performed, and ten
almost identical fruits were sampled at every stage for
each replicate. After the peel was removed, the pulp was
quickly cut into small pieces and frozen with liquid
nitrogen, then stored at 80 °C until RNA was extracted.
The stamen and seed tissues of Chuanzhihongfrom the
garden were used for RNA sequencing. All RNA data
were used for transcriptome-based gene prediction.
DNA and RNA sequencing
DNA sequencing
For PacBio sequencing, a DNA Template Prep Kit 1.0
was used to generate single-molecule real-time (SMRT)
bell genomic libraries. DNA fragments of ~20 kb were
obtained using a Covaris g-Tube, and the distribution of
the fragments was assessed using a Bioanalyzer 2100 12 K
DNA Chip assay. The quality and quantity of the SMRT
Bell template (>10 kb) were checked by using an Agilent
Bioanalyzer and a Qubit uorometer, respectively.
According to the manufacturers instructions, the PacBio
Binding Kit 2.0 was used to generate a ready-to-sequence
SMRTbell-Polymerase Complex. The genomic DNA was
sequenced using SMRT Cells v3.0 with a yield of 16.54 Gb
of subreads (Table S1). For short-read sequencing, 150,
180 and 500 bp insert libraries were constructed
Jiang et al. Horticulture Research (2019) 6:128 Page 2 of 12
according to the manufacturers instructions, and
40.46 Gb of Illumina sequences was generated using the
Illumina HiSeq 2000 platform (Table S2).
RNA sequencing
RNA from G1, G2, CT and FR fruits (two cultivars
Chuanzhihongand Dabaixing) was extracted using the
RNAprep Pure Plant Kit (Polysaccharides & Poly-
phenolics-rich). The samples were processed using an
RNA library preparation kit and then sequenced on the
Illumina HiSeq 2000 platform. The RNA from the stamen
and seed tissues of Chuanzhihongwas extracted using
the NEBNext Poly (A) mRNA Magnetic Isolation Module,
and libraries were prepared using the NEBNext mRNA
Library Prep Master Mix Set. Then, paired-end sequen-
cing with a read length of 100 bp was conducted on the
HiSeq 2500 platform.
Genome size estimation and heterozygosity
The genome size of P. armeniaca was estimated using
. The quality of the Illumina reads was
estimated using the FastQC program
. Adapter sequen-
ces, PCR duplicates, contaminants, and low-quality
sequences (Phred score < 30) were removed using
. The analysis of optimal kmer size was performed
by using KmerGenie
with chloroplast and mitochondrial
sequences being removed from the high-quality clean
reads. Then, the best k-mer was used for kmer count
analysis with Jellysh
. After converting the k-mer
counts into a histogram format, the k-mer distribution
was analyzed to estimate genome size and heterozygosity.
De novo genome assembly
We rst estimated the error rate of the long reads
obtained from the PacBio platform using Illumina paired-
end reads. We applied the Canu
pipeline to assemble
the long reads and super-reads obtained from
into contigs with the following ow para-
meters: genomeSize =300 m, corOutCoverage =100,
minReadLength =1000, minOverlapLength =1000,
ErrorRate =0.064 and batOptions. One copy of the con-
tigs from heterozygous regions was retained by using
. We then further mapped the Illumina
paired-end reads to the ltered contigs using bwa-mem
and polished the contigs with Pilon
. We constructed the
linkage map (
and organized the contigs into pseudochromosomes with
JCVI allmaps
and SLAF markers.
Transcriptome assembly
The quality control (base correction, adapter trimming
and read ltering) of the RNA-seq data from leaf, fruit,
stamen and seed tissues (28 libraries in total) was per-
formed by using the software fastp with the default
parameters, after which two approaches were used to
reconstruct transcripts: de novo assembly and reference-
guided assembly. In the de novo assembly, Trinity
used to reconstruct transcripts from the RNA-seq reads.
In the genome-referenced assembly, high-quality RNA-
seq reads were mapped to the genomes using Hisat2
and transcripts were built by using Stringtie
with the
default settings. CD-HIT
was used to cluster highly
similar transcripts for de novo assembly with the default
Evaluation of the assembled genome
We rst mapped the Illumina paired-end reads to the
genome using bwa-mem with the default parameters.
Second, RNA-seq data from leaf, fruit, stamen and seed
tissues were aligned using Hisat2 with the default settings.
Finally, we used BUSCO
to examine the single-copy
orthologs (1375, Species: Arabidopsis) with OrthoDB
Repeat element identication
The library of species-specic repeats was constructed
using RepeatModeler ( with
the default parameters, and RepeatMasker
was used to
identify repeat elements with the specic library and the
default library from the RepeatMasker database (http:// Long terminal repeat (LTR)
retrotransposons were detected with LTR-Finder
. The phylogenetic tree was constructed to
estimate the insertion times of LTR retrotransposons
using the Inpactor pipeline.
Noncoding RNA prediction
The tRNA genes were detected with tRNAscan-SE
using general eukaryote parameters. Ribosomal RNA
(rRNA) genes were identied with the program
. For miRNA prediction, we aligned the
mature miRNA sequences in miRBase (
against the P. armeniaca genome with an e-value < 1e-5
and identity > 95%. The candidate sequences were
extracted from the aligned regions with an extension of 90
nucleotides anking each side and were used to predict
RNA secondary structures by using RNAfold
with the
default parameters. According to the RNAfold analysis,
the candidate miRNAs were selected using the following
criteria: (1) candidate sequences were located in one of
the hairpin precursor arms, (2) the minimum free energy
for the hairpin structures was 20 kcal/mol, and (3) the
hairpins were located in intergenic regions or introns.
snRNAs and other ncRNAs were predicted using Infer-
with the Rfam database (
Gene prediction
Transcriptome-based, homology-based and ab initio
prediction methods were applied to predict gene models.
Jiang et al. Horticulture Research (2019) 6:128 Page 3 of 12
For transcriptome-based prediction, the nonredundant
and full-length transcripts from the de novo assembly
were aligned to the genome to resolve gene structures
using PASA
. The transcripts from the genome-
referenced assembly were applied to obtain reliable
transcripts with the longest open reading frames using
TransDecoder. For homology-based prediction, the pro-
tein sequences of Amygdaloideae genomes were aligned
to the genome by SPLAN
with the default settings. The
alignments were extended by 1 kb on each side of the hits
to identify start and (or) stop codons. For ab initio gene
prediction, the training sets (the transcripts obtained from
transcriptome-based prediction with complete 5UTRs
and 3UTR) were used to generate a hidden Markov
model (HMM) for ab initio gene prediction. Augustus
and SNAP
were employed to predict gene models in the
repeat-masked genome. We used the gene models
obtained from the three approaches to generate con-
sensus gene models with EVidenceModeler
polished the gene models using full-length transcripts
according to the following steps: (1) We rectied the
potential mistakes caused by frame-shift problems. (2)
Priority was given in the following order: SPALN (Con-
served), PASA (Expressed), and EVM (Predicted). (3) If
genes overlapped on the same strand, the longer one was
retained. (4) Gene overlaps with different strand orienta-
tions were allowed. (5) Gene nesting in another genes
intron was allowed. Finally, we identied the UTRs and
alternative splicing of the models with PASA.
Gene functions
Gene functions were assigned by searching the pre-
dicted proteins against public databases by using
with e-value < 1e-5, including the UniProt and
the KEGG (Kyoto Encyclopedia of Genes and Genomes)
databases. We aligned the proteins against the InterPro
database using InterProScan
to identify protein domains
and transmembrane helices and to assign gene ontology
(GO) terms. Transcription factor (TF) identication was
carried out using an online web resource (http://planttfdb.
Gene families and synteny
We collected the protein sequences from Prunus
armeniaca L. and 13 other species (Prunus persica (L)
Batsch, Prunus avium (L) L., Prunus mume (mei), Prunus
dulcis Miller., Malus domestica Borkh., Fragaria vesca L.,
Rosa chinensis Jacq., Vitis vinifera L., Pyrus bretschneideri
Rehder., Populus trichocarpa Torr., Oryza sativa L.,
Arabidopsis thaliana (L.) Heynh. and Amborella tricho-
poda Baill.) to analyze gene families. An all-to-all BLASTP
analysis of proteins with a length 30 aa was performed
with an e-value < 1e-5. According to the BLASTP results,
paralogous and orthologous genes were identied by
using the software OrthoFinder
with an ination of 1.5.
The all-to-all BLASTP results between P. armeniaca and
P. persica,P. armeniaca and P. dulcis, and P. armeniaca
and P. avium were extracted, and the orthologous gene
blocks on the chromosomes or pseudomolecules were
identied using the software MCscanX with the default
Phylogenetic reconstructions and divergence time
Phylogenetic construction was performed based on 269
single-copy genes extracted from the gene family analysis.
We utilized MAFFT
to construct protein alignments for
each single-copy gene family and removed gaps from the
alignments using the program trimAL
. The protein
alignments with a length 30 aa were concatenated for
subsequent analyses. The best substitution model for the
alignment was estimated by using ModelFinder
with the
default settings. The maximum likelihood tree was con-
structed using IQ-TREE
with a Best-t model of JTT +
F+R3 and 1000 bootstrap replicates.
The divergence time of each node in the phylogenetic
tree was estimated based on the JC69 model in the
MCMCTree program from the PAML package
. The use
data parameter was set to 1 for the calculation of the
likelihood function in a normal way. For the clock para-
meter, the correlated rates were used following a log-
normal distribution. In total, the MCMCTree was run for
1,000,000 generations, with a burn-in 10,000 iterations to
a stable state. Three reported divergence times were used
as a calibration. The divergence times between the
Amygdaleae and Maleae
(48.4 Ma), P. trichocarpa and
A. thalian
(100120 Ma) and monocots and eudicots
(~240 Ma) were used as calibrators.
Gene family expansion and contraction analysis
The gene family count prole used as the input le for
was obtained with the program OrthoFinder. The
phylogenetic tree generated by IQ-TREE was converted to
an ultrametric tree using r8s
. The λvalue was estimated
with the CAFE program to identify the expansion or
contraction of gene families based on a stochastic birth
and death process model. We only considered gene
families that were signicantly expanded or contracted
with pvalues smaller than 0.01. We considered both
expansion and contraction compared to the RCAs (recent
common ancestors) of species. We considered a gene
family to be unchanged if the species and its RCA
exhibited the same gene copy.
Synonymous substitutions per synonymous site (K
Orthologous genes and paralogous genes among P.
armeniaca,P. persica,P. avium,P. mume, M. domestica
Jiang et al. Horticulture Research (2019) 6:128 Page 4 of 12
and P. bretschneideri were extracted from the gene
families. Protein alignments were constructed by using
MAFFT, and the corresponding CDS (nucleotide
sequences) alignments were converted. Then, the K
for each pair of orthologous genes and paralogous genes
was calculated using codeml (CodonFreq =2, runmodel
=2) in the PAML package.
RNA-seq and WGCNA analysis
In genome-referenced mapping, the high-quality reads
from RNA-seq data (Chuanzhihongand Dabaixing,
G1, G2, CT and FR) were aligned to the genome by using
Hisat2 and transcripts of each sample were built by using
Stringtie with the default parameters. The fragments per
kilobase of transcript per million fragments mapped
(FPKM) for each gene were calculated using Stringtie with
the -G parameter employing the gff3 genome as the
reference. The differentially expressed genes (DEGs) were
analyzed using the edgeR R package (FDR < 0.05, logFC
. The expression levels of the genes involved in car-
otenoid metabolism and genes encoding transcription
factors were used to construct the correlation network by
using the WGCNA R package
Genome assembly
The genome of apricot (P. armeniaca L) (2n=16) is
small but highly heterozygous. The genome size and
fraction of heterozygosity in P. armeniaca were estimated
to be 220.36220.56 Mb and 0.9000.902%, respectively,
according to evaluation with GenomeScope (best k-mer =
61, obtained with KmerGenie). After the purging of
haplotigs, we obtained a haplotype assembly with 444
contigs, and its size was 221.9 Mb, with a contig
NG50 size of 1.02 Mb (Table 1). A total of 92.88% of the
assembly was anchored to eight linkage groups using
linkage maps, and the pseudomolecules ranged in size
from 18.6 to 43.0 Mb (Fig. 1, Table S3). The Illumina
reads were remapped to the genome, and single
nucleotide polymorphisms were called to estimate the
level of heterozygosity (0.96%). It can be seen from Fig. 1
that the gene density and GC content were uniformly
distributed on eight chromosomes, but the repetition
density was not uniformly distributed either in the whole
genome or on each chromosome.
Assembly validation
To assess the quality of the apricot genome, we aligned
the Illumina clean data to the apricot genome and
obtained a mapping ratio of 99.36%. We also quantied
the coverage by the PacBio data, which was 99.87%. In
addition, we aligned resequenced Prunus sibirica Illumina
paired-end reads (SRR5046735) to the assembled genome
and found that 98.69% of the reads could be mapped
The alignment rates of the RNA-seq reads from three
different tissues (ower, fruit and seed) were approxi-
mately 91.45 and 96.45% (Table S4). The BUSCO analysis
showed that 98.0% of the complete genes could be
detected in the assembly (Table S5).
Repeat annotation
Among the predicted repeats in the apricot genome,
long terminal repeats (LTR) comprised the largest pro-
portion (13.43%) (Table S6). Unclassied elements ranked
second, accounting for 12.17% of the genome. The DNA
class repeat elements comprised 9.50% of the genome.
Altogether, 38.28% of the genome was predicted to be
repetitive. The phylogenetic tree of LTR retrotransposons
showed that the repeat elements were clustered into four
groups: Gypsy,Copia, retrovirus and others (Fig. S3). The
mean divergence times of Gypsy and Copia were 0.97 and
0.88 Mya (million years ago), and both groups exhibited
recent active transposition events (Fig. S4). Altogether,
38.28% of the genome was predicted to be repetitive,
which is comparable with the repeat content observed in
mume (45.0%) and sweet cherry (43.8%) but higher than
that observed in peach (29.6%).
Gene prediction and functional annotation
A total of 30,436 protein-coding genes were predicted,
with an average transcript length of 1641 bp, by using a
combination of homology-based, ab initio and
transcriptome-based prediction methods (Table S7). The
average gene density of apricot was 137 genes per Mb,
which is higher than in peach (122 genes per Mb), mume
(132 genes per Mb), sweet cherry (87 genes per Mb) and
almond (112 genes per Mb). We identied 905 ribosomal
RNAs (5S, 5.8S, 18S and 28S), 488 transfer RNAs,
353 small nuclear RNAs and 278 microRNAs (Table S8).
The proportions of all gene models annotated to the Nr,
, KEGG, GO, UniProt and Trans Membrane pre-
diction (TMHMM) databases were 99.17%, 86.07%,
43.40%, 54.59%, 71.53% and 23.51%, respectively (Table
Table 1 Genome features of P. armeniaca.
Assembly Pseudomolecules
Size (bp) 221,901,797 206,096,285
Number 444 8
NG50 (bp) 1,020,063 25,125,992
N50 (bp) 1,018,044 25,125,992
GC content (%) 37.6% 37.42%
Maximum size (bp) 5,999,228 42,984,470
Minimum size (bp) 1159 18,857,615
Mean size (bp) 499,724 25,762,035
Jiang et al. Horticulture Research (2019) 6:128 Page 5 of 12
S9). We also detected 1363 transcription factors in the
apricot genome (Table S10).
Genome evolution
The phylogenetic tree of apricot and related species was
constructed using the maximum likelihood method, and
the divergence time among branches of the tree was
estimated (Fig. 2, Figs. S5, S6). The phylogenetic tree
indicated that apricot was more closely related to
P. mume (Japanese apricot) and that the ancestor of the
two species split ~5.53 million years ago. The estimated
divergence time of the ancestor of sweet cherry was
relatively distant in the four Prunus species, at 10.92
million years (Fig. 2, Fig. S6).
A collinear analysis of the three closely related Prunus
species (P. armeniaca,P. persica and P. dulcis) was per-
formed, and the results showed that the three species
exhibited high collinearity. A total of 16,780 and 13,094
Fig. 1 Genetic structure and variant density of apricot. (1) Pseudomolecules; (2) gene density; (3) GC content (per 100 Kb); (4) repeat density (per
100 Kb); (5) heatmap of the G1, G2, CT and FR stages.
Jiang et al. Horticulture Research (2019) 6:128 Page 6 of 12
apricot genes were located in collinear blocks between
apricot and peach and between apricot and almond,
respectively (Fig. S7). Functional annotation showed that
these collinear genes were mainly involved in the basic
needs of organisms, including energy and other types of
metabolism (Fig. S8).
Gene family analysis showed that during the evolution of
apricot, 1324 gene families expanded, while 981 families
contracted and produced 2300 apricot-specic genes (Fig.
2, Table S11). The genes from the expanded families were
mainly enriched in phenylpropanoid biosynthesis
(p=0.0018) and avonoid biosynthesis (p=0.0019) (Fig.
S8, Table S12). In addition, the citrate synthase family was
expanded, with three copies in the apricot genome (the
additional copy came from the recent species-specic
tandem duplication event) and two in other species in the
Prunus genus. Gene expression analysis indicated that the
three citrate synthase-encoding genes were highly expres-
sed during fruit development. The functions of the apricot-
specic genes were enriched in transport (sulfate transport
(p=6.0404e-9), anion transport (p=6.0985e-6), trans-
membrane transport (p=5.1224e-5), and oxidation
reduction (p=1.7853e-5)). A total of 361 apricot-specic
genes were transcriptionally active during at least one of
the four fruit developmental stages, indicating that these
genes may play an important role in the growth and
development of apricot (Fig. S8, Table S13).
Although apricot has not experienced whole-genome
duplication events, as observed in apple and pear (Fig.
S10), there were many large segmental duplication regions
in the apricot genome. We identied 290 gene blocks in
the apricot genome, involving a total of 2794 genes. These
genes were mainly enriched in pathways of plant
pathogen interactions (p=0.00029) and phenylpropa-
noid biosynthesis (p=0.0011) (Fig. S8, Table S14).
MATHd evolution of PPV in Prunus
The comparative analysis of MATHd-orthologous
regions within related Prunus species (P. armeniaca,
P. avium, P. mume and P. persica) was performed to detail
the evolutionary history of these important gene clusters
(Fig. 3a). These species exhibited different copy numbers
within these regions; P. armeniaca exhibited 6, P. mume
7, P. persica 7 and P. avium 12. Phylogenetic tree analysis
suggested that species-specic tandem duplication and
perhaps gene loss events contributed to the architectural
composition of these orthologous syntenic regions (Fig.
Carotenoid metabolism
Carotenoids play an important role in plant photo-
synthesis and lipid peroxidation and impact the color
traits of plant fruits. We analyzed the dynamic changes in
gene expression levels in four stages of apricot pulp (G1,
Fig. 2 Phylogenetic tree and gene family changes of apricot and related species.
Jiang et al. Horticulture Research (2019) 6:128 Page 7 of 12
G2, CT and FR) (Fig. 4). Differentially expressed gene
analysis showed that 2532, 4708, and 2033 genes differed
between G1 and G2, G2 and CT, and CT and FR,
respectively, and 1, 9, and 3 genes were involved in car-
otenoid metabolism (Fig. 4, Fig. S11, Table S15A, B, C).
During the G2 to CT phase of fruit ripening, the
expression levels of the genes changed more signicantly,
especially those of genes related to beta-carotene synth-
esis. The gene encoding the enzyme LcyB (lycopene beta-
cyclase), which is the key enzyme in the synthesis of
carotene, was signicantly upregulated, indicating that
beta-carotene is rapidly synthesized during the CT phase.
The genes encoding PSY (15-cis-phytoene synthase) and
ZDS (zeta-carotene desaturase), which play an important
role in the synthesis of the precursors of beta-carotene,
were also upregulated between G2 and CT. Although the
carotene content increased, there was no signicant
change in beta-carotene synthesis-related gene expression
between CT and FR (Fig. 5), which revealed that high
expression of the LcyB gene results in the accumulation of
We further explored the transcription factors involved
in the regulation of beta-carotene synthesis through
coexpression network analysis as described in WGCNA
(Figs. S12, 13). We identied 95 transcription factors
involved in the regulation of carotenoid metabolic path-
ways, 12 of which were related to the LcyB gene (Table
S16). Among the 11 transcription factors, 9 were involved
in positive regulation, and 3 were involved in negative
regulation (Table S17).
The most important difference in the color of the white
and yellow cultivars was the beta-carotene content, but
DEG analysis showed that no obvious changes in the
genes related to beta-carotene synthesis (Fig. S14). We
found that the expression of the gene encoding the
enzyme NCED (9-cis-epoxycarotenoid dioxygenase),
which catalyzes the synthesis of 9-cis-neoxanthin-syn-
thesized xanthoxin, was very active, and its expression
P. armeniaca LG2 (8.28-8.32Mb)
P. mume LG2 (33.32-33.41Mb)P. persica Chr01 (8.66-8.60Mb)
P. avium NW_018921264 (0.58-0.69Mb)
Fig. 3 MATHd proteins within apricot, P. mume,P. persica and P. avium.aOrthologous regions; bthe phylogenetic tree of MATH proteins from
apricot (blue), P. mume (green), P. persica (pink) and P. avium (dark blue).
Jiang et al. Horticulture Research (2019) 6:128 Page 8 of 12
level was signicantly different between the two cultivars
Chuanzhihongand Dabaixing(Fig. S14). In the white
cultivar, the synthesis and decomposition of carotene
were balanced, and the newly synthesized carotenoids
were converted into xanthoxin, the precursor of abcisate,
through enzyme catalysis, especially by NCED, halting the
accumulation of carotene.
Fruit trees usually exhibit high heterogeneity, which
makes it more difcult to obtain high-quality complete
genomes, and this effect is particularly evident in the
released genomes of stone fruit trees. In the assembly of
the apricot genome, to overcome the problem caused by
heterozygosity, we rst assembled the Illumina data into
super-reads by using MaSuRCA, and super-reads and
corrected PacBio subreads were then assembled into
contigs constituting the diploid genome. After purging
haplotigs, the apricot genome N50 size was 1,018,044 bp,
and the contig N50 sizes of peach, cherry, mume and
almond were 294, 276, 31,772 and 77,040 bp, respectively,
which suggested that the P. armeniaca reference genome
was the most contiguous among the sequenced stone fruit
. The BUSCO analysis showed that 98.0%
complete genes were detected in the assembly, which
indicated that the genome quality of apricot was better
than those of the other published stone fruit genomes
In brief, these results indicate that the genome of apricot
is relatively accurate and complete among the available
genomes of stone fruit trees.
Plum pox virus, also known as sharka, is a linear single-
stranded RNA virus that affects Prunus species. The
selection of PPV-resistant genetic resources in Prunus
germplasms is an important approach for resistance
breeding that is currently effective against PPV; with the
breakthroughs regarding anti-PPV genes, biotechnologi-
cal strategies have been applied or may be exploited to
Abscisic aldehyde
Abscisic acid
lcyE lcyB
Fig. 4 Carotenoid metabolism pathway of apricot. The expression
levels of genes are shown in a bar chart with different colors
corresponding to different stages. A black asterisk indicates a
signicant change (p0.05) between two adjacent stages.
Fig. 5 Changes in the color and beta-carotenoid content of
apricot fruits in different developmental stages.
Jiang et al. Horticulture Research (2019) 6:128 Page 9 of 12
confer PPV resistance
. Recent studies have shown
that apricot resistance to PPV is associated with the
pseudogenization/downregulation of two tandemly
duplicated MATHd genes
. By comparing the changes in
MATHd orthologues in Prunus, we found that the asso-
ciated regions were vertically inherited from the ancestor
of Prunus species and that at least two tandemly arrayed
copies have been retained in each species; the loss of the
MATHd genes may result in susceptibility to PPV. Con-
sidering these results together, it will be interesting to test
the roles of these genes in PPV infection through mole-
cular studies, genetic association studies and molecular
breeding within other Prunus species.
The balance of the biosynthesis and decomposition of
β-carotene may contribute to the color of apricot fruit.
First, β-carotene, as one of the metabolites important for
fruit quality, contributes to the yellow color of apricot
. Among fruit pigments, anthocyanins are the
main typical pigments contributing to the red, pink or
violet coloration of some fruit, such as apple, tomato,
purple sweet potato, and strawberry
. However, in
apricot fruit, β-carotene is detectable and shows sig-
nicantly higher levels in the yellow-esh of Chuanzhi-
hongthan in the white-esh of Dabaixing. These
results showed that β-carotene is the main pigment of
apricot with yellow esh (Fig. 5), which is supported by
studies from Curl
and Roussos et al.
There are also
many other fruit species with high levels of β-carotene,
such as citrus, carrot, mango, papaya, and tomato
Moreover, the expression patterns of β-carotene
synthesis genes in plants are tissue- and stage-dependent.
Psy, pds, zds, lcy-e, crt-b, zep and necd3 are all expressed
in coffee leaves, owers and shoots, but the transcript
levels differ among the three tissues
. For tomato, citrus,
watermelon and other fruit-type crops, the genes related
to β-carotene biosynthesis appear to show the highest
transcript levels, and rapid β-carotene accumulation is
mainly observed in the nearly mature stage
. Lyco-
pene and β-carotene rapidly accumulate in the esh of
Cara citrus fruit during the two stages of fruit enlarge-
ment and fruit ripening. The present study indicated that
the G2 to CT stages may be the corresponding key period
in apricot. As described in the literature, the color change
from green to red or yellow is very important in fruit
development as anthocyanins or carotenoids accumulate,
in addition to sugar, ABA and ETH, to promote fruit
. A similar result was observed in this study
(Fig. 5). During the G2 to CT phase of fruit ripening, the
expression levels of genes changed more signicantly,
especially those of genes related to β-carotene synthesis.
Multiple metabolic pathways are involved in this stage to
affect fruit development and ripening, including
β-carotene biosynthesis. The β-carotene content of
Chuanzhihongrapidly increased to a high level
beginning in the CT stage and reached its highest level at
the FR stage, which indicates maturity of the fruit.
Furthermore, all the DEGs related to the β-carotene
biosynthesis pathway were analyzed. In apricot, lcy-b may
make an important contribution to yellow esh develop-
ment in the ripening fruit of Chaunzhihong. As shown
in Fig. S10, lcy-b was signicantly upregulated, indicating
that β-carotene was undergoing rapid synthesis during the
CT phase, which is similar to the change trend of esh
color. In contrast, among the other three important genes
(psy, pds, zds) for β-carotene biosynthesis, the gene
expression levels were not constant with the accumula-
tion of β-carotene. All of the previous studies indicated
that the gene expression changes controlling β-carotene
biosynthesis in plants among different species or different
varieties are very complex. However, it is interesting that,
in the white cultivar Dabaixing, the transcript level of
the NCED gene is much higher during fruit development,
especially in the last two stages of CT and FR, which is
contrary to what is observed in the yellower cultivar
Chuanzhihong(Fig. S13). The newly synthesized car-
otenoids are rapidly converted into xanthoxin, the pre-
cursor of abcisate, through enzyme catalysis, especially
that by NCED, halting the accumulation of carotene.
Thus, the balance of the biosynthesis and decomposition
of β-carotene may contribute to the color of apricot fruit,
providing the rst report of the likely mechanism of fruit
color development in apricot. Further research with
respect to the characterization and functional identica-
tion of lcy-b and NCED as well as the possible tran-
scriptional regulation mechanism of β-carotene in apricot
will be carried out, which is valuable for basic research
and the future breeding of new apricot varieties rich in
In this study, we rst report the sequencing, assembly
and annotation of the apricot (Prunus armeniaca L)
genome, along with the signicant evolutionary features
of the main Prunus species. The MATHd genes were
shown to be vertically inherited from the ancestor of the
Prunus species and retained at least two tandemly arrayed
copies in apricot, cherry, mume and peach species. The
NCED (9-cis-epoxycarotenoid dioxygenase) gene, not
LcyB (lycopene beta-cyclase), results in a low β-carotenoid
content in the white cultivar Dabaixing. The
chromosome-scale assembly of apricot will provide more
important gene resources for future studies on stone fruit
crops, which is also valuable for efciently screening
functional genes related to agronomic traits as well as for
GWAS (genome-wide association study) analysis and ne
QTL mapping. Taken together, the results of this study
indicate that it is feasible to use this genome as a tool for
improving breeding strategies.
Jiang et al. Horticulture Research (2019) 6:128 Page 10 of 12
This work was supported by the research of the National Key R&D Program of
China (2018YFD1000606-4), the Beijing Academy of Agriculture and Forestry
Fund for Young Scholars (QNJJ201702, QNJJ201925), the National Natural
Science Foundation of China (31401836), and the Municipal Natural Science
Foundation of Beijing (6162012).
Author details
Beijing Academy of Forestry and Pomology Sciences, 100093 Beijing, PR
Apricot Engineering and Technology Research Center, National
Forestry and Grassland Administration, 100093 Beijing, PR China.
Laboratory of Genome Sciences and Information, Beijing Institute of Genomics,
Chinese Academy of Sciences, 100101 Beijing, China
Author contributions
Y.W. and H.S. conceived and designed the project. J.Z. collected and extracted
the genomic DNA and fruit RNA with assistance from F.J., L.Y., H.S. and M.Z. S.H.
and Y.L. supervised the bioinformatics analysis; S.W., F.J. and S.G. assembled the
apricot genome; F.J. and S.W. performed gene annotation and gene family
analysis; S.W., F.J. and Y.L. performed the analysis of transcriptome and PPV-
resistant genes; F.J., J.Z. and S.W. wrote the draft manuscript. All authors read
and approved the nal version of the manuscript.
Data availability
The data that support the ndings of this study have been deposited in the
CNSA ( of CNGBdb with accession number
CNP0000755. The genome assembly data have been deposited in genome
database for Rosaceae (
Conict of interest
The authors declare that they have no conict of interest.
Supplementary Information accompanies this paper at (
Received: 21 June 2019 Revised: 9 October 2019 Accepted: 23 October
1. Folta, K. M. & Gardiner, S. E. Genetics and Genomics of Rosaceae (Springer, 2009).
2. Benichou, M. et al. Postharvest technologies for shelf life enhancement of
temperate fruits. in
Postharvest Biology and Technology of Temperate Fruits (eds Shabir Ahmad,
M., Manzoor Ahmad, S. & Mohammad Maqbool, M.) 77100 (Springer, 2018).
3. Kearney, J. Food consumption trends and drivers. Philos.Trans.R.Soc.B:Biol.
Sci. 365, 27932807 (2010).
4. De Candolle, A. Origin of Cultivated Plants (Reprint 1964). (Hafner Publishing,
New York, 1886).
5. Faust,M.,Suranyi,D.&Nyujto,F.Originanddisseminationofapricot.Hortic.
Res. 22,225260 (1998).
6. Zhebentyayeva, T., Ledbetter, C., Burgos, L. & Llácer, G. Apricot. Fruit Breeding
415458 (Springer, 2012).
7. Zuriaga, E. et al. Genomic analysis reveals MATH gene (s) as candidate (s) for P
lum pox virus (PPV) resistance in apricot (P runus armeniaca L.). Mol. Plant
Pathol. 14, 663677 (2013).
8. Sass-Kiss, A., Kiss, J., Milotay, P., Kerek, M. & Toth-Markus, M. Differences in
anthocyanin and carotenoid content of fruits and vegetables. Food Res. Int. 38,
10231029 (2005).
9. Dragovic-Uzelac,V.,Levaj,B.,Mrkic,V.,Bursac,D.&Boras,M.Thecontentof
polyphenols and carotenoids in three apricot cultivars depending on stage of
maturity and geographical region. Food Chem. 102, 966975 (2007).
10. Roussos,P.A.etal.Apricot(Prunus armeniaca L.). in Nutritional composition
of fruit cultivars (eds Simmonds, M. & Preedy, V.) 19-48 (Academic press,
11. Akin, E. B., Karabulut, I. & Topcu, A. Some compositional properties of main
Malatya apricot (Prunus armeniaca L.) varieties. Food Chem. 107,939948
12. Ali,S.,Masud,T.&Abbasi,K.S.Physico-chemicalcharacteristicsofapricot
(Prunus armeniaca L.) grown in northern areas of Pakistan. Sci. Horticulturae
130,386392 (2011).
13. Vardi,N.etal.Potentprotectiveeffectofapricotandβ-carotene on
methotrexate-induced intestinal oxidative damage in rats. Food Chem. Toxicol.
46,30153022 (2008).
14. Vardi,N.,Parlakpinar,H.,Ates,B.,Cetin,A.&Otlu,A.Theprotectiveeffectsof
Prunus armeniaca L (apricot) against methotrexate-induced oxidative damage
and apoptosis in rat kidney. J. Physiol. Biochem. 69,371381 (2013).
15. Shulaev, V. et al. Multiple models for Rosaceae genomics. Plant Physiol. 147,
9851003 (2008).
16. Verde, I. et al. The high-quality draft genome of peach (Prunus persica)iden-
ties unique patterns of genetic diversity, domestication and genome evo-
lution. Nat. Genet. 45, 487 (2013).
17. Zhang, Q. et al. The genome of Prunus mume.Nat. Commun. 3, 1318 (2012).
18. Shirasawa, K. et al. The genome sequence of sweet cherry (Prunus avium)for
use in genomics-assisted breeding. DNA Res. 24,499508 (2017).
19. Sánchez-Pérez, R. et al. Mutation of a bHLH transcription factor allowed
almond domestication. Science 364, 10951098 (2019).
20. Vurture,G.W.etal.GenomeScope:fastreference-freegenomeproling from
short reads. Bioinformatics 33,22022204 (2017).
21. Andrews, S. FastQC. A quality control tool for high throughput sequence data. (2010).
22. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ pre-
processor. Bioinformatics 34,i884i890 (2018).
23. Chikhi, R. & Medvedev, P. Informed and automated k-mer size selection for
genome assembly. Bioinformatics 30,3137 (2013).
24. Marçais, G. & Kingsford, C. A fast, lock-free approach for efcient parallel
counting of occurrences of k-mers. Bioinformatics 27,764770 (2011).
25. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-
mer weighting and repeat separation. Genome Res. 27,722736 (2017).
26. Zimin, A. V. et al. The MaSuRCA genome assembler. Bioinformatics 29,
26692677 (2013).
27. Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig
reassignment for third-gen diploid genome assemblies. BMC Bioinforma. 19,
460 (2018).
28. Li, H. & Durbin, R. Fast and accurate short read alignment with
BurrowsWheeler transform. Bioinformatics 25,17541760 (2009).
29. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial
variant detection and genome assembly improve ment. PLoS ONE 9,
e112963 (2014).
30. Tang, H., Krishnakuar, V. & Li, J. jcvi: JCVI utility libraries. Zenodo.
10.5281/zenodo.31631 (2015).
31. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq
using the Trinity platform for reference generation and analysis. Nat. Protoc. 8,
1494 (2013).
32. Kim,D.,Langmead,B.&Salzberg,S.L.HISAT:afastsplicedalignerwithlow
memory requirements. Nat. Meth. 12, 357 (2015).
33. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome
from RNA-seq reads. Nat. Biotechnol. 33, 290 (2015).
34. Fu,L.,Niu,B.,Zhu,Z.,Wu,S.&Li,W.CD-HIT:acceleratedforclusteringthenext-
generation sequencing data. Bioinformatics 28,31503152 (2012).
35. Simão,F.A.,Waterhouse,R.M.,Ioannidis,P.,Kriventseva,E.V.&Zdobnov,E.M.
BUSCO: assessing genome assembly and annotation completeness with
single-copy orthologs. Bioinformatics 31,32103212 (2015).
36. Kriventseva,E.V.etal.OrthoDBv10:sampling the diversity of animal, plant,
fungal, protist, bacterial and viral genomes for evolutionary and functional
annotations of orthologs. Nucleic Acids Res. 47,D807D811 (2018).
37. TarailoGraovac, M. & Chen, N. Using RepeatMasker to identify repetitive ele-
ments in genomic sequences. Curr. Protoc. Bioinforma. 25, 14 (2009). 4.10. 11-
38. Xu, Z. & Wang, H. LTR_FINDER: an efcient tool for the prediction of full-length
LTR retrotransposons. Nucleic Acids Res. 35,W265W268 (2007).
39. Orozco-Arias, S. et al. Inpactor, integrated and parallel analyzer and classier of
LTR retrotransposons and its application for pineapple LTR retrotransposons
diversity and dynamics. Biology 7,32(2018).
40. Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and
context for analysis of transfer RNA genes. Nucleic Acids Res. 44,
W54W57 (2016).
41. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal
RNA genes. Nucleic Acids Res. 35,31003108 (2007).
Jiang et al. Horticulture Research (2019) 6:128 Page 11 of 12
42. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6,26(2011).
43. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology sear-
ches. Bioinformatics 29, 29332935 (2013).
44. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVi-
denceModeler and the Program to Assemble Spliced Alignments. Genome
Biol. 9, R7 (2008).
45. Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including
Spaln2, an extended version of Spaln that incorporates additional species-
specicfeatures.Nucleic Acids Res. 40,e161e161 (2012).
46. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts.
Nucleic Acids Res. 34,W435W439 (2006).
47. Korf, I. Gene nding in novel genomes. BMC Bioinforma. 5,59(2004).
48. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res. 25,33893402 (1997).
49. Jones, P. et al. InterProScan 5: genome-scale protein function classication.
Bioinformatics 30,12361240 (2014).
50. Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole
genome comparisons dramatically improves orthogroup inference accuracy.
Genome Biol. 16, 157 (2015).
51. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary
analysis of gene synteny and collinearity. Nucleic Acids Res. 40,e49e49
52. Katoh,K.&Standley,D.M.MAFFTmultiplesequencealignmentsoftware
version 7: improvements in performance and usability. Mol. Biol. Evol. 30,
772780 (2013).
53. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for
automated alignment trimming in large-scale phylogenetic analyses. Bioin-
formatics 25,19721973 (2009).
54. Kalyaanamoorthy,S.,Minh,B.Q.,Wong,T.K.,VonHaeseler,A.&Jermiin,L.S.
ModelFinder: fast model selection for accurate phylogenetic estimates. Nat.
Meth. 14, 587 (2017).
55. Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and
effective stochastic algorithm for estimating maximum-likelihood phylogenies.
Mol. Biol. Evol. 32, 268274 (2014).
56. Yang, Z. PAML: a program package for phylogenetic analysis by maximum
likelihood. Bioinformatics 13,555556 (1997).
57. Hohmann, N., Wolf, E. M., Lysak, M. A. & Koch, M. A. A time-calibrated road map
of Brassicaceae species radiation and evolutionary history. Plant Cell 27,
27702784 (2015).
58. Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa
(Torr. & Gray). Science 313,15961604 (2006).
59. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hex-
aploidization in major angiosperm phyla. Nature 449, 463 (2007).
60. De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational
tool for the study of gene family evolution. Bioinformatics 22, 12691271
61. Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and
divergence times in the absence of a molecular clock. Bioinformatics 19,
301302 (2003).
62. Robinson,M.D.,Mccarthy,D.J.&Smyth,G.K.edgeR:aBioconductorpackage
for differential expression analysis of digital gene expression data. Bioinfor-
matics 26,139140 (2010).
63. Langfelder,P.&Horvath,S.WGCNA:anRpackageforweightedcorrelation
network analysis. BMC Bioinforma. 9, 559 (2008).
64. Rubio, M. et al. Gene expression analysis of Plum pox virus (Sharka) suscept-
ibility/resistance in apricot (Prunus armeniaca L.). PLoS ONE 10, e0144670
65. Ilardi,V.&Tavazza,M.Biotechnological strategies and tools for Plum pox virus
resistance: trans-, intra-, cis-genesis, and beyond. Front. Plant Sci. 6, 379 (2015).
66. Zuriaga,E.,Romero,C.,Blanca,J.M.&Badenes,M.L.Resistanceto
Plum Pox Virus (PPV) in apricot (Prunus armeniaca L.) is associated with down-
regulation of two MATHd genes. BMC Plant Biol. 18, 25 (2018).
67. García-Gómez, B. E., Salazar, J. A., Dondini, L., Martínez-Gómez, P. & Ruiz, D.
Identication of QTLs linked to fruit quality traits in apricot (Prunus armeniaca
L.) and biological validation through gene expression analysis using qPCR. Mol.
Breed. 39,28(2019).
68. García-Gómez, B. et al. Comparative analysis of SSR markers developed in
exon, intron, and intergenic regions and distributed in regions controlling fruit
quality traits in Prunus species: genetic diversity and association studies. Plant
Mol. Biol. Report. 36,2335 (2018).
69. Winkel-Shirley, B. Flavonoid biosynthesis. A colorful model for genetics,
biochemistry, cell biology, and biotechnology. Plant Physiol. 126,
485493 (2001).
70. Espley, R. V. et al. Red colouration in apple fruit is due to the activity of the
MYB transcription factor, MdMYB10. Plant J. 49,414427 (2007).
71. Gonzali,S.,Mazzucato,A.&Perata,P.Purpleasatomato:towardshigh
anthocyanin tomatoes. Trends Plant Sci. 14,237241 (2009).
72. Carbone, F. et al. Developmental, genetic and environmental factors affect the
expression of avonoid genes, enzymes and metabolites in strawberry fruits.
Plant, Cell Environ. 32,11171131 (2009).
73. Pons, E. et al. Metabolic engineering of β-carotene in orange fruit increases its
in vivo antioxidant properties. Plant Biotechnol. J. 12,1727 (2014).
74. Rinaldo,A.R.etal.Agrapevineanthocyanin acyltransferase, transcriptionally
regulated by VvMYBA, can produce most acylated anthocyanins present in
grape skins. Plant Physiol. 169, 18971916 (2015).
75. Borghesi, E. et al. Comparative physiology during ripening in tomato rich-
anthocyanins fruits. Plant Growth Regul. 80,207214 (2016).
76. Yao,G.etal.Mapbased cloning of the pear gene MYB 114 identies an
interaction with other transcription factors to coordinately regulate fruit
anthocyanin biosynthesis. Plant J. 92,437451 (2017).
77. Curl, A. L. The carotenoids of apricots. J. Food Sci. 25,190196 (1960).
78. Schweiggert,R.M.,Mezger,D.,Schimpf,F.,Steingass,C.B.&Carle,R.Inuence
of chromoplast morphology on carotenoid bioaccessibility of carrot, mango,
papaya, and tomato. Food Chem. 135,27362742 (2012).
79. Simkin,A.J.,Kuntz,M.,Moreau,H.&Mccarthy,J.Carotenoidproling and
the expression of carotenoid biosynthetic genes in developing coffee grain.
Plant Physiol. Biochem. 48, 434442 (2010).
80. Carrari, F. & Fernie, A. R. Metabolic regulation underlying tomato fruit devel-
opment. J. Exp. Bot. 57,18831897 (2006).
81. Alquezar,B.,Rodrigo,M.J.&Zacarías,L.Regulationofcarotenoidbiosynthesis
during fruit maturation in the red-eshed orange mutant Cara Cara. Phy-
tochemistry 69,19972007 (2008).
82. Dou, Jl et al. Effect of ploidy level on expression of lycopene bio-
synthesis genes and accumulation of phytohormones during water-
melon (Citrullus lanatus) fruit development and ripening. J. Integr. Agr.
16, 19561967 (2017).
83. Ecarnot, M., Bączyk, P., Tessarotto, L. & Chervin, C. Rapid phenotyping of the
tomato fruit model, Micro-Tom, with a portable VISNIR spectrometer. Plant
Physiol. Biochem. 70,159163 (2013).
84. Su, L. et al. Carotenoid accumulation during tomato fruit ripening is modu-
lated by the auxin-ethylene balance. BMC Plant Biol. 15, 114 (2015).
85. Wang, Q. H. et al. Transcriptome analysis around the onset of strawberry fruit
ripening uncovers an important role of oxidative phosphorylation in ripening.
Sci. Rep. 7, 41477 (2017).
Jiang et al. Horticulture Research (2019) 6:128 Page 12 of 12
... Mb for P. persica,~284.2 Mb for P. salicina,~658.9 Mb for M. domestica, and~503 Mb for R. chinensis [46,[51][52][53][54][55][56]. This phenomenon suggests that there is no direct correlation between the number of MPK and MKK genes and plant genome size. ...
... accessed on 6 October 2022); the genomes of P. mume var. tortuosa [50] and five other Rosaceae species, including Prunus armeniaca [52], Prunus persica [53], Prunus salicina [54], Malus domestica [55] and Rosa chinensis [56] were downloaded from the Genome Database for Rosaceae (https://www., ...
Full-text available
Protein kinases of the MAPK cascade family (MAPKKK–MAPKK–MAPK) play an essential role in plant stress response and hormone signal transduction. However, their role in the cold hardiness of Prunus mume (Mei), a class of ornamental woody plant, remains unclear. In this study, we use bioinformatic approaches to assess and analyze two related protein kinase families, namely, MAP kinases (MPKs) and MAPK kinases (MKKs), in wild P. mume and its variety P. mume var. tortuosa. We identify 11 PmMPK and 7 PmMKK genes in the former species and 12 PmvMPK and 7 PmvMKK genes in the latter species, and we investigate whether and how these gene families contribute to cold stress responses. Members of the MPK and MKK gene families located on seven and four chromosomes of both species are free of tandem duplication. Four, three, and one segment duplication events are exhibited in PmMPK, PmvMPK, and PmMKK, respectively, suggesting that segment duplications play an essential role in the expansion and evolution of P. mume and its gene variety. Moreover, synteny analysis suggests that most MPK and MKK genes have similar origins and involved similar evolutionary processes in P. mume and its variety. A cis-acting regulatory element analysis shows that MPK and MKK genes may function in P. mume and its variety’s development, modulating processes such as light response, anaerobic induction, and abscisic acid response as well as responses to a variety of stresses, such as low temperature and drought. Most PmMPKs and PmMKKs exhibited tissue-specifific expression patterns, as well as time-specific expression patterns that protect them through cold. In a low-temperature treatment experiment with the cold-tolerant cultivar P. mume ‘Songchun’ and the cold-sensitive cultivar ‘Lve’, we find that almost all PmMPK and PmMKK genes, especially PmMPK3/5/6/20 and PmMKK2/3/6, dramatically respond to cold stress as treatment duration increases. This study introduces the possibility that these family members contribute to P. mume’s cold stress response. Further investigation is warranted to understand the mechanistic functions of MAPK and MAPKK proteins in P. mume development and response to cold stress.
... and Succ., P. mandshurica (Maxim), P. brigantina Vill., and P. holosericeae Batal are also recognized [39]. Multiple genome sequences [40][41][42] have been reported since the first publication of the P. armeniaca ('Chuanzhihong') genome [43], which provide an opportunity to elucidate the connection between the genetic diversity and phenotypic variation for important agronomic traits in apricots. Further, the accessibility to genomes offers a more effective reference to explore transcriptional regulatory mechanisms of fruit color [43][44][45] and taste [45] in apricots. ...
... Multiple genome sequences [40][41][42] have been reported since the first publication of the P. armeniaca ('Chuanzhihong') genome [43], which provide an opportunity to elucidate the connection between the genetic diversity and phenotypic variation for important agronomic traits in apricots. Further, the accessibility to genomes offers a more effective reference to explore transcriptional regulatory mechanisms of fruit color [43][44][45] and taste [45] in apricots. Fruit size is one of the important quality traits in apricots [39], and multiple QTLs related to it have been obtained in recent reports [46][47][48]. ...
Full-text available
Fruit size is one of the essential quality traits and influences the economic value of apricots. To explore the underlying mechanisms of the formation of differences in fruit size in apricots, we performed a comparative analysis of anatomical and transcriptomics dynamics during fruit growth and development in two apricot cultivars with contrasting fruit sizes (large-fruit Prunus armeniaca ‘Sungold’ and small-fruit P. sibirica ‘F43’). Our analysis identified that the difference in fruit size was mainly caused by the difference in cell size between the two apricot cultivars. Compared with ‘F43’, the transcriptional programs exhibited significant differences in ‘Sungold’, mainly in the cell expansion period. After analysis, key differentially expressed genes (DEGs) most likely to influence cell size were screened out, including genes involved in auxin signal transduction and cell wall loosening mechanisms. Furthermore, weighted gene co-expression network analysis (WGCNA) revealed that PRE6/bHLH was identified as a hub gene, which interacted with 1 TIR1, 3 AUX/IAAs, 4 SAURs, 3 EXPs, and 1 CEL. Hence, a total of 13 key candidate genes were identified as positive regulators of fruit size in apricots. The results provide new insights into the molecular basis of fruit size control and lay a foundation for future breeding and cultivation of larger fruits in apricot.
... These results suggest that HD-Zip genes may also participate in carotenoid regulation, and the three up-regulated HD-Zip I genes may play positive roles in yellow-inner-leaf Chinese cabbage. A previous study found that the AtHB21 gene could regulate the NCED gene, which could decrease the carotenoid content [46,47,67] and promote ABA accumulation to improve heat stress tolerance [19]. BraA09g011460.3C was homologous to AtHB21 and was highly expressed after heat treatment and lowly expressed in high-carotenoid varieties, suggesting a regulatory effect on heat tolerance and the carotenoid content in B. rapa. ...
Full-text available
HD-Zip, a special class of transcription factors in high plants, has a role in plant development and responding to external environmental stress. Heat stress has always been an important factor affecting plant growth, quality, and yield. Carotenoid content is also an important factor affecting the color of the inner leaf blades of Chinese cabbage. In this study, the genomes of three Brassicaceae plants were selected: Chinese cabbage (Brassica rapa subsp. pekinensis), Brassica oleracea, and Brassica napus. We identified 93, 96, and 184 HD-Zip genes in the B. rapa, B. oleracea, and B. napus, respectively. The HD-Zip gene family was classified into four subfamilies based on phylogeny: I, II, III, and IV;. The results of cis-acting element analysis suggested that HD-Zip family genes may participate in various biological processes, such as pigment synthesis, cell cycle regulation, defense stress response, etc. Conserved motifs prediction revealed that three motifs exist among the four HD-Zip gene families and that different motifs exhibit significant effects on the structural differences in HD-Zips. Synteny, Ks, and 4DTv results displayed that genome-wide triplication events act in HD-Zip gene family expansion. Transcriptome data showed that 18 genes responded (>1.5-fold change) to heat stress in Chinese cabbage, and 14 of 18 genes were from the HD-Zip I subfamily. Three genes had up-regulation, and eight genes had down-regulation in high-carotenoid-content Chinese cabbage. The BraA09g011460.3C expression level was up-regulated after heat stress treatment and significantly reduced in varieties with high carotenoid content, indicating its potential for heat stress tolerance and carotenoid content regulation. This study provided important gene resources for the subsequent breeding of Chinese cabbage.
... Recently, chromosome-scale genomic sequences of many ornamental plants have been released. The genomes of many fruit trees that belong to the Prunus genus have also been released, such as apple Velasco et al., 2010), pear (Wu et al., 2013), peach (International Peach Genome et al., 2013), sweet cherry (Wang et al., 2020) and apricot (Jiang et al., 2019). Genomic resequencing of a large panel of germplasms was also conducted for a number of plants. ...
Full-text available
Key message TFL1-like genes of the basal eudicot Platanus acerifolia have conserved roles in maintaining vegetative growth and inhibiting flowering, but may act through distinct regulatory mechanism. Three TERMINAL FLOWER 1 (TFL1)-like genes were isolated and characterized from London plane tree (Platanus acerifolia). All genes have conserved genomic organization and characteristic of the phosphatidylethanolamine-binding protein (PEBP) family. Sequence alignment and phylogenetic analysis indicated that two genes belong to the TFL1 clade, designated as PlacTFL1a and PlacTFL1b, while another one was grouped in the BFT clade, named as PlacBFT. qRT-PCR analysis showed that all three genes primarily expressed in vegetative phase, but the expression of PlacTFL1a was much higher and wider than that of PlacTFL1b, with the latter only detected at relatively low expression levels in apical and lateral buds in April. PlacBFT was mainly expressed in young stems of adult trees followed by juvenile tissues. Ectopic expression of any TFL1-like gene in Arabidopsis showed phenotypes of delayed or repressed flowering. Furthermore, overexpression of PlacTFL1a gene in petunia also resulted in extremely delayed flowering. In non-flowering 35:PlacTFL1a transgenic petunia plants, the FT-like gene (PhFT) gene was significantly upregulated and AP1 homologues PFG, FBP26 and FBP29 were significantly down-regulated in leaves. Yeast two-hybrid analysis indicated that only weak interactions were detected between PlacTFL1a and PlacFDL, and PlacTFL1a showed no interaction with PhFDL1/2. These results indicated that the TFL1-like genes of Platanus have conserved roles in repressing flowering, but probably via a distinct regulatory mechanism.
... P. armeniaca (apricot) is an edible fruit mainly cultivated in Mediterranean climates (Hormaza 2002). Although it is grown all over the world except Antarctica, it is extensively cultivated in Turkey, Iran, Italy, Pakistan, France, Spain, the Unite States, and Morocco (Jiang et al. 2019). It is estimated that approximately 7.92 million tons of apricot is produced worldwide (Kılıç Topuz et al. 2018). ...
Full-text available
Prunus armeniaca (P. armeniaca) is a stone fruit that is widely consumed around the world with its appealing and delicious taste. Pectate lyases (Pels) cleave the α‑1,4‑glycosidic bond of polygalacturonic acid. Pels have a role in the development, ripening, and in providing intracellular entry by disrupting the integrity of the cell wall of pathogenic microorganisms. The three-dimensional structures of proteins and enzymes provide important data on their functional properties and catalytic roles. The protein structure of any Prunus is unknown. In this study, P. armeniaca protein homology models were modeled using ProMod3 and trRosetta deep learning algorithms in order to elucidate the molecular mechanism of ripening and cell wall degradation processes in which pectate lyase (Pel) plays a role. Substrate binding patterns were demonstrated by molecular docking. The (Pel) homology models from P. armeniaca were within x‑ray quality limits. The three-dimensional structure of P. armeniaca pectate lyase (PaPel) has been shown to have an unusual β‑folding formation that we have encountered in other Pels. We determined that ²⁹⁷RXPXXR³⁰² and ²³⁰WIDH²³³ residues preserved in known Pels were preserved in Pel from P. armeniaca. The ¹⁶⁸NVHI¹⁷¹ and ¹⁸⁹NVHI¹⁹² repeat sites in the vicinity of the catalytic site may be responsible for substrate stability. The lowest binding energy for the substrate was −4.18 kcal.mol⁻¹. The data presented in this study may provide an important model for elucidating the catalytic mechanism of development, ripening, and cell wall destruction processes of Prunus Pel.
Full-text available
Fruit production is an important part of the gross domestic product for many countries around the world especially to those who have a strong focus on agriculture. However, long-term maintenance and yield stability of fruit production may be threatened by the ongoing climate change and its consequences like extended drought periods, heavy rain events, and floodings. Genome editing, with its progressive technological developments, offers opportunities to adapt relevant fruit plant species to new climatic conditions. Among modern genome editing techniques, CRISPR/Cas, in particular, has the potential to support breeding for those fruit plant species with extended breeding cycles, e.g., perennial fruits. In this review, we discuss CRISPR/Cas and other genome editing techniques in detail and how these techniques can be applied to support the breeding of fruit plant species for adaptation to changing climates. The chronological history of CRISPR/Cas9 systems, their associated computational tools, genomic data sources, transformation methods along with their delivery vehicles, quality improvement, environmental-stress resiliency, limitations, and future perspectives will also be discussed with respect to securing future global fruit production.
Background: L-glutamate involves in many important chemical reactions in horticultural products and improves postharvest disease resistance. Quality decline of apple fruit caused by senescence and fungus invasion often leads to tremendous losses during logistics. This study was performed to evaluate the variations of quality attributes, carotenoid, sorbitol and sucrose metabolisms in apples (cv. Qiujin) after L-glutamate dipping treatment. RESUITS L-glutamate immersion maintained a high values of L*, a* and b*, flesh firmness, titratable acidity, as well as the total soluble solids, soluble sugar, reducing sugar and ascorbic acid contents in apples. L-glutamate also decreased mass loss, respiratory rate and ethylene release, enhanced sucrose synthase-cleavage, acid invertase and neutral invertase activities, whereas reduced sorbitol dehydrogenase, sucrose phosphate synthase, sucrose synthase synthesis and sorbitol oxidase activities in apples. Moreover, L-glutamate inhibited lutein, β-carotene and lycopene accumulation, and down-regulated phytoene synthase, lycopene β-cyclase, ζ-carotene desaturase, phytoene desaturase, carotenoid isomerase, ζ-carotene isomerase and carotenoids cleavage dioxygenase gene expressions, but up-regulated 9-cis-epoxycarotenoid dioxygenase gene expression in apples. Conclusion: Postharvest L-glutamate dipping treatment can keep apple quality by modulating key enzyme activity and gene expression in sorbitol, sucrose and carotenoid metabolisms. This article is protected by copyright. All rights reserved.
Peach (Prunus persica L. Batsch) and apricot (Prunus armeniaca L.) are two species of economic importance for fruit production in the genus Prunus. Peach and apricot fruits exhibit significant differences in carotenoid levels and profiles. HPLC-PAD analysis showed that a greater content of β-carotene in mature apricot fruits is primarily responsible for orange color, while peach fruits showed a prominent accumulation of xanthophylls (violaxanthin and cryptoxanthin) with yellow color. There are two β-carotene hydroxylase genes in both peach and apricot genomes. Transcriptional analysis revealed that BCH1 expresses highly in peach but lowly in apricot fruit, showing a correlation with peach and apricot fruit carotenoid profiles. By using a carotenoid engineered bacterial system, it was demonstrated that there was no difference in the BCH1 enzymatic activity between peach and apricot. Comparative analysis about the putative cis-acting regulatory elements between peach and apricot BCH1 promoters provided important information for our understanding of the differences in promoter activity of the BCH1 genes in peach and apricot. Therefore, we investigated the promoter activity of BCH1 gene through a GUS detection system, and confirmed that the difference in the transcription level of the BCH1 gene resulted from the difference of the promoter function. This study provides important perspective to understanding the diversity of carotenoid accumulation in Prunus fruits such as peach and apricot. In particular, BCH1 gene is proposed as a main predictor for β-carotene content in peach and apricot fruits during the ripening process.
Full-text available
Nine important fruit quality traits—including fruit weight, stone weight, fruit diameter, skin ground colour, flesh colour, blush colour, firmness, soluble solids content and acidity content—were studied for two consecutive years in two F1 apricot progeny derived from the crosses ‘Bergeron’ × ‘Currot’ (B×C) and ‘Goldrich’ × ‘Currot’ (G×C). Results showed great segregation variability between populations, which was expected because of the polygenic nature and quantitative inheritance of all the studied traits. In addition, some correlations were observed among the fruit quality traits studied. QTL (quantitative trait loci) analysis was carried out using the phenotypic data and genetic linkages maps of ‘B×C’ and ‘G×C’ obtained with SSR and SNP markers. The most significant QTLs were localised in LG4 for soluble solids content and in LG3 for skin and flesh colour. In LG4, we can highlight the presence of candidate genes involved in D-glucose and D-mannose binding, while in LG3, we identified MYB genes previously linked to skin colour by other authors. In order to clearly identify the candidate genes responsible for the analysed traits, we converted the QTLs into expression QTLs and analysed the abundance of transcripts in the segregating genotypes ‘GC 2–11’ and ‘GC 3–7’ from the G×C population. Using qPCR, we analysed the gene expression of nine candidate genes associated with the QTLs identified, including transcription factors (MYB 10), carotenoid biosynthesis genes (LOX 2, CCD1 and CCD4), anthocyanin biosynthesis genes (ANS, UFGT and F3’5’H), organic acid biosynthesis genes (NAD ME) and ripening date genes (NAC). Results showed variable expression patterns throughout fruit development and between contrasted genotypes, with a correlation between validated genes and linked QTLs. The MYB10 gene was the best candidate gene for skin colour. In addition, we found that monitoring NAC expression is a good RNA marker for evaluating ripening progression.
Full-text available
Background Recent developments in third-gen long read sequencing and diploid-aware assemblers have resulted in the rapid release of numerous reference-quality assemblies for diploid genomes. However, assembly of highly heterozygous genomes is still problematic when regional heterogeneity is so high that haplotype homology is not recognised during assembly. This results in regional duplication rather than consolidation into allelic variants and can cause issues with downstream analysis, for example variant discovery, or haplotype reconstruction using the diploid assembly with unpaired allelic contigs. Results A new pipeline—Purge Haplotigs—was developed specifically for third-gen sequencing-based assemblies to automate the reassignment of allelic contigs, and to assist in the manual curation of genome assemblies. The pipeline uses a draft haplotype-fused assembly or a diploid assembly, read alignments, and repeat annotations to identify allelic variants in the primary assembly. The pipeline was tested on a simulated dataset and on four recent diploid (phased) de novo assemblies from third-generation long-read sequencing, and compared with a similar tool. After processing with Purge Haplotigs, haploid assemblies were less duplicated with minimal impact on genome completeness, and diploid assemblies had more pairings of allelic contigs. Conclusions Purge Haplotigs improves the haploid and diploid representations of third-gen sequencing based genome assemblies by identifying and reassigning allelic contigs. The implementation is fast and scales well with large genomes, and it is less likely to over-purge repetitive or paralogous elements compared to alignment-only based methods. The software is available at under a permissive MIT licence. Electronic supplementary material The online version of this article (10.1186/s12859-018-2485-7) contains supplementary material, which is available to authorized users.
Full-text available
OrthoDB ( provides evolutionary and functional annotations of orthologs. This update features a major scaling up of the resource coverage, sampling the genomic diversity of 1271 eukaryotes, 6013 prokaryotes and 6488 viruses. These include putative orthologs among 448 metazoan, 117 plant, 549 fungal, 148 protist, 5609 bacterial, and 404 archaeal genomes, picking up the best sequenced and annotated representatives for each species or operational taxonomic unit. OrthoDB relies on a concept of hierarchy of levels-of-orthology to enable more finely resolved gene orthologies for more closely related species. Since orthologs are the most likely candidates to retain functions of their ancestor gene, OrthoDB is aimed at narrowing down hypotheses about gene functions and enabling comparative evolutionary studies. Optional registered-user sessions allow on-line BUSCO assessments of gene set completeness and mapping of the uploaded data to OrthoDB to enable further interactive exploration of related annotations and generation of comparative charts. The accelerating expansion of genomics data continues to add valuable information, and OrthoDB strives to provide orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations and to compute evolutionary annotations. The data can be browsed online, downloaded or assessed via REST API or SPARQL RDF compatible with both UniProt and Ensembl.
Full-text available
Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at
Full-text available
During the last three or four decades, temperate fruit production and consumption have increased tremendously, thanks to new developments and findings of research activities. These aspects concern better understanding of the physiology of the crops in order to develop appropriate technologies, which can ensure quality preservation and improve the availability of products for fresh consumption, as well as for processing. This chapter intends to highlight some major factors that have a direct effect on the preservation and/or deterioration of fresh temperate fruit during postharvest operations. Hence, it presents an overview of the available reports on the effects of postharvest technologies on the enhancement of shelf life and storage potential of various temperate fruits.
Full-text available
One particular class of Transposable Elements (TEs), called Long Terminal Repeats (LTRs), retrotransposons, comprises the most abundant mobile elements in plant genomes. Their copy number can vary from several hundreds to up to a few million copies per genome, deeply affecting genome organization and function. The detailed classification of LTR retrotransposons is an essential step to precisely understand their effect at the genome level, but remains challenging in large-sized genomes, requiring the use of optimized bioinformatics tools that can take advantage of supercomputers. Here, we propose a new tool: Inpactor, a parallel and scalable pipeline designed to classify LTR retrotransposons, to identify autonomous and non-autonomous elements, to perform RT-based phylogenetic trees and to analyze their insertion times using High Performance Computing (HPC) techniques. Inpactor was tested on the classification and annotation of LTR retrotransposons in pineapple, a recently-sequenced genome. The pineapple genome assembly comprises 44% of transposable elements, of which 23% were classified as LTR retrotransposons. Exceptionally, 16.4% of the pineapple genome assembly corresponded to only one lineage of the Gypsy superfamily: Del, suggesting that this particular lineage has undergone a significant increase in its copy numbers. As demonstrated for the pineapple genome, Inpactor provides comprehensive data of LTR retrotransposons’ classification and dynamics, allowing a fine understanding of their contribution to genome structure and evolution. Inpactor is available at
Full-text available
Background: Plum pox virus (PPV), causing Sharka disease, is one of the main limiting factors for Prunus production worldwide. In apricot (Prunus armeniaca L.) the major PPV resistance locus (PPVres), comprising ~ 196 kb, has been mapped to the upper part of linkage group 1. Within the PPVres, 68 genomic variants linked in coupling to PPV resistance were identified within 23 predicted transcripts according to peach genome annotation. Taking into account the predicted functions inferred from sequence homology, some members of a cluster of meprin and TRAF-C homology domain (MATHd)-containing genes were pointed as PPV resistance candidate genes. Results: Here, we have characterized the global apricot transcriptome response to PPV-D infection identifying six PPVres locus genes (ParP-1 to ParP-6) differentially expressed in resistant/susceptible cultivars. Two of them (ParP-3 and ParP-4), that encode MATHd proteins, appear clearly down-regulated in resistant cultivars, as confirmed by qRT-PCR. Concurrently, variant calling was performed using whole-genome sequencing data of 24 apricot cultivars (10 PPV-resistant and 14 PPV-susceptible) and 2 wild relatives (PPV-susceptible). ParP-3 and ParP-4, named as Prunus armeniaca PPVres MATHd-containing genes (ParPMC), are the only 2 genes having allelic variants linked in coupling to PPV resistance. ParPMC1 has 1 nsSNP, while ParPMC2 has 15 variants, including a 5-bp deletion within the second exon that produces a frameshift mutation. ParPMC1 and ParPMC2 are adjacent and highly homologous (87.5% identity) suggesting they are paralogs originated from a tandem duplication. Cultivars carrying the ParPMC2 resistant (mutated) allele show lack of expression in both ParPMC2 and especially ParPMC1. Conclusions: Accordingly, we hypothesize that ParPMC2 is a pseudogene that mediates down-regulation of its functional paralog ParPMC1 by silencing. As a whole, results strongly support ParPMC1 and/or ParPMC2 as host susceptibility genes required for PPV infection which silencing may confer PPV resistance trait. This finding may facilitate resistance breeding by marker-assisted selection and pave the way for gene edition approaches in Prunus.
Full-text available
Simple sequence repeats (SSRs) are genome domains located in both coding and non-coding regions in eukaryotic genomes. Although SSRs are often characterized by low polymorphism, their DNA-flanking sequences could be a useful source of DNA markers, which could help in genetic studies and breeding because they are associated with genes that control traits of interest. In this study, 56 genotypes from different Prunus species were used, including peach, apricot, plum, and almond (already phenotyped for several agronomical traits, including self-compatibility, flowering and ripening time, fruit type, skin and flesh color, and shell hardness). These Prunus genotypes were molecularly characterized using 28 SSR markers developed in exons, introns, and intergenic regions. All these genes were located in specific regions where quantitative trait loci (QTLs) for certain fruit quality traits were also located, including flowering and ripening times and fruit flesh and skin color. A sum of 309 SSR alleles were identified in the whole panel of analyzed cultivars, with expected heterozygosity values of 0.61 (upstream SSRs), 0.17 (exonic SSRs), 0.65 (intronic SSRs), and 0.58 (downstream SSRs). These values prove the low level of polymorphism of the exonic (gene-coding regions) markers. Cluster and structural analysis based on SSR data clearly differentiated the genotypes according to either specie (for the four species) and pedigree (apricot) or geographic origin (Japanese plum). In addition, some SSR markers mainly developed in intergenic regions could be associated with genes that control traits of interest in breeding and could therefore help in marker-assisted breeding. These findings highlight the importance of using molecular markers able to discriminate between the functional roles of the gene allelic variants.
How to make almonds palatable The domesticated almond tree has been feeding humans for millennia. Derivation from the wild, bitter, and toxic almond required loss of the cyanogenic diglucoside amygdalin. Sánchez-Pérez et al. sequenced the almond genome and analyzed the genomic region responsible for this shift. The key change turned out to be a point mutation in a transcription factor that regulates production of P450 monooxygenases in the biosynthetic pathway for the toxic compound. Science , this issue p. 1095
Genomics of the Rosaceae Edited by Kevin M. Folta and Susan E. Gardiner Plant Genetics and Genomics: Crops and Models book series provides current overviews and summaries of the state of the art in genetics and genomics for each of the important crop plants and genetic models for which such a volume does not now exist or is out of date. Volumes will focus on a single crop, species, or group of close relatives, including especially those plants that already have advanced genomic resources developed and preferably complete or advancing genome sequences. The Rosaceae Family includes many significant fruit, nut, ornamental and wood crops. Traditionally their large stature, long juvenility periods, and often complicated genomes presented little opportunity for genetic or genomic inquiry. But the new millennium brings with it new challenges to production of the highly desirable products from this family, challenges that genetic and genomic tools may help resolve. Together necessity and scientific curiosity conspire to launch deep exploration of rosaceous crop biology. This volume originates at an acceleration point for Roscaeae genomics. A foundation of outstanding tools has been developed in a cross-section of species. The successes and failures of various approaches have been documented from model systems and inform our attack on questions within the valued crops within the Rosaceae. The text within describes the species and products of this plant family along with a synopsis of the current state of research presented from experts active on the front line of Rosaceae genomics research. Kevin M. Folta is a native of Chicago, Illinois USA. Kevin completed his Ph.D. work in identification of blue light regulated promoter elements and post-transcriptional light-regulated mRNA stability at the University of Illinois at Chicago. Postdoctoral work at the University of Wisconsin centered on light regulated electrophysiology, gene expression and high-resolution growth monitoring. Kevin has maintained a unique research program in photomorphogenesis and a separate emphasis in strawberry genomics at the University of Florida in Gainesville, FL. He is active in public science education and has won awards for dedication to undergraduate research. Susan E. Gardiner grew up on a sheep farm in Christchurch, New Zealand. Home schooled as a child, Sue later obtained both her undergraduate degree and PhD in Biochemistry from Otago University in Dunedin, New Zealand. Her post-doctoral period was spent in Freiburg, Germany, where she worked to elucidate the mechanism of differential regulation of the enzymes of phenylpropanoid biosynthesis in a parsley cell culture system irradiated with UV light. In 1980, Sue joined the Plant Physiology Division of the Department of Scientific and Industrial Research (DSIR) in Palmerston North, New Zealand where she first worked on the elucidation of chloroplast glycerolipid biosynthesis and later developed techniques for distinguishing varieties of pasture grasses and legumes. Sue’s work on gene mapping in apple began in 1990 at DSIR and continued when DSIR was restructured into HortResearch in 1992. Today, Sue leads a team working to unravel the genetic architecture of traits of fruit crops central to New Zealand fruit industries. Technological advancements in Genomics at HortResearch proved to be critical in her success in identification of ‘breeder-friendly markers for traits of economic value.’ Sue has developed an extensive network of international collaborators in Rosaceae genomics, and recently completed a 2-year term as Chair of the International Rosaceae Genomics Initiative. Sue derives her greatest satisfaction from seeing her team’s marker technologies being used by breeders.