[Show abstract][Hide abstract] ABSTRACT: Polyploidization events are frequent among flowering plants, and the duplicate genes produced via such events contribute significantly to plant evolution. We sequenced the genome of wild radish (Raphanus raphanistrum), a Brassicaceae species that experienced a whole-genome triplication event prior to diverging from Brassica rapa. Despite substantial gene gains in these two species compared with Arabidopsis thaliana and Arabidopsis lyrata, ∼70% of the orthologous groups experienced gene losses in R. raphanistrum and B. rapa, with most of the losses occurring prior to their divergence. The retained duplicates show substantial divergence in sequence and expression. Based on comparison of A. thaliana and R. raphanistrum ortholog floral expression levels, retained radish duplicates diverged primarily via maintenance of ancestral expression level in one copy and reduction of expression level in others. In addition, retained duplicates differed significantly from genes that reverted to singleton state in function, sequence composition, expression patterns, network connectivity, and rates of evolution. Using these properties, we established a statistical learning model for predicting whether a duplicate would be retained postpolyploidization. Overall, our study provides new insights into the processes of plant duplicate loss, retention, and functional divergence and highlights the need for further understanding factors controlling duplicate gene fate.
[Show abstract][Hide abstract] ABSTRACT: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011.
Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an "unsupported" status and 4% are absent from the Mt4.0 predictions.
Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www.jcvi.org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.
[Show abstract][Hide abstract] ABSTRACT: From Jan. 1, 2009, to May 31, 2013, 15, 287 respiratory specimens submitted to the Clinical Virology laboratory at Children's Hospital Colorado were tested for human coronavirus RNA by RT-PCR. Human coronaviruses HKU1, OC43, 229E, and NL63 co-circulated during each of the respiratory seasons but with significant year to year variability, and cumulatively accounted for 7.4 - 15.6% of all samples tested during months of peak activity. A total of 79 (0.5% prevalence) specimens were positive for human betacoronavirus HKU1 RNA. Genotypes HKU1 A and B were both isolated from clinical specimens and propagated on primary human tracheal-bronchial epithelial cells cultured at the air-liquid interface and were neutralized in vitro by human intravenous immunoglobulin and by polyclonal rabbit antibodies to the spike glycoprotein of HKU1. Phylogenetic analysis of deduced amino acid sequences of 7 full length genomes of Colorado HKU1 viruses and the spike glycoproteins from 4 additional HKU1 viruses from Colorado and 3 from Brazil, demonstrated remarkable conservation of these sequences with genotypes circulating in Hong Kong and France. Within genotype A, all but one of the Colorado HKU1 sequences formed a unique subclade defined by 3 amino acid residue substitutions (W197F, F613Y, S752F) in the spike glycoprotein and exhibited a unique signature in the acidic tandem repeat in the N terminal region of the nsp3 subdomain. Elucidating the function of and mechanisms responsible for the formation of theses varying tandem will increase our understanding of the replication process and pathogenicity of HKU1 and potentially other coronaviruses.
Journal of General Virology 01/2014; · 3.13 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
[Show abstract][Hide abstract] ABSTRACT: The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea.
[Show abstract][Hide abstract] ABSTRACT: We used the Roche-454 platform to sequence from normalized cDNA libraries from each of two inbred lines of onion (OH1 and 5225). From approximately 1.6 million reads from each inbred, 27,065 and 33,254 cDNA contigs were assembled from OH1 and 5225, respectively. In total, 3,364 well supported single nucleotide polymorphisms (SNPs) on 1,716 cDNA contigs were identified between these two inbreds. One SNP on each of 1,256 contigs was randomly selected for genotyping. OH1 and 5225 were crossed and 182 gynogenic haploids extracted from hybrid plants were used for SNP mapping. A total of 597 SNPs segregated in the OH1 × 5225 haploid family and a genetic map of ten linkage groups (LOD ≥8) was constructed. Three hundred and thirty-nine of the newly identified SNPs were also mapped using a previously developed segregating family from BYG15-23 × AC43, and 223 common SNPs were used to join the two maps. Because these new SNPs are in expressed regions of the genome and commonly occur among onion germplasms, they will be useful for genetic mapping, gene tagging, marker-aided selection, quality control of seed lots, and fingerprinting of cultivars.
Theoretical and Applied Genetics 05/2013; · 3.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Soybean (Glycine max (L. Merr.)) resistance to any population of Heterodera glycines (I.), or Fusarium virguliforme (Akoi, O'Donnell, Homma & Lattanzi) required a functional allele at Rhg1/Rfs2. H. glycines, the soybean cyst nematode (SCN) was an ancient, endemic, pest of soybean whereas F. virguliforme causal agent of sudden death syndrome (SDS), was a recent, regional, pest. This study examined the role of a receptor like kinase (RLK) GmRLK18-1 (gene model Glyma_18_02680 at 1,071 kbp on chromosome 18 of the genome sequence) within the Rhg1/Rfs2 locus in causing resistance to SCN and SDS.
A BAC (B73p06) encompassing the Rhg1/Rfs2 locus was sequenced from a resistant cultivar and compared to the sequences of two susceptible cultivars from which 800 SNPs were found. Sequence alignments inferred that the resistance allele was an introgressed region of about 59 kbp at the center of which the GmRLK18-1 was the most polymorphic gene and encoded protein. Analyses were made of plants that were either heterozygous at, or transgenic (and so hemizygous at a new location) with, the resistance allele of GmRLK18-1. Those plants infested with either H. glycines or F. virguliforme showed that the allele for resistance was dominant. In the absence of Rhg4 the GmRLK18-1 was sufficient to confer nearly complete resistance to both root and leaf symptoms of SDS caused by F. virguliforme and provided partial resistance to three different populations of nematodes (mature female cysts were reduced by 30-50%). In the presence of Rhg4 the plants with the transgene were nearly classed as fully resistant to SCN (females reduced to 11% of the susceptible control) as well as SDS. A reduction in the rate of early seedling root development was also shown to be caused by the resistance allele of the GmRLK18-1. Field trials of transgenic plants showed an increase in foliar susceptibility to insect herbivory.
The inference that soybean has adapted part of an existing pathogen recognition and defense cascade (H.glycines; SCN and insect herbivory) to a new pathogen (F. virguliforme; SDS) has broad implications for crop improvement. Stable resistance to many pathogens might be achieved by manipulation the genes encoding a small number of pathogen recognition proteins.
[Show abstract][Hide abstract] ABSTRACT: This study compared the complete genome sequences of 16 NL63 strain human coronaviruses (hCoVs) from respiratory specimens of paediatric patients with respiratory disease in Colorado, USA, and characterized the epidemiology and clinical characteristics associated with circulating NL63 viruses over a 3-year period. From 1 January 2009 to 31 December 2011, 92 of 9380 respiratory specimens were found to be positive for NL63 RNA by PCR, an overall prevalence of 1 %. NL63 viruses were circulating during all 3 years, but there was considerable yearly variation in prevalence and the month of peak incidence. Phylogenetic analysis comparing the genome sequences of the 16 Colorado NL63 viruses with those of the prototypical hCoV-NL63 and three other NL63 viruses from the Netherlands demonstrated that there were three genotypes (A, B and C) circulating in Colorado from 2005 to 2010, and evidence of recombination between virus strains was found. Genotypes B and C co-circulated in Colorado in 2005, 2009 and 2010, but genotype A circulated only in 2005 when it was the predominant NL63 strain. Genotype C represents a new lineage that has not been described previously. The greatest variability in the NL63 virus genomes was found in the N-terminal domain (NTD) of the spike gene (nt 1-600, aa 1-200). Ten different amino acid sequences were found in the NTD of the spike protein among these NL63 strains and the 75 partial published sequences of NTDs from strains found at different times throughout the world.
Journal of General Virology 07/2012; 93(Pt 11):2387-98. · 3.13 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Canine alphacoronaviruses (CCoV) exist in two serotypes, type I and II, both of which can cause severe gastroenteritis. Here, we characterize a canine alphacoronavirus, designated CCoV-A76, first isolated in 1976. Serological studies show that CCoV-A76 is distinct from other CCoVs, such as the prototype CCoV-1-71. Efficient replication of CCoV-A76 is restricted to canine cell lines, in contrast to the prototypical type II strain CCoV-1-71 that more efficiently replicates in feline cells. CCoV-A76 can use canine aminopeptidase N (cAPN) receptor for infection of cells, but was unable to use feline APN (fAPN). In contrast, CCoV-1-71 can utilize both. Genomic analysis shows that CCoV-A76 possesses a distinct spike, which is the result of a recombination between type I and type II CCoV, that occurred between the N- and C-terminal domains (NTD and C-domain) of the S1 subunit. These data suggest that CCoV-A76 represents a recombinant coronavirus form, with distinct host cell tropism.
[Show abstract][Hide abstract] ABSTRACT: Defensins are a class of small and diverse cysteine-rich proteins found in plants, insects, and vertebrates, which share a common tertiary structure and usually exert broad-spectrum antimicrobial activities. We used a bioinformatic approach to scan the Vitis vinifera genome and identified 79 defensin-like sequences (DEFL) corresponding to 46 genes and allelic variants, plus 33 pseudogenes and gene fragments. Expansion and diversification of grapevine DEFL has occurred after the split from the last common ancestor with the genera Medicago and Arabidopsis. Grapevine DEFL localization on the 'Pinot Noir' genome revealed the presence of several clusters likely evolved through local duplications. By sequencing reverse-transcription polymerase chain reaction products, we could demonstrate the expression of grapevine DEFL with no previously reported record of expression. Many of these genes are predominantly or exclusively expressed in tissues linked to plant reproduction, consistent with findings in other plant species, and some of them accumulated at fruit ripening. The transcripts of five DEFL were also significantly upregulated in tissues infected with Botrytis cinerea, a necrotrophic mold, suggesting a role of these genes in defense against this pathogen. Finally, three novel defensins were discovered among the identified DEFL. They inhibit B. cinerea conidia germination when expressed as recombinant proteins.
[Show abstract][Hide abstract] ABSTRACT: Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species. Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing ∼94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.
[Show abstract][Hide abstract] ABSTRACT: The availability of genomic resources can facilitate progress in plant breeding through the application of advanced molecular technologies for crop improvement. This is particularly important in the case of less researched crops such as cassava, a staple and food security crop for more than 800 million people. Here, expressed sequence tags (ESTs) were generated from five drought stressed and well-watered cassava varieties. Two cDNA libraries were developed: one from root tissue (CASR), the other from leaf, stem and stem meristem tissue (CASL). Sequencing generated 706 contigs and 3,430 singletons. These sequences were combined with those from two other EST sequencing initiatives and filtered based on the sequence quality. Quality sequences were aligned using CAP3 and embedded in a Windows browser called HarvEST:Cassava which is made available. HarvEST:Cassava consists of a Unigene set of 22,903 quality sequences. A total of 2,954 putative SNPs were identified. Of these 1,536 SNPs from 1,170 contigs and 53 cassava genotypes were selected for SNP validation using Illumina's GoldenGate assay. As a result 1,190 SNPs were validated technically and biologically. The location of validated SNPs on scaffolds of the cassava genome sequence (v.4.1) is provided. A diversity assessment of 53 cassava varieties reveals some sub-structure based on the geographical origin, greater diversity in the Americas as opposed to Africa, and similar levels of diversity in West Africa and southern, eastern and central Africa. The resources presented allow for improved genetic dissection of economically important traits and the application of modern genomics-based approaches to cassava breeding and conservation.
Theoretical and Applied Genetics 11/2011; 124(4):685-95. · 3.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, Arabidopsis thaliana, provides means to explore their genomic complexity.
A genome-wide physical map of a rapid-cycling strain of B. oleracea was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of B. oleracea and Arabidopsis thaliana, a relatively high level of genomic change since their divergence. Comparison of the B. oleracea physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity.
A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes.All the physical mapping data is freely shared at a WebFPC site (http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/; Temporarily password-protected: account: pgml; password: 123qwe123.
[Show abstract][Hide abstract] ABSTRACT: Chickpea (Cicer arietinum L.) is an important legume crop in the semi-arid regions of Asia and Africa. Gains in crop productivity have been low however, particularly because of biotic and abiotic stresses. To help enhance crop productivity using molecular breeding techniques, next generation sequencing technologies such as Roche/454 and Illumina/Solexa were used to determine the sequence of most gene transcripts and to identify drought-responsive genes and gene-based molecular markers. A total of 103,215 tentative unique sequences (TUSs) have been produced from 435,018 Roche/454 reads and 21,491 Sanger expressed sequence tags (ESTs). Putative functions were determined for 49,437 (47.8%) of the TUSs, and gene ontology assignments were determined for 20,634 (41.7%) of the TUSs. Comparison of the chickpea TUSs with the Medicago truncatula genome assembly (Mt 3.5.1 build) resulted in 42,141 aligned TUSs with putative gene structures (including 39,281 predicted intron/splice junctions). Alignment of ∼37 million Illumina/Solexa tags generated from drought-challenged root tissues of two chickpea genotypes against the TUSs identified 44,639 differentially expressed TUSs. The TUSs were also used to identify a diverse set of markers, including 728 simple sequence repeats (SSRs), 495 single nucleotide polymorphisms (SNPs), 387 conserved orthologous sequence (COS) markers, and 2088 intron-spanning region (ISR) markers. This resource will be useful for basic and applied research for genome analysis and crop improvement in chickpea.
[Show abstract][Hide abstract] ABSTRACT: Pigeonpea [Cajanus cajan (L.) Millsp.] is an important legume crop of rainfed agriculture. Despite of concerted research efforts directed to pigeonpea improvement, stagnated productivity of pigeonpea during last several decades may be accounted to prevalence of various biotic and abiotic constraints and the situation is exacerbated by availability of inadequate genomic resources to undertake any molecular breeding programme for accelerated crop improvement. With the objective of enhancing genomic resources for pigeonpea, this study reports for the first time, large scale development of SSR markers from BAC-end sequences and their subsequent use for genetic mapping and hybridity testing in pigeonpea.
A set of 88,860 BAC (bacterial artificial chromosome)-end sequences (BESs) were generated after constructing two BAC libraries by using HindIII (34,560 clones) and BamHI (34,560 clones) restriction enzymes. Clustering based on sequence identity of BESs yielded a set of >52K non-redundant sequences, comprising 35 Mbp or >4% of the pigeonpea genome. These sequences were analyzed to develop annotation lists and subdivide the BESs into genome fractions (e.g., genes, retroelements, transpons and non-annotated sequences). Parallel analysis of BESs for microsatellites or simple sequence repeats (SSRs) identified 18,149 SSRs, from which a set of 6,212 SSRs were selected for further analysis. A total of 3,072 novel SSR primer pairs were synthesized and tested for length polymorphism on a set of 22 parental genotypes of 13 mapping populations segregating for traits of interest. In total, we identified 842 polymorphic SSR markers that will have utility in pigeonpea improvement. Based on these markers, the first SSR-based genetic map comprising of 239 loci was developed for this previously uncharacterized genome. Utility of developed SSR markers was also demonstrated by identifying a set of 42 markers each for two hybrids (ICPH 2671 and ICPH 2438) for genetic purity assessment in commercial hybrid breeding programme.
In summary, while BAC libraries and BESs should be useful for genomics studies, BES-SSR markers, and the genetic map should be very useful for linking the genetic map with a future physical map as well as for molecular breeding in pigeonpea.