-
[show abstract]
[hide abstract]
ABSTRACT: BACKGROUND: Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines. RESULTS: We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding. CONCLUSIONS: Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Genome biology 09/2012; 13(9):R49. · 6.63 Impact Factor
-
Karine Megy,
Scott J Emrich,
Daniel Lawson,
David Campbell,
Emmanuel Dialynas,
Daniel S T Hughes, Gautier Koscielny,
Christos Louis,
Robert M Maccallum,
Seth N Redmond,
Andrew Sheehan,
Pantelis Topalis,
Derek Wilson
[show abstract]
[hide abstract]
ABSTRACT: VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.
Nucleic Acids Research 12/2011; 40(Database issue):D729-34. · 8.03 Impact Factor
-
Paul Flicek,
M Ridwan Amode,
Daniel Barrell,
Kathryn Beal,
Simon Brent,
Denise Carvalho-Silva,
Peter Clapham,
Guy Coates,
Susan Fairley,
Stephen Fitzgerald, [......],
Jennifer Harrow,
Javier Herrero,
Tim J P Hubbard,
Anne Parker,
Glenn Proctor,
Giulietta Spudich,
Jan Vogel,
Andy Yates,
Amonida Zadissa,
Stephen M J Searle
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
Nucleic Acids Research 11/2011; 40(Database issue):D84-90. · 8.03 Impact Factor
-
Paul J Kersey,
Daniel M Staines,
Daniel Lawson,
Eugene Kulesha,
Paul Derwent,
Jay C Humphrey,
Daniel S T Hughes,
Stephan Keenan,
Arnaud Kerhornou, Gautier Koscielny, [......],
Mark D McDowall,
Karine Megy,
Uma Maheswari,
Michael Nuhn,
Michael Paulini,
Helder Pedro,
Iliana Toneva,
Derek Wilson,
Andrew Yates,
Ewan Birney
[show abstract]
[hide abstract]
ABSTRACT: Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
Nucleic Acids Research 11/2011; 40(Database issue):D91-7. · 8.03 Impact Factor
-
Paul Flicek,
Bronwen L Aken,
Benoit Ballester,
Kathryn Beal,
Eugene Bragin,
Simon Brent,
Yuan Chen,
Peter Clapham,
Guy Coates,
Susan Fairley, [......],
Fiona Cunningham,
Ian Dunham,
Richard Durbin,
Xosé M Fernández-Suarez,
Javier Herrero,
Tim J P Hubbard,
Anne Parker,
Glenn Proctor,
James Smith,
Stephen M J Searle
[show abstract]
[hide abstract]
ABSTRACT: Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
Nucleic Acids Research 11/2009; 38(Database issue):D557-62. · 8.03 Impact Factor
-
Gautier Koscielny,
Vincent Le Texier,
Chellappa Gopalakrishnan,
Vasudev Kumanduri,
Jean-Jack Riethoven,
Francesco Nardone,
Eleanor Stanley,
Christine Fallsehr,
Oliver Hofmann,
Meelis Kull, [......],
Alexander Herrmann,
Jens G Reich,
Roderic Guigó,
Peer Bork,
Magnus von Knebel Doeberitz,
Jaak Vilo,
Winston Hide,
Rolf Apweiler,
Thangavel Alphonse Thanaraj,
Daniel Gautheret
[show abstract]
[hide abstract]
ABSTRACT: The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.
Genomics 01/2009; 93(3):213-20. · 3.02 Impact Factor