-
P J Kersey,
D Lawson, E Birney,
P S Derwent,
M Haimel,
J Herrero,
S Keenan,
A Kerhornou,
G Koscielny,
A Kähäri,
R J Kinsella,
E Kulesha,
U Maheswari,
K Megy,
M Nuhn,
G Proctor,
D Staines,
F Valentin,
A J Vilella,
A Yates
[show abstract]
[hide abstract]
ABSTRACT: Ensembl Genomes (http://www.ensemblgenomes.org) is a new portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualisation platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Many of the databases supporting the portal have been built in close collaboration with the scientific community, which we consider as essential for maintaining the accuracy and usefulness of the resource. A common set of user interfaces (which include a graphical genome browser, FTP, BLAST search, a query optimised data warehouse, programmatic access, and a Perl API) is provided for all domains. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualised in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site.
Nucleic Acids Research 11/2009; 38(Database issue):D563-9. · 8.03 Impact Factor
-
T J P Hubbard,
B L Aken,
S Ayling,
B Ballester,
K Beal,
E Bragin,
S Brent,
Y Chen,
P Clapham,
L Clarke, [......],
F Cunningham,
V Curwen,
R Durbin,
X M Fernandez-Suarez,
J Herrero,
A Kasprzyk,
G Proctor,
J Smith,
S Searle,
P Flicek
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases, and other information for chordate, selected model organism and disease vector genomes. As of release 51 (November 2008), Ensembl fully supports 45 species, and three additional species have preliminary support. New species in the past year include orangutan and six additional low coverage mammalian genomes. Major additions and improvements to Ensembl since our previous report include a major redesign of our website; generation of multiple genome alignments and ancestral sequences using the new Enredo-Pecan-Ortheus pipeline and development of our software infrastructure, particularly to support the Ensembl Genomes project (http://www.ensemblgenomes.org/).
Nucleic Acids Research 12/2008; 37(Database issue):D690-7. · 8.03 Impact Factor
-
P Flicek,
B L Aken,
K Beal,
B Ballester,
M Caccamo,
Y Chen,
L Clarke,
G Coates,
F Cunningham,
T Cutts, [......],
V Curwen,
R Durbin,
X M Fernandez-Suarez,
J Herrero,
T J P Hubbard,
A Kasprzyk,
G Proctor,
J Smith,
A Ureta-Vidal,
S Searle
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein-DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.
Nucleic Acids Research 02/2008; 36(Database issue):D707-14. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: High-throughput genome sequencing techniques have now reached vector biology with an emphasis on those species that are vectors of human pathogens. The first mosquito to be sequenced was Anopheles gambiae, the vector for Plasmodium parasites that cause malaria. Further mosquitoes have followed: Aedes aegypti (yellow fever and dengue fever vector) and Culex pipiens (lymphatic filariasis and West Nile fever). Species that are currently in sequencing include the body louse Pediculus humanus (Typhus vector), the triatomine Rhodnius prolixus (Chagas disease vector) and the tick Ixodes scapularis (Lyme disease vector). The motivations for sequencing vector genomes are to further understand vector biology, with an eye on developing new control strategies (for example novel chemical attractants or repellents) or understanding the limitations of current strategies (for example the mechanism of insecticide resistance); to analyse the mechanisms driving their evolution; and to perform an exhaustive analysis of the gene repertory. The proliferation of genomic data creates the need for efficient and accessible storage. We present VectorBase, a genomic resource centre that is both involved in the annotation of vector genomes and act as a portal for access to the genomic information (http://www.vectorbase.org).
Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases 02/2008; 9(3):308-13. · 3.22 Impact Factor
-
T J P Hubbard,
B L Aken,
K Beal,
B Ballester,
M Caccamo,
Y Chen,
L Clarke,
G Coates,
F Cunningham,
T Cutts, [......],
V Curwen,
R Durbin,
X M Fernandez-Suarez,
P Flicek,
A Kasprzyk,
G Proctor,
S Searle,
J Smith,
A Ureta-Vidal, E Birney
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.
Nucleic Acids Research 02/2007; 35(Database issue):D610-7. · 8.03 Impact Factor
-
E Birney,
D Andrews,
M Caccamo,
Y Chen,
L Clarke,
G Coates,
T Cox,
F Cunningham,
V Curwen,
T Cutts, [......],
D Smedley,
J Smith,
A Stabenau,
J Stalker,
S Trevanion,
A Ureta-Vidal,
J Vogel,
S White,
C Woodwark,
T J P Hubbard
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data.
Nucleic Acids Research 02/2006; 34(Database issue):D556-61. · 8.03 Impact Factor
-
T Hubbard,
D Andrews,
M Caccamo,
G Cameron,
Y Chen,
M Clamp,
L Clarke,
G Coates,
T Cox,
F Cunningham, [......],
W Spooner,
A Stabenau,
J Stalker,
R Storey,
S Trevanion,
A Ureta-Vidal,
J Vogel,
S White,
C Woodwark, E Birney
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution.
Nucleic Acids Research 02/2005; 33(Database issue):D447-53. · 8.03 Impact Factor
-
E Birney,
D Andrews,
P Bevan,
M Caccamo,
G Cameron,
Y Chen,
L Clarke,
G Coates,
T Cox,
J Cuff, [......],
D Smedley,
J Smith,
W Spooner,
A Stabenau,
J Stalker,
R Storey,
A Ureta-Vidal,
C Woodwark,
M Clamp,
T Hubbard
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
Nucleic Acids Research 02/2004; 32(Database issue):D468-70. · 8.03 Impact Factor
-
E Birney
Cold Spring Harbor Symposia on Quantitative Biology 02/2003; 68:213-5.
-
M Clamp,
D Andrews,
D Barker,
P Bevan,
G Cameron,
Y Chen,
L Clark,
T Cox,
J Cuff,
V Curwen, [......],
S Searle,
G Slater,
J Smith,
W Spooner,
A Stabenau,
J Stalker,
E Stupka,
A Ureta-Vidal,
I Vastrik, E Birney
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Nucleic Acids Research 02/2003; 31(1):38-42. · 8.03 Impact Factor
-
G Joshi-Tope,
I Vastrik,
G R Gopinath,
L Matthews,
E Schmidt,
M Gillespie,
P D'Eustachio,
B Jassal,
S Lewis,
G Wu, E Birney,
L Stein
Cold Spring Harbor Symposia on Quantitative Biology 02/2003; 68:237-43.
-
Y Okazaki,
M Furuno,
T Kasukawa,
J Adachi,
H Bono,
S Kondo,
I Nikaido,
N Osato,
R Saito,
H Suzuki, [......],
D Sasaki,
K Shibata,
A Shinagawa,
A Yasunishi,
M Yoshino,
R Waterston,
E S Lander,
J Rogers, E Birney,
Y Hayashizaki
[show abstract]
[hide abstract]
ABSTRACT: Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Nature 01/2003; 420(6915):563-73. · 36.28 Impact Factor
-
T Hubbard,
D Barker, E Birney,
G Cameron,
Y Chen,
L Clark,
T Cox,
J Cuff,
V Curwen,
T Down, [......],
S Searle,
G Slater,
J Smith,
W Spooner,
A Stabenau,
J Stalker,
E Stupka,
A Ureta-Vidal,
I Vastrik,
M Clamp
[show abstract]
[hide abstract]
ABSTRACT: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Nucleic Acids Research 02/2002; 30(1):38-41. · 8.03 Impact Factor
-
S E Lewis,
S M J Searle,
N Harris,
M Gibson,
V Lyer,
J Richter,
C Wiel,
L Bayraktaroglir, E Birney,
M A Crosby,
J S Kaminker,
B B Matthews,
S E Prochnik,
C D Smithy,
J L Tupy,
G M Rubin,
S Misra,
C J Mungall,
M E Clamp
[show abstract]
[hide abstract]
ABSTRACT: The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.
Genome biology 02/2002; 3(12):RESEARCH0082. · 6.63 Impact Factor
-
genesis 01/2002; 31(4):137-41. · 2.53 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Identification of the genes that cause oncogenesis is a central aim of cancer research. We searched the proteins predicted from the draft human genome sequence for paralogues of known tumour suppressor genes, but no novel genes were identified. We then assessed whether it was possible to search directly for oncogenic sequence changes in cancer cells by comparing cancer genome sequences against the draft genome. Apparently chimaeric transcripts (from oncogenic fusion genes generated by chromosomal translocations, the ends of which mapped to different genomic locations) were detected to the same degree in both normal and neoplastic tissues, indicating a significant level of false positives. Our experiment underscores the limited amount and variable quality of DNA sequence from cancer cells that is currently available.
Nature 03/2001; 409(6822):850-2. · 36.28 Impact Factor
-
E S Lander,
L M Linton,
B Birren,
C Nusbaum,
M C Zody,
J Baldwin,
K Devon,
K Dewar,
M Doyle,
W FitzHugh, [......],
K A Wetterstrand,
A Patrinos,
M J Morgan,
P de Jong,
J J Catanese,
K Osoegawa,
H Shizuya,
S Choi,
Y J Chen,
J Szustakowki
[show abstract]
[hide abstract]
ABSTRACT: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Nature 03/2001; 409(6822):860-921. · 36.28 Impact Factor
-
V G Cheung,
N Nowak,
W Jang,
I R Kirsch,
S Zhao,
X N Chen,
T S Furey,
U J Kim,
W L Kuo,
M Olivier, [......],
N A Doggett,
N P Carter,
E E Eichler,
D Haussler,
J R Korenberg,
C C Morton,
D Albertson,
G Schuler,
P J de Jong,
B J Trask
[show abstract]
[hide abstract]
ABSTRACT: We have placed 7,600 cytogenetically defined landmarks on the draft sequence of the human genome to help with the characterization of genes altered by gross chromosomal aberrations that cause human disease. The landmarks are large-insert clones mapped to chromosome bands by fluorescence in situ hybridization. Each clone contains a sequence tag that is positioned on the genomic sequence. This genome-wide set of sequence-anchored clones allows structural and functional analyses of the genome. This resource represents the first comprehensive integration of cytogenetic, radiation hybrid, linkage and sequence maps of the human genome; provides an independent validation of the sequence map and framework for contig order and orientation; surveys the genome for large-scale duplications, which are likely to require special attention during sequence assembly; and allows a stringent assessment of sequence differences between the dark and light bands of chromosomes. It also provides insight into large-scale chromatin structure and the evolution of chromosomes and gene families and will accelerate our understanding of the molecular bases of human disease and cancer.
Nature 03/2001; 409(6822):953-8. · 36.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Now that the draft human genome sequence is available, everyone wants to be able to use it. However, we have perhaps become complacent about our ability to turn new genomes into lists of genes. The higher volume of data associated with a larger genome is accompanied by a much greater increase in complexity. We need to appreciate both the scale of the challenge of vertebrate genome analysis and the limitations of current gene prediction methods and understanding.
Nature 03/2001; 409(6822):827-8. · 36.28 Impact Factor
-
R Apweiler,
T K Attwood,
A Bairoch,
A Bateman, E Birney,
M Biswas,
P Bucher,
L Cerutti,
F Corpet,
M D Croning, [......],
A Kanapin,
Y Karavidopoulou,
R Lopez,
B Marx,
N J Mulder,
T M Oinn,
M Pagni,
F Servant,
C J Sigrist,
E M Zdobnov
[show abstract]
[hide abstract]
ABSTRACT: Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Nucleic Acids Research 02/2001; 29(1):37-40. · 8.03 Impact Factor