Frances M G Pearl |
|
|
|
Institute of Cancer Research
·
Division of Cancer Therapeutics
|
Publications (24) View all
-
Article: The CATH extended protein‐family database: Providing structural annotations for genome sequences
Frances M.G. Pearl, David Lee, James E. Bray, Daniel W.A. Buchan, Adrian J. Shepherd, Christine A. Orengo[show abstract] [hide abstract]
ABSTRACT: An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.Protein Science 04/2009; 11(2):233 - 244. · 2.80 Impact Factor -
SourceAvailable from: PubMed Central
Article: MoKCa database--mutations of kinases in cancer.
Christopher J Richardson, Qiong Gao, Costas Mitsopoulous, Marketa Zvelebil, Laurence H Pearl, Frances M G Pearl[show abstract] [hide abstract]
ABSTRACT: Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data.Nucleic Acids Research 12/2008; 37(Database issue):D824-31. · 8.03 Impact Factor -
SourceAvailable from: PubMed Central
Article: CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures.
[show abstract] [hide abstract]
ABSTRACT: We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure-based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.PLoS Computational Biology 11/2007; 3(11):e232. · 5.22 Impact Factor -
Chapter: The Cath Domain Structure Database
Christine A. Orengo, Frances M. G. Pearl, Janet M. Thornton01/2005: pages 249 - 271; , ISBN: 9780471721208 -
SourceAvailable from: PubMed Central
Article: The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.
Alison Cuff, Oliver C Redfern, Lesley Greene, Ian Sillitoe, Tony Lewis, Mark Dibley, Adam Reid, Frances Pearl, Tim Dallman, Annabel Todd, Richard Garratt, Janet Thornton, Christine Orengo[show abstract] [hide abstract]
ABSTRACT: This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., alphabeta-motifs, alpha-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.Structure 09/2009; 17(8):1051-62. · 6.35 Impact Factor