José María Carazo

Spanish National Research Council, Madrid, Madrid, Spain

Are you José María Carazo?

Claim your profile

Publications (36)119.07 Total impact

  • Article: Integrating human and murine anatomical gene expression data for improved comparisons.
    [show abstract] [hide abstract]
    ABSTRACT: Information concerning the gene expression pattern in four dimensions (species, genes, anatomy and developmental stage) is crucial for unraveling the roles of genes through time. There are a variety of anatomical gene expression databases, but extracting information from them can be hampered by their diversity and heterogeneity. aGEM 3.1 (anatomic Gene Expression Mapping) addresses the issues of diversity and heterogeneity of anatomical gene expression databases by integrating six mouse gene expression resources (EMAGE, GXD, GENSAT, Allen Brain Atlas data base, EUREXPRESS and BioGPS) and three human gene expression databases (HUDSEN, Human Protein Atlas and BioGPS). Furthermore, aGEM 3.1 provides new cross analysis tools to bridge these resources. aGEM 3.1 can be queried using gene and anatomical structure. Output information is presented in a friendly format, allowing the user to display expression maps and correlation matrices for a gene or structure during development. An in-depth study of a specific developmental stage is also possible using heatmaps that relate gene expression with anatomical components. http://agem.cnb.csic.es natalia@cnb.csic.es Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2011; 28(3):397-402. · 5.47 Impact Factor
  • Source
    Article: Electron microscopy studies on the quaternary structure of p53 reveal different binding modes for p53 tetramers in complex with DNA.
    [show abstract] [hide abstract]
    ABSTRACT: The multidomain homotetrameric tumor suppressor p53 has two modes of binding dsDNA that are thought to be responsible for scanning and recognizing specific response elements (REs). The C termini bind nonspecifically to dsDNA. The four DNA-binding domains (DBDs) bind REs that have two symmetric 10 base-pair sequences. p53 bound to a 20-bp RE has the DBDs enveloping the DNA, which is in the center of the molecule surrounded by linker sequences to the tetramerization domain (Tet). We investigated by electron microscopy structures of p53 bound to DNA sequences consisting of a 20-bp RE with either 12 or 20 bp nonspecific extensions on either end. We found a variety of structures that give clues to recognition and scanning mechanisms. The 44- and 60-bp sequences gave rise to three and four classes of structures, respectively. One was similar to the known 20-bp structure, but the DBDs in the other classes were loosely arranged and incompatible with specific DNA recognition. Some of the complexes had density consistent with the C termini extending from Tet to the DNA, adjacent to the DBDs. Single-molecule fluorescence resonance energy transfer experiments detected the approach of the C termini towards the DBDs on addition of DNA. The structural data are consistent with p53 sliding along DNA via its C termini and the DNA-binding domains hopping on and off during searches for REs. The loose structures and posttranslational modifications account for the affinity of nonspecific DNA for p53 and point to a mechanism of enhancement of specificity by its binding to effector proteins.
    Proceedings of the National Academy of Sciences 01/2011; 108(2):557-62. · 9.68 Impact Factor
  • Source
    Article: Moara: a Java library for extracting and normalizing gene and protein mentions.
    [show abstract] [hide abstract]
    ABSTRACT: Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available. This study proposes a versatile and trainable Java library that implements gene/protein tagger and normalization steps based on machine learning approaches. The system has been trained for several model organisms and corpora but can be expanded to support new organisms and documents. Moara is a flexible, trainable and open-source system that is not specifically orientated to any organism and therefore does not requires specific tuning in the algorithms or dictionaries utilized. Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system.
    BMC Bioinformatics 03/2010; 11:157. · 2.75 Impact Factor
  • Article: Moara: a Java library for extracting and normalizing gene and protein mentions
    Mariana Neves, José María Carazo, Alberto Pascual-Montano
    [show abstract] [hide abstract]
    ABSTRACT: Abstract Background Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available. Results This study proposes a versatile and trainable Java library that implements gene/protein tagger and normalization steps based on machine learning approaches. The system has been trained for several model organisms and corpora but can be expanded to support new organisms and documents. Conclusions Moara is a flexible, trainable and open-source system that is not specifically orientated to any organism and therefore does not requires specific tuning in the algorithms or dictionaries utilized. Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system.
    BMC Bioinformatics. 01/2010;
  • Article: tmRNA.SmpB complex mimics native aminoacyl-tRNAs in the A site of stalled ribosomes.
    [show abstract] [hide abstract]
    ABSTRACT: Bacterial ribosomes stalled on faulty, often truncated, mRNAs lacking stop codons are rescued by trans-translation. It relies on an RNA molecule (tmRNA) capable of replacing the faulty mRNA with its own open reading frame (ORF). Translation of tmRNA ORF results in the tagging of faulty protein for degradation and its release from the ribosome. We used single-particle cryo-electron microscopy to visualize tmRNA together with its helper protein SmpB on the 70S Escherichia coli ribosome in states subsequent to GTP hydrolysis on elongation factor Tu (EF-Tu). Three-dimensional reconstruction and heterogeneity analysis resulted in a 15A resolution structure of the tmRNA.SmpB complex accommodated in the A site of the ribosome, which shows that SmpB mimics the anticodon- and D-stem of native tRNAs missing in the tRNA-like domain of tmRNA. We conclude that the tmRNA.SmpB complex accommodates in the ribosomal A site very much like an aminoacyl-tRNA during protein elongation.
    Journal of Structural Biology 10/2009; 169(3):342-8. · 3.41 Impact Factor
  • Source
    Article: aGEM: an integrative system for analyzing spatial-temporal gene-expression information.
    [show abstract] [hide abstract]
    ABSTRACT: The work presented here describes the 'anatomical Gene-Expression Mapping (aGEM)' Platform, a development conceived to integrate phenotypic information with the spatial and temporal distributions of genes expressed in the mouse. The aGEM Platform has been built by extending the Distributed Annotation System (DAS) protocol, which was originally designed to share genome annotations over the WWW. DAS is a client-server system in which a single client integrates information from multiple distributed servers. The aGEM Platform provides information to answer three main questions. (i) Which genes are expressed in a given mouse anatomical component? (ii) In which mouse anatomical structures are a given gene or set of genes expressed? And (iii) is there any correlation among these findings? Currently, this Platform includes several well-known mouse resources (EMAGE, GXD and GENSAT), hosting gene-expression data mostly obtained from in situ techniques together with a broad set of image-derived annotations. The Platform is optimized for Firefox 3.0 and it is accessed through a friendly and intuitive display: http://agem.cnb.csic.es
    Bioinformatics 08/2009; 25(19):2566-72. · 5.47 Impact Factor
  • Source
    Article: Introducing robustness to maximum-likelihood refinement of electron-microscopy data.
    Sjors H W Scheres, José María Carazo
    [show abstract] [hide abstract]
    ABSTRACT: An expectation-maximization algorithm for maximum-likelihood refinement of electron-microscopy images is presented that is based on fitting mixtures of multivariate t-distributions. The novel algorithm has intrinsic characteristics for providing robustness against atypical observations in the data, which is illustrated using an experimental test set with artificially generated outliers. Tests on experimental data revealed only minor differences in two-dimensional classifications, while three-dimensional classification with the new algorithm gave stronger elongation factor G density in the corresponding class of a structurally heterogeneous ribosome data set than the conventional algorithm for Gaussian mixtures.
    Acta crystallographica. Section D, Biological crystallography 08/2009; 65(Pt 7):672-8. · 12.67 Impact Factor
  • Article: Maximum likelihood refinement of electron microscopy data with normalization errors.
    [show abstract] [hide abstract]
    ABSTRACT: Commonly employed data models for maximum likelihood refinement of electron microscopy images behave poorly in the presence of normalization errors. Small variations in background mean or signal brightness are relatively common in cryo-electron microscopy data, and varying signal-to-noise ratios or artifacts in the images interfere with standard normalization procedures. In this paper, a statistical data model that accounts for normalization errors is presented, and a corresponding algorithm for maximum likelihood classification of structurally heterogeneous projection data is derived. The extended data model has general relevance, since similar algorithms may be derived for other maximum likelihood approaches in the field. The potentials of this approach are illustrated for two structurally heterogeneous data sets: 70S E.coli ribosomes and human RNA polymerase II complexes. In both cases, maximum likelihood classification based on the conventional data model failed, whereas the new approach was capable of revealing previously unobserved conformations.
    Journal of Structural Biology 03/2009; 166(2):234-40. · 3.41 Impact Factor
  • Source
    Article: Crystal structure of a near-full-length archaeal MCM: functional insights for an AAA+ hexameric helicase.
    [show abstract] [hide abstract]
    ABSTRACT: The minichromosome maintenance protein (MCM) complex is an essential replicative helicase for DNA replication in Archaea and Eukaryotes. Whereas the eukaryotic complex consists of 6 homologous proteins (MCM2-7), the archaeon Sulfolobus solfataricus has only 1 MCM protein (ssoMCM), 6 subunits of which form a homohexamer. Here, we report a 4.35-A crystal structure of the near-full-length ssoMCM. The structure shows an elongated fold, with 5 subdomains that are organized into 2 large N- and C-terminal domains. A near-full-length ssoMCM hexamer generated based on the 6-fold symmetry of the N-terminal Methanothermobacter thermautotrophicus (mtMCM) hexamer shows intersubunit distances suitable for bonding contacts, including the interface around the ATP pocket. Four unusual beta-hairpins of each subunit are located inside the central channel or around the side channels in the hexamer. Additionally, the hexamer fits well into the double-hexamer EM map of mtMCM. Our mutational analysis of residues at the intersubunit interfaces and around the side channels demonstrates their critical roles for hexamerization and helicase function. These structural and biochemical results provide a basis for future study of the helicase mechanisms of the archaeal and eukaryotic MCM complexes in DNA replication.
    Proceedings of the National Academy of Sciences 01/2009; 105(51):20191-6. · 9.68 Impact Factor
  • Article: CentrosomeDB: a human centrosomal proteins database.
    [show abstract] [hide abstract]
    ABSTRACT: Active research on the biology of the centrosome during the past decades has allowed the identification and characterization of many centrosomal proteins. Unfortunately, the accumulated data is still dispersed among heterogeneous sources of information. Here we present centrosome:db, which intends to compile and integrate relevant information related to the human centrosome. We have compiled a set of 383 likely human centrosomal genes and recorded the associated supporting evidences. Centrosome:db offers several perspectives to study the human centrosome including evolution, function and structure. The database contains information on the orthology relationships with other species, including fungi, nematodes, arthropods, urochordates and vertebrates. Predictions of the domain organization of centrosome:db proteins are graphically represented at different sections of the database, including sets of alternative protein isoforms, interacting proteins, groups of orthologs and the homologs identified with blast. Centrosome:db also contains information related to function, gene-disease associations, SNPs and the 3D structure of proteins. Apart from important differences in the coverage of the set of centrosomal genes, our database differentiates from other similar initiatives in the way information is treated and analyzed. Centrosome:db is publicly available at http://centrosome.dacya.ucm.es.
    Nucleic Acids Research 11/2008; 37(Database issue):D175-80. · 8.03 Impact Factor
  • Article: Image processing for electron microscopy single-particle analysis using XMIPP.
    [show abstract] [hide abstract]
    ABSTRACT: We describe a collection of standardized image processing protocols for electron microscopy single-particle analysis using the XMIPP software package. These protocols allow performing the entire processing workflow starting from digitized micrographs up to the final refinement and evaluation of 3D models. A particular emphasis has been placed on the treatment of structurally heterogeneous data through maximum-likelihood refinements and self-organizing maps as well as the generation of initial 3D models for such data sets through random conical tilt reconstruction methods. All protocols presented have been implemented as stand-alone, executable python scripts, for which a dedicated graphical user interface has been developed. Thereby, they may provide novice users with a convenient tool to quickly obtain useful results with minimum efforts in learning about the details of this comprehensive package. Examples of applications are presented for a negative stain random conical tilt data set on the hexameric helicase G40P and for a structurally heterogeneous data set on 70S Escherichia coli ribosomes embedded in vitrified ice.
    Nature Protocol 02/2008; 3(6):977-90. · 8.36 Impact Factor
  • Article: Modeling experimental image formation for likelihood-based classification of electron microscopy data.
    [show abstract] [hide abstract]
    ABSTRACT: The coexistence of multiple distinct structural states often obstructs the application of three-dimensional cryo-electron microscopy to large macromolecular complexes. Maximum likelihood approaches are emerging as robust tools for solving the image classification problems that are posed by such samples. Here, we propose a statistical data model that allows for a description of the experimental image formation within the formulation of 2D and 3D maximum-likelihood refinement. The proposed approach comprises a formulation of the probability calculations in Fourier space, including a spatial frequency-dependent noise model and a description of defocus-dependent imaging effects. The Expectation-Maximization-like algorithms presented are generally applicable to the alignment and classification of structurally heterogeneous projection data. Their effectiveness is demonstrated with various examples, including 2D classification of top views of the archaeal helicase MCM and 3D classification of 70S E. coli ribosome and Simian Virus 40 large T-antigen projections.
    Structure 11/2007; 15(10):1167-77. · 6.35 Impact Factor
  • Article: Quaternary structures of tumor suppressor p53 and a specific p53 DNA complex.
    [show abstract] [hide abstract]
    ABSTRACT: The homotetrameric tumor suppressor p53 consists of folded core and tetramerization domains, linked and flanked by intrinsically disordered segments that impede structure analysis by x-ray crystallography and NMR. Here, we solved the quaternary structure of human p53 in solution by a combination of small-angle x-ray scattering, which defined its shape, and NMR, which identified the core domain interfaces and showed that the folded domains had the same structure in the intact protein as in fragments. We combined the solution data with electron microscopy on immobilized samples that provided medium resolution 3D maps. Ab initio and rigid body modeling of scattering data revealed an elongated cross-shaped structure with a pair of loosely coupled core domain dimers at the ends, which are accessible for binding to DNA and partner proteins. The core domains in that open conformation closed around a specific DNA response element to form a compact complex whose structure was independently determined by electron microscopy. The structure of the DNA complex is consistent with that of the complex of four separate core domains and response element fragments solved by x-ray crystallography and contacts identified by NMR. Electron microscopy on the conformationally mobile, unbound p53 selected a minor compact conformation, which resembled the closed conformation, from the ensemble of predominantly open conformations. A multipronged structural approach could be generally useful for the structural characterization of the rapidly growing number of multidomain proteins with intrinsically disordered regions.
    Proceedings of the National Academy of Sciences 08/2007; 104(30):12324-9. · 9.68 Impact Factor
  • Article: Loading a ring: structure of the Bacillus subtilis DnaB protein, a co-loader of the replicative helicase.
    [show abstract] [hide abstract]
    ABSTRACT: Loading of the ring-shaped replicative helicase is a critical step in the initiation of DNA replication. Bacillus subtilis has adopted a two-protein strategy to load its hexameric replicative helicase: DnaB and DnaI interact with the helicase and mediate its delivery onto DNA. We present here the 3D electron microscopy structure of the DnaB protein, along with a detailed analysis of both its oligomeric state and its domain organization. DnaB is organized as an asymmetric tetramer that is comprised of two stacked components, one arranged as a closed collar and the other as an open sigma shape. Intriguingly, the 3D map of DnaB exhibits an overall architecture similar to the structure of the Escherichia coli gamma-complex, the loader of the ring-shaped processivity factor. We propose a model whereby each DnaB monomer participates in both stacked components of the tetramer and displays a different overall shape. This asymmetric quaternary organization could be a general feature of ring loaders.
    Journal of Molecular Biology 04/2007; 367(3):764-9. · 4.00 Impact Factor
  • Source
    Article: Flexible fitting in 3D-EM guided by the structural variability of protein superfamilies.
    [show abstract] [hide abstract]
    ABSTRACT: A method for flexible fitting of molecular models into three-dimensional electron microscopy (3D-EM) reconstructions at a resolution range of 8-12 A is proposed. The approach uses the evolutionarily related structural variability existing among the protein domains of a given superfamily, according to structural databases such as CATH. A structural alignment of domains belonging to the superfamily, followed by a principal components analysis, is performed, and the first three principal components of the decomposition are explored. Using rigid body transformations for the secondary structure elements (SSEs) plus the cyclic coordinate descent algorithm to close the loops, stereochemically correct models are built for the structure to fit. All of the models are fitted into the 3D-EM map, and the best one is selected based on crosscorrelation measures. This work applies the method to both simulated and experimental data and shows that the flexible fitting was able to produce better results than rigid body fitting.
    Structure 08/2006; 14(7):1115-26. · 6.35 Impact Factor
  • Source
    Article: Structural basis for the cooperative assembly of large T antigen on the origin of replication.
    [show abstract] [hide abstract]
    ABSTRACT: Large T antigen (LTag) from simian virus 40 (SV40) is an ATP-driven DNA helicase that specifically recognizes the core of the viral origin of replication (ori), where it oligomerizes as a double hexamer. During this process, binding of the first hexamer stimulates the assembly of a second one. Using electron microscopy, we show that the N-terminal part of LTag that includes the origin-binding domain does not present a stable quaternary structure in single hexamers. This disordered region, however, is well arranged within the LTag double hexamer after specific ori recognition, where it mediates the interactions between hexamers and constructs a separated structural module at their junction. We conclude that full assembly of LTag hexamers occurs only within the dodecamer, and requires the specific hexamer-hexamer interactions established upon binding to the origin of replication. This mechanism provides the structural basis for the cooperative assembly of LTag double hexamer on the cognate viral ori.
    Journal of Molecular Biology 05/2006; 357(4):1295-305. · 4.00 Impact Factor
  • Article: Quaternary polymorphism of replicative helicase G40P: structural mapping and domain rearrangement.
    [show abstract] [hide abstract]
    ABSTRACT: Quaternary polymorphism is a distinctive structural feature of the DnaB family of replicative DNA hexameric helicases. The Bacillus subtilis bacteriophage SPP1 gene 40 product (G40P) belongs to this family. Three different quaternary states have been described for G40P homohexamers, two of them with C(3) symmetry, and the other with C(6) symmetry. We present three-dimensional reconstructions of the different architectures of G40P hexamers and a variant lacking the N-terminal domain. Comparison of the G40P and the deletion mutant structures sheds new light on the functional roles of the N and C-terminal domains, at the same time that it allows the direct structural mapping of these domains. Based on this new information, hybrid EM/X-ray models are presented for all the different symmetries. These results suggest that quaternary polymorphism of hexameric helicases may be implicated in the translocation along the DNA.
    Journal of Molecular Biology 05/2006; 357(4):1063-76. · 4.00 Impact Factor
  • Article: Optimization problems in electron microscopy of single particles.
    [show abstract] [hide abstract]
    ABSTRACT: Electron Microscopy is a valuable tool for the elucidation of the three-dimensional structure of macromolecular complexes. Knowledge about the macromolecular structure provides important information about its function and how it is carried out. This work addresses the issue of three-dimensional reconstruction of biological macromolecules from electron microscopy images. In particular, it focuses on a methodology known as “single-particles” and makes a thorough review of all those steps that can be expressed as an optimization problem. In spite of important advances in recent years, there are still unresolved challenges in the field that offer an excellent testbed for new and more powerful optimization techniques.
    Annals of Operations Research 01/2006; 148:133-165. · 0.84 Impact Factor
  • Source
    Article: Fast maximum-likelihood refinement of electron microscopy images.
    Sjors H W Scheres, Mikel Valle, José-María Carazo
    [show abstract] [hide abstract]
    ABSTRACT: Maximum-likelihood (ML) image refinement is a promising candidate to improve attainable resolution limits in 3D-EM. However, its large CPU requirements may prohibit application to 3D-structure optimization. We speeded up ML image refinement by reducing its search space over the alignment parameters. Application of this reduced-search approach to a cryo-EM dataset yielded practically identical results as the original approach, but in approximately one day instead of one week of CPU. This work has been implemented in the public domain package Xmipp. Documentation and download instructions may be found at: http://www.cnb.uam.es/~bioinfo
    Bioinformatics 10/2005; 21 Suppl 2:ii243-4. · 5.47 Impact Factor
  • Article: SPI-EM: towards a tool for predicting CATH superfamilies in 3D-EM maps.
    [show abstract] [hide abstract]
    ABSTRACT: In this paper the theoretical framework used to build a superfamily probability in electron microscopy (SPI-EM) is presented. SPI-EM is a new tool for determining the homologous superfamily to which a protein domain belongs looking at its three-dimensional electron microscopy map. The homologous superfamily is assigned according to the domain-architecture database CATH. Our method follows a probabilistic approach applied to the results of fitting protein domains into maps of proteins and the computation of local cross-correlation coefficient measures. The method has been tested and its usefulness proven with isolated domains at a resolution of 8 A and 12 A. Results obtained with simulated and experimental data at 10 A suggest that it is also feasible to detect the correct superfamily of the domains when dealing with electron microscopy maps containing multi-domain proteins. The inherent difficulties and limitations that multi-domain proteins impose are discussed. Our procedure is complementary to other techniques existing in the field to detect structural elements in electron microscopy maps like alpha-helices and beta-sheets. Based on the proposed methodology, a database of relevant distributions is being built to serve the community.
    Journal of Molecular Biology 02/2005; 345(4):759-71. · 4.00 Impact Factor

Institutions

  • 2011
    • Spanish National Research Council
      Madrid, Madrid, Spain
  • 2005–2010
    • Centro Nacional de Biotecnología (CNB)
      Madrid, Madrid, Spain
    • Universidad Autónoma de Madrid
      Madrid, Madrid, Spain
  • 2008
    • Complutense University of Madrid
      Madrid, Madrid, Spain
  • 2003
    • Universidad de Málaga
      • Departamento de Arquitectura de Computadores
      Málaga, Andalusia, Spain