Hongzhan Huang

University of Delaware, Newark, DE, USA

Are you Hongzhan Huang?

Claim your profile

Publications (43)121.41 Total impact

  • Article: Structural and functional studies of S-adenosyl-L-methionine binding proteins: a ligand-centric approach.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: The post-genomic era poses several challenges. The biggest is the identification of biochemical function for protein sequences and structures resulting from genomic initiatives. Most sequences lack a characterized function and are annotated as hypothetical or uncharacterized. While homology-based methods are useful, and work well for sequences with sequence identities above 50%, they fail for sequences in the twilight zone (<30%) of sequence identity. For cases where sequence methods fail, structural approaches are often used, based on the premise that structure preserves function for longer evolutionary time-frames than sequence alone. It is now clear that no single method can be used successfully for functional inference. Given the growing need for functional assignments, we describe here a systematic new approach, designated ligand-centric, which is primarily based on analysis of ligand-bound/unbound structures in the PDB. Results of applying our approach to S-adenosyl-L-methionine (SAM) binding proteins are presented. RESULTS: Our analysis included 1,224 structures that belong to 172 unique families of the Protein Information Resource Superfamily system. Our ligand-centric approach was divided into four levels: residue, protein/domain, ligand, and family levels. The residue level included the identification of conserved binding site residues based on structure-guided sequence alignments of representative members of a family, and the identification of conserved structural motifs. The protein/domain level included structural classification of proteins, Pfam domains, domain architectures, and protein topologies. The ligand level included ligand conformations, ribose sugar puckering, and the identification of conserved ligand-atom interactions. The family level included phylogenetic analysis. CONCLUSION: We found that SAM bound to a total of 18 different fold types (I-XVIII). We identified 4 new fold types and 11 additional topological arrangements of strands within the well-studied Rossmann fold Methyltransferases (MTases). This extends the existing structural classification of SAM binding proteins. A striking correlation between fold type and the conformation of the bound SAM (classified as types) was found across the 18 fold types. Several site-specific rules were created for the assignment of functional residues to families and proteins that do not have a bound SAM or a solved structure.
    BMC Structural Biology 04/2013; 13(1):6. · 2.48 Impact Factor
  • Article: Use of the protein ontology for multi-faceted analysis of biological processes: a case study of the spindle checkpoint.
    [show abstract] [hide abstract]
    ABSTRACT: As a member of the Open Biomedical Ontologies (OBO) foundry, the Protein Ontology (PRO) provides an ontological representation of protein forms and complexes and their relationships. Annotations in PRO can be assigned to individual protein forms and complexes, each distinguishable down to the level of post-translational modification, thereby allowing for a more precise depiction of protein function than is possible with annotations to the gene as a whole. Moreover, PRO is fully interoperable with other OBO ontologies and integrates knowledge from other protein-centric resources such as UniProt and Reactome. Here we demonstrate the value of the PRO framework in the investigation of the spindle checkpoint, a highly conserved biological process that relies extensively on protein modification and protein complex formation. The spindle checkpoint maintains genomic integrity by monitoring the attachment of chromosomes to spindle microtubules and delaying cell cycle progression until the spindle is fully assembled. Using PRO in conjunction with other bioinformatics tools, we explored the cross-species conservation of spindle checkpoint proteins, including phosphorylated forms and complexes; studied the impact of phosphorylation on spindle checkpoint function; and examined the interactions of spindle checkpoint proteins with the kinetochore, the site of checkpoint activation. Our approach can be generalized to any biological process of interest.
    Frontiers in genetics. 01/2013; 4:62.
  • Source
    Article: Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.
    [show abstract] [hide abstract]
    ABSTRACT: Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
    Database The Journal of Biological Databases and Curation 01/2012; 2012:bar064. · 2.07 Impact Factor
  • Article: A comprehensive protein-centric ID mapping service for molecular data integration.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY: www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT: huang@dbi.udel.edu.
    Bioinformatics 04/2011; 27(8):1190-1. · 5.47 Impact Factor
  • Conference Proceeding: An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents.
    IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011, Atlanta, GA, USA, 12-15 November, 2011; 01/2011
  • Article: Structure-guided rule-based annotation of protein functional sites in UniProt knowledgebase.
    [show abstract] [hide abstract]
    ABSTRACT: The rapid growth of protein sequence databases has necessitated the development of methods to computationally derive annotation for uncharacterized entries. Most such methods focus on "global" annotation, such as molecular function or biological process. Methods to supply high-accuracy "local" annotation to functional sites based on structural information at the level of individual amino acids are relatively rare. In this chapter we will describe a method we have developed for annotation of functional residues within experimentally-uncharacterized proteins that relies on position-specific site annotation rules (PIR Site Rules) derived from structural and experimental information. These PIR Site Rules are manually defined to allow for conditional propagation of annotation. Each rule specifies a tripartite set of conditions whereby candidates for annotation must pass a whole-protein classification test (that is, have end-to-end match to a whole-protein-based HMM), match a site-specific profile HMM and, finally, match functionally and structurally characterized residues of a template. Positive matches trigger the appropriate annotation for active site residues, binding site residues, modified residues, or other functionally important amino acids. The strict criteria used in this process have rendered high-confidence annotation suitable for UniProtKB/Swiss-Prot features.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 694:91-105.
  • Article: Protein-centric data integration for functional analysis of comparative proteomics data.
    [show abstract] [hide abstract]
    ABSTRACT: High-throughput proteomic, microarray, protein interaction and other experimental methods all generate long lists of proteins and/or genes that have been identified or have varied in accumulation under the experimental conditions studied. These lists can be difficult to sort through for Biologists to make sense of. Here we describe a next step in data analysis--a bottom-up approach at data integration--starting with protein sequence identifications, mapping them to a common representation of the protein and then bringing in a wide variety of structural, functional, genetic, and disease information related to proteins derived from annotated knowledge bases and then using this information to categorize the lists using Gene Ontology (GO) terms and mappings to biological pathway databases. We illustrate with examples how this can aid in identifying important processes from large complex lists.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 694:323-39.
  • Source
    Article: Proteomic analysis of pathways involved in estrogen-induced growth and apoptosis of breast cancer cells.
    [show abstract] [hide abstract]
    ABSTRACT: Estrogen is a known growth promoter for estrogen receptor (ER)-positive breast cancer cells. Paradoxically, in breast cancer cells that have been chronically deprived of estrogen stimulation, re-introduction of the hormone can induce apoptosis. Here, we sought to identify signaling networks that are triggered by estradiol (E2) in isogenic MCF-7 breast cancer cells that undergo apoptosis (MCF-7:5C) versus cells that proliferate upon exposure to E2 (MCF-7). The nuclear receptor co-activator AIB1 (Amplified in Breast Cancer-1) is known to be rate-limiting for E2-induced cell survival responses in MCF-7 cells and was found here to also be required for the induction of apoptosis by E2 in the MCF-7:5C cells. Proteins that interact with AIB1 as well as complexes that contain tyrosine phosphorylated proteins were isolated by immunoprecipitation and identified by mass spectrometry (MS) at baseline and after a brief exposure to E2 for two hours. Bioinformatic network analyses of the identified protein interactions were then used to analyze E2 signaling pathways that trigger apoptosis versus survival. Comparison of MS data with a computationally-predicted AIB1 interaction network showed that 26 proteins identified in this study are within this network, and are involved in signal transduction, transcription, cell cycle regulation and protein degradation. G-protein-coupled receptors, PI3 kinase, Wnt and Notch signaling pathways were most strongly associated with E2-induced proliferation or apoptosis and are integrated here into a global AIB1 signaling network that controls qualitatively distinct responses to estrogen.
    PLoS ONE 01/2011; 6(6):e20410. · 4.09 Impact Factor
  • Source
    Article: Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.
    [show abstract] [hide abstract]
    ABSTRACT: The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.
    PLoS ONE 01/2011; 6(4):e18910. · 4.09 Impact Factor
  • Article: Protein bioinformatics databases and resources.
    Chuming Chen, Hongzhan Huang, Cathy H Wu
    [show abstract] [hide abstract]
    ABSTRACT: In the past decades, a variety of publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. However, there is also an increasing confusion for the researchers who are trying to quickly find the appropriate resources to help them solve their problems. In this chapter, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases and resources that are relevant to comparative proteomics research. We conclude the chapter by discussing the challenges and opportunities for developing new protein bioinformatics databases.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 694:3-24.
  • Article: Omics-based molecular target and biomarker identification.
    [show abstract] [hide abstract]
    ABSTRACT: Genomic, proteomic, and other omic-based approaches are now broadly used in biomedical research to facilitate the understanding of disease mechanisms and identification of molecular targets and biomarkers for therapeutic and diagnostic development. While the Omics technologies and bioinformatics tools for analyzing Omics data are rapidly advancing, the functional analysis and interpretation of the data remain challenging due to the inherent nature of the generally long workflows of Omics experiments. We adopt a strategy that emphasizes the use of curated knowledge resources coupled with expert-guided examination and interpretation of Omics data for the selection of potential molecular targets. We describe a downstream workflow and procedures for functional analysis that focus on biological pathways, from which molecular targets can be derived and proposed for experimental validation.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 719:547-71.
  • Source
    Article: The Protein Ontology: a structured representation of protein forms and complexes.
    [show abstract] [hide abstract]
    ABSTRACT: The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.
    Nucleic Acids Research 10/2010; 39(Database issue):D539-45. · 8.03 Impact Factor
  • Source
    Article: Molecular mechanisms mediating the effect of mono-(2-ethylhexyl) phthalate on hormone-stimulated steroidogenesis in MA-10 mouse tumor Leydig cells.
    [show abstract] [hide abstract]
    ABSTRACT: Di-(2-ethylhexyl) phthalate, a widely used plasticizer, and its active metabolite, mono-(2-ethylhexyl) phthalate (MEHP), have been shown to exert adverse effects on the reproductive tract in developing and adult animals. As yet, however, the molecular mechanisms by which they act are uncertain. In the present study, we address the molecular and cellular mechanisms underlying the effects of MEHP on basal and human chorionic gonadotropin (hCG)-stimulated steroid production by MA-10 Leydig cells, using a systems biology approach. MEHP induced dose-dependent decreases in hCG-stimulated steroid formation. Changes in mRNA and protein expression in cells treated with increasing concentrations of MEHP in the presence or absence of hCG were measured by gene microarray and protein high-throughput immunoblotting analyses, respectively. Expression profiling indicated that low concentrations of MEHP induced the expression of a number of genes that also were expressed after hCG stimulation. Cross-comparisons between the hCG and MEHP treatments revealed two genes, Anxa1 and AR1. We suggest that these genes may be involved in a new self-regulatory mechanism of steroidogenesis. The MEHP-induced decreases in hCG-stimulated steroid formation were paralleled by increases in reactive oxygen species generation, with the latter mediated by the Cyp1a1 gene and its network. A model for the mechanism of MEHP action on MA-10 Leydig cell steroidogenesis is proposed.
    Endocrinology 07/2010; 151(7):3348-62. · 4.46 Impact Factor
  • Article: Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput "omics" Data.
    Adv. Bioinformatics. 01/2010; 2010.
  • Source
    Article: Phylogenomic analysis of marine Roseobacters.
    Kai Tang, Hongzhan Huang, Nianzhi Jiao, Cathy H Wu
    [show abstract] [hide abstract]
    ABSTRACT: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10-25% of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira-Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes for dimethylsulfoniopropionate (DMSP) degradation. These aquired genes are essential for the growth of Roseobacters and their eukaryotic partners.
    PLoS ONE 01/2010; 5(7):e11604. · 4.09 Impact Factor
  • Source
    Article: Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput "omics" Data.
    [show abstract] [hide abstract]
    ABSTRACT: High-throughput "omics" technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput "omics" data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput "omics" data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied "omics" data from different laboratories to make useful connections that could lead to new biological knowledge.
    Advances in Bioinformatics 01/2010;
  • Source
    Article: Systems integration of biodefense omics data for analysis of pathogen-host interactions and identification of potential targets.
    [show abstract] [hide abstract]
    ABSTRACT: The NIAID (National Institute for Allergy and Infectious Diseases) Biodefense Proteomics program aims to identify targets for potential vaccines, therapeutics, and diagnostics for agents of concern in bioterrorism, including bacterial, parasitic, and viral pathogens. The program includes seven Proteomics Research Centers, generating diverse types of pathogen-host data, including mass spectrometry, microarray transcriptional profiles, protein interactions, protein structures and biological reagents. The Biodefense Resource Center (www.proteomicsresource.org) has developed a bioinformatics framework, employing a protein-centric approach to integrate and support mining and analysis of the large and heterogeneous data. Underlying this approach is a data warehouse with comprehensive protein + gene identifier and name mappings and annotations extracted from over 100 molecular databases. Value-added annotations are provided for key proteins from experimental findings using controlled vocabulary. The availability of pathogen and host omics data in an integrated framework allows global analysis of the data and comparisons across different experiments and organisms, as illustrated in several case studies presented here. (1) The identification of a hypothetical protein with differential gene and protein expressions in two host systems (mouse macrophage and human HeLa cells) infected by different bacterial (Bacillus anthracis and Salmonella typhimurium) and viral (orthopox) pathogens suggesting that this protein can be prioritized for additional analysis and functional characterization. (2) The analysis of a vaccinia-human protein interaction network supplemented with protein accumulation levels led to the identification of human Keratin, type II cytoskeletal 4 protein as a potential therapeutic target. (3) Comparison of complete genomes from pathogenic variants coupled with experimental information on complete proteomes allowed the identification and prioritization of ten potential diagnostic targets from Bacillus anthracis. The integrative analysis across data sets from multiple centers can reveal potential functional significance and hidden relationships between pathogen and host proteins, thereby providing a systems approach to basic understanding of pathogenicity and target identification.
    PLoS ONE 01/2009; 4(9):e7162-. · 4.09 Impact Factor
  • Article: Integrated Bioinformatics for Radiation-Induced Pathway Analysis from Proteomics and Microarray Data.
    [show abstract] [hide abstract]
    ABSTRACT: Functional analysis and interpretation of large-scale proteomics and gene expression data require effective use of bioinformatics tools and public knowledge resources coupled with expert-guided examination. An integrated bioinformatics approach was used to analyze cellular pathways in response to ionizing radiation. ATM, or ataxia-telangiectasia mutated , a serine-threonine protein kinase, plays critical roles in radiation responses, including cell cycle arrest and DNA repair. We analyzed radiation responsive pathways based on 2D-gel/MS proteomics and microarray gene expression data from fibroblasts expressing wild type or mutant ATM gene. The analysis showed that metabolism was significantly affected by radiation in an ATM dependent manner. In particular, purine metabolic pathways were differentially changed in the two cell lines. The expression of ribonucleoside-diphosphate reductase subunit M2 (RRM2) was increased in ATM-wild type cells at both mRNA and protein levels, but no changes were detected in ATM-mutated cells. Increased expression of p53 was observed 30min after irradiation of the ATM-wild type cells. These results suggest that RRM2 is a downstream target of the ATM-p53 pathway that mediates radiation-induced DNA repair. We demonstrated that the integrated bioinformatics approach facilitated pathway analysis, hypothesis generation and target gene/protein identification.
    Journal of Proteomics & Bioinformatics 06/2008; 1(2):47-60.
  • Article: An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.
    Nucleic Acids Research. 01/2008; 36:884-891.
  • Article: UniRef: comprehensive and non-redundant UniProt reference clusters.
    [show abstract] [hide abstract]
    ABSTRACT: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. Supplementary data are available at Bioinformatics online.
    Bioinformatics 06/2007; 23(10):1282-8. · 5.47 Impact Factor

Institutions

  • 2010–2012
    • University of Delaware
      • Department of Computer and Information Sciences
      Newark, DE, USA
    • Xiamen University
      • State Key Laboratory of Marine Environmental Science
      Xiamen, Fujian, China
  • 2002–2010
    • Georgetown University
      • Department of Biochemistry and Molecular and Cellular Biology
      Washington, D. C., DC, USA
  • 2007
    • National Institute on Aging
      Baltimore, MD, USA