Chengxin Zhang

Chengxin Zhang
University of Michigan | U-M · Department of Computational Medicine and Bioinformatics

Doctor of Philosophy

About

67
Publications
8,411
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,534
Citations
Citations since 2017
65 Research Items
2533 Citations
20172018201920202021202220230200400600800
20172018201920202021202220230200400600800
20172018201920202021202220230200400600800
20172018201920202021202220230200400600800
Introduction
Skills and Expertise

Publications

Publications (67)
Preprint
Full-text available
RNAs are fundamental in living cells and perform critical functions determined by the tertiary architectures. However, accurate modeling of 3D RNA structure remains a challenging problem. Here we present a novel method, DRfold, to predict RNA tertiary structures by simultaneous learning of local frame rotations and geometric restraints from experim...
Article
Full-text available
Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from protein sequences. The method was systematically...
Article
Computational screening for potentially bioactive molecules using advanced molecular modeling approaches including molecular docking and molecular dynamic simulation is mainstream in certain fields like drug discovery. Significant advances in computationally predicting protein structures from sequence information have also expanded the availability...
Article
Full-text available
The multiple sequence alignment (MSA) is the entry point of many RNA structure modeling tasks, such as prediction of RNA secondary structure (rSS) and contacts. However, there are few automated programs for generating high quality MSAs of target RNA molecules. We have developed rMSA, a hierarchical pipeline for sensitive search and accurate alignme...
Article
Genome-wide protein-protein interaction (PPI) determination remains a significant unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (HTEs) have often a relatively high false-positive rate in assigning PPIs, and PPI quaternary structures are more difficult to solve than tertiary structures using trad...
Preprint
Full-text available
Although mmCIF is the current official format for deposition of protein and nucleic acid structures to the Protein Data Bank (PDB) database, the legacy PDB format is still the primary supported format for many structural bioinformatics tools. Therefore, reliable software to convert mmCIF structure files to PDB files is needed. Unfortunately, existi...
Article
Full-text available
Many distantly related structure pairs exhibit structural similarities that can only be fully captured by a non-sequential alignment program. We present US-align2, a unified protocol for both sequential and non-sequential alignment of proteins and nucleic acids. On manually curated reference alignments for protein structural pairs with non-sequenti...
Article
Full-text available
Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules—proteins, RNAs and DNAs. The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignment...
Article
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, wh...
Preprint
Accurate identification of protein function is critical to elucidate life mechanism and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained self-attention transformer models. The method was systematically tested...
Article
Full-text available
The usefulness of live attenuated virus vaccines has been limited by suboptimal immunogenicity, safety concerns or cumbersome manufacturing processes and techniques. Here we describe the generation of a live attenuated influenza A virus vaccine using proteolysis-targeting chimeric (PROTAC) technology to degrade viral proteins via the endogenous ubi...
Article
Gene Ontology (GO) has been widely used to annotate functions of genes and gene products. We proposed a new method, TripletGO, to deduce GO terms of protein-coding and non-coding genes, through the integration of four complementary pipelines built on transcript expression profile, genetic sequence alignment, protein sequence alignment, and naïve pr...
Article
Full-text available
Ab initio protein structure prediction has been vastly boosted by the modeling of inter-residue contact/distance maps in recent years. We developed a new deep-learning model, DeepPotential, which accurately predicts the distribution of a complementary set of geometric descriptors including a novel hydrogen-bonding potential defined by C-alpha atom...
Article
Full-text available
CD13, an ectoenzyme on myeloid and stromal cells, also circulates as a shed, soluble protein (sCD13) with powerful chemoattractant, angiogenic and arthritogenic properties, which require engagement of a G protein-coupled receptor (GPCR). Here we identify the GPCR that mediates sCD13 arthritogenic actions as the bradykinin receptor B1 (B1R). Immunof...
Preprint
Full-text available
Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules (proteins, RNAs, and DNAs). The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignme...
Article
Full-text available
Motivation The full description of nucleic acid conformation involves eight torsion angles per nucleotide. To simplify this description, we previously developed a representation of the nucleic acid backbone that assigns each nucleotide a pair of pseudo-torsion angles (eta and theta defined by P and C4’ atoms; or eta’ and theta’ defined by P and C1’...
Article
Full-text available
RNA secondary-structure (rSS) assignment is one of the most routine forms of analysis of RNA 3D structures. However, traditional rSS assignment programs require full-atomic structures of the individual RNA nucleotides. This prevents their application to the modeling of RNA structures in which base atoms are missing. To address this issue, Coarse-gr...
Article
Full-text available
Progress in cryo-electron microscopy has provided the potential for large-size protein structure determination. However, the success rate for solving multi-domain proteins remains low because of the difficulty in modelling inter-domain orientations. Here we developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic me...
Article
Full-text available
The brain-expressed ubiquilins (UBQLNs) 1, 2 and 4 are a family of ubiquitin adaptor proteins that participate broadly in protein quality control (PQC) pathways, including the ubiquitin proteasome system (UPS). One family member, UBQLN2, has been implicated in numerous neurodegenerative diseases including ALS/FTD. UBQLN2 typically resides in the cy...
Article
Motivation Accurate and efficient predictions of protein structures play an important role in understanding their functions. I-TASSER (Iterative Threading Assembly Refinement) is one of the most successful and widely used protein structure prediction methods in the recent community-wide CASP experiments. Yet, the computational efficiency of I-TASSE...
Preprint
Gene Ontology (GO) has been widely used to annotate functions of genes and gene products. We proposed a new method (TripletGO) to deduce GO terms of protein-coding and non-coding genes, through the integration of four complementary pipelines built on transcript expression profiling, genetic sequence alignment, protein sequence alignment and naive p...
Article
Full-text available
Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guid...
Article
Full-text available
Aims: After myocardial infarction (MI), injured cardiomyocytes recruit neutrophils and monocytes/macrophages to myocardium, which in turn initiates inflammatory and reparative cascades, respectively. Either insufficient or excessive inflammation impairs cardiac healing. As an endogenous inhibitor of neutrophil adhesion, EDIL3 plays a crucial role...
Article
Background: Loeys-Dietz Syndrome (LDS) is an inherited disorder predisposing individuals to thoracic aortic aneurysm and dissection (TAAD). Currently, there are no medical treatments except surgical resection. Although the genetic basis of LDS is well-understood, molecular mechanisms underlying the disease remain elusive impeding the development of...
Article
This article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning-based contact/distance predictor was employed based on the ensemble of two complementary coevolution feature coupling with deep residual...
Article
In this article, we report 3D structure prediction results by two of our best server groups (“Zhang‐Server” and “QUARK”) in CASP14. These two servers were built based on the D‐I‐TASSER and D‐QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I‐TASSER and QUARK, respectively. The new comp...
Article
Full-text available
Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-...
Preprint
Full-text available
The brain expressed ubiquilins (UBQLNs) 1, 2 and 4 are a family of ubiquitin adaptor proteins that participate broadly in protein quality control (PQC) pathways, including the ubiquitin proteasome system (UPS). One family member, UBQLN2, has been implicated in numerous neurodegenerative diseases including ALS/FTD. UBQLN2 typically resides in the cy...
Article
Full-text available
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advan...
Article
Full-text available
Protein engineering is actively pursued in industrial and laboratory settings for high thermostability. Among the many protein engineering methods, rational design by bioinformatics provides theoretical guidance without time-consuming experimental screenings. However, most rational design methods either rely on protein tertiary structure informatio...
Preprint
Full-text available
Myoglobin is the major oxygen carrying protein in vertebrate muscle. Previous studies identified in secondarily aquatic mammalian lineages high myoglobin net charge, which serves to prevent aggregation at the extremely high intracellular myoglobin concentrations found in these species. However, it is unknown how aquatic birds that dive for extended...
Article
Numerous human diseases are caused by mutations in genomic sequences. Since amino acid changes affect protein function through mechanisms often predictable from protein structure, the integration of structural and sequence data enables us to estimate with greater accuracy whether and how a given mutation will lead to disease. Publicly available ann...
Article
When the JCVI-syn3.0 genome was designed and implemented in 2016 as the minimal genome of a free-living organism, approximately one-third of the 438 protein-coding genes had no known function. Subsequent refinement into JCVI-syn3A led to inclusion of 16 additional protein-coding genes, including several unknown functions, resulting in an improved g...
Article
Despite considerable research progress on SARS-CoV-2, the direct zoonotic origin (intermediate host) of the virus remains ambiguous. The most definitive approach to identify the intermediate host would be the detection of SARS-CoV-2-like coronaviruses in wild animals. However, due to the high number of animal species, it is not feasible to screen a...
Preprint
Full-text available
Genome-wide protein-protein interaction determination (or interactome) remains a significantly unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (THEs) have often high false-positive rate in assigning PPIs and PPI quaternary structure is much more difficult to solve than tertiary structure using tra...
Preprint
Full-text available
Progress in cryo-electron microscopy (cryo-EM) has provided the potential for large-size protein structure determination. However, the solution rate for multi-domain proteins remains low due to the difficulty in modeling inter-domain orientations. We developed DEMO-EM, an automatic method to assemble multi-domain structures from cryo-EM maps throug...
Preprint
Full-text available
Prokaryotes and some unicellular eukaryotes routinely overcome evolutionary pressures with the help of horizontally acquired genes. In contrast, it is unusual for multicellular eukaryotes to adapt through horizontal gene transfer (HGT). Recent studies identified several cases of adaptive acquisition in the gut-dwelling multicellular fungal phylum N...
Preprint
Full-text available
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advan...
Article
Full-text available
Despite considerable research progress on SARS-CoV-2, the direct zoonotic origin (intermediate host) of the virus remains ambiguous. The most definitive approach to identify the intermediate host would be the detection of SARS-CoV-2-like coronaviruses in wild animals. However, due to the high number of animal species, it is not feasible to screen a...
Article
Full-text available
Motivation: Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases. Results: We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations...
Article
As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we first analyzed two recent studies which concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions shar...
Preprint
Full-text available
As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we re-analyzed the computational approaches and findings presented in two recent manuscripts by Ji et al. (https://doi.org/10.1002/jmv.25682) and by Pradhan...
Article
Full-text available
Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of...
Article
Motivation: The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence ali...
Article
Full-text available
Introduction: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. Results: By processing 1.3...
Article
Full-text available
Accurate prediction of atomic-level protein structure is important for annotating the biological functions of protein molecules and for designing new compounds to regulate the functions. Template-based modeling (TBM), which aims to construct structural models by copying and refining the structural frameworks of other known proteins, remains the mos...
Article
Full-text available
In 2018, we reported a hybrid pipeline that predicts protein structures with I-TASSER and function with COFACTOR. I-TASSER/COFACTOR achieved Gene Ontology (GO) high prediction accuracies of Fmax=0.69 and 0.57 for molecular function (MF) and biological process (BP), respectively, on 100 comprehensively-annotated proteins. Now we report blinded analy...
Article
We report the results of residue‐residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary HMM‐based...
Article
Significance More than 80% of eukaryotic proteins and 67% of prokaryotic proteins contain multiple domains. Due to the technical difficulties in structural biology, however, 65.3% of solved proteins in the Protein Data Bank contain only single-domain structures. Similarly, most computational approaches are optimized for single-domain structure pred...
Article
Full-text available
The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed...
Article
We report the results of two fully‐automated structure prediction pipelines, “Zhang‐Server” and “QUARK”, in CASP13. The pipelines were built upon the C‐I‐TASSER and C‐QUARK programs, which in turn are based on I‐TASSER and QUARK but with three new modules: (1) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence‐...
Article
Full-text available
The LOMETS2 server (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is an online meta-threading server system for template-based protein structure prediction. Although the server has been widely used by the community over the last decade, the previous LOMETS server no longer represents the state-of-the-art due to aging of the algorithms and unsatisfac...
Article
Motivation: Contact-map of a protein sequence dictates the global topology of structural fold. Accurate prediction of the contact-map is thus essential to protein 3D structure prediction, which is particularly useful for the protein sequences that do not have close homology templates in the PDB. Results: We developed a new method, ResPRE, to pre...
Article
After myocardial infarction (MI), injured cardiomyocytes recruit neutrophils, leading to extravasated monocytes polarized to inflammatory and reparative macrophages sequentially, which contribute to the healing and regenerative process. As an endogenous leukocyte-endothelial adhesion inhibitor, the role of EDIL3 in MI remains obscure. We found that...
Article
Motivation: Comparison of RNA 3D structures can be used to infer functional relationship of RNA molecules. Most of the current RNA structure alignment programs are built on size-dependent scales, which complicate the interpretation of structure and functional relations. Meanwhile, the low speed prevents the programs from being applied to large-sca...
Article
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts wi...
Article
Full-text available
G protein-coupled receptors (GPCRs) constitute the key component of cellular signal transduction. Accurately annotating the biological functions of GPCR proteins is vital to the understanding of the physiological processes they involve in. With the rapid development of text mining technologies and the exponential growth of biomedical literature, it...
Article
Understanding the function of human proteins is essential to decipher the molecular mechanisms of human diseases and phenotypes. Of the 17470 human protein coding genes in neXtProt 2018-01-17 database with unequivocal protein existence evidence (PE1), 1260 proteins do not have characterized functions. To reveal the function of poorly annotated huma...
Article
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with...
Article
Full-text available
α-Helical transmembrane proteins are a ubiquitous and important class of proteins, but present difficulties for crystallographic structure solution. Here, the effectiveness of the AMPLE molecular replacement pipeline in solving α-helical transmembrane-protein structures is assessed using a small library of eight ideal helices, as well as search mod...
Article
Full-text available
We develop two complementary pipelines, “Zhang-Server” and “QUARK”, based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay...
Article
The COFACTOR web server is a unified platform for structure-based multiple-level protein function predictions. By structurally threading low-resolution structural models through the BioLiP library, the COFACTOR server infers three categories of protein functions including gene ontology, enzyme commission and ligand-binding sites from various analog...
Article
Full-text available
Understanding how gene-level mutations affect the binding affinity of protein–protein interactions is a key issue of protein engineering. Due to the complexity of the problem, using physical force field to predict the mutation-induced binding free-energy change remains challenging. In this work, we present a renewed approach to calculate the impact...
Article
Full-text available
Motivation: The recently released PyMod GUI integrates many of the individual steps required for protein sequence-structure analysis and homology modeling within the interactive visualization capabilities of PyMOL. Here we describe the improvements introduced into the version 2.0 of PyMod. Results: The original code of PyMod has been completely...

Network

Cited By

Projects

Project (1)