Yaohang Li

Yaohang Li
Old Dominion University | ODU · Department of Computer Science

Ph.D., Florida State University

About

206
Publications
58,984
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,965
Citations
Additional affiliations
July 2010 - present
Old Dominion University
Position
  • Professor (Associate)

Publications

Publications (206)
Article
Full-text available
The convergence of Markov chain-based Monte Carlo linear solvers using the Ulam- von Neumann algorithm for a linear system of the form x = Hx + b is investigated in this paper. We analyze the convergence of the Monte Carlo solver based on the original Ulam-von Neumann algorithm under the conditions that ||H|| < 1 as well as ρ(H) < 1, where ρ(H) is...
Article
We report a new approach of using statistical context-based scores as encoded features to train neural networks to achieve secondary structure prediction accuracy improvement. The context-based scores are pseudo-potentials derived by evaluating statistical, high-order inter-residue interactions, which estimate the favorability of a residue adopting...
Article
The rapidly increasing number of protein crystal structures available in PDB has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a Context-based Secondary Structure Potential (CSSP) for assessing the quality of predicted protein secondary structures generated by variou...
Article
Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protei...
Article
The relative distance and orientation in contacting residue pairs plays a significant role in protein folding and stabilization. We hereby devise a new knowledge-based, coarse-grained contact potential, so-called ICOSA, by correlating inter-residue contact distance and orientation in evaluating pair-wise inter-residue interactions. The rationale of...
Preprint
Full-text available
Recent advances in cellular research demonstrate that scRNA-seq characterizes cellular heterogeneity, while spatial transcriptomics reveals the spatial distribution of gene expression. Cell representation is the fundamental issue in the two fields. Here, we propose Topology-encoded Latent Hyperbolic Geometry (TopoLa), a computational framework enha...
Preprint
A likelihood analysis of the observables in deeply virtual exclusive photoproduction off a proton target, $ep \rightarrow e' p' \gamma'$, is presented. Two processes contribute to the reaction: deeply virtual Compton scattering, where the photon is produced at the proton vertex, and the Bether-Heitler process, where the photon is radiated from the...
Article
Full-text available
Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for characterizing cellular landscapes within complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell typ...
Article
We extend the Variational Autoencoder Inverse Mapper (VAIM) framework for the inverse problem of extracting Compton Form Factors (CFFs) from deeply virtual exclusive reactions, such as the unpolarized Deeply virtual exclusive scattering (DVCS) cross section. VAIM is an end-to-end deep learning framework to address the solution ambiguity issue in il...
Preprint
Deeply virtual exclusive scattering processes (DVES) serve as precise probes of nucleon quark and gluon distributions in coordinate space. These distributions are derived from generalized parton distributions (GPDs) via Fourier transform relative to proton momentum transfer. QCD factorization theorems enable DVES to be parameterized by Compton form...
Preprint
In overview of the recent activity of the newly funded EXCLusives with AI and Machine learning (EXCLAIM) collaboration is presented. The main goal of the collaboration is to develop a framework to implement AI and machine learning techniques in problems emerging from the phenomenology of high energy exclusive scattering processes from nucleons and...
Article
Full-text available
Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases a...
Article
Motivation Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes b...
Preprint
Full-text available
Complex networks, which are the abstractions of many real-world systems, present a persistent challenge across disciplines for people to decipher their underlying information. Recently, hyperbolic geometry of latent spaces has gained traction in network analysis, due to its ability to preserve certain local intrinsic properties of the nodes. In thi...
Article
Full-text available
Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this pa...
Preprint
Full-text available
Single-cell RNA sequencing (scRNA-seq) technologies have been widely used to characterize cellular landscapes in complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell types often rely...
Article
Background computational molecular docking plays an important role in determining the precise receptor-ligand conformation, which becomes a powerful tool for drug discovery. In the past 30 years, most computational docking methods treat the receptor structure as a rigid body, although flexible docking often yields higher accuracy. The main disadvan...
Preprint
Full-text available
Motivation Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes b...
Article
Full-text available
AI-supported algorithms, particularly generative models, have been successfully used in a variety of different contexts. This work employs a generative modeling approach to unfold detector effects specifically tailored for exclusive reactions that involve multiparticle final states. Our study demonstrates the preservation of correlations between ki...
Article
Full-text available
Adverse Drug Reactions (ADRs) have a direct impact on human health. As continuous pharmacovigilance and drug monitoring prove to be costly and time-consuming, computational methods have emerged as promising alternatives. However, most existing computational methods primarily focus on predicting whether or not the drug is associated with an adverse...
Article
Full-text available
Motivation: Cancer heterogeneity drastically affects cancer therapeutic outcomes. Predicting drug response in vitro is expected to help formulate personalized therapy regimens. In recent years, several computational models based on machine learning and deep learning have been proposed to predict drug response in vitro. However, most of these metho...
Preprint
AI-supported algorithms, particularly generative models, have been successfully used in a variety of different contexts. In this work, we demonstrate for the first time that generative adversarial networks (GANs) can be used in high-energy experimental physics to unfold detector effects from multi-particle final states, while preserving correlation...
Article
Full-text available
Motivation: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection me...
Article
Automated ICD coding is a multi-label prediction task aiming at assigning patient diagnoses with the most relevant subsets of disease codes. In the deep learning regime, recent works have suffered from large label set and heavy imbalance distribution. To mitigate the negative effect in such scenarios, we propose a retrieve and rerank framework that...
Article
Full-text available
Motivation: Hi-C technology has been the most widely used chromosome conformation capture(3C) experiment that measures the frequency of all paired interactions in the entire genome, which is a powerful tool for studying the 3D structure of the genome. The fineness of the constructed genome structure depends on the resolution of Hi-C data. However,...
Article
Circular RNAs (circRNAs) are reverse-spliced and covalently closed RNAs. Their interactions with RNA-binding proteins (RBPs) have multiple effects on the progress of many diseases. Some computational methods are proposed to identify RBP binding sites on circRNAs but suffer from insufficient accuracy, robustness and explanation. In this study, we fi...
Article
Drug discovery and drug repurposing often rely on the successful prediction of drug-target interactions (DTIs). Recent advances have shown great promise in applying deep learning to drug-target interaction prediction. One challenge in building deep learning-based models is to adequately represent drugs and proteins that encompass the fundamental lo...
Article
Full-text available
We present a new machine learning-based Monte Carlo event generator using generative adversarial networks (GANs) that can be trained with calibrated detector simulations to construct a vertex-level event generator free of theoretical assumptions about femtometer scale physics. Our framework includes a GAN-based detector folding as a fast-surrogate...
Preprint
Human leukocyte antigen (HLA) is an important molecule family in the field of human immunity, which recognizes foreign threats and triggers immune responses by presenting peptides to T cells. In recent years, the synthesis of tumor vaccines to induce specific immune responses has become the forefront of cancer treatment. Computationally modeling th...
Article
Full-text available
Automatic International Classification of Diseases (ICD) coding is defined as a kind of text multi-label classification problem, which is difficult because the number of labels is very large and the distribution of labels is unbalanced. The label-wise attention mechanism is widely used in automatic ICD coding because it can assign weights to every...
Preprint
We develop a framework to establish benchmarks for machine learning and deep neural networks analyses of exclusive scattering cross sections (FemtoNet). Within this framework we present an extraction of Compton form factors for deeply virtual Compton scattering from an unpolarized proton target. Critical to this effort is a study of the effects of...
Article
Motivation: Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming, and expensive. With the collection of more validated biomedical data and the advancement of c...
Article
The understanding of protein functions is critical to many biological problems such as the development of new drugs and new crops. To reduce the huge gap between the increase of protein sequences and annotations of protein functions, many methods have been proposed to deal with this problem. These methods use Gene Ontology (GO) to classify the func...
Article
The identification of drug–target relations (DTRs) is substantial in drug development. A large number of methods treat DTRs as drug-target interactions (DTIs), a binary classification problem. The main drawback of these methods are the lack of reliable negative samples and the absence of many important aspects of DTR, including their dose dependenc...
Article
Topologically associating domains (TADs) are local chromatin interaction domains, which have been shown to play an important role in gene expression regulation. TADs were originally discovered in the investigation of 3D genome organization based on High-throughput Chromosome Conformation Capture (Hi-C) data. Continuous considerable efforts have bee...
Article
Motivation The identification of compound-protein interactions (CPIs) is an essential step in the process of drug discovery. The experimental determination of CPIs is known for a large amount of funds and time it consumes. Computational model has therefore become a promising and efficient alternative for predicting novel interactions between compou...
Chapter
The recently proposed generative adversarial network (GAN)-based event generator, the Feature Augmented and Transformed GAN (FAT-GAN), has shown an impressive capability of reproducing inclusive electron–proton scattering events at given collision energy. In contrast, many practical applications require the event generator to have the flexibility o...
Article
Full-text available
Essential proteins are considered the foundation of life as they are indispensable for the survival of living organisms. Computational methods for essential protein discovery provide a fast way to identify essential proteins. But most of them heavily rely on various biological information, especially protein-protein interaction networks, which limi...
Conference Paper
Full-text available
We apply generative adversarial network (GAN) technology to build an event generator that simulates particle production in electron-proton scattering that is free of theoretical assumptions about underlying particle dynamics. The difficulty of efficiently training a GAN event simulator lies in learning the complicated patterns of the distributions...
Conference Paper
Full-text available
Event generators in high-energy nuclear and particle physics play an important role in facilitating studies of particle reactions. We survey the state of the art of machine learning (ML) efforts at building physics event generators. We review ML generative models used in ML-based event generators and their specific challenges, and discuss various a...
Article
Identifying the frequencies of the drug-side effects is a very important issue in pharmacological studies and drug risk-benefit. However, designing clinical trials to determine the frequencies is usually time consuming and expensive, and most existing methods can only predict the drug-side effect existence or associations, not their frequencies. In...
Article
The ATC (Anatomical Therapeutic Chemical) code of a drug is a classification system designated by the World Health Organization Collaborating Center for Drug Statistics Methodology. Correctly identifying the potential ATC codes for drugs can accelerate drug development and reduce the cost of experiments. Several classifiers have been proposed in th...
Preprint
Full-text available
Event generators in high-energy nuclear and particle physics play an important role in facilitating studies of particle reactions. We survey the state-of-the-art of machine learning (ML) efforts at building physics event generators. We review ML generative models used in ML-based event generators and their specific challenges, and discuss various a...
Article
Full-text available
Biomolecular recognition between ligand and protein plays an essential role in drug discovery and development. However, it is extremely time and resource consuming to determine the protein–ligand binding affinity by experiments. At present, many computational methods have been proposed to predict binding affinity, most of which usually require prot...
Article
The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel co...
Article
The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, a...
Article
Motivation The Anatomical Therapeutic Chemical (ATC) system is an official classification system established by the World Health Organization for medicines. Correctly assigning ATC classes to given compounds is an important research problem in drug discovery, which can not only discover the possible active ingredients of the compounds, but also inf...
Conference Paper
Full-text available
Correctly identifying the potential Anatomical Therapeutic Chemical (ATC) codes for drugs can accelerate drug development and reduce the cost of experiments. However, most of the existing methods only analyze the first-level ATC code of drugs and lack of the ability to learn basic features from sparsely known drug-ATC code associations. In this pap...
Article
In pharmaceutical sciences, a crucial step of the drug discovery is the identification of drug-target interactions (DTIs). However, only a small portion of the DTIs have been experimentally validated. Moreover, it is an extremely laborious, expensive, and time-consuming procedure to capture new interactions between drugs and targets through traditi...
Article
Full-text available
In the post-genomic era, proteomics has achieved significant theoretical and practical advances with the development of high-throughput technologies. Especially the rapid accumulation of protein-protein interactions (PPIs) provides a foundation for constructing protein interaction networks (PINs), which can furnish a new perspective for understandi...
Article
Motivation Determining the structures of proteins is a critical step to understand their biological functions. Crystallography-based X-ray diffraction technique is the main method for experimental protein structure determination. However, the underlying crystallization process, which needs multiple time-consuming and costly experimental steps, has...
Article
Full-text available
Background One of the most essential problems in structural bioinformatics is protein fold recognition. In this paper, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features at fragment level to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the firs...
Article
Full-text available
With the development of high-throughput technology and the accumulation of biomedical data, the prior information of biological entity can be calculated from different aspects. Specifically, drug-drug similarities can be measured from target profiles, drug-drug interaction and side effects. Similarly, different methods and data sources to calculate...
Chapter
The identification of drug-target interactions plays a crucial role in drug discovery and design. However, capturing interactions between drugs and targets via traditional biochemical experiments is an extremely laborious, expensive and time-consuming procedure. Therefore, the use of computational methods for predicting potential interactions to gu...
Preprint
Full-text available
We present a new strategy using artificial intelligence (AI) to build the first AI-based Monte Carlo event generator (MCEG) capable of faithfully generating final state particle phase space in lepton-hadron scattering. We show a blueprint for integrating machine learning strategies with calibrated detector simulations to build a vertex-level, AI-ba...
Article
Matrix completion, whose goal is to recover a matrix from a few entries observed, is a fundamental model behind many applications. Our study shows that, in many applications, the to-be-complete matrix can be represented as the sum of a low-rank matrix and a sparse matrix associating with side information matrices. The low-rank matrix depicts the gl...
Article
Full-text available
The rapid development of proteomics and high-throughput technologies has produced a large amount of Protein-Protein Interaction (PPI) data, which makes it possible for considering dynamic properties of protein interaction networks (PINs) instead of static properties. Identification of protein complexes from dynamic PINs becomes a vital scientific p...
Article
Full-text available
In recent years, accumulating studies have shown that long non-coding RNAs (lncRNAs) not only play an important role in the regulation of various biological processes but also are the foundation for understanding mechanisms of human diseases. Due to the high cost of traditional biological experiments, the number of experimentally verified lncRNA-di...
Article
Full-text available
A growing amount of evidence suggests that long non-coding RNAs (lncRNAs) play important roles in the regulation of biological processes in many human diseases. However, the number of experimentally verified lncRNA-disease associations is very limited. Thus, various computational approaches are proposed to predict lncRNA-disease associations. Curre...
Article
Full-text available
Drug repositioning can drastically decrease the cost and duration taken by traditional drug research and development while avoiding the occurrence of unforeseen adverse events. With the rapid advancement of high-throughput technologies and the explosion of various biological data and medical data, computational drug repositioning methods have been...
Preprint
Full-text available
We apply generative adversarial network (GAN) technology to build an event generator that simulates particle production in electron-proton scattering that is free of theoretical assumptions about underlying particle dynamics. The difficulty of efficiently training a GAN event simulator lies in learning the complicated patterns of the distributions...
Article
Full-text available
Knowledge of protein functions plays an important role in biology and medicine. With the rapid development of highthroughput technologies, a huge number of proteins have been discovered. However, there are a great number of proteins without functional annotations. A protein usually has multiple functions and some functions or biological processes r...
Article
Full-text available
Identification of potential drug–associated indications is critical for either approved or novel drugs in drug repositioning. Current computational methods based on drug similarity and disease similarity have been developed to predict drug–disease associations. When more reliable drug- or disease-related information becomes available and is integra...
Article
Recently, increasing evidences reveal that dysregulations of long non-coding RNAs (lncRNAs) are relevant to diverse diseases. However, the number of experimentally verified lncRNA-disease associations is limited. Prioritizing potential associations is beneficial not only for disease diagnosis, but also disease treatment, more important apprehending...
Article
Full-text available
Background: Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of...
Article
Full-text available
The explosion of digital healthcare data has led to a surge of data-driven medical research based on machine learning. In recent years, as a powerful technique for big data, deep learning has gained a central position in machine learning circles for its great advantages in feature representation and pattern recognition. This article presents a comp...
Article
Full-text available
Understanding and computationally predicting the protein folding process remains one of the most challenging scientific problems and has uniquely garnered the interdisciplinary efforts of researchers from both the biological, chemical, physical and computational disciplines. Previous studies have demonstrated the importance of long-range interactio...
Article
Full-text available
Motivation: Protein-protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict...