Yusen Zhang

Yusen Zhang
Shandong University, Weihai, China · School of Mathematics and Statistics

Ph. D

About

91
Publications
15,401
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,180
Citations
Citations since 2016
43 Research Items
793 Citations
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
Introduction
Yusen Zhang received the PhD degree in computational mathematics from Dalian University of Technology, China. During 2004-2006, he was a postdoctoral fellow in Bioinformatics and Computer Applications at Graduate University of Chinese Academy of Science. he is a visiting scholar of Department of Computer Science and Engineering in Wahsington University in St Louis. His research is in the algorithms in computational biology, bioinformatics and statistical genetics.
Additional affiliations
December 2012 - December 2013
Washington University in St. Louis
Position
  • DNA fragments assembly
January 2006 - December 2014
Shandong University
Position
  • Shandong University (Weihai)
January 2003 - present
Dalian University of Technology
Education
February 2004 - February 2006
Chinese Academy of Sciences
Field of study
  • Bioinformatics and Computer Applications
September 2000 - July 2003
Dalian University of Technology
Field of study
  • Computational Mathematics

Publications

Publications (91)
Article
Full-text available
Background Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few cancer driver genes whose mutations cause tumor growth. However, the majority of existing computational approaches underuse the co-occurre...
Article
Microcystis is a cyanobacteria that is widely distributed across the world. It has attracted great attention because it produces the hepatotoxin microcystin (MC) that can inhibit eukaryotic protein phosphatases and pose a great risk to animal and human health. Due to the high diversity of morphospecies and genomes, it is still difficult to classify...
Article
Full-text available
Extensive clinical and biomedical studies have shown that microbiome plays a prominent role in human health. Identifying potential microbe–disease associations (MDAs) can help reveal the pathological mechanism of human diseases and be useful for the prevention, diagnosis, and treatment of human diseases. Therefore, it is necessary to develop effect...
Preprint
Full-text available
Background: Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few driver mutation genes from a much larger number of passenger mutation genes. However, majority of existing computational approaches under...
Article
Background Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective Herein, we present a computational model (PEPRF) to...
Article
Full-text available
Campylobacter jejuni is a leading cause of bacterial gastroenteritis in humans around the world. The emergence of bacterial resistance is becoming more serious; therefore, development of new vaccines is considered to be an alternative strategy against drug-resistant pathogen. In this study, we investigated the pangenome of 173 C. jejuni strains and...
Article
Full-text available
lncRNA affects the expression of nearby protein-coding genes and interfaces with related RNA binding proteins to exert functions. It is necessary to develop new computational models, which can reduce the cost and time of the biological experiments and select the most promising lncRNA-protein pairs for experimental validation. In this work, we propo...
Article
Full-text available
Background Microbes are closely related to human health and diseases. Identification of disease-related microbes is of great significance for revealing the pathological mechanism of human diseases and understanding the interaction mechanisms between microbes and humans, which is also useful for the prevention, diagnosis and treatment of human disea...
Article
Protein-protein interaction (PPI) not only plays a critical role in cell life activities, but also plays an important role in discovering the mechanism of biological activity, protein function, and disease states. Developing computational methods is of great significance for PPIs prediction since experimental methods are time-consuming and laboriou...
Article
Lysine malonylation is one of the important post-translational modification (PTM) of proteins. Malonylated proteins can affect various cell functions of eukaryotes and prokaryotes, which play an important role in metabolic pathways, mitochondrial functions, fatty acid oxidation and other life activities. However, accurate identification of the malo...
Article
Full-text available
Studies have shown that microRNAs (miRNAs) are closely associated with many human diseases, but we have not yet fully understand the role and potential molecular mechanisms of miRNAs in the process of disease development. However, ordinary biological experiments often require higher costs, and computational methods can be used to quickly and effect...
Article
Full-text available
Background: The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance...
Article
Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among differ...
Preprint
Full-text available
Proteins play a significant part in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of proteins in cells. Studies have found that more and more proteins belong to multiple subcellular locations, and these proteins are called multi-l...
Article
Full-text available
Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identific...
Article
Full-text available
The task of predicting protein–protein interactions (PPIs) has been essential in the context of understanding biological processes. This paper proposes a novel computational model namely FCTP-WSRC to predict PPIs effectively. Initially, combinations of the F-vector, composition (C) and transition (T) are used to map each protein sequence onto numer...
Article
Full-text available
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translate...
Article
Full-text available
* Background In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases. In this paper, we propose an effective computational model that uses deep learning and word2vec to predict therapeutic peptides (PTPD). * Results Represent...
Article
Full-text available
Microbes are vital in human health. It is helpful to promote diagnostic and treatment of human disease, and drug development by identifying microbe-disease associations. However, knowledge in this area still needs to be further improved. In this paper, a new computational model using matrix completion to predict human microbe-disease associations (...
Article
Full-text available
We generalize chaos game representation (CGR) to higher dimensional spaces while maintaining its bijection, keeping such method sufficiently representative and mathematically rigorous compare to previous attempts. We first state and prove the asymptotic property of CGR and our generalized chaos game representation (GCGR) method. The prediction foll...
Presentation
Full-text available
Our new R package at https://cran.r-project.org/package=qkerntool/ Abstract Nonlinear machine learning tool for classification, clustering dimensionality reduction and visualization. This tool provides new insight into kernel methods by conditional negative definite (CND) kernel. Thus enabling us to use CND kernels in a much larger class of kernel...
Article
Full-text available
Background Using knowledge-based interpretation to analyze omics data can not only obtain essential information regarding various biological processes, but also reflect the current physiological status of cells and tissue. The major challenge to analyze gene expression data, with a large number of genes and small samples, is to extract disease-rela...
Article
Full-text available
Background Apoptosis is associated with some human diseases, including cancer, autoimmune disease, neurodegenerative disease and ischemic damage, etc. Apoptosis proteins subcellular localization information is very important for understanding the mechanism of programmed cell death and the development of drugs. Therefore, the prediction of subcellul...
Article
Full-text available
Microorganisms resided in human body play a vital role in metabolism, immune defense, nutrition absorption, cancer control and protection against pathogen colonization. The changes of microbial communities can cause human diseases. Based on the known microbe-disease association, we presented a novel computational model employing Random Walking with...
Article
Advances in sequencing technologies led to rapid increase in the number and diversity of biological sequences, which facilitated development in the sequence research. In this paper, we present a new method for analyzing protein sequence similarity. We calculated the spectral radii of 20 amino acids (AAs) and put forward a novel 2-D graphical repres...
Article
As therapeutic peptides have been taken into consideration in disease therapy in recent years, many biologists spent time and labor to verify various functional peptides from a large number of peptide sequences. In order to reduce the workload and increase the efficiency of identification of functional proteins, we propose a sequencebased model, q-...
Article
Full-text available
Analysis of drug–target interactions (DTIs) is of great importance in developing new drug candidates for known protein targets or discovering new targets for old drugs. However, the experimental approaches for identifying DTIs are expensive, laborious and challenging. In this study, we report a novel computational method for predicting DTIs using t...
Article
As therapeutic peptides have been taken into consideration in disease therapy in recent years, many biologists spent time and labor to verify various functional peptides from a large number of peptide sequences. In order to reduce the workload and increase the efficiency of identification of functional proteins, we propose a sequence-based model, q...
Article
Prediction of protein structural class plays an important role in protein structure and function analysis, drug design and many other biological applications. Prediction of protein structural class for low-similarity sequences is still a challenging task. Based on the theory of wavelet denoising, this paper presents a novel method of prediction of...
Article
Full-text available
Scientific Reports 7 : Article number: 46237 10.1038/srep46237 ; published online: 10 April 2017 ; updated: 04 May 2017 In the original version of this Article, Yusen Zhang was incorrectly affiliated with ‘Faculty of Science, University of Kragujevac, P.
Article
Full-text available
We develop a novel position-feature-based model for protein sequences by employing physicochemical properties of 20 amino acids and the measure of graph energy. The method puts the emphasis on sequence order information and describes local dynamic distributions of sequences, from which one can get a characteristic B-vector. Afterwards, we apply the...
Article
Full-text available
This paper proposes a new classification for microarray data which utilizes K-means clustering combined with modified single-to-noise-ratio based on graph energy (SNRGE) method. This method is employed to select a small subset of characteristic features from DNA microarray data. Comparing with the single-to-noise-ratio (SNR) method proposed by Golu...
Article
Full-text available
In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute distance between seq...
Article
Gene selection is important for cancer classification based on gene expression data, because of high dimensionality and small sample size. In this paper, we present a new gene selection method based on clustering, in which dissimilarity measures are obtained through kernel functions. It searches for best weights of genes iteratively at the same tim...
Article
Full-text available
In this paper, we propose the graph energy of 20 amino acids and the 2D graphical representation of protein sequences based on six physicochemical properties of 20 amino acids and the relationship between them. Moreover, we could get a specific vector from the graphical curve of a protein sequence, and use this vector to calculate the distance betw...
Article
Full-text available
We propose in this paper, the graph energy and Laplacian energy of 20 amino acids based on the codons coding the amino acids and apply them to put forward a novel 2-D graphical representation of proteins. The novel graphical representation has no circuit or degeneracy, uniquely represents proteins and allows one to easily and quickly visually obser...
Article
Full-text available
As the fundamental unit of eukaryotic chromatin structure, nucleosome plays critical roles in gene expression and regulation by controlling physical access to transcription factors. In this paper, based on the geometrically transformed Tsallis entropy and two index-vectors, a valid nucleosome positioning information model is developed to describe t...
Article
Tyrosine sulfation is a post-translational modification widely distributed in eukaryotic proteins. The prerequisite to reveal its biological role which is largely unknown is identifying more protein sulfotyrosine sites. However, previous computational methods only achieved limited accuracy. In this paper, we propose a novel tool named SulfoTyrP wit...
Article
Full-text available
Nucleosome is the basic structure of chromatin in eukaryotic cells. Nucleosome positioning plays a key role in the regulation of many biological processes like replication, transcription and DNA repair. In this paper the informational entropy and the mutual information are applied to detect the information on nucleotide correlation stored in the nu...
Article
Sequence comparison is one of the major tasks in bioinformatics, which can be used to study structural and functional conservation, as well as evolutionary relations among the sequences. In this paper, we introduce the concept of distance frequency of amino acid pairs and propose a new numerical characterization of protein sequences, which converts...
Article
Full-text available
We introduce the analysis of DNA sequences based on the fuzzy integral, and compare it with some other existing methods. The similarity and phylogenetic analysis on two real data sets illustrate that the proposed approach is effective and feasible.
Article
In mammals, breeding is preceded by species-specific mating behaviours. In this study, we investigated whether parthenogenetic embryo quality could be improved by mating behaviours in mice. To investigate this hypothesis, female mice were mated with vasectomized Kunming white male mice after superovulation. Oocytes were collected and counted at 16 ...
Article
The Wiener index W is the sum of distances between all pairs of vertices of a connected graph. Recently, q-analogs of W were conceived, motivated by the theory of hypergeometric series. In this article formulas are obtained for computing the q-Wiener indices of some compound trees. These generalize expressions, earlier known to hold for W.
Article
Nucleosome is the basic structure of chromatin in eukaryotic cells, and they form the chromatin fiber interconnected by sections of linker DNA. Nucleosome positioning is of great significance for gene transcription regulation. In this paper, we consider the difference of absolute frequency of nucleotides between the nucleosome forming and nucleosom...
Article
Full-text available
We propose q-analogs of the Wiener index, motivated by the theory of hypergeometric series. The basic properties of these q-Wiener indices are established, as well as their relations with the Hosoya polynomial. Some possible chemical interpretations and applications of the q-Wiener indices are considered.
Article
Full-text available
In this paper we propose a novel method to compare RNA molecules. We transform an RNA secondary structure into a linear structural sequence not only differentiating paired bases from free bases but also considering the hydrogen bonds between paired bases. We also propose two suitable distance measures based on both the linear structural sequences a...
Article
Full-text available
Nucleosome is the basic structure of chromatin in eukaryotic cells, forming the chromatin fiber interconnected by sections of linker DNA. Nucleosome positioning is of great significance for the regulation of gene transcription. A few computational models have been proposed to predict in vivo nucleosome positioning on genome directly from DNA sequen...
Article
Apoptosis proteins play a crucial role in the development and home-ostasis of an organism. Obtaining information about subcellular location of these proteins is very important to understand the mechanism of programmed cell death. In this paper, based on the hydropathy characteristics, we introduce the frequency of 2-blocks and pK value of the α-NH+...
Article
Transcription factors (TF) are proteins that control the first step of gene expression, the transcription of DNA into RNA sequences. The mechanism of transcriptional regulatory can be much better understood if the category of transcription factors is known. We developed a new method for predicting the classification of transcription factors by inco...
Article
Full-text available
Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similari...