
Shaohong Zhang- PhD
- Professor (Associate) at Guangzhou University
Shaohong Zhang
- PhD
- Professor (Associate) at Guangzhou University
About
56
Publications
10,383
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
395
Citations
Current institution
Publications
Publications (56)
Text readability assessment has gained significant attention from researchers in various domains. However, the lack of exploration into corpus compatibility poses a challenge as different research groups utilize different corpora. In this study, we propose a novel evaluation framework, Cross-corpus text Readability Compatibility Assessment (CRCA),...
Representation learning is a method to compute the corresponding vectorized representations of entities or relationships. It is one of the most basic and essential natural language processing tasks. Current computer domain knowledge modeling techniques have two flaws: (1) the neglect of fine-grained knowledge hierarchies, and (2) the lack of a unif...
With the continuous development of the information industry and the enormous market demand, the demand for Internet Technology (IT) training is also rapidly increasing. However, the epidemic’s sudden outbreak and the implementation of related policies have left training institutions in an existential crisis. To analyze the influencing factors of in...
Text readability assessment has gained significant attention from researchers in various domains. However, the lack of exploration into corpus compatibility poses a challenge as different research groups utilize different corpora. In this study, we propose a novel evaluation framework, Cross-corpus text Readability Compatibility Assessment (CRCA)
<...
In traditional course recommendation methods, courses are usually recommended through a single language content. However, learners can not be effectively recommended using current methods when they want to access cross-language course content. In computer science, foreign computer technology has first mover advantages, so cross-language recommendat...
In recent years the application of deep learning algorithms has dominated this research field on document readability prediction. Traditional methods rely excessively on manual feature extraction, and modern deep learning algorithms are severely time consuming in terms of efficiency in deep feature extraction. On the other hand, there is a consider...
In this paper, we proposed a new privacy-preserving clustering framework. We proposed two different types of data transformation methods on clustering solution vectors and clustering ensemble consensus matrix in a unified way. The first one is encryption, which includes cryptography-based methods and hashing functions. The other one is the perturba...
The development of computer technology has brought great convenience to the production and life of the society, and has also led to the rapid growth of the demand for computer talents. Whether we can train more and better computer talents has become an inevitable development of society. In the process of training computer talents, the cultivation o...
In the traditional description methods of computer science curriculum, the teaching plans and syllabus are generally used to describe the professional knowledge of computer sciences and the overall structure of the knowledge points of the curriculum. However, relying on traditional methods is not enough to grasp the general structure of computer sc...
With the rapid development of the computer industry, the computer processing performance has been continuously improved, and artificial intelligence has been continuously developed and innovated from the beginning of the concept. Artificial intelligence has become an indispensable part of the computer industry. China’s domestic emphasis on the arti...
With the continuous development of computer education, programming teaching as a core course of computer elementary science education is receiving more and more attention. Colleges and universities have started to combine the Online Judge System (OJ) to develop programming skills of students. OJ platforms have been domestically and internationally...
Motivation: Extensive efforts have been devoted to understanding the antigenic peptides binding to MHC class I and II molecules since they play a fundamental role in controlling immune responses and due their involvement in vaccination, transplantation, and autoimmunity. The genes coding for the MHC molecules are highly polymorphic, and it is diffi...
Artificial Intelligence (AI) is one of the most popular technologies in recently years. Journals and conferences are widely viewed as major tools to track the development of technologies. Citation counting analysis is one of the most acknowledged metrics in spite of its controversial drawbacks. To the best of our knowledge, most methods based on ci...
In this paper, a number of pair-counting similarity measures associated with a general formulation of cluster ensemble are proposed. These measures are formulated based on our motivation to evaluate the consistency between an individual clustering solution and a cluster ensemble solution, or that between different cluster ensemble solutions, in a u...
In a number of biological studies, the raw gene expression data are not usually published due to different causes, such as data privacy and patent rights. Instead, significant gene lists with fold change values are usually provided in most studies. However, due to variations in data sources and profiling conditions, only a small number of common si...
Global transcriptional analyses have been performed with human embryonic stem cells (hESC) derived cardiomyocytes (CMs) to identify molecules and pathways important for human CM differentiation, but variations in culture and profiling conditions have led to greatly divergent results among different studies. Consensus investigation to identify genes...
-Differentiation of pluripotent human embryonic stem cells (hESCs) to the cardiac lineage represents a potentially unlimited source of ventricular cardiomyocytes ( V: CMs), but hESC- V: CMs are developmentally immature. Previous attempts to profile hESC- V: CMs primarily relied on transcriptomic approaches, but the global proteome has not been exam...
Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression.
In this paper, we formulate the problem of identify...
Outlier ranking methods can provide a quantitative measure to evaluate the outlierness of data instances in data clustering and attract great interest in pattern recognition and data mining communities. However, it has been pointed out that the diverse scaling ranges of these scores bring difficulty to result interpretation. Moreover, popular outli...
In recent years, semi-supervised clustering receives considerable attention in the pattern recognition and data mining communities. This type of clustering algorithms takes advantage of partial prior knowledge, and significant improved performance beyond traditional unsupervised clustering algorithms is observed. In general, the partial prior knowl...
Semantic similarity defined on Gene Ontology (GO) aims to provide the functional relationship between different GO terms. In this paper, a novel method, namely the Shortest Path (SP) algorithm, for measuring the semantic similarity on GO terms is proposed based on both GO structure information and the term's property. The proposed algorithm searche...
The immune system must detect a wide variety of microbial pathogens, such as viruses, bacteria, fungi and parasitic worms, to protect the host against disease. Antigenic peptides displayed by MHC II (class II Major Histocompatibility Complex) molecules is a pivotal process to activate CD4+ TH cells (Helper T cells). The activated TH cells can diffe...
Human (h) embryonic stem cells (ESC) represent an unlimited source of cardiomyocytes (CMs); however, these differentiated cells are immature. Thus far, gene profiling studies have been performed with non-purified or non-chamber specific CMs. Here we took a combinatorial approach of using systems biology to guide functional discoveries of novel biol...
RNA structural motifs are recurrent structural elements occurring in RNA molecules. RNA structural motif recognition aims to find RNA substructures that are similar to a query motif, and it is important for RNA structure analysis and RNA function prediction. In view of this, we propose a new method known as RNA Structural Motif Recognition based on...
In this paper, Adjusted Rand Index (ARI) is generalized to two new measures based on matrix comparison: (i) Adjusted Rand Index between a similarity matrix and a cluster partition (ARImp), to evaluate the consistency of a set of clustering solutions with their corresponding consensus matrix in a cluster ensemble, and (ii) Adjusted Rand Index betwee...
Feature selection is widely established as one of the fundamental computational techniques in mining microarray data. Due to the lack of categorized information in practice, unsupervised feature selection is more practically important but correspondingly more difficult. Motivated by the cluster ensemble techniques, which combine multiple clustering...
RNA 3D motifs are recurrent substructures in an RNA subunit and are building blocks of the RNA architecture. They play an important role in binding proteins and consolidating RNA tertiary structures. RNA 3D motif searching consists of two steps: candidate generation and candidate filtering. We proposed a novel method, known as Feature-based RNA Mot...
In this paper, we propose a novel 3-D model retrieval framework, which is referred to as hybrid 3-D model associative retrieval. Unlike the conventional 3-D model similarity retrieval approach, the query model and the models obtained by 3-D model hybrid associative retrieval have the following properties: They belong to different model classes and...
Semantic similarity defined on Gene Ontology (GO) aims to provide the functional relationship between different biological processes, molecular functions, or cellular components. In this paper, a novel method, namely the Shortest Path (SP) algorithm, for measuring the semantic similarity on GO is proposed based on both the GO structure information...
Adjusted Rand Index (ARI) is one of the most popular measure to evaluate the consistency between two partitions of data sets in the areas of pattern recognition. In this paper, ARI is generalized to a new measure, Adjusted Rand Index between a similarity matrix and a cluster partition (ARImp), to evaluate the consistency between a set of clustering...
Constrained clustering has recently become an active research topic. This type of clustering methods takes advantage of partial knowledge in the form of pairwise constraints, and acquires significant improvement beyond the traditional un-supervised clustering. However, most of the existing constrained clustering methods use constraints which are se...
In this paper we propose a new partial closure-based constrained clustering algorithm. We introduce closures into the partial constrained clustering and we propose a new measurement to order the importance of the constrained closures. Experiments on public datasets demonstrate the advantages of our algorithm over the standard Kmeans and two state-o...
In this paper, we propose a filter-refinement scheme based on a new approach called Sorted Extended Gaussian Image histogram
approach (SEGI) to address the problems of traditional EGI. Specifically, SEGI first constructs a 2D histogram based on the
EGI histogram and the shell histogram. Then, SEGI extracts two kinds of descriptors from each 3D mode...
Image segmentation is a classical problem in the area of image processing, multimedia, medical image, and so on. Although
there exist a lot of approaches to perform image segmentation, few of them study the image segmentation by the cluster ensemble
approach. In this paper, we propose a new algorithm called the cluster ensemble algorithm (CEA) for...
Carrier frequency offset (CFO) largely degrade the performance of orthogonal frequency-division multiplexing (OFDM) systems because they may violate the orthogonality of sub-carriers and introduce the inter-carrier interferences. In this paper, by using some special pilots which take value from a constrained set and are interspersed in frequency-do...