About
35
Publications
17,404
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,338
Citations
Introduction
Current institution
Additional affiliations
January 2019 - March 2022
August 2018 - January 2019
Education
September 2015 - August 2018
September 2012 - January 2015
September 2008 - June 2012
Publications
Publications (35)
The study aims at developing a neural network model to improve the performance of Human Phenotype Ontology (HPO) concept recognition tools. We used the terms, definitions, and comments about the phenotypic concepts in the HPO database to train our model. The document to be analyzed is first split into sentences and annotated with a base method to g...
The COVID-19 (coronavirus disease 2019) pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for...
In recent years, big data deluges have resulted in exciting data science opportunities. In particular, there is always a desire to extract the most from different data sources. To address it, a promising and recurring task is to perform feature selection and feature extraction. Specifically, the objective is to obtain the non-redundant and informat...
DNA Computing is still at its infant stage since its emergence. Multiple aspects of DNA computing have been studied but most of the research results have not been applied to the reality. It has been proved to exhibit high data storage density and support efficient random data access. It also shows the potential to provide alternative facilities for...
Motivation
Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to i...
The COVID-19 pandemic has had a significant impact on society, both because of the serious health effects of COVID-19 and because of public health measures implemented to slow its spread. Many of these difficulties are fundamentally information needs; attempts to address these needs have caused an information overload for both researchers and the p...
Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify bio...
Cyberbullying and hate speeches are common issues in online etiquette. To tackle this highly concerned problem, we propose a text classification model based on convolutional neural networks for the de facto verbal aggression dataset built in our previous work and observe significant improvement, thanks to the proposed 2D TF-IDF features instead of...
A massive number of biological entities, such as genes and mutations, are mentioned in the biomedical literature. The capturing of the semantic relatedness of biological entities is vital to many biological applications, such as protein-protein interaction prediction and literature-based discovery. Concept embeddings—which involve the learning of v...
Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools...
The recent advances in DNA sequencing technology, from first-generation sequencing (FGS) to third-generation sequencing (TGS), have constantly transformed the genome research landscape. Its data throughput is unprecedented and severalfold as compared with past technologies. DNA sequencing technologies generate sequencing data that are big, sparse,...
Motivation:
Biomedical event extraction is fundamental for information extraction in molecular biology and biomedical research. The detected events form the central basis for comprehensive biomedical knowledge fusion, facilitating the digestion of massive information influx from literature. Limited by the event context, the existing event detectio...
Inspired by the success of the General Language Understanding Evaluation benchmark, we introduce the Biomedical Language Understanding Evaluation (BLUE) benchmark to facilitate research in the development of pre-training language representations in the biomedicine domain. The benchmark consists of five tasks with ten datasets that cover both biomed...
Motivation: Biomedical event detection is fundamental for information extraction in molecular biology and biomedical research. The detected events form the central basis for comprehensive biomedical knowledge fusion, facilitating the digestion of massive information influx from literature. Limited by the feature context, the existing event detectio...
The early detection of cancers has the potential to save many lives. Multianalyte blood test results could provide valuable information for early cancer detection. A recent attempt to detect multiple cancer types from multianalyte blood test results by logistic regression and random forest has been demonstrated successful in 2018 \cite{pmid29348365...
The Gene Expression Omnibus (GEO) repository harbours an exponentially increasing number of gene expression studies. The expression data, as well as the related metadata, provides an abundant resource for knowledge discovery. Each study in GEO focuses on the gene expression perturbation of a specific subject (e.g. gene, drug, and disease). The iden...
Transcription factors (TFs) are the major components of human gene regulation. In particular, they bind onto specific DNA sequences and regulate neighborhood genes in different tissues at different developmental stages. Non-synonymous single nucleotide polymorphisms on its protein-coding sequences could result in undesired consequences in human. Th...
Motivation:
Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as...
Verbal aggression and cyberbullying are widely concerned issues in netiquette. In this article, we introduce a text mining system that can detect whether a certain paragraph contains the aggressive sentiment, and demonstrate its performance with different classification models. In addition, it is observed that our system works well on both our manu...
Understanding genome-wide protein-DNA interactions forms the basis for further focused studies. In particular, the chromatin immunoprecipitation (ChIP) with sequencing (ChIPSeq) technology can enable us to measure the genome-wide occupancy of DNA-binding protein of interest in vivo. Multiple ChIP-Seq runs thus inherent the potential for us to decip...
In this paper, we propose a new algorithm for mining association rules in corpus efficiently. Compared to classical transactional association rule mining problems, corpus contains large amount of items, and what's more, there are by far more itemsets in corpus, and traditional association rule mining algorithm can-not handle corpus efficiently. To...