About
304
Publications
36,418
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,274
Citations
Introduction
My research interests include computational Systems Biolgy and Bio-Medicine.
Current institution
Publications
Publications (304)
Recently, Deep Neural Networks have been successfully utilized in many domains; especially in computer vision. Many famous convolutional neural networks, such as VGG, ResNet, Inception, and so forth, are used for image classification, object detection, and so forth. The architecture of these state-of-the-art neural networks has become deeper and co...
Motivation
High-throughput sequencing technology has revolutionized the study of metagenomics and cancer evolution. In a relatively simple environment, a metagenomics sequencing data is dominated by a few species. By analyzing the alignment of reads from microbial species, single nucleotide polymorphisms can be discovered and the evolutionary histo...
Aim and objective:
In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 milli...
Indoor microbial communities have important implications for human health, especially in health-care institutes (HCIs). The factors that determine the diversity and composition of microbiomes in a built environment remain unclear. Herein, we used 16S rRNA amplicon sequencing to investigate the relationships between building attributes and surface b...
Backgroud/purpose:
Aggregatibacter actinomycetemcomitans has emerged as one of the aetiological agents in periodontal disease. Although Type IV secretion systems (T4SSs) are widely distributed in many bacteria, the genetic features and distribution of T4SSs in A. actinomycetemcomitans remain unclear. In this study, we investigated the prevalence o...
Background/purpose:
Subgingival microorganisms are potentially associated with periodontal diseases. However, the correlation between the variance in the periodontal microbiome and the prevalence and severity of periodontitis remains unclear. The aim of this study was to determine the subgingival microbiota in Taiwanese individuals with severe chr...
Many transcribed RNAs are non-coding RNAs, including microRNAs (miRNAs), which bind to complementary sequences on messenger RNAs to regulate the translation efficacy. Therefore, identifying the miRNAs expressed in cells/organisms aids in understanding genetic control in cells/organisms. In this report, we determined the binding of oligonucleotides...
We propose an algorithm to solve the multiple patterns approximate string matching problem. Our algorithm uses a hashing table to quickly find all possible positions. Our algorithm is very efficient for multiple patterns with different lengths. We also implemented our algorithm to solve the short oligonucleotide alignment on a long reference DNA se...
Next-generation sequencing (NGS) technologies, such as Illumina/Solexa, ABI/SOLiD, and Roche/454 Pyrosequencing, are revolutionizing the acquisition of genomic data at relatively low cost. NGS technologies are rapidly changing the approach to complex genomic studies, opening a way to the development of personalized drugs and personalized medicine....
In the past, many computing technologies have been proposed and utilized to accelerate biologists/chemists to analyze biological and chemical data, such as homology detection, evolutionary analysis, function prediction, computer-aided drug design, and cheminformatics. Leveraging a power of these technologies, a lot of tools and services are valuabl...
In this study, we applied a 16S ribosomal RNA (rRNA) metagenomics approach to survey inanimate hospital environments (IHEs) in a respiratory care center (RCC). A total of 16 samples, including 9 from medical devices and 7 from workstations, were analyzed. Besides, clinical isolates were retrospectively analyzed during the sampling period in the RCC...
Background
Carbapenem-resistance in Acinetobacter baumannii has gradually become a global challenge. To identify the genes involved in carbapenem resistance in A. baumannii, the transcriptomic responses of the completely sequenced strain ATCC 17978 selected with 0.5 mg/L (IPM-2 m) and 2 mg/L (IPM-8 m) imipenem were investigated using RNA-sequencing...
We investigated the prevalence of a type IV secretion system (T4SS)-bearing plasmid among clinical isolates of carbapenem-resistant Acinetobacter baumannii (CRAB) using plasmid replicon typing. The complete sequence of a T4SS-bearing plasmid, pAB_CC, isolated from A. baumannii TYTH-1 was determined, and a comparative analysis of the T4SS gene modul...
Multidrug resistance (MDR) in Acinetobacter baumannii is increasingly reported and has become a significant public concern. The method responsible for the acquisition of resistance genes via integrons from the environment or intra-species in A. baumannii remains to be understood. This study was performed to investigate the transmission route of the...
Copy number variation (CNV) has been reported to be associated with disease and various cancers. Hence, identifying the accurate position and the type of CNV is currently a critical issue. There are many tools targeting on detecting CNV regions, constructing haplotype phases on CNV regions, or estimating the numerical copy numbers. However, none of...
This study employed genomewide analysis to investigate potential resistance mechanisms in Acinetobacter baumannii following imipenem exposure. Imipenem-selected mutants were generated from the imipenem-susceptible strain ATCC 17978 by multistep selection resistance. Antibiotic susceptibilities were examined, and the selected mutants originated from...
Zanamivir and Oseltamivir are both sialic acid analog inhibitors of Neuraminidase NA, which is an important target in influenza A virus treatment. Quantitative Structure-Activity Relationships QSAR is a common computational method for correlating the structural properties of compounds or inhibitors with their biological activities. The pharmcophore...
Human dihydroorotate dehydrogenase (hDHODH) is a class-2 dihydroorotate dehydrogenase. Because it is extensively used by proliferating cells, its inhibition in autoimmune and inflammatory diseases, cancers, and multiple sclerosis is of substantial clinical importance. In this study, we had two aims. The first was to develop an hDHODH pharma-similar...
Inherited cardiac conduction diseases (CCD) are rare but are caused by mutations in a myriad of genes. Recently, whole-exome sequencing has successfully led to the identification of causal mutations for rare monogenic Mendelian diseases.
To investigate the genetic background of a family affected by inherited CCD.
We used whole-exome sequencing to s...
SUMOylation, as part of the epigenetic regulation of transcription, has been intensively studied in lower eukaryotes that contain only a single SUMO protein; however, the functions of SUMOylation during mammalian epigenetic transcriptional regulation are largely uncharacterized. Mammals express three major SUMO paralogues: SUMO-1, SUMO-2, and SUMO-...
Predicting protein functional classes such as localization sites and modifications plays a crucial role in function annotation. Given a tremendous amount of sequence data yielded from high-throughput sequencing experiments, the need of efficient and interpretable prediction strategies has been rapidly amplified. Our previous approach for subcellula...
Damage to DNA is caused by ionizing radiation, genotoxic chemicals or collapsed replication forks. When DNA is damaged or cells fail to respond, a mutation that is associated with breast or ovarian cancer may occur. Mammalian cells control and stabilize the genome using a cell cycle checkpoint to prevent damage to DNA or to repair damaged DNA. Chec...
An understanding of the activities of enzymes could help to elucidate the metabolic pathways of thousands of chemical reactions that are catalyzed by enzymes in living systems. Sophisticated applications such as drug design and metabolic reconstruction could be developed using accurate enzyme reaction annotation. Because accurate enzyme reaction an...
Combining multiple information retrieval (IR) systems has been shown to improve performance over individual systems. However, it remains a challenging problem to determine when and how a set of individual systems should to be combined. In this paper, we investigate these issues using combinatorial fusion analysis and five data sets provide by TREC...
Severe gastroenteritis and foodborne illness caused by Noroviruses (NoVs) during the winter are a worldwide phenomenon. Vulnerable populations including young children and elderly and immunocompromised people often require hospitalization and may die. However, no efficient vaccine for NoVs exists because of their variable genome sequences. This stu...
Background/purpose:
BfmR, the response regulator component of the two-component system BfmRS, has important roles in biofilm formation and cellular morphology of Acinetobacter baumannii. Until now, the contribution of the sensor kinase BfmS to the virulence of this bacterium remains unknown. In this study, a bfmS knockout and complementation studi...
There have been increasing reports of bla(OXA-23)-carrying strains of carbapenem-resistant Acinetobacter baumannii (CRAB), which has become a significant public health concern in Taiwan. To determine the origin of these CRAB strains, the prevalence of CRAB and bla(OXA-23)-carrying CRAB in a regional hospital was analysed retrospectively. The genome...
Single nucleotide polymorphism (SNP) data derived from array-based technology or massive parallel sequencing are often flawed with missing data. Missing SNPs can bias the results of association analyses. To maximize information usage, imputation is often adopted to compensate for the missing data by filling in the most probable values. To better un...
Background
In the last decade, a considerable amount of research has been devoted to investigating the phylogenetic properties of organisms from a systems-level perspective. Most studies have focused on the classification of organisms based on structural comparison and local alignment of metabolic pathways. In contrast, global alignment of multiple...
Organisms used in this study. Edges represent the reactions catalyzed by enzymes in each metabolic network. All metabolic pathways were retrieved from KEGG [19].
Statistics for KEGG pathways between two pairs of organisms of Prochlorococcus and Synechococcus: The x axis represents KEGG pathway IDs, and the y axis represents the number of the constituent enzymes in the pathways. (a) (pma, pmc) from Prochlorococcus, and (pma, syx) from Prochlorococcus and Synechococcus, respectively. (b) (syw, syx) from Synec...
Comparison of reconstructed phylogenic trees. Left: Reconstruction by Chang et al. [17]. Right: Reconstruction by Zhang et al. [12]. Reprinted under the BioMed Central Open License agreement (BMC Bioinformatics).
Statistics for KEGG pathways between three pairs of organisms: (mlo, pae), (ccr, mlo) and (ccr, pae). The x axis represents KEGG pathway IDs, and the y axis represents the number of the constituent enzymes in the pathways. The two pathways ko00260 and ko00860 in the pair (mlo, pae) contain more functional orthologs than those in the pairs (ccr, mlo...
Statistics for KEGG pathways between two pairs of organisms in Lactobacillus: The x axis represents KEGG pathway IDs, and the y axis represents the number of the constituent enzymes in the pathways. (a) (lga, ljo) in obligate homofermentation, and (lfe, lga) from different fermentation types. (b) (lfe, lru) in obligate heterofermentation, and (lfe,...
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) predic...
Background
The opportunistic enterobacterium, Morganella morganii, which can cause bacteraemia, is the ninth most prevalent cause of clinical infections in patients at Changhua Christian Hospital, Taiwan. The KT strain of M. morganii was isolated during postoperative care of a cancer patient with a gallbladder stone who developed sepsis caused by b...
Supplementary Figure 1. The origin of replication was assigned based on the GC deviation of the genome using Ori-Finder (*.pdf)
Supplementary table 2. M. morganii genes involved in multidrug efflux genes (*.pdf)
Supplementary table 7. M. morganii genes involved in superoxide stress (*.pdf)
Supplementary Table 9. Specific CDSs in M. morganii compared to other Enterobacteriaceae (n = 14) and/or Proteeae (n = 5). (*.xls)
Supplementary table 1. Resequencing analysis on assembled contigs (*.xls)
Supplementary table 3. Flagellum-related genes and chemotaxis genes located in 58.8-kb locus of M. morganii (*.pdf)
Supplementary table 5. M. morganii genes involved in lipopolysaccharide or enterobacterial common antigen biosynthesis (*.pdf)
Supplementary Table 8. 2 prophages and 12 degenerate prophages (*.xls)
Supplementary Table 10. Comparison of CDSs in M. morganii compared to other Enterobacteriaceae (n = 14) and Proteeae (n = 5). (*.xls)
Supplementary table 4. Protein similarity search of Type III secretion system (T3SS) of M. morganii (*.pdf)
Supplementary table 6. Protein similarity search of insecticidal toxin of M. morganii (*.pdf)
Acinetobacter baumannii has emerged recently as a major cause of health care-associated infections due to the extent of its antimicrobial resistance
and its propensity to cause large nosocomial outbreaks. Here we report the genome sequence of Acinetobacter baumannii TYTH-1 isolated in Taiwan during 2008.
It is known that combining multiple information retrieval systems can improve the combined systems performance over the performance of individual systems in many cases. It has also been known in these cases that the performance improvement of the combined system is mainly due to: (a) performance of each of the individual systems, and (b) the divers...
Leptospirosis, a widespread zoonosis, is a re-emerging infectious disease caused by pathogenic Leptospira species. In Taiwan, Leptospira santarosai serovar Shermani is the most frequently isolated serovar, causing both renal and systemic infections. This study aimed to generate a L. santarosai serovar Shermani genome sequence and categorize its hyp...
Background
RNA interference (RNAi) is commonly applied in genome-scale gene functional screens. However, a one-on-one RNAi analysis that targets each gene is cost-ineffective and laborious. Previous studies have indicated that siRNAs can also affect RNAs that are near-perfectly complementary, and this phenomenon has been termed an off-target effect...
Tables S1-3. The lists of the siRNAs that were selected for first-round RNAi analysis with different parameters.
Figures S1 and S2. A simple example of a powerful subsequence P, and the dotplot of TF15 and TF21 sequences.
Proof to prove that the problem of finding a qualified sequence
r
maximizing the size of
T
is NP-complete.
Methods the detail of our modified greedy algorithm for selecting qualified siRNAs to perform the first-round RNAi analysis.
Genetic robustness refers to a compensatory mechanism for buffering deleterious mutations or environmental variations. Gene duplication has been shown to provide such functional backups. However, the overall contribution of duplication-based buffering for genetic robustness is rather small. In this study, we investigated whether transcriptional com...
Aspergillus species are industrially and agriculturally important as fermentors and as producers of various secondary metabolites. Among them, fungal polyketides such as lovastatin and melanin are considered a gold mine for bioactive compounds. We used a phylogenomic approach to investigate the distribution of iterative polyketide synthases (PKS) i...
High-throughput technology for genotyping has made genome-wide associations possible. Single nucleotide polymorphism (SNP) data derived from array-based technology are usually flawed due to missing data, although they have generally high call rates and good concordance rates across different genotype calling schemes. Missing SNPs can bias the resul...
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetecto...
Supplementary Table 1: The filename, LRi cutoff value, and physical position in samples used for InDel analysis in this study.
Supplementary Table 2: The filename, LRi cutoff value and repeat structure in samples used for microsatellite analysis in this study.
Supplementary Table 3: The primer paris designed for amplification of 10 CODIS locus in t...
Taiwan red-feathered country chickens (TRFCCs) are one of the main meat resources in Taiwan. Due to the lack of any systematic breeding programs to improve egg productivity, the egg production rate of this breed has gradually decreased. The prediction by zone (PreZone) program was developed to select the chickens with low egg productivity so as to...
Proteins perform most important biochemical reactions in organisms, such as the catalysis, signal transduction, and transport of nutrients. The urgent need of automatic annotation is due to the advent of high-throughput sequencing techniques in the post-genomic era. Proteins consist of domains which are elementary building units of protein folding,...
Microorganisms able to grow under artificial culture conditions comprise only a small proportion of the biosphere's total microbial community. Until recently, scientists have been unable to perform thorough analyses of difficult-to-culture microorganisms due to limitations in sequencing technology. As modern techniques have dramatically increased s...
Large-scale proteomic tree. There are 843 microbes included in this large-scale proteomic tree. The blue dotted line indicates that Archaea (lower part) and Bacteria (upper part) are separated by the probe-set clustering. Organisms of different phyla are labeled with different colors. The color codes are shown at the upper right corner of this figu...
The HGT+/- proteomic trees. This experiment involves the 415 microorganisms recorded in the HGT database that are known to possess horizontally transferred genes [44]. (a) Tree HGT+: the proteomic tree constructed using whole genomes. (b) HGT-: the proteomic tree constructed with horizontally transferred genes removed. For clarity, Tree HGT+ is div...
Sorting is a classic algorithmic problem and its importance has led to the design and implementation of various sorting algorithms on many-core graphics processing units GPUs. CUDPP Radix sort is the most efficient sorting on GPUs and GPU Sample sort is the best comparison-based sorting. Although the implementations of these algorithms are efficien...
Next-generation sequencing (NGS) technologies-based transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon discovery, novel transcriptional isoforms and genomic sequence variations. However, this technique also poses many biological and informatics challenges to ex...
The paper has been withdrawn by the authors.
FMS-like tyrosine kinase 3 (FLT-3) is strongly correlated with acute myeloid leukemia, but no FLT-3-inhibitor cocomplex structure is available to assist the design of therapeutic inhibitors. Hence, we propose a dual-layer 3D-QSAR model for FLT-3 that integrates the pharmacophore, CoMFA, and CoMSIA. We then coupled the model with the fragment-based...
Carbohydrate binding modules (CBMs) are found in polysaccharide-targeting enzymes and increase catalytic efficiency. Because only a relatively small number of CBM structures have been solved, computational modeling represents an alternative approach in conjunction with experimental assessment of CBM functionality and ligand-binding properties. An a...
Recently, many new next-generation sequencing techniques have been proposed. These techniques can produce lot of short reads rapidly. Hence, a number of tools have been developed to map these short reads to the genome. However, with more and more reads sequenced and the length of reads increases, these tools require high memory usage and huge compu...
Molecular methods for predicting prognosis in patients with oral cavity squamous cell carcinoma (OSCC) are urgently needed, considering its high recurrence rate and tendency for metastasis. The present study investigated the genetic basis of variations in gene expression associated with poor prognosis in OSCC using Affymetrix SNP 6.0 and Affymetrix...
Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of approaches have been used to predict gene f...
Statistical results of correlation evaluation. For each identified module pair, we evaluated gene correlations within and between modules. Additional file 6 lists the final results including module pairs with significant number of gene correlations and modules with significant number of gene correlations.
Statistical evaluation of the cooperation of the identified module pairs. We evaluated the statistical significance of the cooperation of each module pair identified by our method and listed module pairs that significantly cooperate with genes functional in the cell cycle process or a specific phase. The column Pair_ID lists the unique identifier o...
Cooperative relationship media ted by Cdc28 and phase-related regulators. Additional file 10 lists cooperative module pairs that cooperate with essential regulators of the yeast cell cycle. The column Regulator lists regulators cooperating with a module pair.
Functional annotation results o f the identified modules. Additional file 3 lists functional annotation results of the 82 modules. We annotated functions of the identified modules from biological processes of Gene Ontology and listed the most significant function of each module.
Supplementary discussions. Additional details of our method and discussions were described in Additional file 1.
Cell cycle-related gene set. The cell cycle-related gene set consisted of 985 genes from three types of benchmark sets, including genes significantly regulated in the cell cycle and genes annotated in functional categories of cell cycle and DNA processing [40].
Cell cycle-related genes of the identified modules. Genes that are cell cycle-regulated and/or functional in the cell cycle (cell cycle-related genes) (see the Methods section for additional details) were identified in each module. Additional file 5 lists cell cycle-related genes in the modules.
Correlated genes of the identified module pairs. Additional file 7 lists the correlated genes of each identified module pair.
Phase-regulated gene set. The phase-regulated gene set consisted of 416 genes with significant periodically changing expression identified by Cho et al. [45].
Modules containing Cdc28 and phase-related regulators. We listed modules identified by our method that contain Cdc28 and phase-related regulators.
Gene lists of the identified modules. Additional file 2 lists the genes in each identified module.
Protein-protein interaction data. Protein-protein interaction data for yeast were downloaded from the BioGRID database [13].
Multiple Sequence Alignment (MSA) is the computational biology tool for facilitating the study of DNA homology, phylogeny determinations and conserved motifs. Many MSA methods have been presented to align protein, DNA, and RNA sequences successfully but not for coding region sequences. Therefore, we propose a heuristic alignment method, CORAL-M, fo...
Signal transduction is the major mechanism through which cells transmit external stimuli to evoke intracellular biochemical responses. Understanding relationship between external stimuli and corresponding cellular responses, as well as the subsequent effects on downstream genes, is a major challenge in systems biology. Thus, a systematic approach t...
This study was undertaken to identify the genes in response to areca nut extract, a potential carcinogen of oral cancer.
Two oral cancer sublines chronically treated with areca nut extract were established. Methods such as microarray and immunohistochemistry were used to screen and validate the genes' altered expressions in areca nut extract-sublin...
Multiple sequence alignment is a scientific tool to assist the study of DNA homology, phylogeny determinations, and conserved motifs identification. Various heuristic MSA methods have been presented to obtain the resulting alignment for multiple sequences. Although these alignment tools are able to align protein, DNA, and RNA sequences successfully...