International Journal of Data Mining and Bioinformatics (INT J DATA MIN BIOIN )

Journal description

Mining bioinformatics data is an emerging area at the intersection between bioinformatics and data mining. The objective of the IJDMB is to facilitate collaboration between data mining researchers and bioinformaticians by presenting cutting edge research topics and methodologies in the area of data mining for bioinformatics. This perspective acknowledges the inter-disciplinary nature of the research in data mining and bioinformatics and provides a unified forum for researchers/practitioners/students/policy makers to share the latest research and developments in this fast growing multi-disciplinary research area.

Current impact factor: 0.66

Impact Factor Rankings

2015 Impact Factor Available summer 2015
2013 / 2014 Impact Factor 0.655
2012 Impact Factor 0.393
2011 Impact Factor 0.429
2010 Impact Factor 0.681
2009 Impact Factor 0.933
2008 Impact Factor 0.667
2007 Impact Factor 0.636

Impact factor over time

Impact factor

Additional details

5-year impact 0.50
Cited half-life 0.00
Immediacy index 0.00
Eigenfactor 0.00
Article influence 0.12
Website International Journal of Data Mining and Bioinformatics website
Other titles IJDMB, Data mining and bioinformatics
ISSN 1748-5673
OCLC 318200707
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publications in this journal

  • International Journal of Data Mining and Bioinformatics 01/2015; 11(3):314.
  • International Journal of Data Mining and Bioinformatics 01/2015; 11(3):331.
  • [Show abstract] [Hide abstract]
    ABSTRACT: It has recently been shown that disease associated gene signatures can be identified by profiling tissue other than the disease related tissue. In this paper, we investigate gene signatures for Irritable Bowel Syndrome (IBS) using gene expression profiling of both disease related tissue (colon) and surrogate tissue (rectum). Gene specific joint ANOVA models were used to investigate differentially expressed genes between the IBS patients and the healthy controls taken into account both intra and inter tissue dependencies among expression levels of the same gene. Classification algorithms in combination with feature selection methods were used to investigate the predictive power of gene expression levels from the surrogate and the target tissues. We conclude based on the analyses that expression profiles of the colon and the rectum tissue could result in better predictive accuracy if the disease associated genes are known.
    International Journal of Data Mining and Bioinformatics 01/2015; 11(3):301.
  • International Journal of Data Mining and Bioinformatics 01/2015;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Early stage infections caused by fungal/oomycete spores may not be detected until signs or symptoms develop. Serological and molecular techniques are currently used for detecting these pathogens. Next-generation sequencing (NGS) has potential as a diagnostic tool, due to the capacity to target multiple unique signature loci of pathogens in an infected plant metagenome. NGS has significant potential for diagnosis of important eukaryotic plant pathogens. However, the assembly and analysis of huge amounts of sequence is laborious, time consuming, and not necessary for diagnostic purposes. Previous work demonstrated that a bioinformatic tool termed Electronic probe Diagnostic Nucleic acid Analysis (EDNA) had potential for greatly simplifying detecting Fungal and Oomycete plant pathogens in simulated metagenomes. The initial study demonstrated limitations for detection accuracy related to the analysis of matches between queries and metagenome reads. This study is a modification of EDNA demonstrating a better accuracy for detecting Fungal and Oomycete plant pathogens.
    International Journal of Data Mining and Bioinformatics 01/2015; In press.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational annotation and prediction of protein structure is very important in the post-genome era due to existence of many different proteins, most of which are yet to be verified. Mutual information based feature selection methods can be used in selecting such minimal yet predictive subsets of features. However, as protein features are organised into natural partitions, individual feature selection that ignores the presence of these views, dismantles them, and treats their variables intermixed along with those of others at best results in a complex un-interpretable predictive system for such multi-view datasets. In this paper, instead of selecting a subset of individual features, each feature subset is passed through a clustering step so that it is represented in discrete form using the cluster indices; this makes mutual information based methods applicable to view-selection. We present our experimental results on a multi-view protein dataset that are used to predict protein structure.
    International Journal of Data Mining and Bioinformatics 04/2014; 10(2):162-174.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Oligonucleotide sets are widely used in molecular biology to target a group of nucleic acid sequences using Polymerase Chain Reaction (PCR)-based technologies. Currently, the global matching efficiency of an oligonucleotide set is considered to be equal to the lower matching efficiency calculated for each oligonucleotide. However, sequences matching the limiting oligonucleotide did not always match the other oligonucleotide of the set, resulting in a biased evaluation of the matching efficiency. The Oligo- SpecificitySystem program avoid this bias by calculations of the real global matching efficiency of oligonucleotide sets. It can process all kinds of oligonucleotide sets, including the number of oligonucleotides, base pair degeneracy occurrences or mismatch occurrences.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(4):417 - 423.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Moreover, the large amount of textual knowledge in the existing biomedical literature is growing rapidly, and the creation of manual patterns from the available literature is becoming more difficult. There is an increasing demand to extract potential generic regulatory relationships from unlabelled data sets. In this paper, we describe a Semi-Supervised, Weighted Pattern Learning method (SSWPL) to extract such generic regulatory information from the literature. SSWPL can build new regulatory patterns according to predefined initial patterns from unlabelled data in the literature. These constructed regulatory patterns are then used to extract generic regulatory information from PubMed abstracts. The results presented herein demonstrate that our method can be utilised to effectively extract generic regulatory relationships from the literature by using learned, weighted patterns through semi-supervised pattern learning.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(4):401 - 416.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of the study is to evaluate gene component analysis for microarray studies. Three dimensional reduction strategies, Principle Component Regression (PCR), Partial Least Square (PLS) and Reduced Rank Regression (RRR) were applied to publicly available breast cancer microarray dataset and the derived gene components were used for tumor classification by Logistic Regression (LR) and Linear Discriminative Analysis (LDA). The impact of gene selection/filtration was evaluated as well. We demonstrated that gene component classifiers could reduce the high-dimensionality of gene expression data and the collinearity problem inherited in most modern microarray experiments. In our study gene component analysis could discriminate Estrogen Receptor (ER) positive breast cancers from negative cancers and the proposed classifiers were successfully reproduced and projected into independent microarray dataset with high predictive accuracy.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(2):149-71.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Message RNA (mRNA) is the template for protein synthesis. It carries information from DNA in the nucleus to the ribosome sites of protein synthesis in the cell. The turnover process of mRNA is a chemical event with multiple small step reactions and the degradation of mRNA molecules is an important step in gene expression. A number of mathematical models have been proposed to study the dynamics of mRNA turnover, ranging from a one-step first order reaction model to the linear multi-component models. Although the linear multi-component models provide detailed dynamics of mRNA degradation, the simple first-order reaction model has been widely used in mathematical modelling of genetic regulatory networks. To illustrate the difference between these models, we first considered a stochastic model based on the multi-component model. Then a simpler stochastic model was proposed to approximate the linear multi-component model. We also discussed the delayed one-step reaction models with different types of time delay, including the constant delay, exponentially distributed delay and Erlang distributed delay. The comparison study suggested that the one-step reaction models failed to realise the dynamics of mRNA turnover accurately. Therefore, more sophisticated one-step reaction models are needed to study the dynamics of mRNA degradation.
    International Journal of Data Mining and Bioinformatics 01/2014; 10(1):18 - 32.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of operons is a critical step for the reconstruction of biochemical and regulatory networks at the whole genome level. In this paper, a novel operon prediction model is proposed based on Markov Clustering (MCL). The model employs a graph-clustering method by MCL for prediction and does not need a classifier. In the cross-species validation, the accuracies of E. coli K12, Bacillus subtilis and P. furiosus are 92.1, 86.9 and 87.3%, respectively. Experimental results show that the proposed method has a powerful capability of operon prediction. The compiled program and test data sets are publicly available at
    International Journal of Data Mining and Bioinformatics 01/2014; 9(4):424 - 443.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the interaction patterns among biological entities in a pathway can potentially reveal the role of the entities in biological systems. Although considerable effort has been contributed to this direction, querying biological pathways remained relatively unexplored. Querying is principally different in which we retrieve pathways satisfying a given property in terms of its topology, or constituents. One such property is subnetwork matching using various constituent parameters. In this paper, we introduce a logic based framework for querying biological pathways using a novel and generic subgraph isomorphism computation technique. We develop a graphical interface called IsoKEGG to facilitate flexible querying of KEGG pathways based on isomorphic pathway topologies as well as matching any combination of node names, types, and edges. It allows editing KGML represented query pathways and returns all isomorphic patterns in KEGG pathways satisfying a given query condition for further analysis.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(1):1-21.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(1):89-105.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, mass spectrometry data analysis has become an important protein identification technique. The mass spectrometry technologies emerge as useful tools for biomarker discovery through studying protein profiles in various biological specimens. In mining mass spectrometry datasets, peak alignment is a critical issue among the preprocessing steps that affect the quality of analysis results. However, the existing peak alignment methods are sensitive to noise peaks across various mass spectrometry samples. In this paper, we proposed a novel algorithm named Two-Phase Clustering for peak Alignment (TPC-Align) to align mass spectrometry peaks across samples in the pre-processing phase. The TPC-Align algorithm sequentially considers the distribution of intensity values and the locations of mass-to-charge ratio values of peaks between samples. Moreover, TPC-Align algorithm can also report a list of significantly differential peaks between samples, which serve as the candidate biomarkers for further biological study. The proposed peak alignment method was compared to the current peak alignment approach based on one-dimension hierarchical clustering through experimental evaluations and the results show that TPC-Align outperforms the traditional method on the real dataset.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(1):52-66.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Although Ant-Miner has been used with relative ease for datasets with categorical data and small-sized feature vectors, microarray datasets, which contain a few samples with large amount of genes, are a totally different story. The Ant-Miner is an ant colony optimisation algorithm that extracts predictive rules from datasets and intrinsically works on discrete values. This study has developed a new algorithm, "Enhanced Ant-Miner" (EAM), based on previous works. EAM deals with continuous attributes as well as categorical ones and presents its captured models in the form of predictive rules. EAM has been tested versus SVM, CN2, K-means and hierarchical clustering and the results show that EAM is the best in the context of predictive accuracy. Additionally, its agent-based nature gives it a much more charming ability to speed up the whole process when compared to other trivial miners.
    International Journal of Data Mining and Bioinformatics 01/2014; 10(1):83 - 97.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present NetLoc, a novel diffusion Kernel-based Logistic Regression (KLR) algorithm for predicting protein subcellular localisation using four types of protein networks including physical PPI networks, genetic Protein-Protein Interaction (PPI) networks, mixed PPI networks and co-expression networks. NetLoc is applied to yeast protein localisation prediction. The results showed that protein networks can provide rich information for protein localisation prediction, achieving Area Under Curve (AUC) score of 0.93. We also showed that networks with high connectivity and high percentage of co-localised PPI lead to better prediction performance. Investigation showed that NetLoc is a very robust approach which can produce good performance (AUC = 0.75) only using 30% of original interactions and capable of producing overall accuracy greater than 0.5 only with 20% annotation coverage. Compared to the previous network feature based prediction algorithm which achieved AUC scores of 0.49 and 0.52 on the yeast PPI network, NetLoc achieved significantly better overall performance with the AUC of 0.74.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(4):386 - 400.