International Journal of Data Mining and Bioinformatics (INT J DATA MIN BIOIN)

Publisher: Inderscience

Journal description

Mining bioinformatics data is an emerging area at the intersection between bioinformatics and data mining. The objective of the IJDMB is to facilitate collaboration between data mining researchers and bioinformaticians by presenting cutting edge research topics and methodologies in the area of data mining for bioinformatics. This perspective acknowledges the inter-disciplinary nature of the research in data mining and bioinformatics and provides a unified forum for researchers/practitioners/students/policy makers to share the latest research and developments in this fast growing multi-disciplinary research area.

Current impact factor: 0.50

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 0.495
2013 Impact Factor 0.655
2012 Impact Factor 0.393
2011 Impact Factor 0.429
2010 Impact Factor 0.681
2009 Impact Factor 0.933
2008 Impact Factor 0.667
2007 Impact Factor 0.636

Impact factor over time

Impact factor

Additional details

5-year impact 0.72
Cited half-life 4.30
Immediacy index 0.08
Eigenfactor 0.00
Article influence 0.12
Website International Journal of Data Mining and Bioinformatics website
Other titles IJDMB, Data mining and bioinformatics
ISSN 1748-5673
OCLC 318200707
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details


  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 6 months embargo
  • Conditions
    • Cannot archive until publication
    • Author's pre-print and Author's post-print on author's personal website, institutional repository or subject repository
    • Publisher copyright and source must be acknowledged
    • Must link to journal webpage and /or DOI
    • Publisher's version/PDF cannot be used, unless covered by funding agency rules
    • Authors covered by funding agency rules, may post the Publisher's Version/PDF in subject repositories after a 6 months embargo
    • Reviewed 10/02/2014
    • Author's post-print equates to Inderscience's Proof
  • Classification
    ​ yellow

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.
    International Journal of Data Mining and Bioinformatics 10/2015; 13(3):248 - 265. DOI:10.1504/IJDMB.2015.072092
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):424. DOI:10.1504/IJDMB.2015.067957
  • [Show abstract] [Hide abstract]
    ABSTRACT: As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):412. DOI:10.1504/IJDMB.2015.067956
  • [Show abstract] [Hide abstract]
    ABSTRACT: In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):458. DOI:10.1504/IJDMB.2015.067973
  • [Show abstract] [Hide abstract]
    ABSTRACT: Early classification of time series has been receiving a lot of attention recently. In this paper we present a model, which we call the Early Classification Model (ECM), that allows for early, accurate and patient-specific classification of multivariate observations. ECM is comprised of an integration of the widely used Hidden Markov Model (HMM) and Support Vector Machine (SVM) models. It attained very promising results on the datasets we tested it on: in one set of experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification. In the set of experiments tested on a sepsis therapy dataset, ECM was able to surpass the standard threshold-based method and the state-of-the-art method for early classification of multivariate time series.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):392. DOI:10.1504/IJDMB.2015.067955
  • [Show abstract] [Hide abstract]
    ABSTRACT: Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):365. DOI:10.1504/IJDMB.2015.067954
  • [Show abstract] [Hide abstract]
    ABSTRACT: The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. We applied various algorithms with parameter variation to learn from the data. The tasks were to predict mass density and to predict malignancy. The best classifier that predicts mass density is based on a support vector machine and has accuracy of 81.3%. The expert correctly annotated 70% of the mass densities. The best classifier that predicts malignancy is also based on a support vector machine and has accuracy of 85.6%, with a positive predictive value of 85%. One important contribution of this work is that our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.
    International Journal of Data Mining and Bioinformatics 09/2015; DOI:10.1504/IJDMB.2015.067319
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mining functional modules in Protein-Protein Interaction (PPI) networks is a very important research for revealing the structure-functionality relationships in biological processes. More recently, some swarm intelligence algorithms have been successfully applied in the field. This paper presents a new nature-inspired approach, ACC-FMD, which is based on ant colony clustering to detect functional modules. First, some proteins with the higher clustering coefficients are, respectively, selected as ant seed nodes. And then, the picking and dropping operations based on ant probabilistic models are developed and employed to assign proteins into the corresponding clusters represented by seeds. Finally, the best clustering result in each generation is used to perform the information transmission by updating the similarly function. Experimental results on some benchmarked datasets show that ACC-FMD outperforms the CFinder and MCODE algorithms and has comparative performance with the MINE, COACH, DPClus and Core algorithms in terms of the general evaluation metrics.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(3):331. DOI:10.1504/IJDMB.2015.067323
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a systems biology approach to the understanding of the miRNA-regulatory network in colon rectal cancer. An initial set of significant genes in Colon Rectal Cancer (CRC) were obtained by mining relevant literature. An initial set of cancer-related miRNAs were obtained from three databases: miRBase, miRWalk, Targetscan and GEO microarray experiment. First principle methods were then used to generate the global miRNA-gene network. Significant miRNAs and associated transcription factors in the global miRNA-gene network were identified using topological and sub-graph analyses. Eleven novel miRNAs were identified and three of the novel miRNAs, hsa-miR-630, hsa-miR-100 and hsa-miR-99a, were further analysed to elucidate their role in CRC. The proposed methodology effectively made use of literature data and was able to show novel, significant miRNA-transcription associations in CRC.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(1):1. DOI:10.1504/IJDMB.2015.066332
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):139. DOI:10.1504/IJDMB.2015.066775
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subgraphs that occur in complex networks with significantly higher frequency than those in randomised networks are called network motifs. Such subgraphs often play important roles in the functioning of those networks. Finding network motifs is a computationally challenging problem. The main difficulties arise from the fact that real networks are large and the size of the search space grows exponentially with increasing network and motif size. Numerous methods have been developed to overcome these challenges. This paper provides a comparative study of the key network motif discovery algorithms in the literature and presents their algorithmic details on an example network.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):180. DOI:10.1504/IJDMB.2015.066777
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):205. DOI:10.1504/IJDMB.2015.066767
  • [Show abstract] [Hide abstract]
    ABSTRACT: Predicting the class of gene expression profiles helps improve the diagnosis and treatment of diseases. Analysing huge gene expression data otherwise known as microarray data is complicated due to its high dimensionality. Hence the traditional classifiers do not perform well where the number of features far exceeds the number of samples. A good set of features help classifiers to classify the dataset efficiently. Moreover, a manageable set of features is also desirable for the biologist for further analysis. In this paper, we have proposed a linear regression-based feature selection method for selecting discriminative features. Our main focus is to classify the dataset more accurately using less number of features than other traditional feature selection methods. Our method has been compared with several other methods and in almost every case the classification accuracy is higher using less number of features than the other popular feature selection methods.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):167. DOI:10.1504/IJDMB.2015.066776
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):223. DOI:10.1504/IJDMB.2015.066768
  • [Show abstract] [Hide abstract]
    ABSTRACT: Systems toxicology, a branch of toxicology that studies drug effects at the level of biological systems, offers exciting opportunities to discover toxicity-related sub-networks using high-throughput technologies. This paper takes a computational approach to systems toxicology and investigates the use of automated signalling path detection for discovery of potential biomarkers of drug-induced non-immune neutropenia. The algorithm utilises a gene expression change measure to mine a large protein interaction network and identify chemical-toxicity signalling paths. Cytoscape-based analysis of detected signalling paths with statistically significant path expression scores reveals 'hub' proteins and a smaller sub-network of path proteins. The importance of 'hub' and drug-toxicity signalling path proteins in haematological and apoptotic signal transduction networks is investigated in order to understand the value of automated signalling path detection approach.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(1):102-14. DOI:10.1504/IJDMB.2015.066339
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents grey structure activity relationship analysis for anti-inflammation of phenolic acid phenethyl esters in human neutrophils. To study the anti-inflammation effect, 14 compounds of phenolic acid phenethyl esters are synthesised, while the inhibition on superoxide anion generation (which is linked to an inflammation effect) induced by PMA and fMLP stimulants is detected. Next, the relationship weighting of each functional group of phenolic acid phenethyl esters is found by applying the grey system theory on the measured data. Moreover, evident structure activity relationships are established to regulate the anti-inflammation effect of such compounds, e.g. the most important functional group affecting the anti-inflammation in human neutrophils is revealed. In addition, some extending results are obtained based on the grey analysis. It is interesting that the analysed result is consistent with the actual circumstance. In comparison with traditional methods, this paper applying the grey theory indicates more characteristic information about the structure activity relationships of phenolic acid phenethyl esters while fewer data samples are required.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):244. DOI:10.1504/IJDMB.2015.066769
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein degradation is critical for most cellular processes, and investigating the degradation signals in the sequence and structure is beneficial for analysing the protein stability. In this paper, we investigated in depth the intrinsic factors affecting the protein degradation based on the sequence and structure features. The results indicated that there are more hydrophobic residues on the surface of short-lived protein than the long-lived protein. The secondary structure such as coil tends to be on the surface of short-lived protein. There are more serine phosphorylation sites on the short-lived protein surface, and there is higher possibility for the short-lived proteins to start the degradation by signal of PEST motif than long-lived proteins. We also found that almost all of N terminal residues are exposed to be on the surface; therefore, the specific features of the solvent accessible surface residues are the key factors affecting intracellular protein stability.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(1):84-101. DOI:10.1504/IJDMB.2015.066338