International Journal of Data Mining and Bioinformatics (INT J DATA MIN BIOIN)

Publisher: Inderscience

Journal description

Mining bioinformatics data is an emerging area at the intersection between bioinformatics and data mining. The objective of the IJDMB is to facilitate collaboration between data mining researchers and bioinformaticians by presenting cutting edge research topics and methodologies in the area of data mining for bioinformatics. This perspective acknowledges the inter-disciplinary nature of the research in data mining and bioinformatics and provides a unified forum for researchers/practitioners/students/policy makers to share the latest research and developments in this fast growing multi-disciplinary research area.

Current impact factor: 0.50

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 0.495
2013 Impact Factor 0.655
2012 Impact Factor 0.393
2011 Impact Factor 0.429
2010 Impact Factor 0.681
2009 Impact Factor 0.933
2008 Impact Factor 0.667
2007 Impact Factor 0.636

Impact factor over time

Impact factor

Additional details

5-year impact 0.72
Cited half-life 4.30
Immediacy index 0.08
Eigenfactor 0.00
Article influence 0.12
Website International Journal of Data Mining and Bioinformatics website
Other titles IJDMB, Data mining and bioinformatics
ISSN 1748-5673
OCLC 318200707
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details


  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 6 months embargo
  • Conditions
    • Cannot archive until publication
    • Author's pre-print and Author's post-print on author's personal website, institutional repository or subject repository
    • Publisher copyright and source must be acknowledged
    • Must link to journal webpage and /or DOI
    • Publisher's version/PDF cannot be used, unless covered by funding agency rules
    • Authors covered by funding agency rules, may post the Publisher's Version/PDF in subject repositories after a 6 months embargo
    • Reviewed 10/02/2014
    • Author's post-print equates to Inderscience's Proof
  • Classification

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Most computational methods for identifying essential proteins focus on the topological centrality of protein-protein interaction (PPI) networks. However, these methods have limitations, such as the difficulty for identifying essential proteins with low centrality values and the poor performance for incomplete PPI network. In this paper, protein complex is proven to be an important factor for determining protein essentiality and a new centrality measure, complex centrality, is proposed. The weighted average of complex centrality and subgraph centrality, called harmonic centrality (HC), is proposed to predict essential proteins. It combines PPI network topology and protein complex information and has better performance than methods based on PPI network. The improvement is higher when the PPI network is incomplete. Furthermore, a weighted PPI network is generated by integrating cellular localisation and biological process to a PPI network. The performance of HC measure is improved 5% in this weighted PPI network.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1). DOI:10.1504/IJDMB.2015.068951
  • [Show abstract] [Hide abstract]
    ABSTRACT: Analysing structure of gene networks is an important way to understand regulatory mechanisms of organism at the molecular level. In this work, gene mutual information networks are constructed based on gene expression profiles in prostate tissues with and without cancer. In order to contrast structural difference of normal and diseased networks, curves of four structural parameters are given with the change of thresholds. Then threshold discrimination intervals and discrimination weights are defined. A method of finding structural key genes with significant degree-difference is proposed. The finding of key genes will help the biomedical scientists to further research the pathogenesis of prostate cancer. Finally randomisation test is performed to prove that these structural parameters can distinguish normal and prostate cancer in their structures compared with these results in real data.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1). DOI:10.1504/IJDMB.2015.068950
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper the effect of SNPs on expression levels in Nimblegen RNA expression microarrays is investigated. A vast number of replicates of probe pairs representing both alleles of SNPs on 14 loci allows accurate estimation of the difference in signal intensities both within and between probe pairs. The majority of probe-pairs with sufficiently high expression have significant differences in expression levels within the pair and the difference shows concordance with the genotype of the samples. With two or more replicates of each probe, the allele-to-allele variance dominates the error in estimating the difference within the probe-pair, ten replicates are needed for adequate power in calling a true difference within a single probe-pair. Using the expression level of the probe within the probe-pair that has the higher value gives more accurate estimates. When using probes at loci containing known SNP's one should use probes containing both alleles of the SNP.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1). DOI:10.1504/IJDMB.2015.068949
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1):85. DOI:10.1504/IJDMB.2015.068955
  • [Show abstract] [Hide abstract]
    ABSTRACT: Class imbalance learning has recently drawn considerable attention among researchers. In this area, a rare class is the class of primary interest from the aim of classification. Unfortunately, traditional machine learning algorithms fail to detect this class because a huge majority class overwhelms a tiny minority class. In this paper, we propose a new technique called CORE to handle the class imbalance problem. The objective of CORE is to strengthen the core of a minority class and weaken the risk of misclassified minority instances nearby the borderline of a majority class. These core and borderline regions are defined by the applicability of a safe level. As a result, a minority class is more crowed and dominant. The experiment shows that CORE can significantly improve the predictive performance of a minority class when its dataset is imbalance.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1). DOI:10.1504/IJDMB.2015.068952
  • [Show abstract] [Hide abstract]
    ABSTRACT: In order to screen the more effective software for the pathway and network analysis of Kashin-Beck disease, gene microarrays, TranscriptomeBrowser, MetaCore and GeneMANIA were used for analysis. Three significant chondrocytic pathways and one network were screened by TranscriptomeBrowser; one significant pathway and one network were identified by MetaCore. BAX, APAF1, CASP6, BCL2, VEGF, SOCS3, BAK, TGFBI, TNFAIP6, TNFRSF11B and THBS1 were significant genes associated with the biological function of chondrocyte or cartilage involved in the TranscriptomeBrowser or MetaCore results. The interactions between the significant genes and their adjacent genes were searched and classified in GeneMANIA. In pathway analysis results, TranscriptomeBrowser is superior to get the interaction of pathway and co-expression compared with MetaCore; MetaCore is superior to get the interaction of physical interaction compared with TranscriptomeBrowser. In network analysis results, TranscriptomeBrowser contains more interaction message of co-localisation, MetaCore contains more interaction message of co-expression.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1):100. DOI:10.1504/IJDMB.2015.068963
  • [Show abstract] [Hide abstract]
    ABSTRACT: Position-specific scoring matrix (PSSM) has been widely used for identifying protein functional sites. However, it is 20-dimentional and contains many redundant features. The Kidera factors were reported to contain information relating almost all physical properties of amino acids, but it requires appropriate weighting coefficients to express their properties. We developed a novel method, named as KSPSSMpred, which integrated PSSM and the Kidera Factors into a 10-dimensional matrix (KSPSSM) for ligandbinding site prediction. Flavin adenine dinucleotide (FAD) was chosen as a representative ligand for this study. When compared with five other featurebased methods on a benchmark dataset, KSPSSMpred performed the best. This study demonstrates that, KSPSSM is an effective feature extraction method which can enrich PSSM with information relating 188 physical properties of residues, and reduce 50% feature dimensions without losing information included in the PSSM.
    International Journal of Data Mining and Bioinformatics 10/2015; 12(1). DOI:10.1504/IJDMB.2015.068954
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.
    International Journal of Data Mining and Bioinformatics 10/2015; 13(3):248 - 265. DOI:10.1504/IJDMB.2015.072092
  • [Show abstract] [Hide abstract]
    ABSTRACT: In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):458. DOI:10.1504/IJDMB.2015.067973
  • [Show abstract] [Hide abstract]
    ABSTRACT: As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):412. DOI:10.1504/IJDMB.2015.067956
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):424. DOI:10.1504/IJDMB.2015.067957
  • [Show abstract] [Hide abstract]
    ABSTRACT: Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(4):365. DOI:10.1504/IJDMB.2015.067954
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mining functional modules in Protein-Protein Interaction (PPI) networks is a very important research for revealing the structure-functionality relationships in biological processes. More recently, some swarm intelligence algorithms have been successfully applied in the field. This paper presents a new nature-inspired approach, ACC-FMD, which is based on ant colony clustering to detect functional modules. First, some proteins with the higher clustering coefficients are, respectively, selected as ant seed nodes. And then, the picking and dropping operations based on ant probabilistic models are developed and employed to assign proteins into the corresponding clusters represented by seeds. Finally, the best clustering result in each generation is used to perform the information transmission by updating the similarly function. Experimental results on some benchmarked datasets show that ACC-FMD outperforms the CFinder and MCODE algorithms and has comparative performance with the MINE, COACH, DPClus and Core algorithms in terms of the general evaluation metrics.
    International Journal of Data Mining and Bioinformatics 09/2015; 11(3):331. DOI:10.1504/IJDMB.2015.067323
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):139. DOI:10.1504/IJDMB.2015.066775
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a systems biology approach to the understanding of the miRNA-regulatory network in colon rectal cancer. An initial set of significant genes in Colon Rectal Cancer (CRC) were obtained by mining relevant literature. An initial set of cancer-related miRNAs were obtained from three databases: miRBase, miRWalk, Targetscan and GEO microarray experiment. First principle methods were then used to generate the global miRNA-gene network. Significant miRNAs and associated transcription factors in the global miRNA-gene network were identified using topological and sub-graph analyses. Eleven novel miRNAs were identified and three of the novel miRNAs, hsa-miR-630, hsa-miR-100 and hsa-miR-99a, were further analysed to elucidate their role in CRC. The proposed methodology effectively made use of literature data and was able to show novel, significant miRNA-transcription associations in CRC.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(1):1. DOI:10.1504/IJDMB.2015.066332
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subgraphs that occur in complex networks with significantly higher frequency than those in randomised networks are called network motifs. Such subgraphs often play important roles in the functioning of those networks. Finding network motifs is a computationally challenging problem. The main difficulties arise from the fact that real networks are large and the size of the search space grows exponentially with increasing network and motif size. Numerous methods have been developed to overcome these challenges. This paper provides a comparative study of the key network motif discovery algorithms in the literature and presents their algorithmic details on an example network.
    International Journal of Data Mining and Bioinformatics 08/2015; 11(2):180. DOI:10.1504/IJDMB.2015.066777