Bioinformatics (BIOINFORMATICS)

Publisher: Oxford University Press (OUP)

Journal description

The journal aims to publish high quality peer-reviewed original scientific papers and excellent review articles in the fields of computational molecular biology biological databases and genome bioinformatics.

Current impact factor: 4.98

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 4.981
2013 Impact Factor 4.621
2012 Impact Factor 5.323
2011 Impact Factor 5.468
2010 Impact Factor 4.877
2009 Impact Factor 4.926
2008 Impact Factor 4.328
2007 Impact Factor 5.039
2006 Impact Factor 4.894
2005 Impact Factor 6.019
2004 Impact Factor 5.742
2003 Impact Factor 6.701
2002 Impact Factor 4.615
2001 Impact Factor 3.421
2000 Impact Factor 3.409
1999 Impact Factor 2.259

Impact factor over time

Impact factor

Additional details

5-year impact 8.14
Cited half-life 6.90
Immediacy index 1.17
Eigenfactor 0.20
Article influence 3.57
Website Bioinformatics website
Other titles Bioinformatics (Oxford, England: Online)
ISSN 1367-4811
OCLC 39184474
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Oxford University Press (OUP)

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 12 months embargo
  • Conditions
    • Pre-print can only be posted prior to acceptance
    • Pre-print must be accompanied by set statement (see link)
    • Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
    • Pre-print on author's personal website, employer website, free public server or pre-prints in subject area
    • Post-print in Institutional repositories or Central repositories
    • Publisher's version/PDF cannot be used
    • Published source must be acknowledged
    • Must link to publisher version
    • Set phrase to accompany archived copy (see policy)
    • Eligible authors may deposit in OpenDepot
    • The publisher will deposit in PubMed Central on behalf of NIH authors
    • Publisher last contacted on 19/02/2015
    • This policy is an exception to the default policies of 'Oxford University Press (OUP)'
  • Classification

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Availability: The genefu package is available from Bioconductor. <>. Source code is also available on Github <>. Contact: SUPPLEMENTARY INFORMATION: available at Bioinformatics online.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv693
  • [Show abstract] [Hide abstract]
    ABSTRACT: Availability: RNA-Enrich is available at or from supplemental material as R code.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv694
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: High-throughput sequencing technologies provide access to an increasing number of bacterial genomes. Today, many analyses involve the comparison of biological properties among many strains of a given species, or among species of a particular genus. Tools that can help the microbiologist with these tasks become increasingly important. Results: Insyght is a comparative visualization tool whose core features combine a synchronized navigation across genomic data of multiple organisms with a versatile interoperability between complementary views. In this work, we have greatly increased the scope of the Insyght public dataset by including 2688 complete bacterial genomes available in Ensembl thus vastly improving its phylogenetic coverage. We also report the development of a virtual machine that allows users to easily set up and customize their own local Insyght server. Availability: CONTACT:
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv689
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: A particular challenge of the current omics age is to make sense of the inferred differential expression of genes and proteins. The most common approach is to perform a gene ontology (GO) enrichment analysis, thereby relying on a database that has been extracted from a variety of organisms and that can therefore only yield reliable information on evolutionary conserved functions. Results: We here present a web-based application for a taxon-specific gene set exploration and enrichment analysis, which is expected to yield novel functional insights into newly determined gene sets. The approach is based on the complete collection of curated high-throughput gene expression data sets for the model nematode Caenorhabditis elegans, including 1786 gene sets from more than 350 studies. Availability and implementation: WormExp is available at Contact:,, or SUPPLEMENTARY INFORMATION: available at Bioinformatics online.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv667
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally-tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. Results: We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualisation of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. Availability and implementation: The annotation ontology for rulebased models can be found at The krdf tool and associated executable examples are available at Contact:,
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv660
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: DNA methylation is a key epigenetic modification that can modulate gene expression. Over the past decade, a lot of studies have focused on profiling DNA methylation and investigating its alterations in complex diseases such as cancer. While early studies were mostly restricted to CpG islands or promoter regions, recent findings indicate that many of important DNA methylation changes can occur in other regions and DNA methylation needs to be examined on a genome-wide scale. In this article, we apply the wavelet-based functional mixed model methodology to analyze the high-throughput methylation data for identifying differentially methylated loci across the genome. Contrary to many commonly-used methods that model probes independently, this framework accommodates spatial correlations across the genome through basis function modeling as well as correlations between samples through functional random effects, which allows it to be applied to many different settings and potentially leads to more power in detection of differential methylation. Results: We applied this framework to three different high-dimensional methylation data sets (CpG Shore data, THREE data, and NIH Roadmap Epigenomics data), studied previously in other works. A simulation study based on CpG Shore data suggested that in terms of detection of differentially methylated loci, this modeling approach using wavelets outperforms analogous approaches modeling the loci as independent. For the THREE data, the method suggests newly detected regions of differential methylation, which were not reported in the original study. Availability: Automated software called WFMM is available at CpG Shore data is available at NIH Roadmap Epigenomics data is available at Contact:
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv659
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. Results: We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. Availability: BRAKER1 is available for download at and Contact: &
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv661
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The recently released Oxford Nanopore MinION sequencing platform presents many innovative features opening up potential for a range of applications not previously possible. Among these features, the ability to sequence in real-time provides a unique opportunity for many time-critical applications. While many software packages have been developed to analyse its data, there is still a lack of toolkits that support the streaming and real-time analysis of MinION sequencing data. Results: We developed npReader, an open-source software package to facilitate real-time analysis of MinION sequencing data. npReader can simultaneously extract sequence reads and stream them to downstream analysis pipelines while the samples are being sequenced on the MinION device. It provides a command line interface for easy integration into a bioinformatics work flow, as well as a graphical user interface which concurrently displays the statistics of the run. It also provides an application programming interface for development of streaming algorithms in order to fully utilize the extent of nanopore sequencing potential. Availability and implementation: npReader is written in Java and is freely available at Contact: Minh Duc Cao ( and Lachlan J. M. Coin (
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv658
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Accurate detection of differentially expressed genes between tumor and normal samples is a primary approach of cancer-related biomarker identification. Due to the infiltration of tumor surrounding normal cells, the expression data derived from tumor samples would always be contaminated with normal cells. Ignoring such cellular contamination would deflate the power of detecting DE genes and further confound the biological interpretation of the analysis results. For the time being, there does not exist any differential expression analysis approach for RNA-seq data in literature that can properly account for the contamination of tumor samples. Results: Without appealing to any extra information, we develop a new method 'contamDE' based on a novel statistical model that associates RNA-seq expression levels with cell types. It is demonstrated through simulation studies that contamDE could be much more powerful than the existing methods that ignore the contamination. In the application to two cancer studies, contamDE uniquely found several potential therapy and prognostic biomarkers of prostate cancer and non-small cell lung cancer. Availability and implementation: An R package contamDE is freely available at Contact: SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv657
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: In the Wright-Fisher diffusion, the transition density function (TDF) describes the time-evolution of the population-wide frequency of an allele. This function has several practical applications in population genetics, and computing it for biologically realistic scenarios with selection and demography is an important problem. Results: We develop an efficient method for finding a spectral representation of the TDF for a general model where the effective population size, selection coefficients, and mutation parameters vary over time in a piecewise constant manner. Availability: The method, called spectralTDF, is available at Contact:
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv627
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Predictive tools that model protein-ligand binding on demand are needed to promote ligand research in an innovative drug-design environment. However, it takes considerable time and effort to develop predictive tools that can be applied to individual ligands. An automated production pipeline that can rapidly and efficiently develop user-friendly protein-ligand binding predictive tools would be useful. Results: We developed a system for automatically generating protein-ligand binding predictions. Implementation of this system in a pipeline of Semantic Web technique-based web tools will allow users to specify a ligand and receive the tool within 0.5-1 day. We demonstrated high prediction accuracy for three machine learning algorithms and eight ligands. Availability: The source code and web application are freely available for download at They are implemented in Python and supported on Linux. Contact:, SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv593
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The de novo identification of the initiation and termination zones-regions that replicate earlier or later than their upstream and downstream neighbours, respectively-remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid architecture combining a pre-trained, deep neural network and a hidden Markov model (DNN-HMM) for the de novo identification of replication domains using replication timing profiles. Our results demonstrate that DNN-HMM can significantly outperform strong, discriminatively trained Gaussian mixture model-hidden Markov model (GMM-HMM) systems and other six reported methods that can be applied to this challenge. We applied our trained DNN-HMM to identify distinct replication domain types, namely the early replication domain (ERD), the down transition zone (DTZ), the late replication domain (LRD), and the up transition zone (UTZ), using newly replicated DNA sequencing (Repli-Seq) data across 15 human cells. A subsequent integrative analysis revealed that these replication domains harbour unique genomic and epigenetic patterns, transcriptional activity, and higher-order chromosomal structure. Our findings support the 'replication-domain' model, which states (1) that ERDs and LRDs, connected by UTZs and DTZs, are spatially compartmentalized structural and functional units of higher-order chromosomal structure, (2) that the adjacent DTZ-UTZ pairs form chromatin loops, and (3) that intra-interactions within ERDs and LRDs tend to be short-range and long-range, respectively. Our model reveals an important chromatin organizational principle of the human genome and represents a critical step toward understanding the mechanisms regulating replication timing. Availability: Our DNN-HMM method and three additional algorithms can be freely accessed at The replication domain regions identified in this study are available in GEO under the accession ID GSE53984. Contact: and SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv643
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Advances in high-throughput technologies have led to the acquisition of various types of -omic data on the same biological samples. Each data type gives independent and complementary information that can explain the biological mechanisms of interest. While several studies performing independent analyses of each dataset have led to significant results, a better understanding of complex biological mechanisms requires an integrative analysis of different sources of data. Results: Flexible modeling approaches, based on penalized likelihood methods and expectation-maximization (EM) algorithms, are studied and tested under various biological relationship scenarios between the different molecular features and their effects on a clinical outcome. The models are applied to genomic datasets from two cancer types in the Cancer Genome Atlas project: glioblastoma multiforme and ovarian serous cystadenocarcinoma. The integrative models lead to improved model fit and predictive performance. They also provide a better understanding of the biological mechanisms underlying patients' survival. Availability: Source code implementing the integrative models is freely available at along with example datasets and sample R script applying the models to these data. The TCGA datasets used for analysis are publicly available at Contact: or SUPPLEMENTARY MATERIAL: Details of the EM algorithm are provided in the online Appendix.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv653
  • [Show abstract] [Hide abstract]
    ABSTRACT: Availability and implementation: the source code for TESS is freely available at Contact:
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv651
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Quantitative shape analysis is required by a wide range of biological studies across diverse scales, ranging from molecules to cells and organisms. In particular, high-throughput and systems- level studies of biological structures and functions have started to produce large volumes of complex high-dimensional shape data. Analysis and understanding of high-dimensional biological shape data require dimension reduction techniques. Results: We have developed a technique for nonlinear dimension reduction of 2D and 3D biological shape representations on their Riemannian spaces. A key feature of this technique is that it preserves distances between different shapes in an embedded low dimensional shape space. We demonstrate an application of this technique by combining it with nonlinear mean-shift clustering on the Riemannian spaces for unsupervised clustering of shapes of cellular organelles and proteins. Availability and implementation: Source code and data for reproducing results of this paper are freely available at The implementation was made in MATLAB and supported on MS Windows, Linux, and Mac OS. Contact:
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv648
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Advances in chromosome conformation capture and next-generation sequencing technologies are enabling genome-wide investigation of dynamic chromatin interactions. For example, Hi-C experiments generate genome-wide contact frequencies between pairs of loci by sequencing DNA segments ligated from loci in close spatial proximity. One essential task in such studies is peak calling, that is, detecting non-random interactions between loci from the two-dimensional contact frequency matrix. Successful fulfillment of this task has many important implications including identifying long-range interactions that assist interpreting a sizable fraction of the results from genome-wide association studies. The task - distinguishing biologically meaningful chromatin interactions from massive numbers of random interactions - poses great challenges both statistically and computationally. Model-based methods to address this challenge are still lacking. In particular, no statistical model exists that takes the underlying dependency structure into consideration. Results: In this paper we propose a hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix. By borrowing information from neighboring loci pairs, our method demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis. Availability and implementation: The source code can be downloaded at: CONTACT: or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv650
  • [Show abstract] [Hide abstract]
    ABSTRACT: Availability: Web-server at Standalone also at (Mac OSX/Linux). Contact: SUPPLEMENTARY INFORMATION: Supplementary methods; Figures S1 to S5; Table S1.
    Bioinformatics 11/2015; DOI:10.1093/bioinformatics/btv645