Bioinformatics (BIOINFORMATICS)

Publisher: Oxford University Press (OUP)

Journal description

The journal aims to publish high quality peer-reviewed original scientific papers and excellent review articles in the fields of computational molecular biology biological databases and genome bioinformatics.

Current impact factor: 4.98

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 4.981
2013 Impact Factor 4.621
2012 Impact Factor 5.323
2011 Impact Factor 5.468
2010 Impact Factor 4.877
2009 Impact Factor 4.926
2008 Impact Factor 4.328
2007 Impact Factor 5.039
2006 Impact Factor 4.894
2005 Impact Factor 6.019
2004 Impact Factor 5.742
2003 Impact Factor 6.701
2002 Impact Factor 4.615
2001 Impact Factor 3.421
2000 Impact Factor 3.409
1999 Impact Factor 2.259

Impact factor over time

Impact factor

Additional details

5-year impact 8.14
Cited half-life 6.90
Immediacy index 1.17
Eigenfactor 0.20
Article influence 3.57
Website Bioinformatics website
Other titles Bioinformatics (Oxford, England: Online)
ISSN 1367-4811
OCLC 39184474
Material type Document, Periodical, Internet resource
Document type Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Oxford University Press (OUP)

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 12 months embargo
  • Conditions
    • Pre-print can only be posted prior to acceptance
    • Pre-print must be accompanied by set statement (see link)
    • Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
    • Pre-print on author's personal website, employer website, free public server or pre-prints in subject area
    • Post-print in Institutional repositories or Central repositories
    • Publisher's version/PDF cannot be used
    • Published source must be acknowledged
    • Must link to publisher version
    • Set phrase to accompany archived copy (see policy)
    • Eligible authors may deposit in OpenDepot
    • The publisher will deposit in PubMed Central on behalf of NIH authors
    • Publisher last contacted on 19/02/2015
    • This policy is an exception to the default policies of 'Oxford University Press (OUP)'
  • Classification
    ​ yellow

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual's privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. Results: We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e., evaluate p-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease (KD) datasets. Availability: Download HEALER at CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2015; DOI:10.1093/bioinformatics/btv563
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Current informatic techniques for processing raw, chromatography/mass spectrometry data break down under several common, non-ideal conditions. Importantly, hydrophilic liquid interaction chromatography (a key separation technology for metabolomics) produces data which are especially challenging to process. We identify three critical points of failure in current informatic workflows: compound specific drift, integration region variance, and naive missing value imputation. We implement the Warpgroup algorithm to address these challenges. Results: Warpgroup adds peak subregion detection, consensus integration bound detection, and intelligent missing value imputation steps to the conventional informatic workflow. When compared to the conventional workflow, Warpgroup made major improvements to the processed data. The coefficient of variation for replicate injections of a complex Escherichia Coli extract were halved (a reduction of 19%). Integration regions across samples were much more robust. Additionally, many signals lost by the conventional workflow were "rescued" by the Warpgroup refinement, thereby resulting in greater analyte coverage in the processed data. Availability and implementation: Warpgroup is an open source R package available on GitHub at The package includes example data and XCMS compatibility wrappers for ease of use. Contact: and SUPPLEMENTARY INFORMATION: Supplementary information is available online.
    Bioinformatics 10/2015; DOI:10.1093/bioinformatics/btv564
  • [Show abstract] [Hide abstract]
    ABSTRACT: Availability and implementation: missMethyl is an R package available from the Bioconductor project at Contact:
    Bioinformatics 10/2015; DOI:10.1093/bioinformatics/btv560
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Antibody amino-acid sequences can be numbered in order to identify equivalent positions. Such annotations are valuable for antibody sequence comparison, protein structure modelling and engineering. Multiple different numbering schemes exist, they vary in the nomenclature they use to annotate residue positions, their definitions of position equivalence and their popularity within different scientific disciplines. However, currently no publicly available software exists that can apply all the most widely used schemes or for which an executable can be obtained under an open license. Results: ANARCI is a tool to classify and number antibody and T-cell receptor amino-acid variable domain sequences. It can annotate sequences with the five most popular numbering schemes: Kabat, Chothia, Enhanced Chothia, IMGT and AHo. Availability: ANARCI is available for download under GPLv3 license at A web-interface to the program is available at the same address. Contact:
    Bioinformatics 10/2015; DOI:10.1093/bioinformatics/btv552
  • [Show abstract] [Hide abstract]
    ABSTRACT: Availability and implementation: Cytoscape.js is implemented in JavaScript. Documentation, downloads, and source code are available at CONTACT:
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv557
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use, and flexible tools designed specifically for the analysis of time-series datasets. Results: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. Availability: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at Contact: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv532
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The protein-DNA interactions between transcription factors (TFs) and transcription factor binding sites (TFBSs, also known as DNA motifs) are critical activities in gene transcription. The identification of the DNA motifs is a vital task for downstream analysis. Unfortunately, the long-range coupling information between different DNA motifs is still lacking. To fill the void, as the first-of-its-kind study, we have identified the coupling DNA motif pairs on long-range chromatin interactions in human. Results: The coupling DNA motif pairs exhibit substantially higher DNase accessibility than the background sequences. Half of the DNA motifs involved are matched to the existing motif databases, although nearly all of them are enriched with at least one gene ontology term. Their motif instances are also found statistically enriched on the promoter and enhancer regions. Especially, we introduce a novel measurement called motif pairing multiplicity which is defined as the number of motifs that are paired with a given motif on chromatin interactions. Interestingly, we observe that motif pairing multiplicity is linked to several characteristics such as regulatory region type, motif sequence degeneracy, DNase accessibility, and pairing genomic distance. Taken into account together, we believe the coupling DNA motif pairs identified in this study can shed lights on the gene transcription mechanism under long-range chromatin interactions. Availability: The identified motif pair data is compressed and available in the supplementary materials associated with this manuscript. Contact:
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv555
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Motivation: Chemical cross-linking with mass spectrometry (XL-MS) provides structural information for proteins and protein complexes in the form of crosslinked residue proximity and distance constraints between reactive residues. Utilizing spatial information derived from cross-linked residues can therefore assist with structural modeling of proteins. Selection of computationally derived model structures of proteins remains a major challenge in structural biology. The comparison of site interactions resulting from XL-MS with protein structure contact maps can assist the selection of structural models. Availability and implementation: XLmap was implemented in R and is freely available: (
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv519
  • [Show abstract] [Hide abstract]
    ABSTRACT: PathwaysWeb is a resourced-based, well-documented web system that provides publicly available information on genes, biological pathways, Gene Ontology terms, gene-gene interaction networks (importantly, with the directionality of interactions), and links to key related PubMed documents. The PathwaysWeb API simplifies the construction of applications that need to retrieve and interrelate information across multiple, pathway-related data types from a variety of original data sources. PathwaysBrowser is a companion website that enables users to explore the same integrated pathway data. The PathwaysWeb system facilitates reproducible analyses by providing access to all versions of the integrated data sets. Although its Gene Ontology subsystem includes data for mouse, PathwaysWeb currently focuses on human data. However, pathways for mouse and many other species can be inferred with a high success rate from human pathways.Availability and Supplemental Files: PathwaysWeb can be accessed via the Internet at
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv554
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. Results: We present XTalk, a path-based approach for identifying pairs of pathways that may crosstalk. XTalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, XTalk reports the precise interactions and mechanisms that support the identified crosstalk.We applied XTalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which XTalk achieved an AUC of 0.65, a 12% improvement over the closest competing approach. The AUC varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7,000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID).We provide examples of networks computed by XTalk that accurately recovered known mechanisms of crosstalk. Availability: We will make the XTalk software available at upon publication. Crosstalk networks are available at Contact:,
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv549
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a method to identify approximately independent blocks of linkage disequilibrium (LD) in the human genome. These blocks enable automated analysis of multiple genome-wide association studies.Availability (code): (data): CONTACT:
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv546
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The binding between a peptide and an MHC is one of the most important processes for the induction of an adaptive immune response. Many algorithms have been developed to predict peptide/MHC binding. However, no approach has yet been able to give structural insight into how peptides detach from the MHC. Results: In this study we used a combination of coarse graining, Hierarchical Natural Move Monte Carlo, and stochastic conformational optimization to explore the detachment processes of 32 different peptides from HLA-A*02:01. We performed 100 independent repeats of each stochastic simulation and found that the presence of experimentally known anchor amino acids affects the detachment trajectories of our peptides. Comparison with experimental binding affinity data indicates the reliability of our approach (AROC 0.85). We also compared to a 1000 ns Molecular Dynamics simulation of a non-binding peptide (AAAKTPVIV) and HLA-A*02:01. Even in this simulation, the longest published for peptide/MHC, the peptide does not fully detach. Our approach is orders of magnitude faster and as such allows us to explore peptide/MHC detachment processes in a way not possible with all-atom Molecular Dynamics simulations. Availability: The source code is freely available for download at
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv502
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Aptamers are synthetic nucleic acid molecules that can bind biological targets in virtue of both their sequence and three-dimensional structure. Aptamers are selected using SELEX, Systematic Evolution of Ligands by EXponential enrichment, a technique that exploits aptamer-target binding affinity. The SELEX procedure, coupled with high-throughput sequencing (HT-SELEX), creates billions of random sequences capable of binding different epitopes on specific targets. Since this technique produces enormous amounts of data, computational analysis represents a critical step to screen and select the most biologically relevant sequences. Results: Here we present APTANI, a computational tool to identify target-specific aptamers from HT-SELEX data and secondary structure information. APTANI builds on AptaMotif algorithm (Hoinka et al., 2012), originally implemented to analyze SELEX data; extends the applicability of AptaMotif to HT-SELEX data; and introduces new functionalities, as the possibility to identify binding motifs, to cluster aptamer families or to compare output results from different HT-SELEX cycles. Tabular and graphical representations facilitate the downstream biological interpretation of results. Availability: APTANI is available at Contact: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv545
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This paper develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. Results: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2,339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. Availability: The related source code and documents are freely available at Contact:
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv517
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well-suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm-Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)-which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. Results: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms, and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan qPCR to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the supplementary materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. Availability: (implemented in C) CONTACT:
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv547
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The Oxford Nanopore MinION sequencer, currently in pre-release testing through the MinION Access Programme (MAP), promises long reads in real-time from a cheap, compact, USB device. Tools have been released to extract FASTA/Q from the MinION base calling output and to provide basic yield statistics. However, no single tool yet exists to provide comprehensive alignment-based quality control and error profile analysis - something that is extremely important given the speed with which the platform is evolving. Results: NanoOK generates detailed tabular and graphical output plus an in-depth multi-page PDF report including error profile, quality and yield data. NanoOK is multi-reference, enabling detailed analysis of metagenomic or multiplexed samples. Four popular Nanopore aligners are supported and it is easily extensible to include others. Availability and implementation: NanoOK is open-source software, implemented in Java with supporting R scripts. It has been tested on Linux and Mac OS X and can be downloaded from A VirtualBox VM containing all dependencies and the DH10B read set used in the paper is available from A Docker image is also available from Docker Hub - see program documentation. Contact: Information: Program documentation is available at The complete E. coli report referred to below is provided as supplementary data.
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv540
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The contig orientation problem, which we formally define as the MAX-DIR problem, has at times been addressed cursorily and at times using various heuristics. In setting forth a linear-time reduction from the MAX-CUT problem to the MAX-DIR problem, we prove the latter is NP-complete. We compare the relative performance of a novel greedy approach with several other heuristic solutions. Results: Our results suggest that our greedy heuristic algorithm not only works well, but outperforms the other algorithms due to the nature of scaffold graphs. Our results also demonstrate a novel method for identifying inverted repeats and inversion variants, both of which contradict the basic single-orientation assumption. Such inversions have previously been noted as being difficult to detect and are directly involved in the genetic mechanisms of several diseases. Availability: CONTACT:
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv548