Bioinformatics (BIOINFORMATICS )

Publisher: Oxford University Press


The journal aims to publish high quality peer-reviewed original scientific papers and excellent review articles in the fields of computational molecular biology biological databases and genome bioinformatics.

Impact factor 4.62

  • Hide impact factor history
    Impact factor
  • 5-year impact
  • Cited half-life
  • Immediacy index
  • Eigenfactor
  • Article influence
  • Website
    Bioinformatics website
  • Other titles
    Bioinformatics (Oxford, England: Online)
  • ISSN
  • OCLC
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Oxford University Press

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 12 months embargo on science, technology, medicine articles
    • 2 years embargo on arts and humanities articles
    • Some titles may have different embargoes
  • Conditions
    • Pre-print can only be posted prior to acceptance
    • Pre-print must be accompanied by set statement (see link)
    • Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
    • Pre-print on author's personal website, employer website, free public server or pre-prints in subject area
    • Post-print in Institutional repositories or Central repositories
    • Publisher version cannot be used except for Nucleic Acids Research articles
    • Published source must be acknowledged
    • Must link to publisher version
    • Set phrase to accompany archived copy (see policy)
    • Articles in some journals can be made Open Access on payment of additional charge
    • Eligible UK authors may deposit in OpenDepot
    • Publisher will deposit on behalf of NIH funded authors to PubMed Central, Nucleic Acids Research authors must pay their fee first
    • Some titles may use different policies
  • Classification
    ​ yellow

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Recent progress in live-cell imaging and modeling techniques has resulted in generation of a large amount of quantitative data (from experimental measurements and computer simulations) on spatiotemporal dynamics of biological objects such as molecules, cells and organisms. Although many research groups have independently dedicated their efforts to developing software tools for visualizing and analyzing these data, these tools are often not compatible with each other because of different data formats. We developed an open unified format, Biological Dynamics Markup Language (BDML; current version: 0.2), which provides a basic framework for representing quantitative biological dynamics data for objects ranging from molecules to cells to organisms. BDML is based on Extensible Markup Language (XML). Its advantages are machine- and human-readability and extensibility. BDML will improve the efficiency of development and evaluation of software tools for data visualization and analysis. Availability: A specification and a schema file for BDML are freely available online at SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online. © The Author(s) 2014. Published by Oxford University Press.
    Bioinformatics 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Ustiloxins A and B are toxic cyclic tetrapeptides, Tyr-Val/Ala-Ile-Gly (Y-V/A-I-G), that were originally identified from Ustilaginoidea virens, a pathogenic fungus affecting rice plants. Contrary to our report that ustiloxin B is ribosomally synthesised in Aspergillus flavus, a recent report suggested that ustiloxins are synthesised by a non-ribosomal peptide synthetase in U. virens. Thus, we analysed the U. virens genome, to identify the responsible gene cluster. Results: The biosynthetic gene cluster was identified from the genome of U. virens based on homologies to the ribosomal peptide biosynthetic gene cluster for ustiloxin B identified from A. flavus. It contains a gene encoding precursor protein having five YVIG and three YAIG motifs for ustiloxins A and B, respectively, strongly indicating that ustiloxins A and B from U. virens are ribosomally synthesised. Availability: Accession codes of the U. virens and A. flavus gene clusters in NCBI are BR001221 and BR001206, respectively. Contact: or
    Bioinformatics 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: Isotope trace detection is a fundamental step for XC-MS data-analysis that faces a multitude of technical challenges on complex samples. The Kalman filter application to isotope trace detection addresses some of these challenges; it discriminates closely eluting isotope traces in the m/z dimension, flexibly handles heteroscedastic m/z variances and does not bin the m/z axis. Yet the behavior of this Kalman filter application has not been fully characterized since no cost-free open-source implementation exists and incomplete evaluation standards for isotope trace detection persist. RESULTS: Massifquant is an open source solution for Kalman filter isotope trace detection that has been subjected to novel and rigorous methods of performance evaluation. The presented evaluation with accompanying annotations and optimization guide sets a new standard for comparative isotope trace detection. Compared to centWave, matchedFilter, and MZMine2-alternative isotope trace detection engines-Massifquant detected more true isotope traces in a real LC-MS complex sample, especially low-intensity isotope traces. It also offers competitive specificity and equally effective quantitation accuracy. AVAILABILITY: Massifquant is integrated into XCMS with GPL license ≥ 2:0 and hosted by Bioconductor: Annotation data is archived at Parameter optimization code and documentation is hosted at CONTACT: or
    Bioinformatics 09/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: RNA-seq techniques generate massive amounts of expression data. Several pipelines (e.g. Tophat and Cufflinks) are broadly applied to analyse these data sets. However, accessing and handling the analytical output remains challenging for non-experts. We present the RNASeqExpressionBrowser, an open-source web interface that can be used to access the output from RNA-seq expression analysis packages in different ways as it allows browsing for genes by identifiers, annotations or sequence similarity. Gene expression information can be loaded as long as it is represented in a matrix like format. Additionally, data can be made available by setting up the tool on a public server. For demonstration purposes, we have set up a version providing expression information from the barley genome. The source code and a show case are accessible at:
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ability to engineer control systems of gene expression is instrumental for synthetic biology. Thus, bioinformatic methods that assist such engineering are appealing because they can guide the sequence design and prevent costly experimental screening. In particular, RNA is an ideal substrate to de novo design regulators of protein expression by following sequence-to-function models. We have implemented a novel algorithm, RiboMaker, aimed at the computational, automated design of bacterial riboregulation. RiboMaker reads the sequence and structure specifications, which codify for a gene regulatory behavior, and optimizes the sequences of a small regulatory RNA and a 5' untranslated region for an efficient intermolecular interaction. To this end, it implements an evolutionary design strategy, where random mutations are selected according to a physicochemical model based on free energies. The resulting sequences can then be tested experimentally, providing a new tool for synthetic biology, and also for investigating the riboregulation principles in natural systems. Availability: Web server is available at Source code, instructions, and examples are freely available for download at CONTACT:
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Prions are self-templating protein aggregates that stably perpetuate distinct biological states and are of keen interest to researchers in both evolutionary and biomedical science. The best understood prions are from yeast and have a prion-forming domain with strongly biased amino acid composition, most notably enriched for Q or N. PLAAC is a web application that scans protein sequences for domains with P: rion-L: ike A: mino A: cid C: omposition. Users can upload sequence files, or paste sequences directly into a textbox. PLAAC ranks the input sequences by several summary scores and allows scores along sequences to be visualized. Text output files can be downloaded for further analyses, and visualizations saved in PDF and PNG formats. Availability and Implementation: The Ruby-based web framework, and the command-line software (implemented in Java, with visualization routines in R) are available at: under the MIT license. All software can be run under OS X, Windows, and Unix.,
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes, and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Availability and Implementation: Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein domains are fundamental units of protein structure, function and evolution, thus it is critical to gain a deep understanding of protein domain organization. Previous works have attempted to identify key residues involved in organization of domain architecture. Since one of the most important characteristics of domain architecture is the arrangement of secondary structure elements (SSEs), here we present a picture of domain organization through an integrated consideration of SSE arrangements and residue contact networks. In this work, by representing SSEs as main-chain scaffolds and side-chain interfaces and through construction of residue contact networks, we have identified the SSE interfaces well packed within protein domains as SSE packing clusters. 17334 SSE packing clusters were recognized from 9015 SCOP domains of less than 40% sequence identity. The similar SSE packing clusters were observed not only among domains of the same folds, but also among domains of different folds, indicating their roles as common scaffolds for organization of protein domains. Further analysis of 14 small single-domain proteins reveals a high correlation between the SSE packing clusters and the folding nuclei. Consistent with their important roles in domain organization, SSE packing clusters were found to be more conserved than other regions within the same proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2014; 30(17).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clustering of chemical and biochemical data based on observed features is a central cognitive step in the analysis of chemical substances, in particular in combinatorial chemistry, or of complex biochemical reaction networks. Very often, for reasons unknown to the researcher, this step produces disappointing results. Once the sources of the problem are known, improved clustering methods might revitalize the statistical approach of compound and reaction search and analysis. Here, we present a generic mechanism that may be at the origin of many clustering difficulties. The variety of dynamical behaviors that can be exhibited by complex biochemical reactions upon variation of the system parameters, are fundamental system fingerprints. In parameter space, shrimp-like or swallow-tail structures separate parameter sets that lead to stable periodic dynamical behavior from those leading to irregular behavior. We work out the genericity of this phenomenon and demonstrate novel examples for their occurrence in realistic models of biophysics. While we elucidate the phenomenon by considering the emergence of periodicity in dependence on system parameters in a low-dimensional parameter space, the conclusions from our simple setting are shown to continue to be valid for features in a higher-dimensional feature space, as long as the feature-generating mechanism is not too extreme and the dimension of this space is not too high compared to the amount of available data. For online versions of super paramagnetic clustering see:
    Bioinformatics 05/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We created a fast, robust, and general C++ implementation of a single-nucleotide polymorphism (SNP) set enrichment algorithm to identify cell types, tissues, and pathways affected by risk loci. It tests trait-associated genomic loci for enrichment specificity to conditions (cell types, tissues, pathways) in a matrix of genes and conditions. We use a nonparametric statistical approach to compute empirical p-values by comparison to null SNP sets. As a proof of concept, we present novel applications of our method to four sets of genome-wide significant SNPs associated with red blood cell count, multiple sclerosis, celiac disease, and HDL cholesterol. CONTACT:
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Phylogenetic trees with hundreds of thousands of leaves are now being inferred from sequence data, posing significant challenges for visualization and exploratory analysis. Image data supplying valuable context for species in trees (and cues for exploring them) are becoming increasingly available in biodiversity databases and elsewhere, but have rarely been built into tree visualization software in a scalable way. Ceiba lets the user explore large trees and inspect image collection arrays (sets of "homologous" images) comprising mixtures of 2D and 3D image objects. Ceiba exploits recent improvements in graphics hardware, OpenGL toolkits, and many standard high performance computer graphics strategies, such as texture compression, level of detail control, culling, animations, and image caching. Its tree layouts can be tuned by user provided phylogenetic definitions of subtrees. The code has been extensively tested on phylogenies with up to 55,000 leaves and images. A manual, data sets, source code (distributed under GPL) and binaries for OS X are available at
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. The Skyline document model contains extensive mass spectrometry data from targeted proteomics experiments performed using selected reaction monitoring (SRM), parallel reaction monitoring (PRM), and data independent and data dependent acquisition (DIA and DDA) methods. Researchers have written software tools that perform statistical analysis of the experimental data contained within Skyline documents. The new external tools framework allows researchers to integrate their tools into Skyline without modifying the Skyline codebase. Installed tools provide point-and-click access to downstream statistical analysis of data processed in Skyline. The framework also specifies a uniform interface to format tools for installation into Skyline. Tool developers can now easily share their tools with proteomics researchers using Skyline. Skyline is available as a single-click, self-updating web installation at This website also provides access to installable external tools and documentation.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In Liquid Chromatography Mass Spectrometry/Tandem Mass Spectrometry (LC-MS/MS), it is necessary to link tandem MS identified peptide peaks so that protein expression changes between the two runs can be tracked. However, only a small number of peptides can be identified and linked by tandem MS in two runs, and it becomes necessary to link peptide peaks with tandem identification in one run to their corresponding ones in another run without identification. In the past, peptide peaks are linked based on similarities in retention time, mass, or peak shape after retention time alignment, which corrects mean retention time shifts between runs. However, the accuracy in linking is still limited especially for complex samples collected from different conditions. Consequently, large scale proteomics studies that require comparison of protein expression profiles of hundreds of patients can not be carried out effectively. In this paper, we consider the problem of linking peptides from a pair of LC-MS/MS runs, and propose a new method, PeakLink (PL), which uses information in both the time and frequency domain as inputs to a non-linear support vector machine (SVM) classifier. The PL algorithm first uses a threshold on a retention time likelihood ratio score to remove candidate corresponding peaks with excessively large elution time shifts, then PL calculates the correlation between a pair of candidate peaks after reducing noise through wavelet transformation. After converting retention time and peak shape correlation to statistical scores, an SVM classifier is trained and applied for differentiating corresponding and non-corresponding peptide peaks. PL is tested in multiple challenging cases, in which LC-MS/MS samples are collected from different disease states, different instruments, and different labs. Testing results show significant improvement in linking accuracy comparing to other algorithms. avaliable online CONTACT:
    Bioinformatics 05/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Several state-of-the-art methods for isoform identification and quantification are based on l1-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the l1-penalty are either restricted to genes with few exons, or only run the regression algorithm on a small set of pre-selected isoforms. We introduce a new technique called FlipFlop which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available. Source code is freely available as an R package from the Bioconductor web site ( and more information is available at
    Bioinformatics 05/2014;