Bioinformatics (BIOINFORMATICS)

Publisher Oxford University Press

Description

The journal aims to publish high quality peer-reviewed original scientific papers and excellent review articles in the fields of computational molecular biology biological databases and genome bioinformatics.

  • Impact factor
    5.47
  • Website
    Bioinformatics website
  • Other titles
    Bioinformatics (Oxford, England: Online)
  • ISSN
    1367-4811
  • OCLC
    39184474
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Oxford University Press

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 12 month embargo on science, technology, medicine articles
    • 24 month embargo on arts and humanities articles
    • Some titles may have different embargoes
  • Conditions
    • Pre-print can only be posted prior to acceptance
    • Pre-print must be accompanied by set statement (see link)
    • Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
    • Pre-print on personal website, employer website, free public server or pre-prints in subject area
    • Post-print on Institutional or Central repositories
    • Publisher version cannot be used except for Nucleic Acids Research articles
    • Published source must be acknowledged
    • Must link to publisher version
    • Set phrase to accompany archived copy (see policy)
    • Articles in some journals can be made Open Access on payment of additional charge
    • Eligible UK authors may deposit in OpenDepot
    • Publisher will deposit on behalf of NIH funded authors to PubMed Central, Nucleic Acids Research authors must pay their fee first
    • Some titles may use different policies
  • Classification
    ​ yellow

Publications in this journal

  • Article: MetPP: A Computational Platform for Comprehensive Two-dimensional Gas Chromatography Time-of-flight Mass Spectrometry-based Metabolomics.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Due to the high complexity of metabolome, the comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) is considered as a powerful analytical platform for metabolomics study. However, the applications of GC×GC-TOF MS in metabolomics are not popular due to the lack of bioinformatics system for data analysis. RESULTS: We developed a computational platform entitled MetPP for analysis of metabolomics data acquired on a GC×GC-TOF MS system. MetPP can process peak filtering and merging, retention index matching, peak list alignment, normalization, statistical significance tests, and pattern recognition, using the peak lists deconvoluted from the instrument data as its input. The performance of MetPP software was tested with two sets of experimental data acquired in a spike-in experiment and a biomarker discovery experiment, respectively. MetPP not only correctly aligned the spiked-in metabolite standards from the experimental data, but also correctly recognized their concentration difference between sample groups. For analysis of the biomarker discovery data, a total of 15 metabolites were recognized with significant concentration difference between the sample groups and these results agree with the literature results of histological analysis, demonstrating the effectiveness of applying MetPP software for disease biomarker discovery. AVAILABILITY: The source code of MetPP is available at http://metaopen.sourceforge.net CONTACT: xiang.zhang@louisville.edu SUPPLEMENTARY INFORMATION: Supplementary Information data are available at Bioinformatics online.
    Bioinformatics 05/2013;
  • Article: GAGE-B: An Evaluation of Genome Assemblers for Bacterial Organisms.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content, and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very-high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms.
    Bioinformatics 05/2013;
  • Article: A Statistical Framework for Power Calculations in ChIP-seq Experiments.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Although the availability of basic analysis tools for ChIP-seq data is rapidly increasing, there has not been much progress on the related design issues. A challenging question for designing a ChIP-seq experiment is how deeply should the ChIP and the control samples be sequenced? The answer depends on multiple factors some of which can be set by the experimenter based on pilot/preliminary data. The sequencing depth of a ChIP-seq experiment is one of the key factors that determines whether or not all the underlying targets (e.g., binding locations or epigenomic profiles) can be identified with a targeted power. RESULTS: Results: We developed a statistical framework named CSSP (ChIP-seq Statistical Power) for power calculations in ChIP-seq experiments by considering a local Poisson model which is commonly adopted by many peak callers. Evaluations with simulations and data-driven computational experiments demonstrate that this framework can reliably estimate the power of a ChIP-seq experiment at different sequencing depths based on pilot data. Furthermore, it provides an analytical approach for calculating the required depth for a targeted power while controlling the false discovery rate at a user-specified level. Hence, our results enable researchers to utilize their own or publicly available data for determining required sequencing depths of their ChIP-seq experiments and potentially make better use of the multiplexing functionality of the sequencers. Evaluation of power for multiple public ChIP-seq datasets indicate that, currently, typical ChIP-seq studies are powered well for detecting large fold changes of ChIP enrichment over the control sample; but have considerably less power for detecting smaller fold changes. AVAILABILITY: Available at www.stat.wisc.edu/∼zuo/CSSP. CONTACT: keles@stat.wisc.edu.
    Bioinformatics 05/2013;
  • Article: A Combinatorial Approach to the Peptide Feature Matching Problem for Label-Free Quantification.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Label-free quantification is an important approach to identify biomarkers as it measures the quantity change of peptides across different biological samples. One of the fundamental steps for label-free quantification is to match the peptide features that are detected in two datasets to each other. Although ad hoc software tools exist for the feature matching, the definition of a combinatorial model for this problem is still not available. RESULTS: A combinatorial model is proposed in this paper. Each peptide feature contains a mass value and a retention time value, which are used to calculate a matching weight between a pair of features. The feature matching is to find the maximum weighted matching between the two sets of features, after applying a to-be-computed time alignment function to all the retention time values of one set of the features. This is similar to the maximum matching problem in a bipartite graph. But we show that the requirement of time alignment makes the problem NP-hard. Practical algorithms are also provided. Experiments on real data show that the algorithm compares favorably to other existing methods. CONTACT: binma@uwaterloo.ca.
    Bioinformatics 05/2013;
  • Article: iFUSE: integrated FUSion gene Explorer.
    [show abstract] [hide abstract]
    ABSTRACT: We present iFUSE (integrated FUSion gene Explorer), an online visualization tool that provides a fast and informative view of structural variation data and prioritizes those breaks likely representing fusion genes. This application uses calculated breakpoints to determine fusion genes based on the latest annotation for genomic sequence information, and where relevant the structural variation (SV) events are annotated with predicted RNA and protein sequences. iFUSE takes as input either a Complete Genomics (CG) junction file, a FusionMap [Ge et al. (2011)] fusion detection report file, or a file already analysed and annotated by the iFUSE application on a previous occasion. RESULTS: We demonstrate the utility of iFUSE with case studies from tumour-normal SV detection derived from Complete Genomics whole-genome sequencing results. AVAILABILITY: iFUSE is available as a web service at http://ifuse.erasmusmc.nl CONTACT: s.hiltemann@erasmusmc.nl.
    Bioinformatics 05/2013;
  • Article: International Society for Computational Biology Honors Goncalo Abecasis with Top Bioinformatics/Computational Biology Award for 2013.
    Bioinformatics 05/2013;
  • Article: Adaptive reference-free compression of sequence quality scores.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full resolution. Since our approach relies directly on redundancy present in the reads, it does not need a reference sequence and is therefore applicable to data from metagenomics and de novo experiments as well as to resequencing data. RESULTS: We show that a conservative smoothing strategy affecting 75% of the quality scores above Q2 leads to an overall quality score compression of 1 bit per value with a negligible effect on variant calling. A compression of 0.68 bit per quality value is achieved using a more aggressive smoothing strategy, again with a very small effect on variant calling. AVAILABILITY: Code to construct the BWT and LCP-array on large genomic data sets is part of the BEETL library, available as a github respository at git@github.com:BEETL/BEETL.git. CONTACT: acox@illumina.com.
    Bioinformatics 05/2013;
  • Article: CMAP: Complement Map Database.
    [show abstract] [hide abstract]
    ABSTRACT: The human complement system is increasingly perceived as an intricate protein network of effectors, inhibitors and regulators that drives critical processes in health and disease and extensively communicates with associated physiological pathways ranging from immunity and inflammation to homeostasis and development. A steady stream of experimental data reveals new fascinating connections at a rapid pace; while opening unique opportunities for research discoveries, the comprehensiveness and large diversity of experimental methods, nomenclatures, and publication sources renders it highly challenging to keep up with the essential findings. With the Complement Map Database (CMAP), we have created a novel and easily accessible research tool to assist the complement community and scientists from related disciplines in exploring the complement network and discovering new connections. AVAILABILITY: http://www.complement.us/cmap CONTACT: lambris@upenn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2013;
  • Article: pyDockWEB: a web server for rigid-body protein-protein docking using electrostatics and desolvation scoring.
    [show abstract] [hide abstract]
    ABSTRACT: pyDockWEB is a web server for the rigid-body docking prediction of protein-protein complex structures using a new version of the pyDock scoring algorithm. We use here a new custom parallel FTDock implementation, with adjusted grid size for optimal FFT calculations, and a new version of pyDock, which dramatically speeds up calculations while keeping the same predictive accuracy. Given the 3D coordinates of two interacting proteins, pyDockWEB returns the best docking orientations as scored mainly by electrostatics and desolvation energy.Availability and implementation: The server does not require registration by the user and is freely accessible for academics at http://life.bsc.es/servlet/pydock CONTACT: juanf@bsc.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2013;
  • Article: TrioVis: a visualisation approach for filtering genomic variants of parent-child trios.
    [show abstract] [hide abstract]
    ABSTRACT: TrioVis is a visual analytics tool developed for filtering on coverage and variant frequency for genomic variants from exome sequencing of parent-child trios. In TrioVis, the variant data are organised by grouping each variant based on the laws of Mendelian inheritance. Taking three Variant Call Format (VCF) files as input, TrioVis allows the user to test different coverage thresholds (i.e. different levels of stringency), to find the optimal threshold values tailored to their hypotheses, and to gain insights into the global effects of filtering through interaction. AVAILABILITY: Executables, source code and sample data are available at https://bitbucket.org/biovizleuven/triovis. Screencast is available at http://vimeo.com/user6757771/triovis. CONTACT: ryo.sakai@esat.kuleuven.be.
    Bioinformatics 05/2013;
  • Article: PconsC: Combination of direct information methods and alignments improves contact prediction.
    [show abstract] [hide abstract]
    ABSTRACT: Recently, several new contact prediction methods have been published. They use (i) large sets of multiple aligned sequences (ii) and assume that correlations between columns in these alignments can be the results of indirect interaction. These methods are clearly superior to earlier methods when it comes to predicting contacts in proteins. Here, we demonstrate that combining predictions from two prediction methods, PSICOV and plmDCA, and two alignment methods, HHblits and jackhmmer at four different e-value cutoffs, provides a relative improvement of 20% in comparison to the best single method, exceeding 70% correct predictions for one contact prediction per residue. AVAILABILITY: The source code for PconsC along with supplementary data is freely available at http://c.pcons.net/ CONTACT: arne@bioinfo.se.
    Bioinformatics 05/2013;
  • Article: FYPO: The Fission Yeast Phenotype Ontology.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: To provide consistent, computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. RESULTS: The Fission Yeast Phenotype Ontology (FYPO) is a modular ontology that uses several existing ontologies from the Open Biological and Biomedical Ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology, and Chemical Entities of Biological Interest (ChEBI). Modular ontology development facilitates partially-automated, effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis, and comparison between different experiments and even between species. AVAILABILITY: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry web site (http://obofoundry.org/). CONTACT: mah79@cam.ac.uk.
    Bioinformatics 05/2013;
  • Article: DAPPLE: a pipeline for the homology-based prediction of phosphorylation sites.
    [show abstract] [hide abstract]
    ABSTRACT: While many experimentally-characterized phosphorylation sites exist for certain organisms, such as human, rat, and mouse, few sites are known for other organisms, hampering related research efforts. We have developed a software pipeline called DAPPLE that automates the process of using known phosphorylation sites from other organisms to identify putative sites in an organism of interest. AVAILABILITY: DAPPLE is available as a web server at http://saphire.usask.ca. CONTACT: brett.trost@usask.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2013;
  • Article: Detection of Significantly Differentially Methylated Regions in Targeted Bisulfite Sequencing Data.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Bisulfite sequencing is currently the gold standard to obtain genome-wide DNA methylation profiles in eukaryotes. In contrast to the rapid development of appropriate preprocessing and alignment software, methods for analyzing the resulting methylation profiles are relatively limited so far. For instance, an appropriate pipeline to detect DNA methylation differences between cancer and control samples is still required. RESULTS: We propose an algorithm that detects significantly differentially methylated regions (DMRs) in data obtained by targeted bisulfite sequencing approaches, such as reduced representation bisulfite sequencing (RRBS). In a first step this approach tests all target regions for methylation differences by taking spatial dependence into account. An FDR procedure controls the expected proportion of incorrectly rejected regions. In a second step the significant target regions are trimmed to the actually differentially methylated regions. This hierarchical procedure detects DMRs with increased power compared to existing methods. AVAILABILITY: R/Bioconductor package BiSeq. CONTACT: katja.hebestreit@uni-muenster.de.
    Bioinformatics 05/2013;
  • Article: Twine: Display and Analysis of Cis-Regulatory Modules.
    [show abstract] [hide abstract]
    ABSTRACT: Many algorithms analyze enhancers for overrepresentation of known and novel motifs, with the goal of identifying binding sites for direct regulators of gene expression. Twine is a Java GUI with multiple graphical representations ("Views") of enhancer alignments that displays motifs, as IUPAC consensus sequences or position frequency matrices, in the context of phylogenetic conservation to facilitate cis-regulatory element discovery. Thresholds of phylogenetic conservation and motif stringency can be altered dynamically to facilitate detailed analysis of enhancer architecture. Views can be exported to vector graphics programs to generate high-quality figures for publication. Twine can be extended via Java plugins to manipulate alignments and analyze sequences. AVAILABILITY: Twine is freely available as a compiled Java .jar package or Java source code at http://labs.bio.unc.edu/crews/twine/ CONTACT: steve_crews@unc.edu SUPPLEMENTARY INFORMATION: Supplementary figures S1-S3 are available at Bioinformatics online.
    Bioinformatics 05/2013;
  • Article: A poor man's BLASTX - high-throughput metagenomic protein database search using PAUDA.
    [show abstract] [hide abstract]
    ABSTRACT: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs approximately 10 000 times faster than BLASTX, while achieving about one third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires less than 80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800 000 CPU hours, leading to the same clustering of samples by functional profiles. AVAILABILITY: PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. CONTACT: daniel.huson@uni-tuebingen.de, xiechao@bic.nus.edu.sg. SUPPLEMENTARY INFORMATION: Method details available from website.
    Bioinformatics 05/2013;
  • Article: pyGenClean: Efficient tool for genetic data clean up before association testing.
    [show abstract] [hide abstract]
    ABSTRACT: Genetic association studies making use of high throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis.Availability and Implementation: pyGenClean is an open source Python 2.7 software and is freely available, along with documentation and examples, from http://www.statgen.org. CONTACT: louis-philippe.lemieux.perreault@umontreal.ca or marie-pierre.dube@statgen.org.
    Bioinformatics 05/2013;
  • Article: InterEvScore: A novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Structural prediction of protein interactions currently remains a challenging but fundamental goal. In particular, progress in scoring functions is critical for the efficient discrimination of near-native interfaces among large sets of decoys. Many functions have been developed using knowledge-based potentials, but few make use of multi-body interactions or evolutionary information, although multi-residue interactions are crucial for protein-protein binding and protein interfaces undergo significant selection pressure to maintain their interactions. RESULTS: This article presents InterEvScore, a novel scoring function using a coarse-grained statistical potential including two- and three-body interactions, which provides each residue with the opportunity to contribute in its most favorable local structural environment. Combination of this potential with evolutionary information considerably improves scoring results on the 54 test cases from the widely used protein docking benchmark for which evolutionary information can be collected. We analyze how our way to include evolutionary information gradually increases the discriminative power of InterEvScore. Comparison with several previously published scoring functions (ZDOCK, ZRANK and SPIDER) shows the significant progress brought by InterEvScore. AVAILABILITY: {{http://biodev.cea.fr/interevol/interevscore}} CONTACT: guerois@cea.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2013;
  • Article: Evidence for the dissemination of cryptic noncoding RNAs transcribed from intronic and intergenic segments by retroposition.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Insertion of DNA segments is one mechanism by which genomes evolve. The bulk of genomic segments are now known to be transcribed into long and short noncoding RNAs (ncRNAs), promoter-associated transcripts, and enhancer-templated transcripts. These various cryptic ncRNAs are thought to be dispersed in the human and other genomes by retroposition. RESULTS: In this study, I report clear evidence for dissemination of cryptic ncRNAs transcribed from intronic and intergenic segments by retroposition. I used highly stringent conditions to find recently retroposed ncRNAs that had a poly(A) tract and were flanked by a target site duplication. I identified a total of 73 instances of retroposition in the human, mouse, and rat genomes (12, 36, and 25 instances, respectively). The inserted segments, in some cases, served as a novel exon or promoter for the associated gene, resulting in novel transcript variants. Some disseminated sequences showed sequence conservation across animals, implying a possible regulatory role. My results indicate that retroposition is one of the mechanisms for dispersion of ncRNAs. I propose that these newly inserted segments may play a role in genome evolution by potentially functioning as novel exons, promoters, or enhancers. CONTACT: yoonsoo.hahn@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2013;

Keywords

Biologie
 
Computational Biology
 
Genome
 
Genomes
 
Génomes
 
Informatietechnologie
 
Life sciences
 
Sciences de la vie
 

Related Journals