Bioinformatics (BIOINFORMATICS )

Publisher: Oxford University Press

Description

The journal aims to publish high quality peer-reviewed original scientific papers and excellent review articles in the fields of computational molecular biology biological databases and genome bioinformatics.

  • Impact factor
    5.47
  • 5-year impact
    6.05
  • Cited half-life
    6.20
  • Immediacy index
    0.67
  • Eigenfactor
    0.16
  • Article influence
    2.61
  • Website
    Bioinformatics website
  • Other titles
    Bioinformatics (Oxford, England: Online)
  • ISSN
    1367-4811
  • OCLC
    39184474
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

Oxford University Press

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • 12 month embargo on science, technology, medicine articles
    • 24 month embargo on arts and humanities articles
    • Some titles may have different embargoes
  • Conditions
    • Pre-print can only be posted prior to acceptance
    • Pre-print must be accompanied by set statement (see link)
    • Pre-print must not be replaced with post-print, instead a link to published version with amended set statement should be made
    • Pre-print on personal website, employer website, free public server or pre-prints in subject area
    • Post-print on Institutional or Central repositories
    • Publisher version cannot be used except for Nucleic Acids Research articles
    • Published source must be acknowledged
    • Must link to publisher version
    • Set phrase to accompany archived copy (see policy)
    • Articles in some journals can be made Open Access on payment of additional charge
    • Eligible UK authors may deposit in OpenDepot
    • Publisher will deposit on behalf of NIH funded authors to PubMed Central, Nucleic Acids Research authors must pay their fee first
    • Some titles may use different policies
  • Classification
    ​ yellow

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: RNA-seq techniques generate massive amounts of expression data. Several pipelines (e.g. Tophat and Cufflinks) are broadly applied to analyse these data sets. However, accessing and handling the analytical output remains challenging for non-experts. We present the RNASeqExpressionBrowser, an open-source web interface that can be used to access the output from RNA-seq expression analysis packages in different ways as it allows browsing for genes by identifiers, annotations or sequence similarity. Gene expression information can be loaded as long as it is represented in a matrix like format. Additionally, data can be made available by setting up the tool on a public server. For demonstration purposes, we have set up a version providing expression information from the barley genome. The source code and a show case are accessible at: http://mips.helmholtz-muenchen.de/plant/RNASeqExpressionBrowser/. k.mayer@helmholtz-muenchen.de.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ability to engineer control systems of gene expression is instrumental for synthetic biology. Thus, bioinformatic methods that assist such engineering are appealing because they can guide the sequence design and prevent costly experimental screening. In particular, RNA is an ideal substrate to de novo design regulators of protein expression by following sequence-to-function models. We have implemented a novel algorithm, RiboMaker, aimed at the computational, automated design of bacterial riboregulation. RiboMaker reads the sequence and structure specifications, which codify for a gene regulatory behavior, and optimizes the sequences of a small regulatory RNA and a 5' untranslated region for an efficient intermolecular interaction. To this end, it implements an evolutionary design strategy, where random mutations are selected according to a physicochemical model based on free energies. The resulting sequences can then be tested experimentally, providing a new tool for synthetic biology, and also for investigating the riboregulation principles in natural systems. Availability: Web server is available at http://ribomaker.jaramillolab.org/ Source code, instructions, and examples are freely available for download at http://sourceforge.net/projects/ribomaker/ CONTACT: Guillermo.Rodrigo@issb.genopole.fr Alfonso.Jaramillo@warwick.ac.uk.
    Bioinformatics 05/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a graphical user interface (PyCorrFit) for the fitting of theoretical model functions to experimental data obtained by fluorescence correlation spectroscopy (FCS). The program supports many data file formats and features a set of tools specialized in FCS data evaluation. Availability and Implementation: The Python source code is freely available for download from the PyCorrFit web page at http://pycorrfit.craban.de. We offer binaries for Ubuntu Linux, Mac OS X, and Microsoft Windows. paul.mueller@biotec.tu-dresden.de and weidemann@biochem.mpg.de SUPPLEMENTARY INFORMATION: Supplementary information and a documentation are available at the PyCorrFit web page.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Prions are self-templating protein aggregates that stably perpetuate distinct biological states and are of keen interest to researchers in both evolutionary and biomedical science. The best understood prions are from yeast and have a prion-forming domain with strongly biased amino acid composition, most notably enriched for Q or N. PLAAC is a web application that scans protein sequences for domains with P: rion-L: ike A: mino A: cid C: omposition. Users can upload sequence files, or paste sequences directly into a textbox. PLAAC ranks the input sequences by several summary scores and allows scores along sequences to be visualized. Text output files can be downloaded for further analyses, and visualizations saved in PDF and PNG formats. Availability and Implementation: http://plaac.wi.mit.edu/. The Ruby-based web framework, and the command-line software (implemented in Java, with visualization routines in R) are available at: http://github.com/whitehead/plaac under the MIT license. All software can be run under OS X, Windows, and Unix. oliver.king@umassmed.edu, lindquist_admin@wi.mit.edu.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes, and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Availability and Implementation: Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/~julie/SIBIS. thompson@unistra.fr.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein domains are fundamental units of protein structure, function and evolution, thus it is critical to gain a deep understanding of protein domain organization. Previous works have attempted to identify key residues involved in organization of domain architecture. Since one of the most important characteristics of domain architecture is the arrangement of secondary structure elements (SSEs), here we present a picture of domain organization through an integrated consideration of SSE arrangements and residue contact networks. In this work, by representing SSEs as main-chain scaffolds and side-chain interfaces and through construction of residue contact networks, we have identified the SSE interfaces well packed within protein domains as SSE packing clusters. 17334 SSE packing clusters were recognized from 9015 SCOP domains of less than 40% sequence identity. The similar SSE packing clusters were observed not only among domains of the same folds, but also among domains of different folds, indicating their roles as common scaffolds for organization of protein domains. Further analysis of 14 small single-domain proteins reveals a high correlation between the SSE packing clusters and the folding nuclei. Consistent with their important roles in domain organization, SSE packing clusters were found to be more conserved than other regions within the same proteins. taijiao@moon.ibp.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 05/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We created a fast, robust, and general C++ implementation of a single-nucleotide polymorphism (SNP) set enrichment algorithm to identify cell types, tissues, and pathways affected by risk loci. It tests trait-associated genomic loci for enrichment specificity to conditions (cell types, tissues, pathways) in a matrix of genes and conditions. We use a nonparametric statistical approach to compute empirical p-values by comparison to null SNP sets. As a proof of concept, we present novel applications of our method to four sets of genome-wide significant SNPs associated with red blood cell count, multiple sclerosis, celiac disease, and HDL cholesterol. http://broadinstitute.org/mpg/snpsea CONTACT: slowikow@broadinstitute.org.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clustering of chemical and biochemical data based on observed features is a central cognitive step in the analysis of chemical substances, in particular in combinatorial chemistry, or of complex biochemical reaction networks. Very often, for reasons unknown to the researcher, this step produces disappointing results. Once the sources of the problem are known, improved clustering methods might revitalize the statistical approach of compound and reaction search and analysis. Here, we present a generic mechanism that may be at the origin of many clustering difficulties. The variety of dynamical behaviors that can be exhibited by complex biochemical reactions upon variation of the system parameters, are fundamental system fingerprints. In parameter space, shrimp-like or swallow-tail structures separate parameter sets that lead to stable periodic dynamical behavior from those leading to irregular behavior. We work out the genericity of this phenomenon and demonstrate novel examples for their occurrence in realistic models of biophysics. While we elucidate the phenomenon by considering the emergence of periodicity in dependence on system parameters in a low-dimensional parameter space, the conclusions from our simple setting are shown to continue to be valid for features in a higher-dimensional feature space, as long as the feature-generating mechanism is not too extreme and the dimension of this space is not too high compared to the amount of available data. For online versions of super paramagnetic clustering see: http://stoop.ini.uzh.ch/research/clustering. ruedi@ini.phys.ethz.ch.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. The Skyline document model contains extensive mass spectrometry data from targeted proteomics experiments performed using selected reaction monitoring (SRM), parallel reaction monitoring (PRM), and data independent and data dependent acquisition (DIA and DDA) methods. Researchers have written software tools that perform statistical analysis of the experimental data contained within Skyline documents. The new external tools framework allows researchers to integrate their tools into Skyline without modifying the Skyline codebase. Installed tools provide point-and-click access to downstream statistical analysis of data processed in Skyline. The framework also specifies a uniform interface to format tools for installation into Skyline. Tool developers can now easily share their tools with proteomics researchers using Skyline. Skyline is available as a single-click, self-updating web installation at http://skyline.maccosslab.org. This website also provides access to installable external tools and documentation.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Phylogenetic trees with hundreds of thousands of leaves are now being inferred from sequence data, posing significant challenges for visualization and exploratory analysis. Image data supplying valuable context for species in trees (and cues for exploring them) are becoming increasingly available in biodiversity databases and elsewhere, but have rarely been built into tree visualization software in a scalable way. Ceiba lets the user explore large trees and inspect image collection arrays (sets of "homologous" images) comprising mixtures of 2D and 3D image objects. Ceiba exploits recent improvements in graphics hardware, OpenGL toolkits, and many standard high performance computer graphics strategies, such as texture compression, level of detail control, culling, animations, and image caching. Its tree layouts can be tuned by user provided phylogenetic definitions of subtrees. The code has been extensively tested on phylogenies with up to 55,000 leaves and images. A manual, data sets, source code (distributed under GPL) and binaries for OS X are available at http://sourceforge.net/projects/ceiba.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In Liquid Chromatography Mass Spectrometry/Tandem Mass Spectrometry (LC-MS/MS), it is necessary to link tandem MS identified peptide peaks so that protein expression changes between the two runs can be tracked. However, only a small number of peptides can be identified and linked by tandem MS in two runs, and it becomes necessary to link peptide peaks with tandem identification in one run to their corresponding ones in another run without identification. In the past, peptide peaks are linked based on similarities in retention time, mass, or peak shape after retention time alignment, which corrects mean retention time shifts between runs. However, the accuracy in linking is still limited especially for complex samples collected from different conditions. Consequently, large scale proteomics studies that require comparison of protein expression profiles of hundreds of patients can not be carried out effectively. In this paper, we consider the problem of linking peptides from a pair of LC-MS/MS runs, and propose a new method, PeakLink (PL), which uses information in both the time and frequency domain as inputs to a non-linear support vector machine (SVM) classifier. The PL algorithm first uses a threshold on a retention time likelihood ratio score to remove candidate corresponding peaks with excessively large elution time shifts, then PL calculates the correlation between a pair of candidate peaks after reducing noise through wavelet transformation. After converting retention time and peak shape correlation to statistical scores, an SVM classifier is trained and applied for differentiating corresponding and non-corresponding peptide peaks. PL is tested in multiple challenging cases, in which LC-MS/MS samples are collected from different disease states, different instruments, and different labs. Testing results show significant improvement in linking accuracy comparing to other algorithms. avaliable online CONTACT: michelle.zhang@utsa.edu.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: compcodeR is an R package for benchmarking of differential expression analysis methods, in particular methods developed for analyzing RNA-seq data. The package provides functionality for simulating realistic RNA-seq count data sets, an interface to several of the most commonly used differential expression analysis methods and extensive functionality for evaluating and comparing different approaches on real and simulated data. compcodeR is available from http://bcf.isb-sib.ch/data/compcodeR CONTACT: Charlotte.Soneson@isb-sib.ch.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: A number of long noncoding RNAs (lncRNAs) have been identified by deep sequencing methods, but their molecular and cellular functions are known only for a limited number of lncRNAs. Current databases on lncRNAs are mostly for cataloguing purpose without providing in-depth information required to infer functions. A comprehensive resource on lncRNA function is an immediate need. We present a database for functional investigation of lncRNAs that encompasses annotation, sequence analysis, gene expression, protein binding, and phylogenetic conservation. We have compiled lncRNAs for 6 species (human, mouse, zebrafish, fruit fly, worm, yeast) from ENSEMBL, HGNC, MGI, and lncRNAdb. Each lncRNA was analyzed for coding potential and phylogenetic conservation in different lineages. Gene expression data of 208 RNA-Seq studies (4995 samples), collected from GEO, ENCODE, modENCODE, and TCGA databases, were used to provide expression profiles in various tissues, diseases, and developmental stages. Importantly, we analyzed RNA-Seq data to identify co-expressed mRNAs that would provide ample insights on lncRNA functions. The resulting gene list can be subject to enrichment analysis such as Gene Ontology or KEGG pathways. Furthermore, we compiled protein-lncRNA interactions by collecting and analyzing publicly available CLIP-seq or PAR-CLIP sequencing data. Finally, we explored evolutionarily conserved lncRNAs with correlated expression between human and six other organisms to identify functional lncRNAs. The whole contents are provided in a user-friendly web interface. lncRNAtor is available at http://lncrnator.ewha.ac.kr/. sanghyuk@ewha.ac.kr.
    Bioinformatics 05/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Several state-of-the-art methods for isoform identification and quantification are based on l1-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the l1-penalty are either restricted to genes with few exons, or only run the regression algorithm on a small set of pre-selected isoforms. We introduce a new technique called FlipFlop which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available. Source code is freely available as an R package from the Bioconductor web site (http://www.bioconductor.org/) and more information is available at http://cbio.ensmp.fr/flipflop. Jean-Philippe.Vert@mines.org.
    Bioinformatics 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present RNASeqGUI R package, a graphical user interface (GUI) for the identification of differentially expressed genes across multiple biological conditions. This R package includes some well known RNA-Seq tools, available at www.bioconductor.org. RNASeqGUI package is not just a collection of some known methods and functions, but it is designed to guide the user during the entire analysis process. RNASeqGUI package is mainly addressed to those users that have little experience with command-line software. Therefore, thanks to RNASeqGUI they can conduct analogous analyses using this simple graphical interface. Moreover, RNASeqGUI is also helpful for those who are expert R-users since it speeds up the usage of the included RNASeq methods drastically. RNASeqGUI package needs the RGTK2 graphical library (Lawrence et al. (2010)) to run. This package is open source and is freely available under GPL licence at http://bioinfo.na.iac.cnr.it/RNASeqGUI/Download CONTACT: rnaseqgui@na.iac.cnr.it SUPPLEMENTARY INFORMATION: A comprehensive user manual with a usage example is available at http://bioinfo.na.iac.cnr.it/RNASeqGUI.
    Bioinformatics 05/2014; 30(17):2514-6.

Related Journals