What is bioinformatics? An introduction and overview

Source: CiteSeer


A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established itself as a discipline in molecular biology, and encompasses a wide range of subject areas from structural biology, genomics to gene expression studies. In this review we provide an introduction and overview of the current state of the field. We discuss the main principles that underpin bioinformatics analyses, look at the types of biological information and databases that are commonly used, and finally examine some of the studies that are being conducted, particularly with reference to transcription regulatory systems. 2. Introduction Biological data are flooding in at an unprecedented rate (1). For example as of August 2000, the GenBank repository of nucleic acid sequences contained 8,214,000 entries (2) and the SWISS-PROT databas...

  • Source
    • "During the first decade of this century, an early step in the simplification of the use of bioinformatics was the development of workflow management software that allows the integration of multiple bioinformatics tools [2] [3]. Their implementation made automatization and large-scale handling of data processing possible. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.
    Full-text · Article · Jun 2015
  • Source
    • "SWISS- PROT and PIRInternational not only store sequence information but also annotate the sequences and also provides functional and domain information. OWL is known as composite databases and filtered information from different primary databases to produce nonredundant information (Luscombe and Greenbaum, 2001). Protein databases are listed in Table 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the rise of computers era, a new interdisciplinary science came into existence that unveiled many hidden biological phenomena and contributed hugely in understanding insights of biological molecules. This science is known as bioinformatics, a science having abilities to solve unanswered question of all biological sciences and now bioinformatics has become a leading science in analyzing and predicting the composition of biological molecules and thus has contributed as a prominent part of Human Genome Project. Bioinformatics is actually a combination of Biotechnology and Information Technology along with the other sciences (chemistry, statistics and mathematics) backing it. Bioinformatics has taken biological science especially biotechnology to new horizons by computerizing and organizing their biological data. Among all biological molecules, proteins are thought to be the most complex molecules and tremendous efforts have been done to understand its structure and function. Through bioinformatics, proteins can be explored at three different levels. Firstly, Primary structure which is often the amino acid sequence of protein along with molecular weight, isoelectric point and many other parameters. Secondly, Secondary structure analysis which involves the analysis of substructures in a protein i.e. helices and Beta-plated sheets that are most abundant secondary structural features in a protein, Beta turns and loops that are less abundant and random coils which are the unstructured or unclassifiable substructures. Thirdly, Tertiary structure which is the combination of secondary structure components. In this chapter, we attempted to present bioinformatics approaches at all three levels of proteins as exploring protein at all these levels through bioinformatics tools and databases can provide more insights to protein structure and function leading to understand the many hidden phenomenon of diseases and thus, can contribute hugely in developing new therapeutic regimens.
    Full-text · Chapter · Nov 2014
  • Source
    • "Bioinformatics is the field of studying biological activities of macromolecules, such as carbohydrates, lipids, proteins and nucleic acids [4], using computational technologies. In general there are three aims of bioinformatics [5]: the first is to maintain a database (such as a protein data bank 1 , for threedimensional macromolecules, or the IMGT/HLA 2 database for maintaining HLA sequences) accessible for researchers to analyze it; the second is to develop tools that are helpful for analyzing these datasets and to understand the functions of macromolecules; and the third aim is to use these analysis tools for interpreting biologically meaningful information about the macromolecules. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).
    Full-text · Article · May 2012
Show more