What is bioinformatics? An introduction and overview

Source: CiteSeer

ABSTRACT A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established itself as a discipline in molecular biology, and encompasses a wide range of subject areas from structural biology, genomics to gene expression studies. In this review we provide an introduction and overview of the current state of the field. We discuss the main principles that underpin bioinformatics analyses, look at the types of biological information and databases that are commonly used, and finally examine some of the studies that are being conducted, particularly with reference to transcription regulatory systems. 2. Introduction Biological data are flooding in at an unprecedented rate (1). For example as of August 2000, the GenBank repository of nucleic acid sequences contained 8,214,000 entries (2) and the SWISS-PROT databas...

  • Source
    • "SWISS- PROT and PIRInternational not only store sequence information but also annotate the sequences and also provides functional and domain information. OWL is known as composite databases and filtered information from different primary databases to produce nonredundant information (Luscombe and Greenbaum, 2001). Protein databases are listed in Table 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the rise of computers era, a new interdisciplinary science came into existence that unveiled many hidden biological phenomena and contributed hugely in understanding insights of biological molecules. This science is known as bioinformatics, a science having abilities to solve unanswered question of all biological sciences and now bioinformatics has become a leading science in analyzing and predicting the composition of biological molecules and thus has contributed as a prominent part of Human Genome Project. Bioinformatics is actually a combination of Biotechnology and Information Technology along with the other sciences (chemistry, statistics and mathematics) backing it. Bioinformatics has taken biological science especially biotechnology to new horizons by computerizing and organizing their biological data. Among all biological molecules, proteins are thought to be the most complex molecules and tremendous efforts have been done to understand its structure and function. Through bioinformatics, proteins can be explored at three different levels. Firstly, Primary structure which is often the amino acid sequence of protein along with molecular weight, isoelectric point and many other parameters. Secondly, Secondary structure analysis which involves the analysis of substructures in a protein i.e. helices and Beta-plated sheets that are most abundant secondary structural features in a protein, Beta turns and loops that are less abundant and random coils which are the unstructured or unclassifiable substructures. Thirdly, Tertiary structure which is the combination of secondary structure components. In this chapter, we attempted to present bioinformatics approaches at all three levels of proteins as exploring protein at all these levels through bioinformatics tools and databases can provide more insights to protein structure and function leading to understand the many hidden phenomenon of diseases and thus, can contribute hugely in developing new therapeutic regimens.
    Biotechnology Vol. 6: Bioinformatics and Computational Biology, 11/2014: chapter 7: pages 130-142; , ISBN: 1-62699-021-2
  • Source
    • "Bioinformatics is the field of studying biological activities of macromolecules, such as carbohydrates, lipids, proteins and nucleic acids [4], using computational technologies. In general there are three aims of bioinformatics [5]: the first is to maintain a database (such as a protein data bank 1 , for threedimensional macromolecules, or the IMGT/HLA 2 database for maintaining HLA sequences) accessible for researchers to analyze it; the second is to develop tools that are helpful for analyzing these datasets and to understand the functions of macromolecules; and the third aim is to use these analysis tools for interpreting biologically meaningful information about the macromolecules. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).
  • Source
    • "The bioinformatics work would have profound long-term consequences for medicine, leading to the explanation of the underling molecular mechanism of diseases and thereby. However, no much is known about the structure, function, expression and regulation of more than 80% of human genes [19]. In order to assign a function to many genes, there is a need for computational method for functional prediction for unknown genes, since the experimentally determining the function of a protein is time-consuming. "
    [Show abstract] [Hide abstract]
    ABSTRACT: One major application of microarray technology lies in cancer classification. Thus far, a significant amount of new discoveries have been made and new bio-markers for various cancers have been detected from microarray data. Bioinspired machine learning approaches are suited and used to discovering the complex relationships between genes under controlled experimental conditions and classify microarray data by identifying a subset of informative genes embedded in a large data set that involves multiple classes and is infected with the high dimensionality noise. In this paper, a hybrid system integrates genetic algorithms and decision tree is proposed for genes expression analysis and prediction to their functionality for cancer classification. The learning capacity of decision trees used in the base learning systems is boosted by feature selection method. Experiments presenting a preliminary result to demonstrate the capability of hybrid system to mine accurate classification rules for classifying prediction in comparable to traditional machine learning algorithms.
Show more


Available from