What is bioinformatics? An introduction and overview

Source: CiteSeer

ABSTRACT A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established itself as a discipline in molecular biology, and encompasses a wide range of subject areas from structural biology, genomics to gene expression studies. In this review we provide an introduction and overview of the current state of the field. We discuss the main principles that underpin bioinformatics analyses, look at the types of biological information and databases that are commonly used, and finally examine some of the studies that are being conducted, particularly with reference to transcription regulatory systems. 2. Introduction Biological data are flooding in at an unprecedented rate (1). For example as of August 2000, the GenBank repository of nucleic acid sequences contained 8,214,000 entries (2) and the SWISS-PROT databas...

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the rise of computers era, a new interdisciplinary science came into existence that unveiled many hidden biological phenomena and contributed hugely in understanding insights of biological molecules. This science is known as bioinformatics, a science having abilities to solve unanswered question of all biological sciences and now bioinformatics has become a leading science in analyzing and predicting the composition of biological molecules and thus has contributed as a prominent part of Human Genome Project. Bioinformatics is actually a combination of Biotechnology and Information Technology along with the other sciences (chemistry, statistics and mathematics) backing it. Bioinformatics has taken biological science especially biotechnology to new horizons by computerizing and organizing their biological data. Among all biological molecules, proteins are thought to be the most complex molecules and tremendous efforts have been done to understand its structure and function. Through bioinformatics, proteins can be explored at three different levels. Firstly, Primary structure which is often the amino acid sequence of protein along with molecular weight, isoelectric point and many other parameters. Secondly, Secondary structure analysis which involves the analysis of substructures in a protein i.e. helices and Beta-plated sheets that are most abundant secondary structural features in a protein, Beta turns and loops that are less abundant and random coils which are the unstructured or unclassifiable substructures. Thirdly, Tertiary structure which is the combination of secondary structure components. In this chapter, we attempted to present bioinformatics approaches at all three levels of proteins as exploring protein at all these levels through bioinformatics tools and databases can provide more insights to protein structure and function leading to understand the many hidden phenomenon of diseases and thus, can contribute hugely in developing new therapeutic regimens.
    Biotechnology Vol. 6: Bioinformatics and Computational Biology, 11/2014: chapter 7: pages 130-142; , ISBN: 1-62699-021-2
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One major application of microarray technology lies in cancer classification. Thus far, a significant amount of new discoveries have been made and new bio-markers for various cancers have been detected from microarray data. Bioinspired machine learning approaches are suited and used to discovering the complex relationships between genes under controlled experimental conditions and classify microarray data by identifying a subset of informative genes embedded in a large data set that involves multiple classes and is infected with the high dimensionality noise. In this paper, a hybrid system integrates genetic algorithms and decision tree is proposed for genes expression analysis and prediction to their functionality for cancer classification. The learning capacity of decision trees used in the base learning systems is boosted by feature selection method. Experiments presenting a preliminary result to demonstrate the capability of hybrid system to mine accurate classification rules for classifying prediction in comparable to traditional machine learning algorithms.


Available from