What is bioinformatics? An introduction and overview

Source: CiteSeer

ABSTRACT A flood of data means that many of the challenges in biology are now challenges in computing. Bioinformatics, the application of computational techniques to analyse the information associated with biomolecules on a large-scale, has now firmly established itself as a discipline in molecular biology, and encompasses a wide range of subject areas from structural biology, genomics to gene expression studies. In this review we provide an introduction and overview of the current state of the field. We discuss the main principles that underpin bioinformatics analyses, look at the types of biological information and databases that are commonly used, and finally examine some of the studies that are being conducted, particularly with reference to transcription regulatory systems. 2. Introduction Biological data are flooding in at an unprecedented rate (1). For example as of August 2000, the GenBank repository of nucleic acid sequences contained 8,214,000 entries (2) and the SWISS-PROT databas...

1 0
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, an overview of the main topics presented in the special session of bioinformatics and biomedical engineering is presented. Bioinformatics consists of two subfields: the development of computational tools and databases, and the application of these tools and databases in generating biological knowledge to better understand living systems, being the main subject genomics and proteomics. Another knowledge scope close to the previous one are the problems related to medicine and biomedical engineering in which it is needed the participation of computer technologies and intelligent systems. The evolution of both disciplines, analyzing the number of publications presented in the bibliography during the last twenty years is presented.
    06/2009: pages 820-828;
  • [show abstract] [hide abstract]
    ABSTRACT: Large-scale analysis studies of genetic sequence data are in progress around the world; one of these studies is to recognize the type of the species that the sequence belongs to. This is very important especially when the source of the sequence is unknown. The complete genome sequence of the hemoglobin provides an excellent basis for studying the clustering of different species. In this paper 13 different species classifier based on hemoglobin sequence has been introduced. Two different classifiers also have been introduced; one of them based on neural network and the other based on extracting 84 pattern feature from the DNA sequence of hemoglobin with the Euclidean distance technique. Both classifiers gave 100% for success recognition of the 13 different species.
    12/2007: pages 279-281;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).


Available from