Chapter

Bioinformatic Tools for Gene and Protein Sequence Analysis

10/2007; DOI:10.1385/1-59259-870-6:387

ABSTRACT The rapid development of efficient, automated DNA-sequencing methods has strongly advanced the genome-sequencing era, culminating
in the determination of the entire human genome in 2001 (1,2). An enormous amount of DNA sequence data are available and databases still grow exponentially (see
Fig. 1). Analysis of this overwhelming amount of data, including hundreds of genomes from both prokaryotes and eukaryotes, has given
rise to the field of bioinformatics. Development of bioinformatic tools has evolved rapidly in order to identify genes that
encode functional proteins or RNA. This is an important task, considering that even in the best studied bacterium Escherichia coli more than 30‰ of the identified open reading frames (ORFs) represent hypothetical genes with no known function. Future challenges
of genome-sequence analysis will include the understanding of diseases, gene regulation, and metabolic pathway reconstruction.
In addition, a set of methods for protein analysis summarized under the term proteomics holds tremendous potential for biomedicine and biotechnology (141). The large number of bioinformatic tools that have been made available to scientists during the last few years has presented
the problem of which to use and how best to obtain scientifically valid answers (3). In this chapter, we will provide a guide for the most efficient way to analyze a given sequence or to collect information
regarding a gene, protein, structure, or interaction of interest by applying current publicly available software and databases
that mainly use the World Wide Web. All links to services or download sites are given in the text or listed in Table 1; the succession of tools is briefly summarized in Fig. 2.

0 0
 · 
0 Bookmarks
 · 
99 Views
  • Science. 01/2001; 291:1304-1351.
  • [show abstract] [hide abstract]
    ABSTRACT: The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.
    Applied Microbiology and Biotechnology 01/2002; 57(5-6):579-92. · 3.69 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.
    Genome Research 03/1998; 8(3):175-85. · 14.40 Impact Factor

Full-text

View
2 Downloads

Bernd H A Rehm