Chapter

Bioinformatic Tools for Gene and Protein Sequence Analysis

DOI: 10.1385/1-59259-870-6:387

ABSTRACT The rapid development of efficient, automated DNA-sequencing methods has strongly advanced the genome-sequencing era, culminating
in the determination of the entire human genome in 2001 (1,2). An enormous amount of DNA sequence data are available and databases still grow exponentially (see
Fig. 1). Analysis of this overwhelming amount of data, including hundreds of genomes from both prokaryotes and eukaryotes, has given
rise to the field of bioinformatics. Development of bioinformatic tools has evolved rapidly in order to identify genes that
encode functional proteins or RNA. This is an important task, considering that even in the best studied bacterium Escherichia coli more than 30‰ of the identified open reading frames (ORFs) represent hypothetical genes with no known function. Future challenges
of genome-sequence analysis will include the understanding of diseases, gene regulation, and metabolic pathway reconstruction.
In addition, a set of methods for protein analysis summarized under the term proteomics holds tremendous potential for biomedicine and biotechnology (141). The large number of bioinformatic tools that have been made available to scientists during the last few years has presented
the problem of which to use and how best to obtain scientifically valid answers (3). In this chapter, we will provide a guide for the most efficient way to analyze a given sequence or to collect information
regarding a gene, protein, structure, or interaction of interest by applying current publicly available software and databases
that mainly use the World Wide Web. All links to services or download sites are given in the text or listed in Table 1; the succession of tools is briefly summarized in Fig. 2.

2 Followers
 · 
155 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.
    Applied Microbiology and Biotechnology 01/2002; 57(5-6):579-92. DOI:10.1007/s00253-001-0844-0 · 3.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.
    Genome Research 03/1998; 8(3):186-94. · 13.85 Impact Factor

Preview

Download
9 Downloads