Predicting the Start of Protein α-Helices Using Machine Learning Algorithms

DOI: 10.1007/978-3-642-13214-8_5
Source: DBLP

ABSTRACT Proteins are complex structures synthesised by living organisms. They are actually a fundamental type of molecules and can
perform a large number of functions in cell biology. Proteins can assume catalytic roles and accelerate or inhibit chemical
reactions in our body. They can assume roles of transportation of smaller molecules, storage, movement, mechanical support,
immunity and control of cell growth and differentiation [25]. All of these functions rely on the 3D-structure of the protein.
The process of going from a linear sequence of amino acids, that together compose a protein, to the protein’s 3D shape is
named protein folding. Anfinsen’s work [29] has proven that primary structure determines the way protein folds. Protein folding is so important
that whenever it does not occur correctly it may produce diseases such as Alzheimer’s, Bovine Spongiform Encephalopathy (BSE),
usually known as mad cows disease, Creutzfeldt-Jakob (CJD) disease, a Amyotrophic Lateral Sclerosis (ALS), Huntingtons syndrome, Parkinson disease, and other
diseases related to cancer.

5 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently there has been increasing interest in systems which induce first order logic programs from examples. However, many difficulties need to be overcome. Well-known algorithms fail to discover correct logical descriptions for large classes of interesting predicates, due either to the intractability of search or overly strong limitations applied to the hypothesis space. In contrast, search is avoided within Plotkin's framework of relative least general generalisation (rlgg). It is replaced by the process of constructing a unique clause which covers a set of examples relative to given background knowledge. However, such a clause can in the worst case contain infinitely many literals, or at best grow exponentially with the number of examples involved. In this paper we introduce the concept of h-easy rlgg clauses and show that they have finite length. We also prove that the length of a certain class of "determinate" rlgg is bounded by a polynomial function of certain features of the ba...
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently Yi & Lander used a neural network and nearest-neighbor method with a scoring system that combined a sequence-similarity matrix with the local structural environment scoring scheme described by Bowie and co-workers for predicting protein secondary structure. We have improved their scoring system by taking into consideration N and C-terminal positions of alpha-helices and beta-strands and also beta-turns as distinctive types of secondary structure. Another improvement, which also decreases the time of computation, is performed by restricting a data base with a smaller subset of proteins that are similar with a query sequence. Using multiple sequence alignments rather than single sequences and a simple jury decision procedure our method reaches a sustained overall three-state accuracy of 72.2%, which is better than that observed for the most accurate multilayered neural-network approach, tested on the same data set of 126 non-homologous protein chains.
    Journal of Molecular Biology 04/1995; 247(1):11-5. DOI:10.1006/jmbi.1994.0116 · 4.33 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An interactive protein secondary structure prediction Internet server is presented. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences. A consensus prediction is also returned which improves the average Q3 accuracy of prediction by 1% to 72.9%. The server simplifies the use of current prediction algorithms and allows conservation patterns important to structure and function to be identified. AVAILABILITY: tml CONTACT:
    Bioinformatics 02/1998; 14(10):892-3. DOI:10.1093/bioinformatics/14.10.892 · 4.98 Impact Factor
Show more