Identification of the subcellular localization of mycobacterial proteins using localization motifs.
ABSTRACT Mycobacterium, the most common disease-causing genus, infects billions of people and is notoriously difficult to treat. Understanding the subcellular localization of mycobacterial proteins can provide essential clues for protein function and drug discovery. In this article, we present a novel approach that focuses on local sequence information to identify localization motifs that are generated by a merging algorithm and are selected based on a binomially distributed model. These localization motifs are employed as features for identifying the subcellular localization of mycobacterial proteins. Our approach provides more accurate results than previous methods and was tested on an independent dataset recently obtained from an experimental study to provide a first and reasonably accurate prediction of subcellular localization. Our approach can also be used for large-scale prediction of new protein entries in the UniportKB database and of protein sequences obtained experimentally. In addition, our approach identified many local motifs involved with the subcellular localization that also interact with the environment. Thus, our method may have widespread applications both in the study of the functions of mycobacterial proteins and in the search for a potential vaccine target for designing drugs.