-
[show abstract]
[hide abstract]
ABSTRACT: Understanding of interactions between proteins and RNAs is essential to reveal networks and functions of molecules in cellular systems. Many studies have been done for analyzing and investigating interactions between protein residues and RNA bases. For interactions between protein residues, it is supported that residues at interacting sites have co-evolved with the corre-sponding residues in the partner protein to keep the interactions between the proteins. In our previous work, on the basis of this idea, we calculated mutual information (MI) between residues from multiple sequence alignments of homologous proteins for identifying interacting pairs of residues in interacting proteins, and combined it with the discriminative random field (DRF), which is useful to extract some characteristic regions from an image in the field of image processing, and is a special type of conditional random fields (CRFs). In a similar way, in this paper, we make use of mutual information for predicting interactions between protein residues and RNA bases. Furthermore, we introduce labels of amino acids and bases as features of a simple two-dimensional CRF instead of DRF. To evaluate our method, we perform computational experiments for several interactions between Pfam domains and Rfam entries. The results suggest that the CRF model with MI and labels is more useful than the CRF model with only MI.
2012 IEEE 6th International Conference on Systems Biology (ISB), Xi'an, China; 08/2012
-
[show abstract]
[hide abstract]
ABSTRACT: For understanding cellular systems and biological networks, it is important to analyze functions and interactions of proteins and domains. Many methods for predicting protein-protein interactions have been developed. It is known that mutual information between residues at interacting sites can be higher than that at non-interacting sites. It is based on the thought that amino acid residues at interacting sites have coevolved with those at the corresponding residues in the partner proteins. Several studies have shown that such mutual information is useful for identifying contact residues in interacting proteins.
We propose novel methods using conditional random fields for predicting protein-protein interactions. We focus on the mutual information between residues, and combine it with conditional random fields. In the methods, protein-protein interactions are modeled using domain-domain interactions. We perform computational experiments using protein-protein interaction datasets for several organisms, and calculate AUC (Area Under ROC Curve) score. The results suggest that our proposed methods with and without mutual information outperform EM (Expectation Maximization) method proposed by Deng et al., which is one of the best predictors based on domain-domain interactions.
We propose novel methods using conditional random fields with and without mutual information between domains. Our methods based on domain-domain interactions are useful for predicting protein-protein interactions.
BMC Systems Biology 01/2011; 5 Suppl 1:S8. · 3.15 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Machine learning methods are nowadays used for many biological prediction problems involving drugs, ligands or polypeptide segments of a protein. In order to build a prediction model a so called training data set of molecules with measured target properties is needed. For many such problems the size of the training data set is limited as measurements have to be performed in a wet lab. Furthermore, the considered problems are often complex, such that it is not clear which molecular descriptors (features) may be suitable to establish a strong correlation with the target property. In many applications all available descriptors are used. This can lead to difficult machine learning problems, when thousands of descriptors are considered and only few (e.g. below hundred) molecules are available for training.
The CoEPrA contest provides four data sets, which are typical for biological regression problems (few molecules in the training data set and thousands of descriptors). We applied the same two-step training procedure for all four regression tasks. In the first stage, we used optimized L1 regularization to select the most relevant features. Thus, the initial set of more than 6,000 features was reduced to about 50. In the second stage, we used only the selected features from the preceding stage applying a milder L2 regularization, which generally yielded further improvement of prediction performance. Our linear model employed a soft loss function which minimizes the influence of outliers.
The proposed two-step method showed good results on all four CoEPrA regression tasks. Thus, it may be useful for many other biological prediction problems where for training only a small number of molecules are available, which are described by thousands of descriptors.
BMC Bioinformatics 01/2011; 12:412. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Analysis of functions and interactions of proteins and domains is important for under-standing cellular systems and biological networks. Many methods for predicting protein-protein interactions have been developed. It is known that mutual information between residues at interact-ing sites can be higher than that at non-interacting sites. It is based on the thought that amino acid residues at interacting sites have coevolved with those at the corresponding residues in the partner proteins. Several studies have shown that such mutual information is useful for identifying contact residues in interacting proteins. Therefore, we focus on the mutual information, and propose a novel method using conditional random fields combined with mutual information between residues. In the method, protein-protein interactions are modeled using domain-domain interactions. We per-form computational experiments, and calculate AUC (Area Under the Curve) score. The results suggest that our proposed model with mutual information is useful.
The Fourth International Conference on Computational Systems Biology (ISB2010), Suzhou, China; 09/2010