Hiroto Saigo

Max-Planck-Institut für Informatik, Saarbrücken, Saarland, Germany

Are you Hiroto Saigo?

Claim your profile

Publications (13)20.75 Total impact

  • Source
    Article: Learning from past treatments and their outcome improves prediction of in vivo response to anti-HIV therapy.
    [show abstract] [hide abstract]
    ABSTRACT: Infections with the human immunodeficiency virus type 1 (HIV-1) are treated with combinations of drugs. Unfortunately, HIV responds to the treatment by developing resistance mutations. Consequently, the genome of the viral target proteins is sequenced and inspected for resistance mutations as part of routine diagnostic procedures for ensuring an effective treatment. For predicting response to a combination therapy, currently available computer-based methods rely on the genotype of the virus and the composition of the regimen as input. However, no available tool takes full advantage of the knowledge about the order of and the response to previously prescribed regimens. The resulting high-dimensional feature space makes existing methods difficult to apply in a straightforward fashion. The machine learning system proposed in this work, sequence boosting, is tailored to exploiting such high-dimensional information, i.e. the extraction of longitudinal features, by utilizing the recent advancements in data mining and boosting. When applied to predicting the latest treatment outcome for 3,759 treatment-experienced patients from the EuResist integrated database, sequence boosting achieved superior performance compared to SVMs with RBF kernels. Moreover, sequence boosting allows an easy access to the discriminative treatment information. Analysis of feature importance values provided by our model confirmed known facts regarding HIV treatment. For instance, application of potent and recently licensed drugs was beneficial for patients, and, conversely, the patient group that was subject to NRTI mono-therapies in the past had poor treatment perspectives today. Furthermore, our model revealed novel biological insights. More precisely, the combination of previously used drugs with their in vivo response is more informative than the information of previously used drugs alone. Using this information improves the performance of systems for predicting therapy outcome.
    Statistical Applications in Genetics and Molecular Biology 01/2011; 10(1):Article 6. · 1.52 Impact Factor
  • Article: Reaction graph kernels predict EC numbers of unknown enzymatic reactions in plant secondary metabolism.
    BMC Bioinformatics. 01/2010; 11:31.
  • Source
    Article: Reaction graph kernels predict EC numbers of unknown enzymatic reactions in plant secondary metabolism.
    [show abstract] [hide abstract]
    ABSTRACT: Understanding of secondary metabolic pathway in plant is essential for finding druggable candidate enzymes. However, there are many enzymes whose functions are not yet discovered in organism-specific metabolic pathways. Towards identifying the functions of those enzymes, assignment of EC numbers to the enzymatic reactions they catalyze plays a key role, since EC numbers represent the categorization of enzymes on one hand, and the categorization of enzymatic reactions on the other hand. We propose reaction graph kernels for automatically assigning EC numbers to unknown enzymatic reactions in a metabolic network. Reaction graph kernels compute similarity between two chemical reactions considering the similarity of chemical compounds in reaction and their relationships. In computational experiments based on the KEGG/REACTION database, our method successfully predicted the first three digits of the EC number with 83% accuracy. We also exhaustively predicted missing EC numbers in plant's secondary metabolism pathway. The prediction results of reaction graph kernels on 36 unknown enzymatic reactions are compared with an expert's knowledge. Using the same data for evaluation, we compared our method with E-zyme, and showed its ability to assign more number of accurate EC numbers. Reaction graph kernels are a new metric for comparing enzymatic reactions.
    BMC Bioinformatics 01/2010; 11 Suppl 1:S31. · 2.75 Impact Factor
  • Conference Proceeding: A Bayesian Approach to Graphy Regression with Relevant Subgraph Selection.
    Silvia Chiappa, Hiroto Saigo, Koji Tsuda
    Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA; 01/2009
  • Article: gBoost: a mathematical programming approach to graph classification and regression.
    Machine Learning. 01/2009; 75:69-89.
  • Conference Proceeding: Partial least squares regression for graph mining.
    Hiroto Saigo, Nicole Krämer, Koji Tsuda
    Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008; 01/2008
  • Conference Proceeding: Regression with interval output values.
    19th International Conference on Pattern Recognition (ICPR 2008), December 8-11, 2008, Tampa, Florida, USA; 01/2008
  • Article: Mining complex genotypic features for predicting HIV-1 drug resistance.
    Hiroto Saigo, Takeaki Uno, Koji Tsuda
    [show abstract] [hide abstract]
    ABSTRACT: Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations explicitly. Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature. http://www.kyb.mpg.de/bs/people/hiroto/iboost/. Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2007; 23(18):2455-62. · 5.47 Impact Factor
  • Source
    Article: Optimizing amino acid substitution matrices with a local alignment kernel.
    [show abstract] [hide abstract]
    ABSTRACT: Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different. Contrary to the local alignment score computed by the Smith-Waterman algorithm, the local alignment kernel is differentiable with respect to the amino acid substitution and its derivative can be computed efficiently by dynamic programming. We optimized the substitution matrix by classical gradient descent by setting an objective function that measures how well the local alignment kernel discriminates homologs from non-homologs in the COG database. The local alignment kernel exhibits better performance when it uses the matrices and gap parameters optimized by this procedure than when it uses the matrices optimized for the Smith-Waterman algorithm. Furthermore, the matrices and gap parameters optimized for the local alignment kernel can also be used successfully by the Smith-Waterman algorithm. This optimization procedure leads to useful substitution matrices, both for the local alignment kernel and the Smith-Waterman algorithm. The best performance for homology detection is obtained by the local alignment kernel.
    BMC Bioinformatics 02/2006; 7:246. · 2.75 Impact Factor
  • Article: A novel representation of protein sequences for prediction of subcellular location using support vector machines.
    [show abstract] [hide abstract]
    ABSTRACT: As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.
    Protein Science 12/2005; 14(11):2804-13. · 2.80 Impact Factor
  • Source
    Article: Protein homology detection using string alignment kernels.
    [show abstract] [hide abstract]
    ABSTRACT: Remote homology detection between protein sequences is a central problem in computational biology. Discriminative methods involving support vector machines (SVMs) are currently the most effective methods for the problem of superfamily recognition in the Structural Classification Of Proteins (SCOP) database. The performance of SVMs depends critically on the kernel function used to quantify the similarity between sequences. We propose new kernels for strings adapted to biological sequences, which we call local alignment kernels. These kernels measure the similarity between two sequences by summing up scores obtained from local alignments with gaps of the sequences. When tested in combination with SVM on their ability to recognize SCOP superfamilies on a benchmark dataset, the new kernels outperform state-of-the-art methods for remote homology detection. Software and data available upon request.
    Bioinformatics 08/2004; 20(11):1682-9. · 5.47 Impact Factor
  • Source
    Article: Comparison of SVM-based methods for remote homology detection
    Genome Informatics. 01/2002; 13:396-397.
  • Article: Iterative Subgraph Mining for Principal Component Analysis.
    Hiroto Saigo, Koji Tsuda
    Giannotti, Fosca; Gunopulos, Dimitrios: Proceedings of the IEEE International Conference on Data Mining (ICDM2008), IEEE Computer Society (2008).