See-Kiong Ng

National University of Singapore, Singapore, Singapore

Are you See-Kiong Ng?

Claim your profile

Publications (92)119.68 Total impact

  • Hong Cao, Xiao-Li Li, D.Y.-K. Woon, See-Kiong Ng
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel Integrated Oversampling (INOS) method that can handle highly imbalanced time series classification. We introduce an enhanced structure preserving oversampling (ESPO) technique and synergistically combine it with interpolation-based oversampling. ESPO is used to generate a large percentage of the synthetic minority samples based on multivariate Gaussian distribution, by estimating the covariance structure of the minority-class samples and by regularizing the unreliable eigen spectrum. To protect the key original minority samples, we use an interpolation-based technique to oversample a small percentage of synthetic population. By preserving the main covariance structure and intelligently creating protective variances in the trivial eigen dimensions, ESPO effectively expands the synthetic samples into the void area in the data space without being too closely tied with existing minority-class samples. This also addresses a key challenge for applying oversampling for imbalanced time series classification, i.e., maintaining the correlation between consecutive values through preserving the main covariance structure. Extensive experiments based on seven public time series data sets demonstrate that our INOS approach, used with support vector machines (SVM), achieved better performance over existing oversampling methods as well as state-of-the-art methods in time series classification.
    IEEE Transactions on Knowledge and Data Engineering 12/2013; 25(12):2809-2822. · 1.89 Impact Factor
  • Henry Han, Xiao-Li Li, See-Kiong Ng, Zhou Ji
    [Show abstract] [Hide abstract]
    ABSTRACT: While high-throughput technologies are expected to play a critical role in clinical translational research for complex disease diagnosis, the ability to accurately and consistently discriminate disease phenotypes by determining the gene and protein expression patterns as signatures of different clinical conditions remains a challenge in translational bioinformatics. In this study, we propose a novel feature selection algorithm: Multi-Resolution-Test (MRT-test) that can produce significantly accurate and consistent phenotype discrimination across a series of omics data. Our algorithm can capture those features contributing to subtle data behaviors instead of selecting the features contributing to global data behaviors, which seems to be essential in achieving clinical level diagnosis for different expression data. Furthermore, as an effective biomarker discovery algorithm, it can achieve linear separation for high-dimensional omics data with few biomarkers. We apply our MRT-test to complex disease phenotype diagnosis by combining it with state-of-the-art classifiers and attain exceptional diagnostic results, which suggests that our method's advantage in molecular diagnostics. Experimental evaluation showed that MRT-test based diagnosis is able to generate consistent and robust clinical-level phenotype separation for various diseases. In addition, based on the seed biomarkers detected by the MRT-test, we design a novel network marker synthesis (NMS) algorithm to decipher the underlying molecular mechanisms of tumorigenesis from a systems viewpoint. Unlike existing top-down gene network building approaches, our network marker synthesis method has a less dependence on the global network and enables it to capture the gene regulators for different subnetwork markers, which will provide biologically meaningful insights for understanding the genetic basis of complex diseases.
    Journal of Bioinformatics and Computational Biology 12/2013; 11(6):1343010. · 0.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Many biological processes are carried out by proteins interacting with each other in the form of protein complexes. However, large-scale detection of protein complexes has remained constrained by experimental limitations. As such, computational detection of protein complexes by applying clustering algorithms on the abundantly available protein-protein interaction (PPI) networks is an important alternative. However, many current algorithms have overlooked the importance of selecting seeds for expansion into clusters without excluding important proteins and including many noisy ones, while ensuring a high degree of functional homogeneity amongst the proteins detected for the complexes. Results We designed a novel method called Probabilistic Local Walks (PLW) which clusters regions in a PPI network with high functional similarity to find protein complex cores with high precision and efficiency in (|V| log |V| + |E|) time. A seed selection strategy, which prioritises seeds with dense neighbourhoods, was devised. We defined a topological measure, called common neighbour similarity, to estimate the functional similarity of two proteins given the number of their common neighbours. Conclusions Our proposed PLW algorithm achieved the highest F-measure (recall and precision) when compared to 11 state-of-the-art methods on yeast protein interaction data, with an improvement of 16.7% over the next highest score. Our experiments also demonstrated that our seed selection strategy is able to increase algorithm precision when applied to three previous protein complex mining techniques. Availability The software, datasets and predicted complexes are available at http://wonglkd.github.io/PLW
    BMC Genomics 10/2013; 14(5). · 4.40 Impact Factor
  • Willy Hugo, Wing-Kin Sung, See-Kiong Ng
    [Show abstract] [Hide abstract]
    ABSTRACT: Many important biological processes, such as the signaling pathways, require protein-protein interactions (PPIs) that are designed for fast response to stimuli. These interactions are usually transient, easily formed, and disrupted, yet specific. Many of these transient interactions involve the binding of a protein domain to a short stretch (3-10) of amino acid residues, which can be characterized by a sequence pattern, i.e., a short linear motif (SLiM). We call these interacting domains and motifs domain-SLiM interactions. Existing methods have focused on discovering SLiMs in the interacting proteins' sequence data. With the recent increase in protein structures, we have a new opportunity to detect SLiMs directly from the proteins' 3D structures instead of their linear sequences. In this chapter, we describe a computational method called SLiMDIet to directly detect SLiMs on domain interfaces extracted from 3D structures of PPIs. SLiMDIet comprises two steps: (1) interaction interfaces belonging to the same domain are extracted and grouped together using structural clustering and (2) the extracted interaction interfaces in each cluster are structurally aligned to extract the corresponding SLiM. Using SLiMDIet, de novo SLiMs interacting with protein domains can be computationally detected from structurally clustered domain-SLiM interactions for PFAM domains which have available 3D structures in the PDB database.
    Methods in molecular biology (Clifton, N.J.) 01/2013; 939:9-20. · 1.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the differential expression of a target gene in a TRN is challenging, especially when multiple TFs collaboratively participate in the transcriptional regulation. To unravel the roles of the TFs in the regulatory networks, we model the underlying regulatory interactions in terms of the TF-target interactions' directions (activation or repression) and their corresponding logical roles (necessary and/or sufficient). We design a set of constraints that relate gene expression patterns to regulatory interaction models, and develop TRIM (Transcriptional Regulatory Interaction Model Inference), a new hidden Markov model, to infer the models of TF-target interactions in large-scale TRNs of complex organisms. Besides, by training TRIM with wild-type time-series gene expression data, the activation timepoints of each regulatory module can be obtained. To demonstrate the advantages of TRIM, we applied it on yeast TRN to infer the TF-target interaction models for individual TFs as well as pairs of TFs in collaborative regulatory modules. By comparing with TF knockout and other gene expression data, we were able to show that the performance of TRIM is clearly higher than DREM (the best existing algorithm). In addition, on an individual Arabidopsis binding network, we showed that the target genes' expression correlations can be significantly improved by incorporating the TF-target regulatory interaction models inferred by TRIM into the expression data analysis, which may introduce new knowledge in transcriptional dynamics and bioactivation.
    Journal of Bioinformatics and Computational Biology 10/2012; 10(5):1250012. · 0.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r.a-star.edu.sg/∼xlli/PUDI/PUDI.html. xlli@i2r.a-star.edu.sg or yang0293@e.ntu.edu.sg Supplementary Data are available at Bioinformatics online.
    Bioinformatics 08/2012; 28(20):2640-7. · 5.47 Impact Factor
  • Eng-Yeow Cheu, Chai Quek, See-Kiong Ng
    [Show abstract] [Hide abstract]
    ABSTRACT: Appetitive operant conditioning in Aplysia for feeding behavior via the electrical stimulation of the esophageal nerve contingently reinforces each spontaneous bite during the feeding process. This results in the acquisition of operant memory by the contingently reinforced animals. Analysis of the cellular and molecular mechanisms of the feeding motor circuitry revealed that activity-dependent neuronal modulation occurs at the interneurons that mediate feeding behaviors. This provides evidence that interneurons are possible loci of plasticity and constitute another mechanism for memory storage in addition to memory storage attributed to activity-dependent synaptic plasticity. In this paper, an associative ambiguity correction-based neuro-fuzzy network, called appetitive reward-based pseudo-outer-product-compositional rule of inference [ARPOP-CRI(S)], is trained based on an appetitive reward-based learning algorithm which is biologically inspired by the appetitive operant conditioning of the feeding behavior in Aplysia. A variant of the Hebbian learning rule called Hebbian concomitant learning is proposed as the building block in the neuro-fuzzy network learning algorithm. The proposed algorithm possesses the distinguishing features of the sequential learning algorithm. In addition, the proposed ARPOP-CRI(S) neuro-fuzzy system encodes fuzzy knowledge in the form of linguistic rules that satisfies the semantic criteria for low-level fuzzy model interpretability. ARPOP-CRI(S) is evaluated and compared against other modeling techniques using benchmark time-series datasets. Experimental results are encouraging and show that ARPOP-CRI(S) is a viable modeling technique for time-variant problem domains.
    IEEE transactions on neural networks and learning systems 01/2012; 23:317-329. · 4.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a system called AssocExplorer to support exploratory data analysis via association rule visualization and exploration. AssocExplorer is designed by following the visual information-seeking mantra: overview first, zoom and filter, then details on demand. It effectively uses coloring to deliver information so that users can easily detect things that are interesting to them. If users find a rule interesting, they can explore related rules for further analysis, which allows users to find interesting phenomenon that are difficult to detect when rules are examined separately. Our system also allows users to compare rules and inspect rules with similar item composition but different statistics so that the key factors that contribute to the difference can be isolated.
    01/2012;
  • Willy Hugo, See-Kiong Ng, Wing-Kin Sung
    [Show abstract] [Hide abstract]
    ABSTRACT: Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.
    Journal of Proteome Research 12/2011; 10(12):5285-95. · 5.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Many cellular functions involve protein complexes that are formed by multiple interacting proteins. Tandem Affinity Purification (TAP) is a popular experimental method for detecting such multi-protein interactions. However, current computational methods that predict protein complexes from TAP data require converting the co-complex relationships in TAP data into binary interactions. The resulting pairwise protein-protein interaction (PPI) network is then mined for densely connected regions that are identified as putative protein complexes. Converting the TAP data into PPI data not only introduces errors but also loses useful information about the underlying multi-protein relationships that can be exploited to detect the internal organization (i.e., core-attachment structures) of protein complexes. In this article, we propose a method called CACHET that detects protein complexes with Core-AttaCHment structures directly from bipartitETAP data. CACHET models the TAP data as a bipartite graph in which the two vertex sets are the baits and the preys, respectively. The edges between the two vertex sets represent bait-prey relationships. CACHET first focuses on detecting high-quality protein-complex cores from the bipartite graph. To minimize the effects of false positive interactions, the bait-prey relationships are indexed with reliability scores. Only non-redundant, reliable bicliques computed from the TAP bipartite graph are regarded as protein-complex cores. CACHET constructs protein complexes by including attachment proteins into the cores. We applied CACHET on large-scale TAP datasets and found that CACHET outperformed existing methods in terms of prediction accuracy (i.e., F-measure and functional homogeneity of predicted complexes). In addition, the protein complexes predicted by CACHET are equipped with core-attachment structures that provide useful biological insights into the inherent functional organization of protein complexes. Our supplementary material can be found at http://www1.i2r.a-star.edu.sg/∼xlli/CACHET/CACHET.htm ; binary executables can also be found there. Supplementary Material is also available at www.liebertonline.com/cmb .
    Journal of computational biology: a journal of computational molecular cell biology 07/2011; 19(9):1027-42. · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hypothesis testing is a well-established tool for scientific discovery. Conventional hypothesis testing is carried out in a hypothesis-driven manner. A scientist must first formulate a hypothesis based on his/her knowledge and experience, and then devise a variety of experiments to test it. Given the rapid growth of data, it has become virtually impossible for a person to manually inspect all the data to find all the interesting hypotheses for testing. In this paper, we propose and develop a data-driven system for automatic hypothesis testing and analysis. We define a hypothesis as a comparison between two or more sub-populations. We find sub-populations for comparison using frequent pattern mining techniques and then pair them up for statistical testing. We also generate additional information for further analysis of the hypotheses that are deemed significant. We conducted a set of experiments to show the efficiency of the proposed algorithms, and the usefulness of the generated hypotheses. The results show that our system can help users (1) identify significant hypotheses; (2) isolate the reasons behind significant hypotheses; and (3) find confounding factors that form Simpson's Paradoxes with discovered significant hypotheses.
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on; 05/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: People regularly attend various social events to interact with other community members. For example, researchers attend conferences to present their work and to network with other researchers. In this paper, we propose an E vent-based COmmunity DEtection algorithm ECODE to mine the underlying community substructures of social networks from event information. Unlike conventional approaches, ECODE makes use of content similarity-based virtual links which are found to be more useful for community detection than the physical links. By performing partial computation between an event and its candidate relevant set instead of computing pair-wise similarities between all the events, ECODE is able to achieve significant computational speedup. Extensive experimental results and comparisons with other existing methods showed that our ECODE algorithm is both efficient and effective in detecting communities from social networks. Keywordssocial network mining–community detection–virtual links
    04/2011: pages 22-37;
  • Source
    Wei Wei, Gao Cong, Xiaoli Li, See-Kiong Ng, Guohui Li
    Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel structure preserving over sampling (SPO) technique for classifying imbalanced time series data. SPO generates synthetic minority samples based on multivariate Gaussian distribution by estimating the covariance structure of the minority class and regularizing the unreliable eigen spectrum. By preserving the main covariance structure and intelligently creating protective variances in the trivial eigen feature dimensions, the synthetic samples expand effectively into the void area in the data space without being too closely tied with existing minority-class samples. Extensive experiments based on several public time series datasets demonstrate that our proposed SPO in conjunction with support vector machines can achieve better performances than existing over sampling methods and state-of-the-art methods in time series classification.
    11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11-14, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: People regularly attend various social events to interact with other community members. For example, researchers attend conferences to present their work and to network with other researchers. In this paper, we propose an E vent-based COmmunity DEtection algorithm ECODE to mine the underlying community substructures of social networks from event information. Unlike conventional approaches, ECODE makes use of content similarity-based virtual links which are found to be more useful for community detection than the physical links. By performing partial computation between an event and its candidate relevant set instead of computing pair-wise similarities between all the events, ECODE is able to achieve significant computational speedup. Extensive experimental results and comparisons with other existing methods showed that our ECODE algorithm is both efficient and effective in detecting communities from social networks.
    Database Systems for Advanced Applications - 16th International Conference, DASFAA 2011, Hong Kong, China, April 22-25, 2011, Proceedings, Part I; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases. We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes. Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.
    PLoS ONE 01/2011; 6(7):e21502. · 3.53 Impact Factor
  • Minh Nhut Nguyen, Xiaoli Li, See-Kiong Ng
    [Show abstract] [Hide abstract]
    ABSTRACT: In many real-world applications of the time series classification problem, not only could the negative training instances be missing, the number of positive instances available for learning may also be rather limited. This has motivated the development of new classification algorithms that can learn from a small set P of labeled seed positive instances augmented with a set U of unlabeled instances (i.e. PU learning algorithms). However, existing PU learning algorithms for time series classification have less than satisfactory performance as they are unable to identify the class boundary between positive and negative instances accurately. In this paper, we propose a novel PU learning algorithm LCLC (Learning from Common Local Clusters) for time series classification. LCLC is designed to effectively identify the ground truths' positive and negative boundaries, resulting in more accurate classifiers than those constructed using existing methods. We have applied LCLC to classify time series data from different application domains; the experimental results demonstrate that LCLC out-performs existing methods significantly.
    IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Publishing individual specific microdata has serious privacy implications. The k-anonymity model has been proposed to prevent identity disclosure from microdata, and the work on l-diversity and t-closeness attempt to address attribute disclosure. However, most current work only deal with publishing microdata with a single sensitive attribute (SA), whereas real life scenarios often involve microdata with multiple SAs that may be multi-valued. This paper explores the issue of attribute disclosure in such scenarios. We propose a method called CODIP (Complete Disjoint Projections) that outlines a general solution to deal with the shortcomings in a naïve approach. We also introduce two measures, Association Loss Ratio and Information Exposure Ratio, to quantify data quality and privacy, respectively. We further propose a heuristic CODIP* for CODIP, which obtains a good trade-off in data quality and privacy. Finally, initial experiments show that CODIP* is practically useful on varying numbers of SAs.
    Database and Expert Systems Applications - 22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: An important class of protein interactions involves the binding of a protein's domain to a short linear motif (SLiM) on its interacting partner. Extracting such motifs, either experimentally or computationally, is challenging because of their weak binding and high degree of degeneracy. Recent rapid increase of available protein structures provides an excellent opportunity to study SLiMs directly from their 3D structures. RESULTS: Using domain interface extraction (Diet), we characterized 452 distinct SLiMs from the Protein Data Bank (PDB), of which 155 are validated in varying degrees-40 have literature validation, 54 are supported by at least one domain-peptide structural instance, and another 61 have overrepresentation in high-throughput PPI data. We further observed that the lacklustre coverage of existing computational SLiM detection methods could be due to the common assumption that most SLiMs occur outside globular domain regions. 198 of 452 SLiM that we reported are actually found on domain-domain interface; some of them are implicated in autoimmune and neurodegenerative diseases. We suggest that these SLiMs would be useful for designing inhibitors against the pathogenic protein complexes underlying these diseases. Our findings show that 3D structure-based SLiM detection algorithms can provide a more complete coverage of SLiM-mediated protein interactions than current sequence-based approaches.
    Bioinformatics 02/2010; 26(8):1036-42. · 5.47 Impact Factor
  • Eng-Yeow Cheu, Chai Quek, See-Kiong Ng
    [Show abstract] [Hide abstract]
    ABSTRACT: Appetitive operant conditioning in Aplysia for feeding behavior via electrical stimulation of esophageal nerve contingently reinforced upon each spontaneous bite resulted in contingently reinforced animals acquiring operant memory. Analysis of the cellular and molecular mechanisms of the feeding motor circuitry revealed activity-dependent neuronal modulation occurs at interneurons that mediate the feeding behaviors, providing one evidence that interneurons are possible loci of plasticity and contribute a mechanism to memory storage in addition to memory storage contributed by activity-dependent synaptic plasticity. In this paper, an associative ambiguity correction-based neuro-fuzzy network called ARPOP-CRI(S), is trained based on an appetitive reward learning algorithm that is biologically inspired from the appetitive operant conditioning of feeding behavior in Aplysia. ARPOP-CRI(S) is evaluated and compared with other modelling techniques by employing benchmark time series data sets. Experimental results are encouraging and shows that ARPOP-CRI(S) is a viable modelling technique for time series forecasting.
    International Joint Conference on Neural Networks, IJCNN 2010, Barcelona, Spain, 18-23 July, 2010; 01/2010

Publication Stats

1k Citations
119.68 Total Impact Points

Institutions

  • 2005–2013
    • National University of Singapore
      • School of Computing
      Singapore, Singapore
  • 2003–2013
    • Institute for Infocomm Research
      Tumasik, Singapore
  • 2008–2012
    • Nanyang Technological University
      • School of Computer Engineering
      Singapore, Singapore
  • 2011
    • Singapore University of Technology and Design
      Singapore
  • 2006
    • Korea Advanced Institute of Science and Technology
      • Department of Computer Science
      Seoul, Seoul, South Korea
  • 2003–2006
    • Genome Institute of Singapore
      Tumasik, Singapore