Haesun Park

Georgia Institute of Technology, Atlanta, GA, USA

Are you Haesun Park?

Claim your profile

Publications (73)42.17 Total impact

  • Source
    Conference Proceeding: iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction
    [show abstract] [hide abstract]
    ABSTRACT: We present an interactive visual analytics system for classification, iVisClassifier, based on a supervised dimension reduction method, linear discriminant analysis (LDA). Given high-dimensional data and associated cluster labels, LDA gives their reduced dimensional representation, which provides a good overview about the cluster structure. Instead of a single two- or three-dimensional scatter plot, iVisClassifier fully interacts with all the reduced dimensions obtained by LDA through parallel coordinates and a scatter plot. Furthermore, it significantly improves the interactivity and interpretability of LDA. LDA enables users to understand each of the reduced dimensions and how they influence the data by reconstructing the basis vector into the original data domain. By using heat maps, iVisClassifier gives an overview about the cluster relationship in terms of pairwise distances between cluster centroids both in the original space and in the reduced dimensional space. Equipped with these functionalities, iVisClassifier supports users' classification tasks in an efficient way. Using several facial image data, we show how the above analysis is performed.
    Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on; 11/2010
  • Source
    Conference Proceeding: p-ISOMAP: An Efficient Parametric Update for ISOMAP for Visual Analytics.
    Proceedings of the SIAM International Conference on Data Mining, SDM 2010, April 29 - May 1, 2010, Columbus, Ohio, USA; 01/2010
  • Source
    Conference Proceeding: Two-stage framework for visualization of clustered high dimensional data
    Jaegul Choo, S. Bohn, Haesun Park
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we discuss dimension reduction methods for 2D visualization of high dimensional clustered data. We propose a two-stage framework for visualizing such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by applying a supervised dimension reduction method such as linear discriminant analysis which preserves the original cluster structure in terms of its criteria. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, the dimension is further reduced to 2 for visualization purposes by another dimension reduction method such as principal component analysis. The role of the second-stage is to minimize the loss of information due to reducing the dimension all the way to 2. Using this framework, we propose several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and real-world text data sets.
    Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on; 11/2009
  • Source
    Conference Proceeding: Linear discriminant analysis for data with subcluster structure
    [show abstract] [hide abstract]
    ABSTRACT: Linear discriminant analysis (LDA) is a widely-used feature extraction method in classification. However, the original LDA has limitations due to the assumption of a unimodal structure for each cluster, which is satisfied in many applications such as facial image data when variations such as angle and illumination can significantly influence the images of the same person. In this paper, we propose a novel method, hierarchical LDA(h-LDA), which takes into account hierarchical subcluster structures in the data sets. Our experiments show that regularized h-LDA produces better accuracy than LDA, PCA, and tensorFaces.
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on; 01/2009
  • Source
    Conference Proceeding: Hierarchical Linear Discriminant Analysis for Beamforming.
    Jaegul Choo, Barry L. Drake, Haesun Park
    Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA; 01/2009
  • Source
    Article: Linear Discriminant Analysis for Subclustered Data
    Jaegul Choo, Barry L Drake, Haesun Park
    [show abstract] [hide abstract]
    ABSTRACT: Linear discriminant analysis (LDA) is a widely-used feature extraction method in classification. However, the original LDA has limitations due to the assumption of a unimodal structure for each cluster, which is not satisfied in many applications such as facial image data when variations, e.g. angle and illumination, can significantly influence the images. In this paper, we propose a novel method called hierarchical LDA (h-LDA), which takes into account hierarchical subcluster structures of the data in the LDA formulation and algorithm. We develop a theoretical basis of hierarchical LDA by identifying its relation to two-way multivariate analysis of variance (MANOVA) based on the data model and variance decomposition. Furthermore, an efficient algorithm for a regularized version of h-LDA (h-RLDA) is presented using the QR decomposition and the generalized SVD. To validate the effectiveness of the proposed method, we compare face recognition performance among h-RLDA, LDA, PCA, and TensorFaces. Our experiments show that h-RLDA produces better prediction accuracy than other methods. When only a small subset of features are used (reduced dimensionality), the superiority of h-RLDA over other methods becomes more significant. It is also shown that h-RLDA is a computationally much more efficient alternative to TensorFaces.
    08/2008;
  • Article: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations
    Hyunsoo Kim, Haesun Park, Barry Drake
    BMC Bioinformatics. 01/2008;
  • Source
    Conference Proceeding: Linear discriminant analysis for data with subcluster structure.
    19th International Conference on Pattern Recognition (ICPR 2008), December 8-11, 2008, Tampa, Florida, USA; 01/2008
  • Article: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations.
    Hyunsoo Kim, Haesun Park, Barry L. Drake
    BMC Bioinformatics. 01/2008; 9.
  • Article: Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method.
    Hyunsoo Kim, Haesun Park
    SIAM J. Matrix Analysis Applications. 01/2008; 30:713-730.
  • Chapter: Cluster-Preserving Dimension Reduction Methods for Document Classification
    Peg Howland, Haesun Park
    [show abstract] [hide abstract]
    ABSTRACT: In today's vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower dimensional representation must be a good approximation of the original document set given in its full space. Toward that end, we present mathematical models, based on optimization and a general matrix rank reduction formula, which incorporate a priori knowledge of the existing structure. From these models, we develop new methods for dimension reduction that can be applied regardless of the relative dimensions of the term-document matrix. We illustrate the effectiveness of each method with document classification results from the reduced representation. After establishing relationships among the solutions obtained by the various methods, we conclude with a discussion of their relative accuracy and complexity.
    12/2007: pages 3-23;
  • Conference Proceeding: A Comparison of Unsupervised Dimension Reduction Algorithms for Classification
    [show abstract] [hide abstract]
    ABSTRACT: Distance preserving dimension reduction (DPDR) using the singular value decomposition has recently been introduced. In this paper, for disease diagnosis using gene or protein expression data, we present empirical comparison results between DPDR and other various dimension reduction (DR) methods (i.e. PC A, MDS, Isomap, and LLE) when using support vector machines with radial basis function kernel. Our results show that DPDR outperforms, as a whole, other DR methods in terms of classification accuracy, but at the same time, it gives significant efficiency compared with other methods since it has no parameter to be optimized. Based on these empirical results, we reach a promising conclusion that DPDR is one of the best DR methods at hand for modeling an efficient and distortion- free classifier for gene or protein expression data.
    Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on; 12/2007
  • Source
    Conference Proceeding: Non-negative Tensor Factorization Based on Alternating Large-scale Non-negativity-constrained Least Squares
    Hyunsoo Kim, Haesun Park, L. Elden
    [show abstract] [hide abstract]
    ABSTRACT: Non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) have attracted much attention and have been successfully applied to numerous data analysis problems where the elements of the data are necessarily non-negative such as chemical concentrations, spectrometry signal intensities, and digital image pixels. Especially, Andersson and Bro's PARAFAC algorithm with non-negativity constraints (AB-PARAFAC-NC) provided the state-of-the-art NTF algorithm, which uses Bro and de Jong's non-negativity-constrained least squares with single right hand side (NLS/S-RHS). However, solving an NLS with multiple right hand sides (NLS/M-RHS) problem by multiple NLS/S-RHS problems is not recommended due to hidden redundant computation. In this paper, we propose an NTF algorithm based on alternating large-scale non-negativity-constrained least squares (NTF/ANLS) using NLS/M-RHS. In addition, we introduce an algorithm for the regularized NTF based on ANLS (RNTF/ANLS). Our experiments illustrate that our NTF algorithms outperform AB-PARAFAC-NC in terms of computing speed on several data sets we tested.
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on; 11/2007
  • Conference Proceeding: Distance Preserving Dimension Reduction Using the QR Factorization or the Cholesky Factorization
    Hyunsoo Kim, Haesun Park, Hongyuan Zha
    [show abstract] [hide abstract]
    ABSTRACT: Dimension reduction plays an important role in handling the massive quantity of high dimensional data such as biomedical text data, gene expression data, and mass spectrometry data, and so forth. In this paper, we introduce distance preserving dimension reduction (DPDR) based on the QR factorization (DPDR/QR) or the Cholesky factorization (DPDR/C). DPDR generates lower dimensional representations of the high-dimensional data, which can exactly preserve Euclidean distances and cosine similarities between any pair of data points in the original dimensional space. After projecting data points to the lower dimensional space obtained from DPDR, one can execute other data analysis algorithms. DPDR can substantially reduce the computing time and/or memory requirement of a given data analysis algorithm, especially when we need to run the data analysis algorithm many times for estimating parameters or searching for a better solution.
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on; 11/2007
  • Article: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.
    Hyunsoo Kim, Haesun Park
    [show abstract] [hide abstract]
    ABSTRACT: Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
    Bioinformatics 07/2007; 23(12):1495-502. · 5.47 Impact Factor
  • Source
    Article: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations.
    Hyunsoo Kim, Haesun Park, Barry L Drake
    [show abstract] [hide abstract]
    ABSTRACT: The construction of literature-based networks of gene-gene interactions is one of the most important applications of text mining in bioinformatics. Extracting potential gene relationships from the biomedical literature may be helpful in building biological hypotheses that can be explored further experimentally. Recently, latent semantic indexing based on the singular value decomposition (LSI/SVD) has been applied to gene retrieval. However, the determination of the number of factors k used in the reduced rank matrix is still an open problem. In this paper, we introduce a way to incorporate a priori knowledge of gene relationships into LSI/SVD to determine the number of factors. We also explore the utility of the non-negative matrix factorization (NMF) to extract unrecognized gene relationships from the biomedical literature by taking advantage of known gene relationships. A gene retrieval method based on NMF (GR/NMF) showed comparable performance with LSI/SVD. Using known gene relationships of a given gene, we can determine the number of factors used in the reduced rank matrix and retrieve unrecognized genes related with the given gene by LSI/SVD or GR/NMF.
    BMC Bioinformatics 02/2007; 8 Suppl 9:S6. · 2.75 Impact Factor
  • Article: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations.
    Hyunsoo Kim, Haesun Park, Barry L. Drake
    BMC Bioinformatics. 01/2007; 8.
  • Source
    Conference Proceeding: Distance Preserving Dimension Reduction for Manifold Learning.
    Hyunsoo Kim, Haesun Park, Hongyuan Zha
    Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA; 01/2007
  • Conference Proceeding: Cancer Class Discovery Using Non-negative Matrix Factorization Based on Alternating Non-negativity-Constrained Least Squares.
    Hyunsoo Kim, Haesun Park
    Bioinformatics Research and Applications, Third International Symposium, ISBRA 2007, Atlanta, GA, USA, May 7-10, 2007, Proceedings; 01/2007
  • Source
    Article: Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis
    [show abstract] [hide abstract]
    ABSTRACT: High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated linear discriminant analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically Uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then, the ULDA/GSVD algorithm is proposed, based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the sample size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of data sets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms
    IEEE Transactions on Knowledge and Data Engineering 11/2006; · 1.66 Impact Factor

Institutions

  • 2006–2010
    • Georgia Institute of Technology
      • • School of Computational Science & Engineering
      • • College of Computing
      Atlanta, GA, USA
    • Arizona State University
      Tempe, AZ, USA
  • 1996–2006
    • University of Minnesota Duluth
      • Department of Computer Science
      Duluth, MN, USA
  • 1993–2005
    • University of Minnesota Twin Cities
      • Department of Computer Science and Engineering
      Minneapolis, MN, USA
  • 2003
    • University of California, Santa Barbara
      Santa Barbara, CA, USA
  • 2002
    • Korea Institute for Advanced Study
      Seoul, Seoul, South Korea
  • 1994
    • KU Leuven
      • Department of Electrical Engineering (ESAT)
      Leuven, VLG, Belgium