Seoung Bum Kim

Korea University, Sŏul, Seoul, South Korea

Are you Seoung Bum Kim?

Claim your profile

Publications (83)91.99 Total impact

  • Sung Won Han · Kyu Jong Lee · Hua Zhong · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Spatiotemporal surveillance, especially in detection of emerging outbreaks is of particular importance. When an outbreak spreads across some areas, the incidence rate at the center of the outbreak area might be expected to be much higher than the rate at its edge. However, to the best of our knowledge, all existing methods assume a uniformly increasing rate across the entire area of the outbreak. The purpose of this study is to compare the performance of the spatiotemporal surveillance methods such as multivariate cumulative sum (MCUSUM) or multivariate exponentially weighted moving average (MEWMA) when the changes in size are nonhomogeneous. Monte Carlo simulations were conducted to examine the properties of these spatiotemporal surveillance methods and compared them in terms of the detection speed and the identification rate under various scenarios. The results showed that when nonhomogeneous change sizes are involved, the MCUSUM method taking into account spatial nonhomogeneity of increase rates yields a better identification than the method ignoring such change size pattern although the detection speeds are similar. Further, a case study for the detection of male thyroid cancer data in New Mexico in the United States was performed to demonstrate the applicability of these methods.
    Communication in Statistics- Simulation and Computation 11/2015; 44(10). DOI:10.1080/03610918.2013.844837 · 0.33 Impact Factor
  • Ji Hoon Kang · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Control charts have been widely used to improve manufacturing processes by reducing variations and defects. In particular, multivariate control charts have been effectively applied with monitoring processes that contain many correlated variables. Most existing multivariate control charts are vulnerable to misclassification errors that originate because of the hypothesis tests. In particular, these often cause the generation of a large number of false alarms. In this paper, we propose a procedure to reduce false alarms by combining a multivariate control chart and data mining algorithms. Simulation and real case studies demonstrate that the proposed method effectively reduces the false alarm rate.
    Journal of Process Control 11/2015; 35:21-29. DOI:10.1016/j.jprocont.2015.08.009 · 2.65 Impact Factor

  • Intelligent Systems, IEEE 11/2015; 30(6):12-29. DOI:10.1109/MIS.2015.111 · 2.34 Impact Factor
  • Su Gon Cho · Jaehee Cho · Seoung Bum Kim ·

    10/2015; 41(5):453-460. DOI:10.7232/JKIIE.2015.41.5.453
  • Woo-Sik Choi · Seoung Bum Kim ·

    08/2015; 41(4):381-388. DOI:10.7232/JKIIE.2015.41.4.381
  • Jin Soo Park · Seoung Bum Kim ·

    08/2015; 41(4):408-413. DOI:10.7232/JKIIE.2015.41.4.408
  • Tae Woo Joo · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Forecasting time series data is one of the most important issues involved in numerous applications in real life. Time series data have been analyzed in either the time or frequency domains. The objective of this study is to propose a forecasting method based on wavelet filtering. The proposed method decomposes the original time series into the trend and variation parts and constructs a separate model for each part. Simulation and real case studies were conducted to examine the properties of the proposed method under various scenarios and compare its performance with time series forecasting models without wavelet filtering. The results from both simulated and real data showed that the proposed method based on wavelet filtering yielded more accurate results than the models without wavelet filtering in terms of mean absolute percentage error criterion.
    Expert Systems with Applications 05/2015; 42(8). DOI:10.1016/j.eswa.2015.01.026 · 2.24 Impact Factor
  • Younghoon Kim · Kevin A. Schug · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Successful identification of the significant features in complex mass spectral fingerprints is a crucial task in discriminating states or differences in natural systems (e.g., diseased vs. healthy, treated vs. untreated, and male vs. female) that are visualized using mass spectrometry technology. In this study, we present an ensemble regularization method that combines three regularization regression models to generate more robust results. Specifically, the coefficients from each of three regularization models were bootstrapped and the means and standard deviations of these coefficients were calculated. After obtaining these estimated statistics of the coefficients, we performed a hypothesis test for each feature. Finally, we determined the significant features that were simultaneously selected by the three hypothesis tests. Mass spectral data from six different extracts of mosquito cuticles were used to evaluate the performance of the proposed method. The purpose of this spectral analysis was to determine the major features needed to differentiate married-female mosquitoes having the potential to cause malaria infection from others. In addition, we compared the proposed ensemble feature selection method with random forest, a widely used feature selection algorithm. We found that the proposed method outperformed random forest in terms of feature selection efficiency.
    Chemometrics and Intelligent Laboratory Systems 05/2015; 146. DOI:10.1016/j.chemolab.2015.05.009 · 2.32 Impact Factor
  • Jieun Son · Seoung Bum Kim · Hyunjoong Kim · Sungzoon Cho ·

    04/2015; 41(2):185-208. DOI:10.7232/JKIIE.2015.41.2.185
  • Kyu Jong Lee · Ji Hoon Kang · Jae Hong Yu · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Multivariate control charts have been widely recognised as efficient tools for detection of abnormal behaviour in multivariate processes. However, these charts provide only limited information about the contribution of any specific variable to an out-of-control signal. To address this limitation, some fault identification methods have been developed to identify contributors to an abnormality. In real situations, however, a couple of tasks should be further considered with these contributors to improve their applicability and to facilitate interpretation of faults. This study presents a rank sum-based summarisation technique and a decision tree algorithm to facilitate the interpretation of fault identification results. Experimental results with real data from the manufacturing process for a thin-film transistor-liquid crystal display (TF-LCD) demonstrate the applicability and effectiveness of the proposed methods.
    European J of Industrial Engineering 01/2015; 9(3):395. DOI:10.1504/EJIE.2015.069346 · 0.74 Impact Factor
  • Source
    Younghoon Kim · Seoung Bum Kim · Sangho Shim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Multicollinearity is the most challenging problem caused by tendency that inde-pendent variables in regression analysis are highly correlated. The multicollinearity reduces the reliability of estimated regression coefficients. In this study, we intro-duce a way of deciding the threshold of correlation which indicates the severity of multicollinearity. The way is to draw a conflict graph, which is the minimum vertex cover of multicollinear variables. The simulation results demonstrate that our pro-posed algorithm can provide an appropriate threshold for reducing large amounts of uncertainty of estimated regression coefficients.
    Submitted to INOC 2015 7th International Conference on Network Optimization; 12/2014
  • Chan Hee Park · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.
    Expert Systems with Applications 11/2014; 42(5). DOI:10.1016/j.eswa.2014.10.044 · 2.24 Impact Factor
  • Gulanbaier Tuerhong · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Control charts are widely used in various industries to improve product quality. One recent trend in developing control charts is based on novelty score algorithms that can effectively describe reality and reflect the unique characteristics of the data being monitored. In this study, we compared eight novelty score algorithms—the T2, Local T2, Dmax, Dmean, K2, the k-nearest neighbor data description, the local density outlier factor, and the hybrid novelty score (HNS)—in terms of their average run length performance. A rigorous simulation was conducted to compare the novelty score-based multivariate control charts under both normal and non-normal scenarios. The simulation showed that in both normal and lognormal scenarios, Dmax-based control charts produced the most promising results. In skewed distribution with high kurtosis non-normal scenarios, HNS- and K2-based control charts performed best. In symmetric with kurtosis non-normal scenarios, local T2-based control charts outperformed the others.
    Communication in Statistics- Simulation and Computation 10/2014; 44(5):1126-1143. DOI:10.1080/03610918.2013.809098 · 0.33 Impact Factor
  • Sungho Park · Seoung Bum Kim ·

    10/2014; 40(5):492-500. DOI:10.7232/JKIIE.2014.40.5.492
  • Poovich Phaladiganon · Seoung Bum Kim · Victoria C. P. Chen ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In novelty detection, support vector data description (SVDD) is a one-class classification technique that constructs a boundary to differentiate novel from normal patterns. However, boundaries constructed by SVDD do not consider the density of the data. Data points located in low density regions are more likely to be novel patterns because they are remote from their neighbors. This study presents a density-focused SVDD (DFSVDD), for which its boundary considers both shape and the dense region of the data. Two distance measures, the kernel distance and the density distance, are combined to construct the DFSVDD boundary. The kernel distance can be obtained by solving a quadratic optimization, while support vectors are used to obtain the density distance. A simulation study was conducted to evaluate the performance of the proposed DFSVDD and was then compared with the traditional SVDD. The proposed method performed better than SVDD in terms of the area under the receiver operating characteristic curve. Copyright © 2014 John Wiley & Sons, Ltd.
    Quality and Reliability Engineering 10/2014; 30(6). DOI:10.1002/qre.1688 · 1.19 Impact Factor
  • Ji Hoon Kang · Chan Hee Park · Seoung Bum Kim ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering analysis elicits the natural groupings of a dataset without requiring information about the sample class and has been widely used in various fields. Although numerous clustering algorithms have been proposed and proven to perform reasonably well, no consensus exists about which one performs best in real situations. In this study, we propose a nonparametric clustering method based on recursive binary partitioning that was implemented in a classification and regression tree model. The proposed clustering algorithm has two key advantages: (1) users do not have to specify any parameters before running it; (2) the final clustering result is represented by a set of if–then rules, thereby facilitating analysis of the clustering results. Experiments with the simulations and real datasets demonstrate the effectiveness and usefulness of the proposed algorithm.
    Formal Pattern Analysis & Applications 08/2014; DOI:10.1007/s10044-014-0399-1 · 0.65 Impact Factor
  • Seulki Lee · Ji Hoon Kang · Hankyu Lee · Tae Woo Joo · Shawn Oh · Sungwook Park · Seoung Bum Kim ·

    06/2014; 40(3):291-298. DOI:10.7232/JKIIE.2014.40.3.291
  • Young Joon Park · Seoung Bum Kim ·

    06/2014; 40(3):275-282. DOI:10.7232/JKIIE.2014.40.3.275
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Variable selection has been widely used in regression data mining not only to select informative variables, but also to simplify the statistical model. A computer experiment based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by solving an optimization subproblem. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. Typically, a large portion of the variables have lit tle impact on the objective; thus, there is a need to eliminat e these before performing the complete set of optimization subproblem computer experiments. Ideally, variable selection would be conducted after a small number of computer experiment runs, likely fewer runs (n) than the number of variables (p). Conventional variable selection techniques cannot be applied in this "large p and small n" problem. We explore the use of regression trees and a multiple testing procedure based on false discovery rate. Performance of the selected variables is measured using the coefficient o f determination (R2) and relative errors. Two real world
    Annals of Operations Research 05/2014; 216(1). DOI:10.1007/s10479-012-1129-y · 1.22 Impact Factor
  • Victoria C. P. Chen · Seoung Bum Kim · Theodore Trafalis ·

    Annals of Operations Research 05/2014; 216(1). DOI:10.1007/s10479-014-1549-y · 1.22 Impact Factor

Publication Stats

495 Citations
91.99 Total Impact Points


  • 2006-2015
    • Korea University
      • Department of Information Management Engineering
      Sŏul, Seoul, South Korea
  • 2006-2011
    • University of Texas at Arlington
      • • Department of Chemistry and Biochemistry
      • • Department of Industrial and Manufacturing Systems Engineering
      Arlington, Texas, United States
  • 2009
    • Emory University
      • School of Medicine
      Atlanta, GA, United States
  • 2003-2006
    • Georgia Institute of Technology
      • School of Industrial and Systems Engineering
      Atlanta, Georgia, United States