Seoung Bum Kim

Korea University, Sŏul, Seoul, South Korea

Are you Seoung Bum Kim?

Claim your profile

Publications (78)85.04 Total impact

  • Sung Won Han · Kyu Jong Lee · Hua Zhong · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Spatiotemporal surveillance, especially in detection of emerging outbreaks is of particular importance. When an outbreak spreads across some areas, the incidence rate at the center of the outbreak area might be expected to be much higher than the rate at its edge. However, to the best of our knowledge, all existing methods assume a uniformly increasing rate across the entire area of the outbreak. The purpose of this study is to compare the performance of the spatiotemporal surveillance methods such as multivariate cumulative sum (MCUSUM) or multivariate exponentially weighted moving average (MEWMA) when the changes in size are nonhomogeneous. Monte Carlo simulations were conducted to examine the properties of these spatiotemporal surveillance methods and compared them in terms of the detection speed and the identification rate under various scenarios. The results showed that when nonhomogeneous change sizes are involved, the MCUSUM method taking into account spatial nonhomogeneity of increase rates yields a better identification than the method ignoring such change size pattern although the detection speeds are similar. Further, a case study for the detection of male thyroid cancer data in New Mexico in the United States was performed to demonstrate the applicability of these methods.
    Communication in Statistics- Simulation and Computation 11/2015; 44(10). DOI:10.1080/03610918.2013.844837 · 0.29 Impact Factor
  • Tae Woo Joo · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Forecasting time series data is one of the most important issues involved in numerous applications in real life. Time series data have been analyzed in either the time or frequency domains. The objective of this study is to propose a forecasting method based on wavelet filtering. The proposed method decomposes the original time series into the trend and variation parts and constructs a separate model for each part. Simulation and real case studies were conducted to examine the properties of the proposed method under various scenarios and compare its performance with time series forecasting models without wavelet filtering. The results from both simulated and real data showed that the proposed method based on wavelet filtering yielded more accurate results than the models without wavelet filtering in terms of mean absolute percentage error criterion.
    Expert Systems with Applications 05/2015; 42(8). DOI:10.1016/j.eswa.2015.01.026 · 1.97 Impact Factor
  • Younghoon Kim · Kevin A. Schug · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Successful identification of the significant features in complex mass spectral fingerprints is a crucial task in discriminating states or differences in natural systems (e.g., diseased vs. healthy, treated vs. untreated, and male vs. female) that are visualized using mass spectrometry technology. In this study, we present an ensemble regularization method that combines three regularization regression models to generate more robust results. Specifically, the coefficients from each of three regularization models were bootstrapped and the means and standard deviations of these coefficients were calculated. After obtaining these estimated statistics of the coefficients, we performed a hypothesis test for each feature. Finally, we determined the significant features that were simultaneously selected by the three hypothesis tests. Mass spectral data from six different extracts of mosquito cuticles were used to evaluate the performance of the proposed method. The purpose of this spectral analysis was to determine the major features needed to differentiate married-female mosquitoes having the potential to cause malaria infection from others. In addition, we compared the proposed ensemble feature selection method with random forest, a widely used feature selection algorithm. We found that the proposed method outperformed random forest in terms of feature selection efficiency.
    Chemometrics and Intelligent Laboratory Systems 05/2015; 146. DOI:10.1016/j.chemolab.2015.05.009 · 2.32 Impact Factor
  • Jieun Son · Seoung Bum Kim · Hyunjoong Kim · Sungzoon Cho
    04/2015; 41(2):185-208. DOI:10.7232/JKIIE.2015.41.2.185
  • Kyu Jong Lee · Ji Hoon Kang · Jae Hong Yu · Seoung Bum Kim
    European J of Industrial Engineering 01/2015; 9(3):395. DOI:10.1504/EJIE.2015.069346 · 1.50 Impact Factor
  • Source
    Younghoon Kim · Seoung Bum Kim · Sangho Shim
    [Show abstract] [Hide abstract]
    ABSTRACT: Multicollinearity is the most challenging problem caused by tendency that inde-pendent variables in regression analysis are highly correlated. The multicollinearity reduces the reliability of estimated regression coefficients. In this study, we intro-duce a way of deciding the threshold of correlation which indicates the severity of multicollinearity. The way is to draw a conflict graph, which is the minimum vertex cover of multicollinear variables. The simulation results demonstrate that our pro-posed algorithm can provide an appropriate threshold for reducing large amounts of uncertainty of estimated regression coefficients.
    Submitted to INOC 2015 7th International Conference on Network Optimization; 12/2014
  • Chan Hee Park · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.
    Expert Systems with Applications 11/2014; 42(5). DOI:10.1016/j.eswa.2014.10.044 · 1.97 Impact Factor
  • Gulanbaier Tuerhong · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Control charts are widely used in various industries to improve product quality. One recent trend in developing control charts is based on novelty score algorithms that can effectively describe reality and reflect the unique characteristics of the data being monitored. In this study, we compared eight novelty score algorithms—the T2, Local T2, Dmax, Dmean, K2, the k-nearest neighbor data description, the local density outlier factor, and the hybrid novelty score (HNS)—in terms of their average run length performance. A rigorous simulation was conducted to compare the novelty score-based multivariate control charts under both normal and non-normal scenarios. The simulation showed that in both normal and lognormal scenarios, Dmax-based control charts produced the most promising results. In skewed distribution with high kurtosis non-normal scenarios, HNS- and K2-based control charts performed best. In symmetric with kurtosis non-normal scenarios, local T2-based control charts outperformed the others.
    Communication in Statistics- Simulation and Computation 10/2014; 44(5):1126-1143. DOI:10.1080/03610918.2013.809098 · 0.29 Impact Factor
  • Sungho Park · Seoung Bum Kim
    10/2014; 40(5):492-500. DOI:10.7232/JKIIE.2014.40.5.492
  • [Show abstract] [Hide abstract]
    ABSTRACT: In novelty detection, support vector data description (SVDD) is a one-class classification technique that constructs a boundary to differentiate novel from normal patterns. However, boundaries constructed by SVDD do not consider the density of the data. Data points located in low density regions are more likely to be novel patterns because they are remote from their neighbors. This study presents a density-focused SVDD (DFSVDD), for which its boundary considers both shape and the dense region of the data. Two distance measures, the kernel distance and the density distance, are combined to construct the DFSVDD boundary. The kernel distance can be obtained by solving a quadratic optimization, while support vectors are used to obtain the density distance. A simulation study was conducted to evaluate the performance of the proposed DFSVDD and was then compared with the traditional SVDD. The proposed method performed better than SVDD in terms of the area under the receiver operating characteristic curve. Copyright © 2014 John Wiley & Sons, Ltd.
    Quality and Reliability Engineering 10/2014; 30(6). DOI:10.1002/qre.1688 · 0.99 Impact Factor
  • Ji Hoon Kang · Chan Hee Park · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering analysis elicits the natural groupings of a dataset without requiring information about the sample class and has been widely used in various fields. Although numerous clustering algorithms have been proposed and proven to perform reasonably well, no consensus exists about which one performs best in real situations. In this study, we propose a nonparametric clustering method based on recursive binary partitioning that was implemented in a classification and regression tree model. The proposed clustering algorithm has two key advantages: (1) users do not have to specify any parameters before running it; (2) the final clustering result is represented by a set of if–then rules, thereby facilitating analysis of the clustering results. Experiments with the simulations and real datasets demonstrate the effectiveness and usefulness of the proposed algorithm.
    Formal Pattern Analysis & Applications 08/2014; DOI:10.1007/s10044-014-0399-1 · 0.74 Impact Factor
  • 06/2014; 40(3):291-298. DOI:10.7232/JKIIE.2014.40.3.291
  • Young Joon Park · Seoung Bum Kim
    06/2014; 40(3):275-282. DOI:10.7232/JKIIE.2014.40.3.275
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Variable selection has been widely used in regression data mining not only to select informative variables, but also to simplify the statistical model. A computer experiment based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by solving an optimization subproblem. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. Typically, a large portion of the variables have lit tle impact on the objective; thus, there is a need to eliminat e these before performing the complete set of optimization subproblem computer experiments. Ideally, variable selection would be conducted after a small number of computer experiment runs, likely fewer runs (n) than the number of variables (p). Conventional variable selection techniques cannot be applied in this "large p and small n" problem. We explore the use of regression trees and a multiple testing procedure based on false discovery rate. Performance of the selected variables is measured using the coefficient o f determination (R2) and relative errors. Two real world
    Annals of Operations Research 05/2014; 216(1). DOI:10.1007/s10479-012-1129-y · 1.10 Impact Factor
  • Victoria C. P. Chen · Seoung Bum Kim · Theodore Trafalis
    Annals of Operations Research 05/2014; 216(1). DOI:10.1007/s10479-014-1549-y · 1.10 Impact Factor
  • Gulanbaier Tuerhong · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Processes characterized by high dimensional and mixture data challenge traditional statistical process control charts. In this study, we propose a multivariate control chart based on the Gower distance that can handle a mixture of continuous and categorical data. An extensive simulation study was conducted to examine the properties of the proposed control chart under various scenarios and compared it with some existing multivariate control charts. The simulation results revealed that the proposed control chart outperformed the existing charts when the number of categorical variables increases. Furthermore, we demonstrated the applicability and effectiveness of the proposed control charts through a real case study.
    Expert Systems with Applications 03/2014; 41(4):1701–1707. DOI:10.1016/j.eswa.2013.08.068 · 1.97 Impact Factor
  • Source
    Pilsung Kang · Youngjoon Park · Sugon Cho · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper aims to propose a research framework of analyzing voting activities of a national assembly on the basis of member-level voting similarity and provides a case study in the national assembly in South Korea. First, we propose a bill contentiousness measure that gives a higher score to bills for which ayes and noes are more diversified in both conservative and progressive parties. Based on the bill contentiousness measure, the top 5%, 10%, and 20% bills were identified and used for further analyses. Moreover, we propose a member-level voting similarity measure that compensates for the lower frequency of noes, and evaluate the pair-wise voting similarities for all lawmakers. Then, voting similarity differences to the affiliated/non-affiliated parties were analyzed for the members in the two major parties according to some internal/external key factors. Finally, similar voting groups were identified and their affiliations were investigated based on the multi-dimensional scaling (MDS) and network analysis techniques. A case study on the national assembly of South Korea showed that the cohesion of the members in the 'Hanara' party becomes higher than that of the 'Minju' party as the bill contentiousness increases, whereas the number of elected, local constituency versus proportional representation, and the competition intensity in a local constituency were found to be partially influential to the voting activities of lawmakers. In addition, MDS and network analysis showed that there is a distinctive difference between two parties when all bills are analyzed, whereas the diversity of parties increases in the same group as the bill contentiousness increases.
    02/2014; 40(1). DOI:10.7232/JKIIE.2014.40.1.060
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new nonparametric multivariate control chart that integrates a novelty score. The proposed control chart uses as its monitoring statistic a hybrid novelty score, calculated based on the distance to local observations as well as on the distance to the convex hull constructed by its neighbors. The control limits of the proposed control chart were established based on a bootstrap method. A rigorous simulation study was conducted to examine the properties of the proposed control chart under various scenarios and compare it with existing multivariate control charts in terms of average run length (ARL) performance. The simulation results showed that the proposed control chart outperformed both the parametric and nonparametric Hotelling's T 2 control charts, especially in nonnormal situations. Moreover, experimental results with real semiconductor data demonstrated the applicability and effectiveness of the proposed control chart. To increase the capability to detect small mean shift, we propose an exponentially weighted hybrid novelty score control chart. Simulation results indicated that exponentially weighted hybrid score charts outperformed the hybrid novelty score based control charts.
    Communication in Statistics- Simulation and Computation 01/2014; 43(1). DOI:10.1080/03610918.2012.698775 · 0.29 Impact Factor
  • Ji Hoon Kang · Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical process control techniques have been widely used to improve processes by reducing variations and defects. In the present paper, we propose a multivariate control chart technique based on a clustering algorithm that can effectively handle a situation in which the distribution of in-control observations is inhomogeneous. A simulation study was conducted to examine the characteristics of the proposed control chart and to compare them with Hotelling’s T 2 multivariate control charts that are widely used in real-world processes. Moreover, an experiment with real data from the thin film transistor liquid crystal display (TFT-LCD) manufacturing process demonstrated the effectiveness and accuracy of the proposed control chart.
    International Journal of Production Research 09/2013; 51(18). DOI:10.1080/00207543.2013.793427 · 1.32 Impact Factor
  • Source
    Seoung Bum Kim · Jung Woo Lee · Sin Young Kim · Deok Won Lee
    [Show abstract] [Hide abstract]
    ABSTRACT: Relevant statistical modeling and analysis of dental data can improve diagnostic and treatment procedures. The purpose of this study is to demonstrate the use of various data mining algorithms to characterize patients with dentofacial deformities. A total of 72 patients with skeletal malocclusions who had completed orthodontic and orthognathic surgical treatments were examined. Each patient was characterized by 22 measurements related to dentofacial deformities. Clustering analysis and visualization grouped the patients into three different patterns of dentofacial deformities. A feature selection approach based on a false discovery rate was used to identify a subset of 22 measurements important in categorizing these three clusters. Finally, classification was performed to evaluate the quality of the measurements selected by the feature selection approach. The results showed that feature selection improved classification accuracy while simultaneously determining which measurements were relevant.
    PLoS ONE 08/2013; 8(8):e67862. DOI:10.1371/journal.pone.0067862 · 3.53 Impact Factor

Publication Stats

360 Citations
85.04 Total Impact Points

Institutions

  • 2006–2015
    • Korea University
      • Department of Information Management Engineering
      Sŏul, Seoul, South Korea
  • 2006–2011
    • University of Texas at Arlington
      • • Department of Chemistry and Biochemistry
      • • Department of Industrial and Manufacturing Systems Engineering
      Arlington, Texas, United States
  • 2009
    • Emory University
      • School of Medicine
      Atlanta, GA, United States
  • 2003–2006
    • Georgia Institute of Technology
      • School of Industrial and Systems Engineering
      Atlanta, Georgia, United States