Seoung Bum Kim

Korea University, Sŏul, Seoul, South Korea

Are you Seoung Bum Kim?

Claim your profile

Publications (72)79.75 Total impact

  • Tae Woo Joo, Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Forecasting time series data is one of the most important issues involved in numerous applications in real life. Time series data have been analyzed in either the time or frequency domains. The objective of this study is to propose a forecasting method based on wavelet filtering. The proposed method decomposes the original time series into the trend and variation parts and constructs a separate model for each part. Simulation and real case studies were conducted to examine the properties of the proposed method under various scenarios and compare its performance with time series forecasting models without wavelet filtering. The results from both simulated and real data showed that the proposed method based on wavelet filtering yielded more accurate results than the models without wavelet filtering in terms of mean absolute percentage error criterion.
    Expert Systems with Applications 05/2015; 42(8). DOI:10.1016/j.eswa.2015.01.026 · 1.97 Impact Factor
  • Source
    Younghoon Kim, Seoung Bum Kim, Sangho Shim
    [Show abstract] [Hide abstract]
    ABSTRACT: Multicollinearity is the most challenging problem caused by tendency that inde-pendent variables in regression analysis are highly correlated. The multicollinearity reduces the reliability of estimated regression coefficients. In this study, we intro-duce a way of deciding the threshold of correlation which indicates the severity of multicollinearity. The way is to draw a conflict graph, which is the minimum vertex cover of multicollinear variables. The simulation results demonstrate that our pro-posed algorithm can provide an appropriate threshold for reducing large amounts of uncertainty of estimated regression coefficients.
    Submitted to INOC 2015 7th International Conference on Network Optimization; 12/2014
  • Chan Hee Park, Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations.
    Expert Systems with Applications 11/2014; 42(5). DOI:10.1016/j.eswa.2014.10.044 · 1.97 Impact Factor
  • Gulanbaier Tuerhong, Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Control charts are widely used in various industries to improve product quality. One recent trend in developing control charts is based on novelty score algorithms that can effectively describe reality and reflect the unique characteristics of the data being monitored. In this study, we compared eight novelty score algorithms—the T2, Local T2, Dmax, Dmean, K2, the k-nearest neighbor data description, the local density outlier factor, and the hybrid novelty score (HNS)—in terms of their average run length performance. A rigorous simulation was conducted to compare the novelty score-based multivariate control charts under both normal and non-normal scenarios. The simulation showed that in both normal and lognormal scenarios, Dmax-based control charts produced the most promising results. In skewed distribution with high kurtosis non-normal scenarios, HNS- and K2-based control charts performed best. In symmetric with kurtosis non-normal scenarios, local T2-based control charts outperformed the others.
    Communication in Statistics- Simulation and Computation 10/2014; 44(5):1126-1143. DOI:10.1080/03610918.2013.809098 · 0.29 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In novelty detection, support vector data description (SVDD) is a one-class classification technique that constructs a boundary to differentiate novel from normal patterns. However, boundaries constructed by SVDD do not consider the density of the data. Data points located in low density regions are more likely to be novel patterns because they are remote from their neighbors. This study presents a density-focused SVDD (DFSVDD), for which its boundary considers both shape and the dense region of the data. Two distance measures, the kernel distance and the density distance, are combined to construct the DFSVDD boundary. The kernel distance can be obtained by solving a quadratic optimization, while support vectors are used to obtain the density distance. A simulation study was conducted to evaluate the performance of the proposed DFSVDD and was then compared with the traditional SVDD. The proposed method performed better than SVDD in terms of the area under the receiver operating characteristic curve. Copyright © 2014 John Wiley & Sons, Ltd.
    Quality and Reliability Engineering 10/2014; 30(6). DOI:10.1002/qre.1688 · 0.99 Impact Factor
  • 06/2014; 40(3):291-298. DOI:10.7232/JKIIE.2014.40.3.291
  • Young Joon Park, Seoung Bum Kim
    06/2014; 40(3):275-282. DOI:10.7232/JKIIE.2014.40.3.275
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Variable selection has been widely used in regression data mining not only to select informative variables, but also to simplify the statistical model. A computer experiment based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by solving an optimization subproblem. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. Typically, a large portion of the variables have lit tle impact on the objective; thus, there is a need to eliminat e these before performing the complete set of optimization subproblem computer experiments. Ideally, variable selection would be conducted after a small number of computer experiment runs, likely fewer runs (n) than the number of variables (p). Conventional variable selection techniques cannot be applied in this "large p and small n" problem. We explore the use of regression trees and a multiple testing procedure based on false discovery rate. Performance of the selected variables is measured using the coefficient o f determination (R2) and relative errors. Two real world
    Annals of Operations Research 05/2014; 216(1). DOI:10.1007/s10479-012-1129-y · 1.10 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: From the text: This special volume evolved out of the Institute for Operations Research and Management Sciences (INFORMS) 2009 Workshop on data mining and system informatics, held at the 2009 INFORMS annual meeting.
    Annals of Operations Research 05/2014; 216(1). DOI:10.1007/s10479-014-1549-y · 1.10 Impact Factor
  • Gulanbaier Tuerhong, Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Processes characterized by high dimensional and mixture data challenge traditional statistical process control charts. In this study, we propose a multivariate control chart based on the Gower distance that can handle a mixture of continuous and categorical data. An extensive simulation study was conducted to examine the properties of the proposed control chart under various scenarios and compared it with some existing multivariate control charts. The simulation results revealed that the proposed control chart outperformed the existing charts when the number of categorical variables increases. Furthermore, we demonstrated the applicability and effectiveness of the proposed control charts through a real case study.
    Expert Systems with Applications 03/2014; 41(4):1701–1707. DOI:10.1016/j.eswa.2013.08.068 · 1.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper aims to propose a research framework of analyzing voting activities of a national assembly on the basis of member-level voting similarity and provides a case study in the national assembly in South Korea. First, we propose a bill contentiousness measure that gives a higher score to bills for which ayes and noes are more diversified in both conservative and progressive parties. Based on the bill contentiousness measure, the top 5%, 10%, and 20% bills were identified and used for further analyses. Moreover, we propose a member-level voting similarity measure that compensates for the lower frequency of noes, and evaluate the pair-wise voting similarities for all lawmakers. Then, voting similarity differences to the affiliated/non-affiliated parties were analyzed for the members in the two major parties according to some internal/external key factors. Finally, similar voting groups were identified and their affiliations were investigated based on the multi-dimensional scaling (MDS) and network analysis techniques. A case study on the national assembly of South Korea showed that the cohesion of the members in the 'Hanara' party becomes higher than that of the 'Minju' party as the bill contentiousness increases, whereas the number of elected, local constituency versus proportional representation, and the competition intensity in a local constituency were found to be partially influential to the voting activities of lawmakers. In addition, MDS and network analysis showed that there is a distinctive difference between two parties when all bills are analyzed, whereas the diversity of parties increases in the same group as the bill contentiousness increases.
    02/2014; 40(1). DOI:10.7232/JKIIE.2014.40.1.060
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new nonparametric multivariate control chart that integrates a novelty score. The proposed control chart uses as its monitoring statistic a hybrid novelty score, calculated based on the distance to local observations as well as on the distance to the convex hull constructed by its neighbors. The control limits of the proposed control chart were established based on a bootstrap method. A rigorous simulation study was conducted to examine the properties of the proposed control chart under various scenarios and compare it with existing multivariate control charts in terms of average run length (ARL) performance. The simulation results showed that the proposed control chart outperformed both the parametric and nonparametric Hotelling's T 2 control charts, especially in nonnormal situations. Moreover, experimental results with real semiconductor data demonstrated the applicability and effectiveness of the proposed control chart. To increase the capability to detect small mean shift, we propose an exponentially weighted hybrid novelty score control chart. Simulation results indicated that exponentially weighted hybrid score charts outperformed the hybrid novelty score based control charts.
    Communication in Statistics- Simulation and Computation 01/2014; 43(1). DOI:10.1080/03610918.2012.698775 · 0.29 Impact Factor
  • Ji Hoon Kang, Chan Hee Park, Seoung Bum Kim
    Formal Pattern Analysis & Applications 01/2014; DOI:10.1007/s10044-014-0399-1 · 0.74 Impact Factor
  • Ji Hoon Kang, Seoung Bum Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical process control techniques have been widely used to improve processes by reducing variations and defects. In the present paper, we propose a multivariate control chart technique based on a clustering algorithm that can effectively handle a situation in which the distribution of in-control observations is inhomogeneous. A simulation study was conducted to examine the characteristics of the proposed control chart and to compare them with Hotelling’s T 2 multivariate control charts that are widely used in real-world processes. Moreover, an experiment with real data from the thin film transistor liquid crystal display (TFT-LCD) manufacturing process demonstrated the effectiveness and accuracy of the proposed control chart.
    International Journal of Production Research 09/2013; 51(18). DOI:10.1080/00207543.2013.793427 · 1.32 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Relevant statistical modeling and analysis of dental data can improve diagnostic and treatment procedures. The purpose of this study is to demonstrate the use of various data mining algorithms to characterize patients with dentofacial deformities. A total of 72 patients with skeletal malocclusions who had completed orthodontic and orthognathic surgical treatments were examined. Each patient was characterized by 22 measurements related to dentofacial deformities. Clustering analysis and visualization grouped the patients into three different patterns of dentofacial deformities. A feature selection approach based on a false discovery rate was used to identify a subset of 22 measurements important in categorizing these three clusters. Finally, classification was performed to evaluate the quality of the measurements selected by the feature selection approach. The results showed that feature selection improved classification accuracy while simultaneously determining which measurements were relevant.
    PLoS ONE 08/2013; 8(8):e67862. DOI:10.1371/journal.pone.0067862 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Multivariate control charts have been widely used in many industries to monitor and diagnose processes characterized by a large number of quality characteristics. Usually, these characteristics are highly correlated with each other. The direct use of conventional multivariate control charts for situations with highly correlated characteristics may lead to increased rates of false alarms. Principal component analysis (PCA) control charts have been widely used to address problems posed by such high correlations by transforming the set of correlated variables to an uncorrelated set of variables and then identifying the PCs with highest contribution which then allows one to reduce dimensionality. However, an assumption that the data are normally distributed underlies the construction of the control limits of traditional PCA control charts. This assumption has limited the use of PCA control charts in nonnormal situations found in many modern systems. This study presents the development of nonparametric PCA control charts that do not require any distributional assumptions for their construction. We propose to use nonparametric techniques, kernel density estimation, and bootstrapping to establish the control limits of these charts. A simulation study was conducted to evaluate the performance of the proposed charts and compare them with traditional PCA control charts. The comparative performance in terms of average run length showed that the proposed nonparametric PCA control charts performed better than the parametric PCA control charts in nonnormal situations.
    Expert Systems with Applications 06/2013; 40(8):3044–3054. DOI:10.1016/j.eswa.2012.12.020 · 1.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Extracting useful and meaningful patterns from large volumes of text data is of growing importance. In the present study we analyze vast amounts of prescription data, generated from the book of oriental medicine to identify the relationships between the symptoms and the associated medicines used to treat these symptoms. The oriental medicine book used in this study (called Bangyakhappyeon) contains a large number of prescriptions to treat about 54 categorized symptoms and lists the corresponding herbal materials. We used an association rule algorithm combined with network analysis and found useful and informative relationships between the symptoms and medicines.
    PLoS ONE 03/2013; 8(3):e59241. DOI:10.1371/journal.pone.0059241 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Control charts have been widely recognised as important tools in system monitoring of abnormal behaviour and quality improvement. Traditional control charts have a major assumption that successive observations are uncorrelated and normally distributed. When this assumption is violated, the traditional control charts do not perform well, but instead show increased false alarm rates. In this study, we propose a data mining model adjustment control chart to address autocorrelation problems for cascade processes. The basic idea of the proposed control chart is to monitor the residuals obtained by data mining models. The data mining models used in this study include support vector regression and artificial neural networks. A simulation study was conducted to evaluate the performance of the proposed control chart and compare it with the standard regression adjustment control chart and the observations-based control chart in terms of average run length performance. The results showed that the proposed data mining model adjustment control charts yielded better performance than the two other methods considered in this study. [Received 8 December 2010; Revised 19 June 2011; Revised 9 September 2011; Accepted 29 November 2011]
    European J of Industrial Engineering 01/2013; 7(4):442 - 455. DOI:10.1504/EJIE.2013.055017 · 1.50 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Process monitoring and diagnosis have been widely recognized as important and critical tools in system monitoring for detection of abnormal behavior and quality improvement. In this study, we present bootstrap-based multivariate control charts to efficiently monitor TFT-LCD (Thin Film Transistor Liquid Crystal Display) manufacturing processes containing a number of quality characteristics that are correlated with each other. Experimental results with real data from the TFT-LCD manufacturing process demonstrate the effectiveness and robustness of the bootstrap-based multivariate control charts.
    Journal of Computational and Theoretical Nanoscience 06/2012; 13(1):579-583. DOI:10.1166/asl.2012.3858 · 1.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Process monitoring and diagnosis have been widely recognized as important and critical tools in system monitoring for detection of abnormal behavior and quality improvement. Although traditional statistical process control (SPC) tools are effective in simple manufacturing processes that generate a small volume of independent data, these tools are not capable of handling the large streams of multivariate and autocorrelated data found in modern systems. As the limitations of SPC methodology become increasingly obvious in the face of ever more complex processes, data mining algorithms, because of their proven capabilities to effectively analyze and manage large amounts of data, have the potential to resolve the challenging problems that are stretching SPC to its limits. In the present study we attempted to integrate state-of-the-art data mining algorithms with SPC techniques to achieve efficient monitoring in multivariate and autocorrelated processes. The data mining algorithms include artificial neural networks, support vector regression, and multivariate adaptive regression splines. The residuals of data mining models were utilized to construct multivariate cumulative sum control charts to monitor the process mean. Simulation results from various scenarios indicated that data mining model-based control charts performs better than traditional time-series model-based control charts.
    Expert Systems with Applications 02/2012; 39(2):2073-2081. DOI:10.1016/j.eswa.2011.08.010 · 1.97 Impact Factor

Publication Stats

325 Citations
79.75 Total Impact Points

Institutions

  • 2009–2015
    • Korea University
      • Department of Information Management Engineering
      Sŏul, Seoul, South Korea
    • Emory University
      • School of Medicine
      Atlanta, GA, United States
  • 2006–2011
    • University of Texas at Arlington
      • • Department of Chemistry and Biochemistry
      • • Department of Industrial and Manufacturing Systems Engineering
      Arlington, Texas, United States
  • 2003–2006
    • Georgia Institute of Technology
      • School of Industrial and Systems Engineering
      Atlanta, Georgia, United States