Seongho Kim

Wayne State University, Detroit, Michigan, United States

Are you Seongho Kim?

Claim your profile

Publications (42)132.32 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC$\times$GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach. We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.
    08/2014;
  • Imhoi Koo, Sen Yao, Xiang Zhang, Seongho Kim
    [Show abstract] [Hide abstract]
    ABSTRACT: Gaussian graphical model (GGM)-based method, a key approach to reverse engineering biological networks, uses partial correlation to measure conditional dependence between two variables by controlling the contribution from other variables. After estimating partial correlation coefficients, one of the most critical processes in network construction is to control the false discovery rate (FDR) to assess the significant associations among variables. Various FDR methods have been proposed mainly for biomarker discovery, but it still remains unclear which FDR method performs better for network construction. Furthermore, there is no study to see the effect of the network structure on network construction. We selected the six FDR methods, the linear step-up procedure (BH95), the adaptive linear step-up procedure (BH00), Efron's local FDR (LFDR), Benjamini-Yekutieli's step-up procedure (BY01), Storey's q-value procedure (Storey01), and Storey-Taylor-Siegmund's adaptive step-up procedure (STS04), to evaluate their performances on network construction. We further considered two network structures, random and scale-free networks, to investigate their influence on network construction. Both simulated data and real experimental data suggest that STS04 provides the highest true positive rate (TPR) or F1 score, while BY01 has the highest positive predictive value (PPV) in network construction. In addition, no significant effect of the network structure is found on FDR methods.
    Journal of bioinformatics and computational biology. 08/2014; 12(4):1450018.
  • Seongho Kim, Xiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Compound identification is a critical process in metabolomics. The widely used approach for compound identification in gas chromatography–mass spectrometry-based metabolomics is spectrum matching, in which the mass spectral similarity between an experimental mass spectrum and each mass spectrum in a reference library is calculated. While various similarity measures have been developed to improve the overall accuracy of compound identification, little attention has been paid to reducing the false discovery rate. We, therefore, develop an approach for controlling the false identification rate using the distribution of the difference between the first and second highest spectral similarity scores. We further propose a model-based approach to achieving a desired true positive rate. The developed method is applied to the National Institute of Standards and Technology mass spectral library, and its performance is compared with that of the conventional approach that uses only the maximum spectral similarity score. The results show that the developed method achieves a significantly higher F1 score and positive predictive value than did the conventional approach. Copyright © 2014 John Wiley & Sons, Ltd.
    Journal of Chemometrics 08/2014; · 1.94 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Childhood obesity has become a national public health crisis in America. Physical inactivity and unhealthy eating behaviors may contribute to the childhood obesity epidemic. School-based healthy lifestyle interventions play a promising role in preventing and controlling childhood obesity. A comprehensive school-based healthy lifestyle intervention was implemented in 4 rural elementary schools in Kentucky. The intervention included 4 goals: improving physical education, health education, family/community involvement, and school wellness policies. Children's physical activity was assessed by pedometer, and nutrition was assessed by a previous day recall survey in January (baseline), February (t1), March (t2), April (t3), and May (t4) of 2011. The intervention had significant effects on increasing the percentages of children meeting physical activity (1% vs 5%, p < .01) and nutrition (15% vs 26%, p < .01) recommendations. The effects of the intervention on physical activity and nutrition depended on school, grade, and age of the children. There was an increasing linear trend of physical activity and an increasing quadratic trend of nutrition over time among children. The intervention had beneficial effects in improving healthy behaviors among children. Further studies are needed to assess its long-term effects and cost-effectiveness.
    Journal of School Health 04/2014; 84(4):247-55. · 1.50 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We report a compound identification method (SimMR), which simultaneously evaluates the mass spectrum similarity and the retention index distance using an empirical mixture score function, for the analysis of GC-MS data. The performance of the developed SimMR method was compared to that of two existing compound identification strategies. One is the mass spectrum matching method without incorporation of retention index information (SM). The other is the method that sequentially evaluates the mass spectrum similarity and retention index distance (SeqMR). For comparison purposes, we used the NIST/EPA/NIH Mass Spectral Library 2005. Our study demonstrates that SimMR performs the best among the three compound identification methods, by improving the overall identification accuracy up to 1.53% and 4.81% compared to SeqMR and SM, respectively.
    The Analyst 03/2014; · 4.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A data dependent peak model (DDPM) based spectrum deconvolution method was developed for analysis of high resolution LC-MS data. To construct the selected ion chromatogram (XIC), a clustering method, the density based spatial clustering of applications with noise (DBSCAN), is applied to all m/z values of an LC-MS data set to group the m/z values into each XIC. The DBSCAN constructs XICs without the need for a user defined m/z variation window. After the XIC construction, the peaks of molecular ions in each XIC are detected using both the first and the second derivative tests, followed by an optimized chromatographic peak model selection method for peak deconvolution. A total of six chromatographic peak models are considered, including Gaussian, log-normal, Poisson, gamma, exponentially modified Gaussian, and hybrid of exponential and Gaussian models. The abundant nonoverlapping peaks are chosen to find the optimal peak models that are both data- and retention-time-dependent. Analysis of 18 spiked-in LC-MS data demonstrates that the proposed DDPM spectrum deconvolution method outperforms the traditional method. On average, the DDPM approach not only detected 58 more chromatographic peaks from each of the testing LC-MS data but also improved the retention time and peak area 3% and 6%, respectively.
    Analytical Chemistry 02/2014; 86(4):2156-65. · 5.70 Impact Factor
  • Imhoi Koo, Xue Shi, Seongho Kim, Xiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: We developed a method, iMatch2, for compound identification using retention indices (RI) in NIST11 library. Three-way ANOVA test and Kruskal-Wallis test respectively demonstrates that column class and temperature program type defined by the NIST library are the most dominant factors affecting the magnitude of retention index while the retention index data type does not cause significant difference. The developed linear regression transformation for merging retention indices with different data types but the same column class and temperature program type reduces the standard deviation of retention index up to 8%, compared to the simple union approach used in the original iMatch. As for outlier detection methods to remove retention indices having large difference with the remaining data of the same compound, Tietjen-Moore test and generalized extreme studentized deviate test are the strictest methods, while methods such as Dixon's test, Thompson tau approach, and Grubbs’ test are more conservative. To improve the accuracy of retention index window, a concept of compound specific retention index window is introduced for compounds with a large number of retention indices in the NIST11 library, while the retention index window is calculated from empirical distributions for the compounds with a small number of retention indices. Analysis of the experimental data of a mixture of compound standards and the metabolite extract from mouse liver show significant improvement of retention index quality in the NIST11 library and the new data analysis methods.
    Journal of Chromatography A 01/2014; · 4.61 Impact Factor
  • Seongho Kim, Lang Li
    [Show abstract] [Hide abstract]
    ABSTRACT: The statistical identifiability of nonlinear pharmacokinetic (PK) models with the Michaelis-Menten (MM) kinetic equation is considered using a global optimization approach, which is particle swarm optimization (PSO). If a model is statistically non-identifiable, the conventional derivative-based estimation approach is often terminated earlier without converging, due to the singularity. To circumvent this difficulty, we develop a derivative-free global optimization algorithm by combining PSO with a derivative-free local optimization algorithm to improve the rate of convergence of PSO. We further propose an efficient approach to not only checking the convergence of estimation but also detecting the identifiability of nonlinear PK models. PK simulation studies demonstrate that the convergence and identifiability of the PK model can be detected efficiently through the proposed approach. The proposed approach is then applied to clinical PK data along with a two-compartmental model.
    Computer methods and programs in biomedicine 10/2013; · 1.56 Impact Factor
  • Hyejeong Jang, Seongho Kim, Dongfeng Wu
    [Show abstract] [Hide abstract]
    ABSTRACT: Lung cancer screening using X-rays has been controversial for many years. A major concern is whether lung cancer screening really brings any survival benefits, which depends on effective treatment after early detection. The problem was analyzed from a different point of view and estimates were presented of the projected lead time for participants in a lung cancer screening program using the Johns Hopkins Lung Project (JHLP) data. The newly developed method of lead time estimation was applied where the lifetime T was treated as a random variable rather than a fixed value, resulting in the number of future screenings for a given individual is a random variable. Using the actuarial life table available from the United States Social Security Administration, the lifetime distribution was first obtained, then the lead time distribution was projected using the JHLP data. The data analysis with the JHLP data shows that, for a male heavy smoker with initial screening ages at 50, 60, and 70, the probability of no-early-detection with semiannual screens will be 32.16%, 32.45%, and 33.17%, respectively; while the mean lead time is 1.36, 1.33 and 1.23years. The probability of no-early-detection increases monotonically when the screening interval increases, and it increases slightly as the initial age increases for the same screening interval. The mean lead time and its standard error decrease when the screening interval increases for all age groups, and both decrease when initial age increases with the same screening interval. The overall mean lead time estimated with a random lifetime T is slightly less than that with a fixed value of T. This result is hoped to be of benefit to improve current screening programs.
    Journal of epidemiology and global health. 09/2013; 3(3):157-63.
  • Imhoi Koo, Seongho Kim, Xiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Compound identification in gas chromatography-mass spectrometry (GC-MS) is usually achieved by matching query spectra to spectra present in a reference library. Although several spectral similarity measures have been developed and compared using a small reference library, it still remains unknown how the relationship between the spectral similarity measure and the size of reference library affects on the identification accuracy as well as the optimal weight factor. We used three reference libraries to investigate the dependency of the optimal weight factor, spectral similarity measure and the size of reference library. Our study demonstrated that the optimal weight factor depends on not only spectral similarity measure but also the size of reference library. The mixture semi-partial correlation measure outperforms all existing spectral similarity measures in all tested reference libraries, in spite of the computational expense. Furthermore, the accuracy of compound identification using a larger reference library in future is estimated by varying the size of reference library. Simulation study indicates that the mixture semi-partial correlation measure will have the best performance with the increase of reference library in future.
    Journal of Chromatography A 05/2013; · 4.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: Due to the high complexity of metabolome, the comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) is considered as a powerful analytical platform for metabolomics study. However, the applications of GC×GC-TOF MS in metabolomics are not popular due to the lack of bioinformatics system for data analysis. RESULTS: We developed a computational platform entitled MetPP for analysis of metabolomics data acquired on a GC×GC-TOF MS system. MetPP can process peak filtering and merging, retention index matching, peak list alignment, normalization, statistical significance tests, and pattern recognition, using the peak lists deconvoluted from the instrument data as its input. The performance of MetPP software was tested with two sets of experimental data acquired in a spike-in experiment and a biomarker discovery experiment, respectively. MetPP not only correctly aligned the spiked-in metabolite standards from the experimental data, but also correctly recognized their concentration difference between sample groups. For analysis of the biomarker discovery data, a total of 15 metabolites were recognized with significant concentration difference between the sample groups and these results agree with the literature results of histological analysis, demonstrating the effectiveness of applying MetPP software for disease biomarker discovery. AVAILABILITY: The source code of MetPP is available at http://metaopen.sourceforge.net CONTACT: xiang.zhang@louisville.edu SUPPLEMENTARY INFORMATION: Supplementary Information data are available at Bioinformatics online.
    Bioinformatics 05/2013; · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Since peak alignment in metabolomics has a huge effect on the subsequent statistical analysis, it is considered a key preprocessing step and many peak alignment methods have been developed. However, existing peak alignment methods do not produce satisfactory results. Indeed, the lack of accuracy results from the fact that peak alignment is done separately from another preprocessing step such as identification. Therefore, a post-hoc approach, which integrates both identification and alignment results, is in urgent need for the purpose of increasing the accuracy of peak alignment. RESULTS: The proposed post-hoc method was validated with three datasets such as a mixture of compound standards, metabolite extract from mouse liver, and metabolite extract from wheat. Compared to the existing methods, the proposed approach improved peak alignment in terms of various performance measures. Also, post-hoc approach was verified to improve peak alignment by manual inspection. CONCLUSIONS: The proposed approach, which combines the information of metabolite identification and alignment, clearly improves the accuracy of peak alignment in terms of several performance measures. R package and examples using a dataset are available at http://mrr.sourceforge.net/download.html.
    BMC Bioinformatics 04/2013; 14(1):123. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A method of employing high-resolution mass spectrometry in combination with in vivo metabolite deuterium labeling was developed in this study to investigate the effects of alcohol exposure on lipid homeostasis at the white adipose tissue (WAT)-liver axis in a mouse model of alcoholic fatty liver. In order to differentiate the liver lipids synthesized from the fatty acids that were transported back from adipose tissue and the lipids synthesized from other sources of fatty acids, a two-stage mouse feeding experiment was performed to incorporate deuterium into metabolites. Hepatic lipids extracted from mouse liver, epididymal white adipose tissue (eWAT) and subcutaneous white adipose tissue (sWAT) were analyzed. It was found that 13 and 10 triacylglycerols (TGs) incorporated with a certain number of deuterium were significantly increased in alcohol induced fatty liver at two and four weeks of alcohol feeding periods, respectively. The concentration changes of these TGs ranged from 1.7 to 6.3-fold increase. A total of 14 deuterated TGs were significantly decreased in both eWAT and sWAT at the two and four weeks and the fold-change ranged from 0.19 to 0.77. The increase of deuterium incorporated TGs in alcohol-induced fatty liver and their decrease in both eWAT and sWAT indicate that alcohol exposure induces hepatic influx of fatty acids which are released from WATs. The results of time course analysis further indicate a mechanistic link between adipose fat loss and hepatic fat gain in alcoholic fatty liver.
    PLoS ONE 01/2013; 8(2):e55382. · 3.73 Impact Factor
  • Source
    Seongho Kim, Xiang Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Peak alignment is a critical procedure in mass spectrometry-based biomarker discovery in metabolomics. One of peak alignment approaches to comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC-MS) data is peak matching-based alignment. A key to the peak matching-based alignment is the calculation of mass spectral similarity scores. Various mass spectral similarity measures have been developed mainly for compound identification, but the effect of these spectral similarity measures on the performance of peak matching-based alignment still remains unknown. Therefore, we selected five mass spectral similarity measures, cosine correlation, Pearson's correlation, Spearman's correlation, partial correlation, and part correlation, and examined their effects on peak alignment using two sets of experimental GC×GC-MS data. The results show that the spectral similarity measure does not affect the alignment accuracy significantly in analysis of data from less complex samples, while the partial correlation performs much better than other spectral similarity measures when analyzing experimental data acquired from complex biological samples.
    Computational and Mathematical Methods in Medicine 01/2013; 2013:509761. · 0.79 Impact Factor
  • Seongho Kim, Dongfeng Wu
    [Show abstract] [Hide abstract]
    ABSTRACT: The probability model for periodic screening was extended to provide statistical inference for sensitivity depending on sojourn time, in which the sensitivity was modeled as a function of time spent in the preclinical state and the sojourn time. The likelihood function with the proposed sensitivity model was then evaluated with simulated data to check its reliability in terms of the mean estimation and the standard error. Simulation results showed that the maximum likelihood estimates of the proposed model have little bias and small standard errors. The extended probability model was further applied to the Johns Hopkins Lung Project data using both maximum likelihood estimation and Bayesian Markov chain Monte Carlo.
    Statistical Methods in Medical Research 11/2012; · 2.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A set of data preprocessing algorithms for peak detection and peak list alignment are reported for analysis of liquid chromatography-mass spectrometry (LC-MS)-based metabolomics data. For spectrum deconvolution, peak picking is achieved at the selected ion chromatogram (XIC) level. To estimate and remove the noise in XICs, each XIC is first segmented into several peak groups based on the continuity of scan number, and the noise level is estimated by all the XIC signals, except the regions potentially with presence of metabolite ion peaks. After removing noise, the peaks of molecular ions are detected using both the first and the second derivatives, followed by an efficient exponentially modified Gaussian-based peak deconvolution method for peak fitting. A two-stage alignment algorithm is also developed, where the retention times of all peaks are first transferred into the z-score domain and the peaks are aligned based on the measure of their mixture scores after retention time correction using a partial linear regression. Analysis of a set of spike-in LC-MS data from three groups of samples containing 16 metabolite standards mixed with metabolite extract from mouse livers demonstrates that the developed data preprocessing method performs better than two of the existing popular data analysis packages, MZmine2.6 and XCMS(2), for peak picking, peak list alignment, and quantification.
    Analytical Chemistry 08/2012; 84(18):7963-71. · 5.70 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A method of calculating the second dimension hold-up time for comprehensive two-dimensional gas chromatographic (GC×GC) data was developed by incorporating the temperature information of the second dimension column into the calculation model. The model was developed by investigating the relationship between the coefficients in each of six literature reported nonlinear models and the relationship between each coefficient and the second dimension column temperature. The most robust nonlinear function was selected and further used to construct the new model for calculation of the second dimension retention time, in which the coefficients that have significant correlation with the column temperature are replaced with expressions of column temperature. An advantage of the proposed equation is that eight parameters could explain the second dimension hold-up time as well as retention time corresponding to n-alkanes and column temperature in the entire chromatographic region, including the chromatographic region not bounded by the retention times of n-alkanes. To optimize the experimental design for collecting the isothermal data of n-alkanes to create the second dimension hold-up time model, the column temperature difference and the number of isothermal experiments should be considered simultaneously. It was concluded that a total of 5 or 6 isothermal experiments with temperature difference of 40 or 50°C are enough to generate an accurate model. The test mean squared error (MSE) of those conditions ranges from 0.0428 to 0.0532 for calculation of the second dimension hold-up time for GC×GC data.
    Journal of Chromatography A 08/2012; 1260:193-9. · 4.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Compound identification is a key component of data analysis in the applications of gas chromatography-mass spectrometry (GC-MS). Currently, the most widely used compound identification is mass spectrum matching, in which the dot product and its composite version are employed as spectral similarity measures. Several forms of transformations for fragment ion intensities have also been proposed to increase the accuracy of compound identification. In this study, we introduced partial and semipartial correlations as mass spectral similarity measures and applied them to identify compounds along with different transformations of peak intensity. The mixture versions of the proposed method were also developed to further improve the accuracy of compound identification. To demonstrate the performance of the proposed spectral similarity measures, the National Institute of Standards and Technology (NIST) mass spectral library and replicate spectral library were used as the reference library and the query spectra, respectively. Identification results showed that the mixture partial and semipartial correlations always outperform both the dot product and its composite measure. The mixture similarity with semipartial correlation has the highest accuracy of 84.6% in compound identification with a transformation of (0.53,1.3) for fragment ion intensity and m/z value, respectively.
    Analytical Chemistry 07/2012; 84(15):6477-87. · 5.70 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Alcohol consumption induces liver steatosis; therefore, this study investigated the possible role of adipose tissue dysfunction in the pathogenesis of alcoholic steatosis. Mice were pair-fed an alcohol or control liquid diet for 8 weeks to evaluate the alcohol effects on lipid metabolism at the adipose tissue-liver axis. Chronic alcohol exposure reduced adipose tissue mass and adipocyte size. Fatty acid release from adipose tissue explants was significantly increased in alcohol-fed mice in association with the activation of adipose triglyceride lipase and hormone-sensitive lipase. Alcohol exposure induced insulin intolerance and inactivated adipose protein phosphatase 1 in association with the up-regulation of phosphatase and tensin homolog (PTEN) and suppressor of cytokine signaling 3 (SOCS3). Alcohol exposure up-regulated fatty acid transport proteins and caused lipid accumulation in the liver. To define the mechanistic link between adipose triglyceride loss and hepatic triglyceride gain, mice were first administered heavy water for 5 weeks to label adipose triglycerides with deuterium, and then pair-fed alcohol or control diet for 2 weeks. Deposition of deuterium-labeled adipose triglycerides in the liver was analyzed using Fourier transform ion cyclotron mass spectrometry. Alcohol exposure increased more than a dozen deuterium-labeled triglyceride molecules in the liver by up to 6.3-fold. These data demonstrate for the first time that adipose triglycerides due to alcohol-induced hyperlipolysis are reverse transported and deposited in the liver.
    American Journal Of Pathology 03/2012; 180(3):998-1007. · 4.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The compound identification in gas chromatography-mass spectrometry (GC-MS) is achieved by matching the experimental mass spectrum to the mass spectra in a spectral library. It is known that the intensities with higher m/z value in the GC-MS mass spectrum are the most diagnostic. Therefore, to increase the relative significance of peak intensities of higher m/z value, the intensities and m/z values are usually transformed with a set of weight factors. A poor quality of weight factors can significantly decrease the accuracy of compound identification. With the significant enrichment of the mass spectral database and the broad application of GC-MS, it is important to re-visit the methods of discovering the optimal weight factors for high confident compound identification. We developed a novel approach to finding the optimal weight factors only through a reference library for high accuracy compound identification. The developed approach first calculates the ratio of skewness to kurtosis of the mass spectral similarity scores among spectra (compounds) in a reference library and then considers a weight factor with the maximum ratio as the optimal weight factor. We examined our approach by comparing the accuracy of compound identification using the mass spectral library maintained by the National Institute of Standards and Technology. The results demonstrate that the optimal weight factors for fragment ion peak intensity and m/z value found by the developed approach outperform the current weight factors for compound identification. The results and R package are available at http://stage.louisville.edu/faculty/x0zhan17/software/ software-development.
    Bioinformatics 02/2012; 28(8):1158-63. · 5.47 Impact Factor

Publication Stats

118 Citations
132.32 Total Impact Points

Institutions

  • 2014
    • Wayne State University
      Detroit, Michigan, United States
  • 2011–2014
    • University of Louisville
      • • Department of Pharmacology and Toxicology
      • • Department of Bioinformatics and Biostatistics
      Louisville, Kentucky, United States
  • 2013
    • Karmanos Cancer Institute
      Detroit, Michigan, United States
  • 2008–2012
    • Indiana University-Purdue University Indianapolis
      • Department of Biostatistics
      Indianapolis, IN, United States
  • 2009
    • University of Michigan
      • Department of Biostatistics
      Ann Arbor, MI, United States