Xiaohui Lin

Dalian University of Technology, Lü-ta-shih, Liaoning, China

Are you Xiaohui Lin?

Claim your profile

Publications (45)61.7 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Liquid chromatography-mass spectrometry (LC-MS) is now a main stream technique for large-scale metabolic phenotyping to obtain a better understanding of genomic functions. However, repeatability is still an essential issue for the LC-MS based methods, and convincing strategies for long time analysis are urgently required. Our former reported pseudotargeted method which combines nontargeted and targeted analyses, is proved to be a practical approach with high-quality and information-rich data. In this study, we developed a comprehensive strategy based on the pseudotargeted analysis by integrating blank-wash, pooled quality control (QC) sample, and post-calibration for the large-scale metabolomics study. The performance of strategy was optimized from both pre- and post-acquisition sections including the selection of QC samples, insertion frequency of QC samples, and post-calibration methods. These results imply that the pseudotargeted method is rather stable and suitable for large-scale study of metabolic profiling. As a proof of concept, the proposed strategy was applied to the combination of 3 independent batches within a time span of 5 weeks, and generated about 54% of the features with coefficient of variations (CV) below 15%. Moreover, the stability and maximal capability of a single analytical batch could be extended to at least 282 injections (about 110 hr) while still providing excellent stability, the CV of 63% metabolic features was less than 15%. Taken together, the improved repeatability of our strategy provides a reliable protocol for large-scale metabolomics studies.
    No preview · Article · Feb 2016
  • [Show abstract] [Hide abstract]
    ABSTRACT: Metabolomics is increasingly applied to discover and validate metabolite biomarkers and illuminate biological variations. Combination of multiple analytical batches in large-scale and long-term metabolomics is commonly utilized to generate robust metabolomics data, but the gross and systematic errors are often observed. The appropriate calibration methods are required before statistical analyses. Here, we develop a novel correction strategy for large-scale and long-term metabolomics study, which could integrate metabolomics data from multiple batches and different instruments by calibrating gross and systematic errors. Gross error calibration method applied various statistical and fitting models of the feature ratios between two adjacent quality control (QC) samples to screen and calibrate outlier variables. Virtual QC of each sample was produced by a linear fitting model of the feature intensities between two neighboring QCs to obtain a correction factor and remove the systematic bias. The suggested method was applied to handle metabolic profiling data of 1197 plant samples in nine batches analyzed by two gas chromatography-mass spectrometry instruments. The method was evaluated by the relative standard deviations of all the detected peaks, the average Pearson correlation coefficients and Euclidean distance of QCs and non-QC replicates. The results showed the established approach outperforms the commonly used internal standard correction and total intensity signal correction methods, it could be used to integrate the metabolomics data from multiple analytical batches and instruments, and allows the frequency of QC to one injection of every 20 real samples. The suggested method makes a large amount of metabolomics analysis practicable.
    No preview · Article · Jan 2016 · Analytical Chemistry
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Early diagnosis of hepatocellular carcinoma (HCC) remains challenging to date. Characteristic metabolic deregulations of HCC may enable novel biomarkers discovery for early diagnosis. A capillary electrophoresis-time of flight mass spectrometry (CE-TOF/MS)-based metabolomics approach was performed to discover and validate potential biomarkers for HCC from the diethylnitrosamine-induced rat hepatocarcinogenesis model to human subjects. Time series sera from the animal model were evaluated using multivariate and univariate analyses to reveal dynamic metabolic changes. Two independent human cohorts (populations I and II) containing 122 human serum specimens were enrolled for validations. A novel biomarker pattern of ratio creatine/betaine which reflects the balance of methylation was identified. This biomarker pattern achieved effective classification of pre-HCC and HCC stages in animal model. It was still effective in the diagnosis of HCC from high-risk patients with cirrhotic nodules, achieving AUC values of 0.865 and 0.905 for two validation cohorts, respectively. The diagnosis of small HCC from cirrhosis with an AUC of 0.928 highlighted the potential for early diagnosis. This ratio biomarker can also improve the diagnostic performance of α-fetoprotein (AFP). This study demonstrates the efficacy of present strategy for biomarker discovery, and the potential of metabolomics approach to provide novel insights for disease study.
    Preview · Article · Nov 2015 · Scientific Reports
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic metabolomics studies can provide a systematic view of the metabolic trajectory during disease development and drug treatment and reveal the nature of biological processes at metabolic level. To extract important information in a systematic time dimension rather than at isolated time points, a weighted method based on the means and variations along the time points was proposed and first applied to previously published rat model data. The method was subsequently extended and applied to prospective metabolomics data analysis of hepatocellular carcinoma (HCC). Permutation was employed for noise filtering and false discovery rate (FDR) was used for parameter optimization during the feature selection. Long-term elevated serum bile acids were identified as risk factors for HCC development.
    Full-text · Article · Mar 2015 · Scientific Reports
  • Source
    Jun Yang · Xinjie Zhao · Xin Lu · Xiaohui Lin · Guowang Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Highlights Developed a data preprocessing strategy to cope with missing values and mask effects in data analysis from high variation of abundant metabolites. A new method- ‘x-VAST’ was developed to amend the measurement deviation enlargement. Applying the above strategy, several low abundant masked differential metabolites were rescued. Developed a data preprocessing strategy to cope with missing values and mask effects in data analysis from high variation of abundant metabolites. A new method- ‘x-VAST’ was developed to amend the measurement deviation enlargement. Applying the above strategy, several low abundant masked differential metabolites were rescued. Metabolomics is a booming research field. Its success highly relies on the discovery of differential metabolites by comparing different data sets (for example, patients vs. controls). One of the challenges is that differences of the low abundant metabolites between groups are often masked by the high variation of abundant metabolites. In order to solve this challenge, a novel data preprocessing strategy consisting of three steps was proposed in this study. In step 1, a ‘modified 80%’ rule was used to reduce effect of missing values; in step 2, unit-variance and Pareto scaling methods were used to reduce the mask effect from the abundant metabolites. In step 3, in order to fix the adverse effect of scaling, stability information of the variables deduced from intensity information and the class information, was used to assign suitable weights to the variables. When applying to an LC/MS based metabolomics dataset from chronic hepatitis B patients study and two simulated datasets, the mask effect was found to be partially eliminated and several new low abundant differential metabolites were rescued.
    Full-text · Article · Feb 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Classification is one of the most important tasks in machine learning. The ensemble classifier which consists of a number of basic classifiers is an efficient classification technique and has shown its effectiveness in many applications. The diversity and strength of the basic ones are two main elements which influence the performance of the ensemble classifier. Since different classification methods could capture the different discriminative information of the data by different classification criteria, using different classification techniques to build the basic ones could increase their diversity and strength. This paper proposes a new ensemble learning method which combines three different learning techniques to build the ensemble basic learners and adopts a double-layer voting method to enhance the strength and diversity of the basic ones, simultaneously. The new method is tested on six benchmark datasets from UCI machine learning repository. The experimental results show that the proposed method outperforms the other ensemble techniques and single classifiers in the classification accuracy in most cases.
    No preview · Conference Paper · Aug 2014
  • Xiaohui Lin · Jiuchong Gao · Lina Zhou · Peiyuan Yin · Guowang Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: In systems biology, the ability to discern meaningful information that reflects the nature of related problems from large amounts of data has become a key issue. The classification method using top scoring pairs (TSP), which measures the features of a data set in pairs and selects the top ranked feature pairs to construct the classifier, has been a powerful tool in genomics data analysis because of its simplicity and interpretability. This study examined the relationship between two features, modified the ranking criteria of the k-TSP method to measure the discriminative ability of each feature pair more accurately, and correspondingly, provided an improved classification procedure. Tests on eight public data sets showed the validity of the modified method. This modified k-TSP method was applied to our serum metabolomics data derived from liquid chromatography-mass spectrometry analysis of hepatocellular carcinoma and chronic liver diseases. Based on the 27 selected feature pairs, HCC and chronic liver diseases were accurately distinguished using the principal component analysis, and certain profound metabolic disturbances related to liver disease development were revealed by the feature pairs.
    No preview · Article · Jun 2014 · Journal of chromatography. B, Analytical technologies in the biomedical and life sciences
  • [Show abstract] [Hide abstract]
    ABSTRACT: A systematic approach for the fusion of associated ions from a common molecule was developed to generate 'one feature for one peak' metabolomics data. This approach guarantees that each molecule is equally selected as a potential biomarker, and may largely enhance the chance to obtain reliable findings without employing redundant ion information. The ion fusion is based on low mass variation in contrast to the theoretical calculation measured by a high-resolution mass spectrometer, such as LTQ orbitrap, and a high correlation of ion pairs from the same molecule. The mass characteristics of isotopic distribution, neutral loss and adduct ions were simultaneously applied to inspect each extracted ion in the range of a pre-defined retention time window. The correlation coefficient was computed with the corresponding intensities of each ion pair amongst all experimental samples. Serum metabolomics data for the investigation of hepatocellular carcinoma (HCC) and healthy controls were utilized as an example to demonstrate this strategy. In total, 609 and 1084 ion pairs were respectively found meeting one or more criteria for fusion, and therefore fused to 106 and 169 metabolite features of the datasets in the positive and negative modes, respectively. The important metabolite features were separately discovered and compared to distinguish the HCC from the healthy controls using the two datasets with and without ion fusion. The results show that the developed method can be an effective tool to process high-resolution mass spectrometry data in 'omics' studies.
    No preview · Article · Mar 2014 · Analytical Chemistry
  • Xirong Xu · Nan Cao · Yong Zhang · Liqing Gao · Xulu Peng · Xiaohui Lin
    [Show abstract] [Hide abstract]
    ABSTRACT: The spectra of Laplacian matrix of graph G consist of all eigenvalues of L(G). The Laplace spectra of folded hypercube Qfn are studied, which are important interconnection network topological structure. The n dimensional folded hypercube is an undirected graph obtained from n dimensional hypercube by adding all complementary edges. By means of Laplacian matrix An of Qn, the dual matrix of Laplace matrix Bn of Qfn is constructed as Cn=An-In* + In, the relationship between Bn and Cn is |Bn+1|=|Bn||Cn-4In| and the spectra of Laplace matrix Bn of Qfn are obtained.
    No preview · Article · Sep 2013 · Dalian Ligong Daxue Xuebao/Journal of Dalian University of Technology
  • [Show abstract] [Hide abstract]
    ABSTRACT: An L(2,1)-labeling of a graph G is an assignment of nonnegative integers to the vertices of G such that adjacent vertices get numbers at least two apart, and vertices at distance two get distinct numbers. The L(2,1)-labeling number of G, λ(G), is the minimum range of labels over all such labelings. In this paper, we determine the λ-numbers of flower snark and its related graphs for all n≥3.
    No preview · Article · Jul 2013 · Ars Combinatoria -Waterloo then Winnipeg-
  • [Show abstract] [Hide abstract]
    ABSTRACT: A graph G with vertex set V is said to have a prime labeling if its vertices can be labeled with distinct integers 1,2,⋯,|V| such that for every edge xy in E, the labels assigned to x and y are relatively prime or coprime. In this paper we show that the Knödel graph W 3,n is prime for n≤130.
    No preview · Article · Apr 2013 · Ars Combinatoria -Waterloo then Winnipeg-
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Investigations of complex metabolic mechanisms and networks have become a focus of research in the post-genomic area, thereby creating an increasing demand for sophisticated analytical approaches. One such tool are lipidomics analyses that provide a detailed picture of the lipid composition of a system at a given time. Introducing stable isotopes into the studied system can additionally provide information on the synthesis, transformation and degradation of individual lipid species. Capturing the entire dynamics of lipid networks, however, is still a challenge. We developed and evaluated a novel strategy for the in-depth analysis of the dynamics of lipid metabolism with the capacity for high molecular specificity and network coverage. The general workflow consists of stable isotope-labeling experiments, ultra high-performance liquid chromatography (UHPLC)-high resolution Orbitrap-MS lipid profiling and data processing by a software tool for global isotopomer filtering and matching. As a proof of concept, this approach was applied to the network-wide mapping of dynamic lipid metabolism in primary human skeletal muscle cells cultured for 4, 12 and 24 h with [U-13C]-palmitate. In the myocellular lipid extracts 692 isotopomers were detected that could be assigned to 203 labeled lipid species spanning 12 lipid (sub-) classes. Interestingly, some lipid classes showed high turnover rates but stable total amounts while the amount of others increased in the course of palmitate treatment. The novel strategy presented here has the potential to open new detailed insights into the dynamics of lipid metabolism that may lead to a better understanding of physiological mechanisms and metabolic perturbations.
    Full-text · Article · Mar 2013 · Analytical Chemistry
  • Source
    Yuansheng Yang · Baigong Zheng · Xirong Xu · Xiaohui Lin
    [Show abstract] [Hide abstract]
    ABSTRACT: The {\it crossing number} of a graph $G$ is the minimum number of pairwise intersections of edges in a drawing of $G$. In this paper, we study the crossing numbers of $K_{m}\times P_n$ and $K_{m}\times C_n$.
    Preview · Article · Nov 2012
  • Source
    Yuansheng Yang · Baigong Zheng · Xiaohui Lin · Xirong Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: The crossing number of a graph $G$ is the minimum number of pairwise intersections of edges among all drawings of $G$. In this paper, we study the crossing number of $K_{n,n}-nK_2$, $K_n\times P_2$, $K_n\times P_3$ and $K_n\times C_4$.
    Preview · Article · Nov 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: A graph with vertex set V is said to have a prime cordial labeling if there is a bijection f from V to {1,2,⋯,|V|} such that if each edge uv is assigned the label 1 for the greatest common divisor gcd(f(u),f(v))=1 and 0 for gcd(f(u),f(v))>1 then the number of edges labeled with 0 and the number of edges labeled with 1 differ by at most 1. In this paper we show that the flower snark and its related graphs are prime cordial for all n≥3.
    No preview · Article · Jul 2012 · Ars Combinatoria -Waterloo then Winnipeg-
  • [Show abstract] [Hide abstract]
    ABSTRACT: Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography-mass spectrometry (LC-MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.
    No preview · Article · May 2012 · Journal of chromatography. B, Analytical technologies in the biomedical and life sciences
  • Xiaohui Lin · Xiaolan Li · Niyi Xiao · Ping Ma · Jiaxue Jiang · Fufang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: Bayesian networks are efficient classification techniques, and widely applied in many fields, however, their structure learning is NP-hard. In this paper, a Bayesian network structure learning method called Tree-like Bayesian network (BN-TL) was proposed, which constructs the network by estimating the correlation between the features and the correlation between the class label and the features. Two metabolomics datasets about liver disease and five public datasets from the University of California at Irvine repository (UCI) were used to demonstrate the performance of BN-TL. The result shows that BN-TL outperforms the other three classifiers, including Naïve Bayesian classifier (NB), Bayesian network classifier whose structure is learned by using K2 greedy search strategy (BN-K2) and a method proposed by Kuschner in 2010 (BN-BMC) in most cases.
    No preview · Article · May 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Patients with chronic liver diseases (CLD) including chronic hepatitis B and hepatic cirrhosis (CIR) are the major high-risk population of hepatocellular carcinoma (HCC). The differential diagnosis between CLD and HCC is a challenge. This work aims to study the related metabolic deregulations in HCC and CLD to promote the discovery of the differential metabolites for distinguishing the different liver diseases. Serum metabolic profiling analysis from patients with CLD and HCC was performed using a liquid chromatography-mass spectrometry system. The acquired large amount of metabolic information was processed with the random forest-recursive feature elimination method to discover important metabolic changes. It was found that long-chain acylcarnitines accumulated, whereas free carnitine, medium and short-chain acylcarnitines decreased with the severity of the non-malignant liver diseases, accompanied with corresponding alterations of enzyme activities. However, the general changing extent was smaller in HCC than in CIR, possibly due to the special energy-consumption mechanism of tumor cells. These observations may help to understand the mechanism of HCC occurrence and progression on the metabolic level and provide information for the identification of early and differential metabolic markers for HCC.
    No preview · Article · Feb 2012 · Analytical and Bioanalytical Chemistry
  • Yong Li · Qiang Ruan · Yanli Li · Guozhu Ye · Xin Lu · Xiaohui Lin · Guowang Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Non-targeted metabolic profiling is the most widely used method for metabolomics. In this paper, a novel approach was established to transform a non-targeted metabolic profiling method to a pseudo-targeted method using the retention time locking gas chromatography/mass spectrometry-selected ion monitoring (RTL-GC/MS-SIM). To achieve this transformation, an algorithm based on the automated mass spectral deconvolution and identification system (AMDIS), GC/MS raw data and a bi-Gaussian chromatographic peak model was developed. The established GC/MS-SIM method was compared with GC/MS-full scan (the total ion current and extracted ion current, TIC and EIC) methods, it was found that for a typical tobacco leaf extract, 93% components had their relative standard deviations (RSDs) of relative peak areas less than 20% by the SIM method, while 88% by the EIC method and 81% by the TIC method. 47.3% components had their linear correlation coefficient higher than 0.99, compared with 5.0% by the EIC and 6.2% by TIC methods. Multivariate analysis showed the pooled quality control samples clustered more tightly using the developed method than using GC/MS-full scan methods, indicating a better data quality. With the analysis of the variance of the tobacco samples from three different planting regions, 167 differential components (p<0.05) were screened out using the RTL-GC/MS-SIM method, but 151 and 131 by the EIC and TIC methods, respectively. The results show that the developed method not only has a higher sensitivity, better linearity and data quality, but also does not need complicated peak alignment among different samples. It is especially suitable for the screening of differential components in the metabolic profiling investigation.
    No preview · Article · Feb 2012 · Journal of Chromatography A
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Metabolic markers are the core of metabonomic surveys. Hence selection of differential metabolites is of great importance for either biological or clinical purpose. Here, a feature selection method was developed for complex metabonomic data set. As an effective tool for metabonomics data analysis, support vector machine (SVM) was employed as the basic classifier. To find out meaningful features effectively, support vector machine recursive feature elimination (SVM-RFE) was firstly applied. Then, genetic algorithm (GA) and random forest (RF) which consider the interaction among the metabolites and independent performance of each metabolite in all samples, respectively, were used to obtain more informative metabolic difference and avoid the risk of false positive. A data set from plasma metabonomics study of rat liver diseases developed from hepatitis, cirrhosis to hepatocellular carcinoma was applied for the validation of the method. Besides the good classification results for 3 kinds of liver diseases, 31 important metabolites including lysophosphatidylethanolamine (LPE) C16:0, palmitoylcarnitine, lysophosphatidylethanolamine (LPC) C18:0 were also selected for further studies. A better complementary effect of the three feature selection methods could be seen from the current results. The combinational method also represented more differential metabolites and provided more metabolic information for a “global” understanding of diseases than any single method. Further more, this method is also suitable for other complex biological data sets. KeywordsSupport vector machine–Genetic algorithm–Random forest–Liver diseases–Metabonomics–Metabolomics
    Full-text · Article · Dec 2011 · Metabolomics