Xiaohui Lin

Dalian University of Technology, Lü-ta-shih, Liaoning, China

Are you Xiaohui Lin?

Claim your profile

Publications (41)49.5 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic metabolomics studies can provide a systematic view of the metabolic trajectory during disease development and drug treatment and reveal the nature of biological processes at metabolic level. To extract important information in a systematic time dimension rather than at isolated time points, a weighted method based on the means and variations along the time points was proposed and first applied to previously published rat model data. The method was subsequently extended and applied to prospective metabolomics data analysis of hepatocellular carcinoma (HCC). Permutation was employed for noise filtering and false discovery rate (FDR) was used for parameter optimization during the feature selection. Long-term elevated serum bile acids were identified as risk factors for HCC development.
    Scientific Reports 03/2015; 5:8984. DOI:10.1038/srep08984 · 5.08 Impact Factor
  • 02/2015; 2. DOI:10.3389/fmolb.2015.00004
  • [Show abstract] [Hide abstract]
    ABSTRACT: In systems biology, the ability to discern meaningful information that reflects the nature of related problems from large amounts of data has become a key issue. The classification method using top scoring pairs (TSP), which measures the features of a data set in pairs and selects the top ranked feature pairs to construct the classifier, has been a powerful tool in genomics data analysis because of its simplicity and interpretability. This study examined the relationship between two features, modified the ranking criteria of the k-TSP method to measure the discriminative ability of each feature pair more accurately, and correspondingly, provided an improved classification procedure. Tests on eight public data sets showed the validity of the modified method. This modified k-TSP method was applied to our serum metabolomics data derived from liquid chromatography-mass spectrometry analysis of hepatocellular carcinoma and chronic liver diseases. Based on the 27 selected feature pairs, HCC and chronic liver diseases were accurately distinguished using the principal component analysis, and certain profound metabolic disturbances related to liver disease development were revealed by the feature pairs.
    Journal of chromatography. B, Analytical technologies in the biomedical and life sciences 06/2014; DOI:10.1016/j.jchromb.2014.05.044 · 2.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A systematic approach for the fusion of associated ions from a common molecule was developed to generate 'one feature for one peak' metabolomics data. This approach guarantees that each molecule is equally selected as a potential biomarker, and may largely enhance the chance to obtain reliable findings without employing redundant ion information. The ion fusion is based on low mass variation in contrast to the theoretical calculation measured by a high-resolution mass spectrometer, such as LTQ orbitrap, and a high correlation of ion pairs from the same molecule. The mass characteristics of isotopic distribution, neutral loss and adduct ions were simultaneously applied to inspect each extracted ion in the range of a pre-defined retention time window. The correlation coefficient was computed with the corresponding intensities of each ion pair amongst all experimental samples. Serum metabolomics data for the investigation of hepatocellular carcinoma (HCC) and healthy controls were utilized as an example to demonstrate this strategy. In total, 609 and 1084 ion pairs were respectively found meeting one or more criteria for fusion, and therefore fused to 106 and 169 metabolite features of the datasets in the positive and negative modes, respectively. The important metabolite features were separately discovered and compared to distinguish the HCC from the healthy controls using the two datasets with and without ion fusion. The results show that the developed method can be an effective tool to process high-resolution mass spectrometry data in 'omics' studies.
    Analytical Chemistry 03/2014; 86(8). DOI:10.1021/ac500878x · 5.83 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Investigations of complex metabolic mechanisms and networks have become a focus of research in the post-genomic area, thereby creating an increasing demand for sophisticated analytical approaches. One such tool are lipidomics analyses that provide a detailed picture of the lipid composition of a system at a given time. Introducing stable isotopes into the studied system can additionally provide information on the synthesis, transformation and degradation of individual lipid species. Capturing the entire dynamics of lipid networks, however, is still a challenge. We developed and evaluated a novel strategy for the in-depth analysis of the dynamics of lipid metabolism with the capacity for high molecular specificity and network coverage. The general workflow consists of stable isotope-labeling experiments, ultra high-performance liquid chromatography (UHPLC)-high resolution Orbitrap-MS lipid profiling and data processing by a software tool for global isotopomer filtering and matching. As a proof of concept, this approach was applied to the network-wide mapping of dynamic lipid metabolism in primary human skeletal muscle cells cultured for 4, 12 and 24 h with [U-13C]-palmitate. In the myocellular lipid extracts 692 isotopomers were detected that could be assigned to 203 labeled lipid species spanning 12 lipid (sub-) classes. Interestingly, some lipid classes showed high turnover rates but stable total amounts while the amount of others increased in the course of palmitate treatment. The novel strategy presented here has the potential to open new detailed insights into the dynamics of lipid metabolism that may lead to a better understanding of physiological mechanisms and metabolic perturbations.
    Analytical Chemistry 03/2013; 85(9). DOI:10.1021/ac400293y · 5.83 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An L(2,1)-labeling of a graph G is an assignment of nonnegative integers to the vertices of G such that adjacent vertices get numbers at least two apart, and vertices at distance two get distinct numbers. The L(2,1)-labeling number of G, λ(G), is the minimum range of labels over all such labelings. In this paper, we determine the λ-numbers of flower snark and its related graphs for all n≥3.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2013; 110. · 0.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The spectra of Laplacian matrix of graph G consist of all eigenvalues of L(G). The Laplace spectra of folded hypercube Q fn are studied, which are important interconnection network topological structure. The n dimensional folded hypercube is an undirected graph obtained from n dimensional hypercube by adding all complementary edges. By means of Laplacian matrix A n of Q n , the dual matrix of Laplace matrix B n of Q fn is constructed as C n =A n -I n * +I n , the relationship between B n and C n is |B n+1 |=|B n ||C n -4I n | and the spectra of Laplace matrix B n of Q fn are obtained.
    Dalian Ligong Daxue Xuebao/Journal of Dalian University of Technology 01/2013; 53(5).
  • [Show abstract] [Hide abstract]
    ABSTRACT: A graph G with vertex set V is said to have a prime labeling if its vertices can be labeled with distinct integers 1,2,⋯,|V| such that for every edge xy in E, the labels assigned to x and y are relatively prime or coprime. In this paper we show that the Knödel graph W 3,n is prime for n≤130.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2013; 109. · 0.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The crossing number of a graph $G$ is the minimum number of pairwise intersections of edges among all drawings of $G$. In this paper, we study the crossing number of $K_{n,n}-nK_2$, $K_n\times P_2$, $K_n\times P_3$ and $K_n\times C_4$.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The {\it crossing number} of a graph $G$ is the minimum number of pairwise intersections of edges in a drawing of $G$. In this paper, we study the crossing numbers of $K_{m}\times P_n$ and $K_{m}\times C_n$.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography-mass spectrometry (LC-MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.
    Journal of chromatography. B, Analytical technologies in the biomedical and life sciences 05/2012; 910. DOI:10.1016/j.jchromb.2012.05.020 · 2.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Patients with chronic liver diseases (CLD) including chronic hepatitis B and hepatic cirrhosis (CIR) are the major high-risk population of hepatocellular carcinoma (HCC). The differential diagnosis between CLD and HCC is a challenge. This work aims to study the related metabolic deregulations in HCC and CLD to promote the discovery of the differential metabolites for distinguishing the different liver diseases. Serum metabolic profiling analysis from patients with CLD and HCC was performed using a liquid chromatography-mass spectrometry system. The acquired large amount of metabolic information was processed with the random forest-recursive feature elimination method to discover important metabolic changes. It was found that long-chain acylcarnitines accumulated, whereas free carnitine, medium and short-chain acylcarnitines decreased with the severity of the non-malignant liver diseases, accompanied with corresponding alterations of enzyme activities. However, the general changing extent was smaller in HCC than in CIR, possibly due to the special energy-consumption mechanism of tumor cells. These observations may help to understand the mechanism of HCC occurrence and progression on the metabolic level and provide information for the identification of early and differential metabolic markers for HCC.
    Analytical and Bioanalytical Chemistry 02/2012; 403(1):203-13. DOI:10.1007/s00216-012-5782-4 · 3.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Non-targeted metabolic profiling is the most widely used method for metabolomics. In this paper, a novel approach was established to transform a non-targeted metabolic profiling method to a pseudo-targeted method using the retention time locking gas chromatography/mass spectrometry-selected ion monitoring (RTL-GC/MS-SIM). To achieve this transformation, an algorithm based on the automated mass spectral deconvolution and identification system (AMDIS), GC/MS raw data and a bi-Gaussian chromatographic peak model was developed. The established GC/MS-SIM method was compared with GC/MS-full scan (the total ion current and extracted ion current, TIC and EIC) methods, it was found that for a typical tobacco leaf extract, 93% components had their relative standard deviations (RSDs) of relative peak areas less than 20% by the SIM method, while 88% by the EIC method and 81% by the TIC method. 47.3% components had their linear correlation coefficient higher than 0.99, compared with 5.0% by the EIC and 6.2% by TIC methods. Multivariate analysis showed the pooled quality control samples clustered more tightly using the developed method than using GC/MS-full scan methods, indicating a better data quality. With the analysis of the variance of the tobacco samples from three different planting regions, 167 differential components (p<0.05) were screened out using the RTL-GC/MS-SIM method, but 151 and 131 by the EIC and TIC methods, respectively. The results show that the developed method not only has a higher sensitivity, better linearity and data quality, but also does not need complicated peak alignment among different samples. It is especially suitable for the screening of differential components in the metabolic profiling investigation.
    Journal of Chromatography A 02/2012; 1255:228-36. DOI:10.1016/j.chroma.2012.01.076 · 4.26 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A graph with vertex set V is said to have a prime cordial labeling if there is a bijection f from V to {1,2,⋯,|V|} such that if each edge uv is assigned the label 1 for the greatest common divisor gcd(f(u),f(v))=1 and 0 for gcd(f(u),f(v))>1 then the number of edges labeled with 0 and the number of edges labeled with 1 differ by at most 1. In this paper we show that the flower snark and its related graphs are prime cordial for all n≥3.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2012; 105. · 0.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bayesian networks are efficient classification techniques, and widely applied in many fields, however, their structure learning is NP-hard. In this paper, a Bayesian network structure learning method called Tree-like Bayesian network (BN-TL) was proposed, which constructs the network by estimating the correlation between the features and the correlation between the class label and the features. Two metabolomics datasets about liver disease and five public datasets from the University of California at Irvine repository (UCI) were used to demonstrate the performance of BN-TL. The result shows that BN-TL outperforms the other three classifiers, including Naïve Bayesian classifier (NB), Bayesian network classifier whose structure is learned by using K2 greedy search strategy (BN-K2) and a method proposed by Kuschner in 2010 (BN-BMC) in most cases.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Metabolic markers are the core of metabonomic surveys. Hence selection of differential metabolites is of great importance for either biological or clinical purpose. Here, a feature selection method was developed for complex metabonomic data set. As an effective tool for metabonomics data analysis, support vector machine (SVM) was employed as the basic classifier. To find out meaningful features effectively, support vector machine recursive feature elimination (SVM-RFE) was firstly applied. Then, genetic algorithm (GA) and random forest (RF) which consider the interaction among the metabolites and independent performance of each metabolite in all samples, respectively, were used to obtain more informative metabolic difference and avoid the risk of false positive. A data set from plasma metabonomics study of rat liver diseases developed from hepatitis, cirrhosis to hepatocellular carcinoma was applied for the validation of the method. Besides the good classification results for 3 kinds of liver diseases, 31 important metabolites including lysophosphatidylethanolamine (LPE) C16:0, palmitoylcarnitine, lysophosphatidylethanolamine (LPC) C18:0 were also selected for further studies. A better complementary effect of the three feature selection methods could be seen from the current results. The combinational method also represented more differential metabolites and provided more metabolic information for a “global” understanding of diseases than any single method. Further more, this method is also suitable for other complex biological data sets. KeywordsSupport vector machine–Genetic algorithm–Random forest–Liver diseases–Metabonomics–Metabolomics
    Metabolomics 12/2011; 7(4):549-558. DOI:10.1007/s11306-011-0274-7 · 3.97 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Solution capacity limited estimation of distribution algorithm (L-EDA) is proposed and applied to ovarian cancer prognosis biomarker discovery to expatiate on its potential in metabonomics studies. Sera from healthy women, epithelial ovarian cancer (EOC), recurrent EOC and non-recurrent EOC patients were analyzed by liquid chromatography-mass spectrometry. The metabolite data were processed by L-EDA to discover potential EOC prognosis biomarkers. After L-EDA filtration, 78 out of 714 variables were selected, and the relationships among four groups were visualized by principle component analysis, it was observed that with the L-EDA filtered variables, non-recurrent EOC and recurrent EOC groups could be separated, which was not possible with the initial data. Five metabolites (six variables) with P
    Metabolomics 12/2011; DOI:10.1007/s11306-011-0286-3 · 3.97 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Discovery of differential metabolites is the focus of metabonomics study. It has very important applications in pathogenesis and disease classification. The aim of this work is to identify differential metabolites for classifying the patients with hepatocellular carcinoma, cirrhosis and hepatitis based on metabolic profiling data analyzed by gas chromatography-time of flight mass spectrometry. A two-stage feature selection algorithm, F-SVM, combining F-score in analysis of variance and support vector machine (SVM), was applied in discovering discriminative metabolites for three different types of liver diseases. The results show that the accuracy rate of the double cross-validation was 73.68±2.98%. 22 important differential metabolites selected by F-SVM were identified and related pathophysiological process of liver diseases was set forth. We conclude that F-SVM is quite feasible to be applied in the selection of biologically relevant features in metabonomics.
    Journal of Separation Science 11/2011; 34(21):3029-36. DOI:10.1002/jssc.201100408 · 2.59 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Minimum redundancy maximum relevancy (mRMR) is one of the successful criteria used by many feature selection techniques to evaluate the discriminating abilities of the features. We combined dynamic sample space with mRMR and proposed a new feature selection method. In each iteration, the weighted mRMR values are calculated on dynamic sample space consisting of the current unlabelled samples. The feature with the largest weighted mRMR value among those which can improve the classification performance is preferred to be selected. Five public data sets were used to demonstrate the superiority of our method.
    Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Assume we have a set of k colors and we assign an arbitrary subset of these colors to each vertex of a graph G. If we require that each vertex to which an empty set is assigned has in its neighborhood all k colors, then this assignment is called the k-rainbow dominating function of a graph G. The minimum sum of numbers of assigned colors over all vertices of G, denoted as γ rk (G), is called the k-rainbow domination number of G. In this paper, we prove that γ r2 (P(n,3))≥⌈7n 8⌉.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2011; 102. · 0.20 Impact Factor