Xiaohui Lin

Dalian University of Technology, Lü-ta-shih, Liaoning, China

Are you Xiaohui Lin?

Claim your profile

Publications (38)46.09 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: In systems biology, the ability to discern meaningful information that reflects the nature of related problems from large amounts of data has become a key issue. The classification method using top scoring pairs (TSP), which measures the features of a data set in pairs and selects the top ranked feature pairs to construct the classifier, has been a powerful tool in genomics data analysis because of its simplicity and interpretability. This study examined the relationship between two features, modified the ranking criteria of the k-TSP method to measure the discriminative ability of each feature pair more accurately, and correspondingly, provided an improved classification procedure. Tests on eight public data sets showed the validity of the modified method. This modified k-TSP method was applied to our serum metabolomics data derived from liquid chromatography-mass spectrometry analysis of hepatocellular carcinoma and chronic liver diseases. Based on the 27 selected feature pairs, HCC and chronic liver diseases were accurately distinguished using the principal component analysis, and certain profound metabolic disturbances related to liver disease development were revealed by the feature pairs.
    Journal of chromatography. B, Analytical technologies in the biomedical and life sciences 06/2014; · 2.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A systematic approach for the fusion of associated ions from a common molecule was developed to generate 'one feature for one peak' metabolomics data. This approach guarantees that each molecule is equally selected as a potential biomarker, and may largely enhance the chance to obtain reliable findings without employing redundant ion information. The ion fusion is based on low mass variation in contrast to the theoretical calculation measured by a high-resolution mass spectrometer, such as LTQ orbitrap, and a high correlation of ion pairs from the same molecule. The mass characteristics of isotopic distribution, neutral loss and adduct ions were simultaneously applied to inspect each extracted ion in the range of a pre-defined retention time window. The correlation coefficient was computed with the corresponding intensities of each ion pair amongst all experimental samples. Serum metabolomics data for the investigation of hepatocellular carcinoma (HCC) and healthy controls were utilized as an example to demonstrate this strategy. In total, 609 and 1084 ion pairs were respectively found meeting one or more criteria for fusion, and therefore fused to 106 and 169 metabolite features of the datasets in the positive and negative modes, respectively. The important metabolite features were separately discovered and compared to distinguish the HCC from the healthy controls using the two datasets with and without ion fusion. The results show that the developed method can be an effective tool to process high-resolution mass spectrometry data in 'omics' studies.
    Analytical Chemistry 03/2014; · 5.70 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Investigations of complex metabolic mechanisms and networks have become a focus of research in the post-genomic area, thereby creating an increasing demand for sophisticated analytical approaches. One such tool are lipidomics analyses that provide a detailed picture of the lipid composition of a system at a given time. Introducing stable isotopes into the studied system can additionally provide information on the synthesis, transformation and degradation of individual lipid species. Capturing the entire dynamics of lipid networks, however, is still a challenge. We developed and evaluated a novel strategy for the in-depth analysis of the dynamics of lipid metabolism with the capacity for high molecular specificity and network coverage. The general workflow consists of stable isotope-labeling experiments, ultra high-performance liquid chromatography (UHPLC)-high resolution Orbitrap-MS lipid profiling and data processing by a software tool for global isotopomer filtering and matching. As a proof of concept, this approach was applied to the network-wide mapping of dynamic lipid metabolism in primary human skeletal muscle cells cultured for 4, 12 and 24 h with [U-13C]-palmitate. In the myocellular lipid extracts 692 isotopomers were detected that could be assigned to 203 labeled lipid species spanning 12 lipid (sub-) classes. Interestingly, some lipid classes showed high turnover rates but stable total amounts while the amount of others increased in the course of palmitate treatment. The novel strategy presented here has the potential to open new detailed insights into the dynamics of lipid metabolism that may lead to a better understanding of physiological mechanisms and metabolic perturbations.
    Analytical Chemistry 03/2013; · 5.70 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An L(2,1)-labeling of a graph G is an assignment of nonnegative integers to the vertices of G such that adjacent vertices get numbers at least two apart, and vertices at distance two get distinct numbers. The L(2,1)-labeling number of G, λ(G), is the minimum range of labels over all such labelings. In this paper, we determine the λ-numbers of flower snark and its related graphs for all n≥3.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2013; 110. · 0.28 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The spectra of Laplacian matrix of graph G consist of all eigenvalues of L(G). The Laplace spectra of folded hypercube Q fn are studied, which are important interconnection network topological structure. The n dimensional folded hypercube is an undirected graph obtained from n dimensional hypercube by adding all complementary edges. By means of Laplacian matrix A n of Q n , the dual matrix of Laplace matrix B n of Q fn is constructed as C n =A n -I n * +I n , the relationship between B n and C n is |B n+1 |=|B n ||C n -4I n | and the spectra of Laplace matrix B n of Q fn are obtained.
    Dalian Ligong Daxue Xuebao/Journal of Dalian University of Technology 01/2013; 53(5).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The crossing number of a graph $G$ is the minimum number of pairwise intersections of edges among all drawings of $G$. In this paper, we study the crossing number of $K_{n,n}-nK_2$, $K_n\times P_2$, $K_n\times P_3$ and $K_n\times C_4$.
    11/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The {\it crossing number} of a graph $G$ is the minimum number of pairwise intersections of edges in a drawing of $G$. In this paper, we study the crossing numbers of $K_{m}\times P_n$ and $K_{m}\times C_n$.
    11/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography-mass spectrometry (LC-MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.
    Journal of chromatography. B, Analytical technologies in the biomedical and life sciences 05/2012; · 2.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Patients with chronic liver diseases (CLD) including chronic hepatitis B and hepatic cirrhosis (CIR) are the major high-risk population of hepatocellular carcinoma (HCC). The differential diagnosis between CLD and HCC is a challenge. This work aims to study the related metabolic deregulations in HCC and CLD to promote the discovery of the differential metabolites for distinguishing the different liver diseases. Serum metabolic profiling analysis from patients with CLD and HCC was performed using a liquid chromatography-mass spectrometry system. The acquired large amount of metabolic information was processed with the random forest-recursive feature elimination method to discover important metabolic changes. It was found that long-chain acylcarnitines accumulated, whereas free carnitine, medium and short-chain acylcarnitines decreased with the severity of the non-malignant liver diseases, accompanied with corresponding alterations of enzyme activities. However, the general changing extent was smaller in HCC than in CIR, possibly due to the special energy-consumption mechanism of tumor cells. These observations may help to understand the mechanism of HCC occurrence and progression on the metabolic level and provide information for the identification of early and differential metabolic markers for HCC.
    Analytical and Bioanalytical Chemistry 02/2012; 403(1):203-13. · 3.66 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Non-targeted metabolic profiling is the most widely used method for metabolomics. In this paper, a novel approach was established to transform a non-targeted metabolic profiling method to a pseudo-targeted method using the retention time locking gas chromatography/mass spectrometry-selected ion monitoring (RTL-GC/MS-SIM). To achieve this transformation, an algorithm based on the automated mass spectral deconvolution and identification system (AMDIS), GC/MS raw data and a bi-Gaussian chromatographic peak model was developed. The established GC/MS-SIM method was compared with GC/MS-full scan (the total ion current and extracted ion current, TIC and EIC) methods, it was found that for a typical tobacco leaf extract, 93% components had their relative standard deviations (RSDs) of relative peak areas less than 20% by the SIM method, while 88% by the EIC method and 81% by the TIC method. 47.3% components had their linear correlation coefficient higher than 0.99, compared with 5.0% by the EIC and 6.2% by TIC methods. Multivariate analysis showed the pooled quality control samples clustered more tightly using the developed method than using GC/MS-full scan methods, indicating a better data quality. With the analysis of the variance of the tobacco samples from three different planting regions, 167 differential components (p<0.05) were screened out using the RTL-GC/MS-SIM method, but 151 and 131 by the EIC and TIC methods, respectively. The results show that the developed method not only has a higher sensitivity, better linearity and data quality, but also does not need complicated peak alignment among different samples. It is especially suitable for the screening of differential components in the metabolic profiling investigation.
    Journal of Chromatography A 02/2012; 1255:228-36. · 4.61 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bayesian networks are efficient classification techniques, and widely applied in many fields, however, their structure learning is NP-hard. In this paper, a Bayesian network structure learning method called Tree-like Bayesian network (BN-TL) was proposed, which constructs the network by estimating the correlation between the features and the correlation between the class label and the features. Two metabolomics datasets about liver disease and five public datasets from the University of California at Irvine repository (UCI) were used to demonstrate the performance of BN-TL. The result shows that BN-TL outperforms the other three classifiers, including Naïve Bayesian classifier (NB), Bayesian network classifier whose structure is learned by using K2 greedy search strategy (BN-K2) and a method proposed by Kuschner in 2010 (BN-BMC) in most cases.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: A graph with vertex set V is said to have a prime cordial labeling if there is a bijection f from V to {1,2,⋯,|V|} such that if each edge uv is assigned the label 1 for the greatest common divisor gcd(f(u),f(v))=1 and 0 for gcd(f(u),f(v))>1 then the number of edges labeled with 0 and the number of edges labeled with 1 differ by at most 1. In this paper we show that the flower snark and its related graphs are prime cordial for all n≥3.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2012; 105. · 0.28 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Discovery of differential metabolites is the focus of metabonomics study. It has very important applications in pathogenesis and disease classification. The aim of this work is to identify differential metabolites for classifying the patients with hepatocellular carcinoma, cirrhosis and hepatitis based on metabolic profiling data analyzed by gas chromatography-time of flight mass spectrometry. A two-stage feature selection algorithm, F-SVM, combining F-score in analysis of variance and support vector machine (SVM), was applied in discovering discriminative metabolites for three different types of liver diseases. The results show that the accuracy rate of the double cross-validation was 73.68±2.98%. 22 important differential metabolites selected by F-SVM were identified and related pathophysiological process of liver diseases was set forth. We conclude that F-SVM is quite feasible to be applied in the selection of biologically relevant features in metabonomics.
    Journal of Separation Science 09/2011; 34(21):3029-36. · 2.59 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Minimum redundancy maximum relevancy (mRMR) is one of the successful criteria used by many feature selection techniques to evaluate the discriminating abilities of the features. We combined dynamic sample space with mRMR and proposed a new feature selection method. In each iteration, the weighted mRMR values are calculated on dynamic sample space consisting of the current unlabelled samples. The feature with the largest weighted mRMR value among those which can improve the classification performance is preferred to be selected. Five public data sets were used to demonstrate the superiority of our method.
    Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Solution capacity limited estimation of distribution algorithm (L-EDA) is proposed and applied to ovarian cancer prognosis biomarker discovery to expatiate on its potential in metabonomics studies. Sera from healthy women, epithelial ovarian cancer (EOC), recurrent EOC and non-recurrent EOC patients were analyzed by liquid chromatography-mass spectrometry. The metabolite data were processed by L-EDA to discover potential EOC prognosis biomarkers. After L-EDA filtration, 78 out of 714 variables were selected, and the relationships among four groups were visualized by principle component analysis, it was observed that with the L-EDA filtered variables, non-recurrent EOC and recurrent EOC groups could be separated, which was not possible with the initial data. Five metabolites (six variables) with P
    Metabolomics 01/2011; · 4.43 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Assume we have a set of k colors and we assign an arbitrary subset of these colors to each vertex of a graph G. If we require that each vertex to which an empty set is assigned has in its neighborhood all k colors, then this assignment is called the k-rainbow dominating function of a graph G. The minimum sum of numbers of assigned colors over all vertices of G, denoted as γ rk (G), is called the k-rainbow domination number of G. In this paper, we prove that γ r2 (P(n,3))≥⌈7n 8⌉.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2011; 102. · 0.28 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Metabolic markers are the core of metabonomic surveys. Hence selection of differential metabolites is of great importance for either biological or clinical purpose. Here, a feature selection method was developed for complex metabonomic data set. As an effective tool for metabonomics data analysis, support vector machine (SVM) was employed as the basic classifier. To find out meaningful features effectively, support vector machine recursive feature elimination (SVM-RFE) was firstly applied. Then, genetic algorithm (GA) and random forest (RF) which consider the interaction among the metabolites and independent performance of each metabolite in all samples, respectively, were used to obtain more informative metabolic difference and avoid the risk of false positive. A data set from plasma metabonomics study of rat liver diseases developed from hepatitis, cirrhosis to hepatocellular carcinoma was applied for the validation of the method. Besides the good classification results for 3 kinds of liver diseases, 31 important metabolites including lysophosphatidylethanolamine (LPE) C16:0, palmitoylcarnitine, lysophosphatidylethanolamine (LPC) C18:0 were also selected for further studies. A better complementary effect of the three feature selection methods could be seen from the current results. The combinational method also represented more differential metabolites and provided more metabolic information for a “global” understanding of diseases than any single method. Further more, this method is also suitable for other complex biological data sets. KeywordsSupport vector machine–Genetic algorithm–Random forest–Liver diseases–Metabonomics–Metabolomics
    Metabolomics 01/2011; 7(4):549-558. · 4.43 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We applied the random forest method to discriminate among different kinds of cut tobacco. To overcome the influence of the descending resolution caused by column pollution and the subsequent deterioration of column efficacy at different testing times, we constructed combined peaks by summing the peaks over a specific elution time interval Deltat. On constructing tree classifiers, both the original peaks and the combined peaks were considered. A data set of 75 samples from three grades of the same tobacco brand was used to evaluate our method. Two parameters of the random forest were optimized using out-of-bag error, and the relationship between Deltat and classification rate was investigated. Experiments show that partial least squares discriminant analysis was not suitable because of the overfitting, and the random forest with the combined features performed more accurately than Naïve Bayes, support vector machines, bootstrap aggregating and the random forest using only its original features.
    Talanta 09/2010; 82(4):1571-5. · 3.50 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data clustering is an effective method for data analysis and pattern recognition which has been applied in many fields such as image segmentation, machine learning and data mining. It is the process of splitting the multidimensional data into several groupings or clusters based on some similarity measures. A cluster is usually defined by a cluster center. Generally, the information of the features may differ from each other and the contributions to the clustering are different. The most meaningful features play an important role in explaining the differences among the samples, thus should be pay a more attention in the clustering process to get a exact grouping. In order to reflect the particular contributions of the features, this paper proposed a new features weighted affinity propagation clustering (AP) algorithm. In this method, all the features are evaluated by a feature analysis method. The training samples are those which are near to the centers in each group according to the affinity propagation cluster result. The similarity matrix of AP is updated on the weighted features and the new cluster result is obtained. The radius by which the training samples are determined is a very important parameter in our method. We study it by means of the sum of weighted distance between any two samples clustered in the same group corresponding to the cluster result. In order to demonstrate our method, three public data sets from UCI were used. The experiment results on the three dataset showed the superiority of the features weighted AP method.
    01/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: A (d,1)-total labelling of a graph G is an assignment of integers to V(G)∪E(G) such that: (i) any two adjacent vertices of G receive distinct integers, (ii) any two adjacent edges of G receive distinct integers, and (iii) a vertex and its incident edge receive integers that differ by at least d in the absolute value. The span of a (d,1)-total labelling is the maximum difference between two labels. The minimum span of labels required for such a (d,1)-total labelling of G is called the (d,1)-total number and is denoted by λ d T (G). In this paper, we prove that λ d T (G)≥d+r+1 for r-regular nonbipartite graphs with d≥r≥3 and determine the (d,1)-total numbers of flower snarks and of quasi flower snarks.
    Ars Combinatoria -Waterloo then Winnipeg- 01/2010; 96. · 0.28 Impact Factor

Publication Stats

62 Citations
46.09 Total Impact Points

Institutions

  • 2005–2014
    • Dalian University of Technology
      Lü-ta-shih, Liaoning, China
  • 2012
    • Northeast Institute of Geography and Agroecology
      • Laboratory of Analytical Chemistry for Life Science
      Beijing, Beijing Shi, China