Qingping Tao

Università degli Studi di Torino, Torino, Piedmont, Italy

Are you Qingping Tao?

Claim your profile

Publications (29)46.69 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Comprehensive two-dimensional chromatography is a powerful technology for analyzing the patterns of constituent compounds in complex samples, but matching chromatographic features for comparative analysis across large sample sets is difficult. Various methods have been described for pairwise peak matching between two chromatograms, but the peaks indicated by these pairwise matches commonly are incomplete or inconsistent across many chromatograms. This paper describes a new, automated method for post-processing the results of pairwise peak matching to address incomplete and inconsistent peak matches and thereby select chromatographic peaks that reliably correspond across many chromatograms. Reliably corresponding peaks can be used both for directly comparing relative compositions across large numbers of samples and for aligning chromatographic data for comprehensive comparative analyses. To select reliable features for a set of chromatograms, the Consistent Cliques Method (CCM) represents all peaks from all chromatograms and all pairwise peak matches in a graph, finds the maximal cliques, and then combines cliques with shared peaks to extract reliable features. The parameters of CCM are the minimum number of chromatograms with complete pairwise peak matches and the desired number of reliable peaks. A particular threshold for the minimum number of chromatograms with complete pairwise matches ensures that there are no conflicts among the pairwise matches for reliable peaks. Experimental results with samples of complex bio-oils analyzed by comprehensive two-dimensional gas chromatography (GCxGC) coupled with mass spectrometry (GCxGC-MS) indicate that CCM provides a good foundation for comparative analysis of complex chemical mixtures.
    Analytical Chemistry 04/2013; · 5.70 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This review surveys different approaches for generating features from comprehensive two-dimensional chromatography for non-targeted cross-sample analysis. The goal of non-targeted cross-sample analysis is to discover relevant chemical characteristics (such as compositional similarities or differences) from multiple samples. In non-targeted analysis, the relevant characteristics are unknown, so individual features for all chemical constituents should be analyzed, not just those for targeted or selected analytes. Cross-sample analysis requires matching the corresponding features that characterize each constituent across multiple samples so that relevant characteristics or patterns can be recognized. Non-targeted, cross-sample analysis requires generating and matching all features across all samples. Applications of non-targeted cross-sample analysis include sample classification, chemical fingerprinting, monitoring, sample clustering, and chemical marker discovery. Comprehensive two-dimensional chromatography is a powerful technology for separating complex samples and so is well suited for non-targeted cross-sample analysis. However, two-dimensional chromatographic data is typically large and complex, so the computational tasks of extracting and matching features for pattern recognition are challenging. This review examines five general approaches that researchers have applied to these difficult problems: visual image comparisons, datapoint feature analysis, peak feature analysis, region feature analysis, and peak-region feature analysis.
    Journal of Chromatography A 07/2011; 1226:140-8. · 4.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Comprehensive two-dimensional gas chromatography (GC×GC) is a powerful technology for separating complex samples. The typical goal of GC×GC peak detection is to aggregate data points of analyte peaks based on their retention times and intensities. Two techniques commonly used for two-dimensional peak detection are the two-step algorithm and the watershed algorithm. A recent study [4] compared the performance of the two-step and watershed algorithms for GC×GC data with retention-time shifts in the second-column separations. In that analysis, the peak retention-time shifts were corrected while applying the two-step algorithm but the watershed algorithm was applied without shift correction. The results indicated that the watershed algorithm has a higher probability of erroneously splitting a single two-dimensional peak than the two-step approach. This paper reconsiders the analysis by comparing peak-detection performance for resolved peaks after correcting retention-time shifts for both the two-step and watershed algorithms. Simulations with wide-ranging conditions indicate that when shift correction is employed with both algorithms, the watershed algorithm detects resolved peaks with greater accuracy than the two-step method.
    Journal of Chromatography A 07/2011; 1218(38):6792-8. · 4.61 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We successfully detected halogenated compounds from several kinds of environmental samples by using a comprehensive two-dimensional gas chromatograph coupled with a tandem mass spectrometer (GC×GC-MS/MS). For the global detection of organohalogens, fly ash sample extracts were directly measured without any cleanup process. The global and selective detection of halogenated compounds was achieved by neutral loss scans of chlorine, bromine and/or fluorine using an MS/MS. It was also possible to search for and identify compounds using two-dimensional mass chromatograms and mass profiles obtained from measurements of the same sample with a GC×GC-high resolution time-of-flight mass spectrometer (HRTofMS) under the same conditions as those used for the GC×GC-MS/MS. In this study, novel software tools were also developed to help find target (halogenated) compounds in the data provided by a GC×GC-HRTofMS. As a result, many dioxin and polychlorinated biphenyl congeners and many other halogenated compounds were found in fly ash extract and sediment samples. By extracting the desired information, which concerned organohalogens in this study, from huge quantities of data with the GC×GC-HRTofMS, we reveal the possibility of realizing the total global detection of compounds with one GC measurement of a sample without any pre-treatment.
    Journal of Chromatography A 06/2011; 1218(24):3799-810. · 4.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography (GCxGC) and high-resolution mass spectrometry (HRMS). GCxGC-HRMS analysis produces large data sets that are rich with information, but highly complex. The size of the data and volume of information requires automated processing for comprehensive cross-sample analysis, but the complexity poses a challenge for developing robust methods. The approach developed here analyzes GCxGC-HRMS data from multiple samples to extract a feature template that comprehensively captures the pattern of peaks detected in the retention-times plane. Then, for each sample chromatogram, the template is geometrically transformed to align with the detected peak pattern and generate a set of feature measurements for cross-sample analyses such as sample classification and biomarker discovery. The approach avoids the intractable problem of comprehensive peak matching by using a few reliable peaks for alignment and peak-based retention-plane windows to define comprehensive features that can be reliably matched for cross-sample analysis. The informatics are demonstrated with a set of 18 samples from breast-cancer tumors, each from different individuals, six each for Grades 1-3. The features allow classification that matches grading by a cancer pathologist with 78% success in leave-one-out cross-validation experiments. The HRMS signatures of the features of interest can be examined for determining elemental compositions and identifying compounds.
    Talanta 01/2011; 83(4):1279-88. · 3.50 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This study examined how advanced fingerprinting methods (i.e., non-targeted methods) provide reliable and specific information about groups of samples based on their component distribution on the GC x GC chromatographic plane. The volatile fractions of roasted hazelnuts (Corylus avellana L.) from nine different geographical origins, comparably roasted for desirable flavor and texture, were sampled by headspace-solid phase micro extraction (HS-SPME) and then analyzed by GC x GC-qMS. The resulting patterns were processed by: (a) "chromatographic fingerprinting", i.e., a pattern recognition procedure based on retention-time criteria, where peaks correspondences were established through a comprehensive peak pattern covering the chromatographic plane; and (b) "comprehensive template matching" with reliable peak matching, where peak correspondences were constrained by retention time and MS fragmentation pattern similarity criteria. Fingerprinting results showed how the discrimination potential of GC x GC can be increased by including in sample comparisons and correlations all the detected components and, in addition, provide reliable results in a comparative analysis by locating compounds with a significant role. Results were completed by a chemical speciation of volatiles and sample profiling was extended to known markers whose distribution can be correlated to sensory properties, geographical origin, or the effect of thermal treatment on different classes of compounds. The comprehensive approach for data interpretation here proposed may be useful to assess product specificity and quality, through measurable parameters strictly and consistently correlated to sensory properties and origin.
    Journal of Chromatography A 09/2010; 1217(37):5848-58. · 4.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Comprehensive two-dimensional LC (LC x LC) is a powerful tool for analysis of complex biological samples. With its multidimensional separation power and increased peak capacity, LC x LC generates information-rich, but complex, chromatograms, which require advanced data analysis to produce useful information. An important analytical challenge is to classify samples on the basis of chromatographic features, e.g., to extract and utilize biomarkers indicative of health conditions, such as disease or response to therapy. This study presents a new approach to extract comprehensive non-target chromatographic features from a set of LC x LC chromatograms for sample classification. Experimental results with urine samples indicate that the chromatographic features generated by this approach can be used to effectively classify samples. Based on the extracted features, a support vector machine successfully classified urine samples by individual, before/after procedure, and concentration with leave-one-out and replicate K-fold cross-validation. The new method for comprehensive chromatographic feature analysis of LC x LC separations provides a potentially powerful tool for classifying complex biological samples.
    Journal of Separation Science 06/2010; 33(10):1365-74. · 2.59 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The present study examines the ability of targeted and non-targeted methods to provide specific and complementary information on groups of samples on the basis of their component distribution on the two-dimensional gas chromatography (GCxGC) plane. The volatile fraction of Arabica green and roasted coffee samples differing in geographical origins and roasting treatments and the volatile fraction from juniper needles, sampled by headspace-solid phase microextraction, were analyzed by GCxGC-qMS and sample profiles processed by different approaches. In the target analysis profiling, samples submitted to different roasting cycles and/or differing in origin and post-harvest treatment are characterized on the basis of known constituents (botanical, technological, and/or aromatic markers). This approach provides highly reliable results on quali-quantitative compositional differences because of the authentic standard confirmation, extending and improving the specificity of the comparative procedure to trace and minor components. On the other hand, non-targeted data-processing methods (e.g., direct image comparison and template-based fingerprinting) include in the sample comparisons and correlations all detected sample components, offering an increased discrimination potential by identifying compounds that are comparatively significant but not known targets. Results demonstrate the ability of GCxGC to explore in depth the complexity of samples and emphasize the advantages of a comprehensive and multidisciplinary approach to improve the level of information provided by GCxGC separation.
    Journal of chromatographic science 04/2010; 48(4):251-61. · 0.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Emerging technologies for chemical imaging provide high-resolution three-dimensional (3D) surveys with high-precision mass spectrometry (MS), promising to open unprecedented vistas for understanding complex phenomena such as cellular metabolism. However, there are critical challenges in transforming the large, complex, multidimensional, multispectral data sets into useful chemical information for biological research and other applications. This paper describes new informatics for advanced interactive spatio-spectral analysis of three-dimensional mass-spectral (3DxMS) chemical images. The technical challenges for interactive informatics are rapid access to large datasets, visualization of 3D hyperspectral images, and pattern recognition for spatio-spectral mapping. This paper describes an effective compression method for time-of-flight secondary ion mass spectrometry (ToF-SIMS) data that provides rapid spatial-spectral access; a framework for 3DxMS visualization that supports multiple views with multiple layers of information; and a suite of pattern recognition tools for spatio-spectral drawing, clustering, and classification. Copyright © 2010 John Wiley & Sons, Ltd.
    Surface and Interface Analysis 01/2010; · 1.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Interactive visualization of data from a new generation of chemical imaging systems requires coding that is efficient and accessible. New technologies for secondary ion mass spectrometry (SIMS) generate large three-dimensional, hyperspectral datasets with high spatial and spectral resolution. Interactive visualization is important for chemical analysis, but the raw dataset size exceeds the memory capacities of typical current computer systems and is a significant obstacle. This paper reports the development of a lossless coding method that is memory efficient, enabling large SIMS datasets to be held in fast memory, and supports quick access for interactive visualization. The approach provides pixel indexing, as required for chemical imaging applications, and is based on the statistical characteristics of the data. The method uses differential time-of-flight to effect mass-spectral run-length-encoding and uses a scheme for variable-length, byte-unit representations for both mass-spectral time-of-flight and intensity values. Experiments demonstrate high compression rates and fast access.
    Rapid Communications in Mass Spectrometry 04/2009; 23(9):1229-33. · 2.51 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The multiple-instance learning (MIL) model has been successful in numerous application areas. Recently, a generalization of this model and an algorithm for it were introduced, showing significant advantages over the conventional MIL model on certain application areas. Unfortunately, that algorithm is not scalable to high dimensions. We adapt that algorithm to one using a support vector machine with our new kernel k\wedge. This reduces the time complexity from exponential in the dimension to polynomial. Computing our new kernel is equivalent to counting the number of boxes in a discrete, bounded space that contain at least one point from each of two multisets. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) for it. We then extend k\wedge by enriching its representation into a new kernel kmin, and also consider a normalized version of k\wedge that we call k\wedge/\vee (which may or may not not be a kernel, but whose approximation yielded positive semidefinite Gram matrices in practice). We then empirically evaluate all three measures on data from content-based image retrieval, biological sequence analysis, and the musk data sets. We found that our kernels performed well on all data sets relative to algorithms in the conventional MIL model.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2009; 30(12):2084-98. · 4.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Comprehensive two-dimensional liquid chromatography (LC × LC) generates information-rich but complex peak patterns that require automated processing for rapid chemical identification and classification. This paper describes a powerful approach and specific methods for peak pattern matching to identify and classify constituent peaks in data from LC × LC and other multidimensional chemical separations. The approach records a prototypical pattern of peaks with retention times and associated metadata, such as chemical identities and classes, in a template. Then, the template pattern is matched to the detected peaks in subsequent data and the metadata are copied from the template to identify and classify the matched peaks. Smart Templates employ rule-based constraints (e.g., multispectral matching) to increase matching accuracy. Experimental results demonstrate Smart Templates, with the combination of retention-time pattern matching and multispectral constraints, are accurate and robust with respect to changes in peak patterns associated with variable chromatographic conditions.
    Journal of Chromatography A. 01/2009;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: New technologies for Secondary Ion Mass Spectrometry (SIMS) produce three-dimensional hyperspectral chemical images with high spatial resolution and fine mass-spectral precision. SIMS imaging of biological tissues and cells promises to provide an informational basis for important advances in a wide variety of applications, including cancer treatments. However, the volume and complexity of data pose significant challenges for interactive visualization and analysis. This paper describes new methods and tools for computer-based visualization and analysis of SIMS data, including a coding scheme for efficient storage and fast access, interactive interfaces for visualizing and operating on three-dimensional hyperspectral images, and spatio-spectral clustering and classification.
    Visual Information Processing XVIII, 14 April 2009, Orlando, Florida, USA; 01/2009
  • Source
    Qingping Tao, Stephen D. Scott
    [Show abstract] [Hide abstract]
    ABSTRACT: A Markov chain Monte Carlo method has previously been introduced to estimate weighted sums in multiplicative weight update algorithms when the number of inputs is exponential. However, the original algorithm still required extensive simulation of the Markov chain in order to get accurate estimates of the weighted sums. We propose an optimized version of the original algorithm that produces exactly the same classifications while often using fewer Markov chain simulations. We also apply three other sampling techniques and empirically compare them with the original Metropolis sampler to determine how effective each is in drawing good samples in the least amount of time, in terms of accuracy of weighted sum estimates and in terms of Winnow’s prediction accuracy. We found that two other samplers (Gibbs and Metropolized Gibbs) were slightly better than Metropolis in their estimates of the weighted sums. For prediction errors, there is little difference between any pair of MCMC techniques we tested. Also, on the data sets we tested, we discovered that all approximations of Winnow have no disadvantage when compared to brute force Winnow (where weighted sums are exactly computed), so generalization accuracy is not compromised by our approximation. This is true even when very small sample sizes and mixing times are used.
    Machine Learning 01/2008; 73:107-132. · 1.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop a method for automatic colorization of im- ages (or two-dimensional fields) in order to visualize pixel values and their local differences. In many applications, local differences in pixel values are as important as their values. For example, in topog- raphy, both elevation and slope often must be considered. Gradient- based value mapping (GBVM) is a technique for colorizing pixels based on value (e.g., intensity or elevation) and gradient (e.g., local differences or slope). The method maps pixel values to a color scale (either gray-scale or pseudocolor) in a manner that emphasizes gra- dients in the image while maintaining ordinal relationships of values. GBVM is especially useful for high-precision data, in which the num- ber of possible values is large. Colorization with GBVM is demon- strated with data from comprehensive two-dimensional gas chroma- tography (GCxGC), using both gray-scale and pseudocolor to visualize both small and large peaks, and with data from the Global Land One-Kilometer Base Elevation (GLOBE) Project, using gray- scale to visualize features that are not visible in images produced with popular value-mapping algorithms. © 2007 SPIE and
    Journal of Electronic Imaging 01/2007; 16:033004. · 1.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper develops a method for automatic colorization of two-dimensional fields presented as images, in order to visualize local changes in values. In many applications, local changes in values are as important as magnitudes of values. For example, in topography, both elevation and slope often must be considered. Gradient-based value mapping for colorization is a technique to visualize both value (e.g., intensity or elevation) and gradient (e.g., local differences or slope). The method maps pixel values to a color scale in a manner that emphasizes gradients in the image. The value mapping function is monotonically non-decreasing, to maintain ordinal relationships of values on the color scale. The color scale can be a grayscale or pseudocolor scale. The first step of the method is to compute the gradient at each pixel. Then, the pixels (with computed gradients) are sorted by value. The value mapping function is the inverse of the relative cumulative gradient magnitude function computed from the sorted array. The value mapping method is demonstrated with data from comprehensive two-dimensional gas chromatography (GCxGC), using both grayscale and a pseudocolor scale to visualize local changes related to both small and large peaks in the GCxGC data.
    Proc SPIE 06/2006;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates methods for comparing datasets produced by comprehensive two-dimensional gas chromatography (GC x GC). Chemical comparisons are useful for process monitoring, sample classification or identification, correlative determinations, and other important tasks. GC x GC is a powerful new technology for chemical analysis, but methods for comparative visualization must address challenges posed by GC x GC data: inconsistency and complexity. The approach extends conventional techniques for image comparison by utilizing specific characteristics of GC x GC data and developing new methods for comparative visualization and analysis. The paper describes techniques that register (or align) GC x GC datasets to remove retention-time variations; normalize intensities to remove sample amount variations; compute differences in local regions to remove slight misregistrations and differences in peak shapes; employ color (hue), intensity, and saturation to simultaneously visualize differences and values; and use tools for masking, three-dimensional visualization, and tabular presentation with controls for graphical highlights to significantly improve comparative analysis of GC x GC datasets. Experimental results indicate that the comparative methods preserve chemical information and support qualitative and quantitative analyses.
    Journal of Chromatography A 03/2006; 1105(1-2):51-8. · 4.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The multiple-instance learning (MIL) model has been successful in areas such as drug discovery and content-based image-retrieval. Recently, this model was generalized and a corresponding kernel was introduced to learn generalized MIL concepts with a support vector machine. While this kernel enjoyed empirical success, it has limitations in its representation. We extend this kernel by enriching its representation and empirically evaluate our new kernel on data from content-based image retrieval, biological sequence analysis, and drug discovery. We found that our new kernel generalized noticeably better than the old one in content-based image retrieval and biological sequence analysis and was slightly better or even with the old kernel in the other applications, showing that an SVM using this kernel does not overfit despite its richer representation.
    Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on; 12/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The multiple-instance learning (MIL) model has been very successful in application areas such as drug discovery and content-based imageretrieval.
    09/2004;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We introduce new distance measures for the construction and analysis of phylogenies, focusing on thioredoxin-fold proteins. Our distance measures for tree construction are based on several criteria, including pairwise alignment of only the thioredoxin fold region of each sequence, Hausdorff distance between sequences represented by sets of real vectors derived from per-residue features of the sequences, and properties of each sequence such as protein function and organism type. We also analyze and compare our trees in several ways. To corroborate the trees, we first compute the distance between the evolutionary trees, and then evaluate the trees based on conditional entropy. We also analyze the trees by finding common subtrees within and between our trees. Finally, biological analysis shows that trees based on our measures yield new information on proteins within the thioredoxin superfamily.
    07/2004;

Publication Stats

166 Citations
46.69 Total Impact Points

Institutions

  • 2010
    • Università degli Studi di Torino
      • Dipartimento di Scienza e Tecnologia del Farmaco
      Torino, Piedmont, Italy
    • Lincoln College USA
      Lincoln, Illinois, United States
  • 2004–2009
    • University of Nebraska at Lincoln
      • Department of Computer Science and Engineering
      Lincoln, NE, United States
    • University of Nebraska at Omaha
      Omaha, Nebraska, United States