-
[show abstract]
[hide abstract]
ABSTRACT: Three methods for variable selection are described, namely the t-statistic, Partial Least Squares Discriminant Analysis (PLS-DA) weights and regression coefficients, with the aim of determining
which variables are the most significant markers for discriminating between two groups: a variable’s level of significance
is related to its magnitude. Monte-Carlo methods are employed to determine empirical significance of variables, by permuting
randomly the class membership 5000times to obtain null distributions, and comparing the observed statistic for each variable
with the null distribution. Seven simulations consisting of 200 samples, divided equally between two classes, and 300 variables,
are constructed; in one dataset there are no induced correlations between variables, in two datasets correlations are induced
but there is no induced separation between the classes, and in four datasets, separation is induced by selecting 20 of the
variables to be discriminators. In addition two metabolomic datasets were analysed consisting of the GCMS of urinary extracts
from mice both to determine the effect of stress and to determine the effect of diet on the urinary chemosignal. It is shown
that the t-statistic combined with Monte-Carlo permutations provides similar results to PLS weights. PLS regression coefficients find
the least number of markers but, for the simulations, the lowest False Positives rates.
Metabolomics 04/2012; 5(4):387-406. · 4.51 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Body fluids such as urine potentially contain a wealth of information pertaining to age, sex, social and reproductive status, physiologic state, and genotype of the donor. To explore whether urine could encode information regarding environment, physiology, and development, we compared the volatile compositions of mouse urine using solid-phase microextraction and gas chromatography-mass spectrometry (SPME-GC/MS). Specifically, we identified volatile organic compounds (VOCs) in individual urine samples taken from inbred C57BL/6J-H-2(b) mice under several experimental conditions-maturation state, diet, stress, and diurnal rhythms, designed to mimic natural variations. Approximately 1000 peaks (i.e., variables) were identified per comparison and of these many were identified as potential differential biomarkers. Consistent with previous findings, we found groups of compounds that vary significantly and consistently rather than a single unique compound to provide a robust signature. We identified over 49 new predictive compounds, in addition to identifying several published compounds, for maturation state, diet, stress, and time-of-day. We found a considerable degree of overlap in the chemicals identified as (potential) biomarkers for each comparison. Chemometric methods indicate that the strong group-related patterns in VOCs provide sufficient information to identify several parameters of natural variations in this strain of mice including their maturation state, stress level, and diet.
Chemical Senses 04/2010; 35(6):459-71. · 2.60 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The paper discusses variable selection as used in large metabolomic studies, exemplified by mouse urinary gas chromatography of 441 mice in three experiments to detect the influence of age, diet, and stress on their chemosignal. Partial least squares discriminant analysis (PLS-DA) was applied to obtain class models, using a procedure of 20,000 iterations including the bootstrap for model optimization and random splits into test and training sets for validation. Variables are selected using PLS regression coefficients on the training set using an optimized number of components obtained from the bootstrap. The variables are ranked in order of significance, and the overall optimal variables are selected as those that appear as highly significant over 100 different test and training set splits. Cost/benefit analysis of performing the model on a reduced number of variables is also illustrated. This paper provides a strategy for properly validated methods for determining which variables are most significant for discriminating between two groups in large metabolomic data sets avoiding the common pitfall of overfitting if variables are selected on a combined training and test set and also taking into account that different variables may be selected each time the samples are split into training and test sets using iterative procedures.
Analytical Chemistry 07/2009; 81(13):5204-17. · 5.86 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: To quantify separate classes, four indices are compared namely the Davies Bouldin index, the silhouette width and two new approaches described in this paper, the modified silhouette width index based on the proportion of objects with a positive silhouette width and the Overlap Coefficient. Four sets of simulated datasets are described, each in turn, consisting of 15 sets of data of varying degrees of overlap, and differing in the nature of outliers. Three experimental datasets consisting of the gas chromatography mass spectrometry of extracts from mouse urine obtained to study the effect of different environmental (stress), physiological (diet) and developmental (age) factors on their metabolic profiles are also described. The paper discusses the robustness of each approach to outliers, and to allow assessment of class separation for each index. The two modifications protect against outliers. Copyright © 2008 John Wiley & Sons, Ltd.
Journal of Chemometrics 09/2008; 23(1):19 - 31. · 1.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: In rodents, the nasal cavity contains two separate chemosensory epithelia, the main olfactory epithelium, located in the posterior dorsal aspect of the nasal cavity, and the vomeronasal/accessory olfactory epithelium, located in a capsule in the anterior aspect of the ventral floor of the nasal cavity. Both the main and accessory olfactory systems play a role in detection of biologically relevant odors. The accessory olfactory system has been implicated in response to pheromones, while the main olfactory system is thought to be a general molecular analyzer capable of detecting subtle differences in molecular structure of volatile odorants. However, the role of the two systems in detection of biologically relevant chemical signals appears to be partially overlapping. Thus, while it is clear that the accessory olfactory system is responsive to putative pheromones, the main olfactory system can also respond to some pheromones. Conversely, while the main olfactory system can mediate recognition of differences in genetic makeup by smell, the vomeronasal organ (VNO) also appears to participate in recognition of chemosensory differences between genetically distinct individuals. The most salient feature of our review of the literature is that there are no general rules that allow classification of the accessory olfactory system as a pheromone detector and the main olfactory system as a detector of general odorants. Instead, each behavior must be considered within a specific behavioral context to determine the role of these two chemosensory systems. In each case, one system or the other (or both) participates in a specific behavioral or hormonal response.
Hormones and Behavior 10/2004; 46(3):247-56. · 3.87 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The olfactory system detects small differences in the composition of natural odorants, made up of hundreds of molecules. Odorous quality is hypothetically represented by a combinatorial code: activation of distinct but overlapping subsets of olfactory receptors resulting in activation of a distinct subset of glomeruli in the main olfactory bulb (MOB). Here we show that modification of a single gene (the K gene of the major histocompatibility locus), which results in a subtle change in the odiferous quality of urine, causes a small but significant change in the composition of urine volatiles and consequently the evoked glomerular activation pattern in the MOB. The magnitude of disparity between urine-evoked glomerular activation patterns is predictive of the extent of (1) the genetic difference among the urine donors, (2) the difference in the chemical composition of urine, and (3) the odor detector's ability to discriminate. These data on natural odors are consistent with the combinatorial code hypothesis and identify subsets of glomeruli that are apt to play a significant role in mediating individual recognition.
Journal of Neuroscience 12/2002; 22(21):9513-21. · 7.11 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The nasal epithelium is richly invested with peptidergic (substance P and calcitonin gene-related peptide [CGRP]) trigeminal polymodal nociceptors, which respond to numerous odorants as well as irritants. Peptidergic trigeminal sensory fibers also enter the glomerular layer of the olfactory bulb. To test whether the trigeminal fibers in the olfactory bulb are collaterals of the epithelial trigeminal fibers, we utilized dual retrograde labeling techniques in rats to identify the trigeminal ganglion cells innervating each of these territories. Nuclear Yellow was injected into the dorsal nasal epithelium, and True Blue was injected into the olfactory bulb of the same side. Following a survival period of 3-7 days, the trigeminal ganglion contained double-labeled, small (11.8 x 8.0 microm), ellipsoid ganglion cells within the ethmoid nerve region of the ganglion. Tracer injections into the spinal trigeminal complex established that these branched trigeminal ganglion cells also extended an axon into the brainstem. These results indicate that some trigeminal ganglion cells with sensory endings in the nasal epithelium also have branches reaching directly into both the olfactory bulb and the spinal trigeminal complex. These trigeminal ganglion cells are unique among primary sensory neurons in having two branches entering the central nervous system at widely distant points. Furthermore, the collateral innervation of the epithelium and bulb may provide an avenue whereby nasal irritants could affect processing of coincident olfactory stimuli.
The Journal of Comparative Neurology 04/2002; 444(3):221-6. · 3.81 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Four common classification methods are described, Euclidean Distance to Centroids (EDC), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Support Vector Machines (SVM). In many applications of chemometrics e.g. in medicine and biology it is common for there to be unequal sample sizes in different groups. When class sizes are unequal the performance of some of these methods may be biased according to class size. This paper describes approaches for incorporating prior probabilities of class membership using Bayesian approaches to three of the methods LDA, QDA and SVM, either assuming equal probability or assuming that the relative sample sizes relate to the relative probabilities. EDC is used as a benchmark to determine model stabilities. The methods are illustrated by four simulated datasets of different structures and one real dataset consisting of the gas chromatographic profile of mouse urine comparing controls to those on a diet.
Chemometrics and Intelligent Laboratory Systems 99(2):111-120. · 1.92 Impact Factor