
Cliff SpiegelmanTexas A&M University | TAMU · Department of Statistics
Cliff Spiegelman
About
104
Publications
13,577
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,336
Citations
Introduction
Publications
Publications (104)
It is well-known that El Paso is the only border area in Texas that has violated national air quality standards. Mobile source emissions (including vehicle exhaust) contribute significantly to air pollution, along with other sources including industrial, residential, and cross-border. This study aims at separating unobserved vehicle emissions from...
Randomly acquired characteristics (RACs), also known as accidental marks, are random markings on a shoe sole, such as scratches or holes, that are used by forensic experts to compare a suspect's shoe with a print found at the crime scene. This article investigates the relationships among three features of a RAC: its location, shape type and orienta...
Comparative Bullet Lead Analysis (CBLA) was discredited as a forensic discipline largely due to the absence of cross-discipline input, primarily metallurgical and statistical, during development and forensic/judicial application of the practice. Of particular significance to the eventual demise of CBLA practice was ignorance of the role of statisti...
For the development of effective air pollution control strategies, it is crucial to identify the sources that are the principal contributors to air pollution and estimate how much each source contributes. Multivariate receptor modeling aims to address these problems by decomposing ambient concentrations of multiple air pollutants into components as...
Several forensic sciences, especially of the pattern-matching kind, are increasingly seen to lack the scientific foundation needed to justify continuing admission as trial evidence. Indeed, several have been abolished in the recent past. A likely next candidate for elimination is bitemark identification. A number of DNA exonerations have occurred i...
The landfill gas (LFG) model is a tool for measuring methane (CH4) generation rates and total CH4 emissions from a particular landfill. These models also have various applications including the sizing of the LFG collection system, evaluating the benefits of gas recovery projects, and measuring and controlling gaseous emissions. This research paper...
This paper presents the findings from a proof-of-concept study that was conducted to examine whether engines and vehicles equipped with onboard diagnostic systems could provide data for optimizing fleet preventive maintenance practices. The study investigated the development of a statistical approach for recommending oil changes in the Texas Depart...
A major difficulty with assessing source-specific health effects is that source-specific exposures cannot be measured directly; rather, they need to be estimated by a source-apportionment method such as multivariate receptor modeling. The uncertainty in source apportionment (uncertainty in source-specific exposure estimates and model uncertainty du...
Chemometrics is a chemical discipline that uses mathematics, statistics, and formal logic (i) to design or select optimal experimental procedures; (ii) to provide maximum relevant chemical information by analyzing chemical data; and (iii) to obtain knowledge about chemical systems. In applications in industry, (iii) can be changed as follows: to ob...
There has been increasing interest in assessing health effects associated with multiple air pollutants emitted by specific sources. A major difficulty with achieving this goal is that the pollution source profiles are unknown and source-specific exposures cannot be measured directly; rather, they need to be estimated by decomposing ambient measurem...
There has been increasing interest in assessing health effects associated with multiple air pollutants emitted by specific sources. A major difficulty with achieving this goal is that the pollution source profiles are unknown and source-specific exposures cannot be measured directly; rather, they need to be estimated by decomposing ambient measurem...
Chemometrics is a chemical discipline that uses mathematics, statistics, and formal logic (i) to design or select optimal experimental procedures; (ii) to provide maximum relevant chemical information by analyzing chemical data; and (iii) to obtain knowledge about chemical systems. In applications in industry, (iii) can be changed as follows: to ob...
Methods are presented for calibrating when there are systematic departures from an exact linear model. The models proposed are much less structured; nonetheless, they admit uncertainty statements about calibration intervals that are usable. Application is made to pressure-volume calibration of a nuclear accountability tank.
We suggest a new approach for classification based on nonparametricly estimated likelihoods. Due to the scarcity of data in high dimensions, full nonparametric estimation of the likelihood functions for each population is impractical. Instead, we propose to build a class of estimated nonparametric candidate likelihood models based on a Markov prope...
En la UE se ha estimado que los costes de la congesti�n representan el 2% de su PIB y que el coste de la poluci�n del aire y ruido supera el 0,6% del PIB, siendo alrededor del 90% de los mismos ocasionados por el transporte terrestre. Ante este hecho y el continuo aumento de la demanda del transporte privado frente al p�blico para los desplazamient...
Verification of candidate biomarkers relies upon specific, quantitative assays optimized for selective detection of target proteins, and is increasingly viewed as a critical step in the discovery pipeline that bridges unbiased biomarker discovery to preclinical validation. Although individual laboratories have demonstrated that multiple reaction mo...
Following the futile efforts of generations to reach the high standard of excellence achieved by the luthiers in Cremona, Italy, by variations of design and plate tuning, current interest is being focused on differences in material properties. The long-standing question whether the wood of Stradivari and Guarneri were treated with wood preservative...
Data for multivariate discriminant analysis. The full set of data included 95×12 values for the Stradivarius violin, 75×12 for the early Guarneri and 30×12 for the rest, with the exception of the German maple which had only 15×12 data points. The data set from the pellets from each musical instrument was analyzed in its entirety as one group, while...
Current approaches and recent developments in methods and software associated with multivariate factor analysis and related methods in the analysis of environmental data for the identification, resolution and apportionment of contamination sources are discussed and compared.The chapter first focuses on techniques to be applied in the analysis of th...
We present a method for predicting future pavement distresses such as longitudinal cracking. These predicted distress values are used to plan road repairs. Large inherent variability in measured cracking and an extremely small number of observations are the nature of the pavement cracking data, which calls for a parametric Bayesian approach. We mod...
A method for estimating the Origin Destination (OD) split proportion matrix based on the observed traffic volume count data such as those from Intelligent Transportation System (ITS) is presented in this article. The nature of the ITS data, which frequently contains erroneous observations or missing values, requires that the procedure (1) is resist...
This paper has attracted interest around the world from the media (both TV and newspapers). In addition, we have received letters, emails and telephone calls. One of our favorites was a voicemail message asking us to return a call to Australia at which point we would learn who really killed JFK. We welcome the opportunity to respond to the letter t...
The use of receptor modeling is now a widely accepted approach to model air pollution data. The resulting estimates of pollution source profiles have error and frequently the uncertainties are obtained under an assumption of independence. In addition traditional Bootstrap approaches are very computationally intensive. We present an intuitive Jackkn...
A definition of chemometrics by Massart et al. is as follows: ‘Chemometrics is a chemical discipline that uses mathematics, statistics and formal logic (a) to design or select optimal experimental procedures; (b) to provide maximum relevant chemical information by analyzing chemical data; and (c) to obtain knowledge about chemical systems.’ In an a...
Current approaches and recent developments and software related with multivariate factor analysis and related methods in the analysis of environmental data for the identification, resolution and apportionment of contamination sources are discussed and compared.
There has been some debate whether inverse or classical calibration methods are superior when there are multivariate predictors and some of them are missing. In this paper, we compare these two methods in the case where the design is not completely known. We develop some general results in the multivariate case and carry out extensive simulations i...
The majority of U.S. departments of transportation (DOT) maintain quality assurance (QA) programs that require asphalt binder testing to verify grade compliance according to Superpave performance grade (PG) specifications. In Texas the binder is tested immediately after binder production, although the goal of QA is to ensure that the binder specifi...
This article considers the estimation of source profiles from pollution data collected at one receptor site. At this receptor site, varying metrological conditions can cause errors that are possibly a mixture of distributions. A standard estimator utilizes a least squares approach because of its optimal properties under normally distributed errors...
The Mueller matrix completely describes the optical polarization properties of a material. These properties are known to change with size of scatterers, birefringence, and number of scatterers. In cancerous and normal cells and tissues all of the above properties can change and therefore this approach can potentially be used to distinguish between...
The Mueller matrix describes all the polarizing properties of a sample, and therefore the optical differences between cancerous and non-cancerous tissue should be present within the matrix elements. We present in this paper the Mueller matrices of three types of tissue; normal, benign mole, and malignant melanoma on a Sinclair swine model. Feature...
In this article we consider a Bayesian approach to inference in which there is a calibration relationship between measured and true quantities of interest. One situation in which this approach is useful is for unknowns in which calibration intervals are obtained. The other situation is when inference about a population is desired in which tolerance...
When data analysts have multivariate data, often they have partial knowledge about the form of the marginal densities, but frequently they have little information about the bivariate and higher dimensional densities. This article provides nonparametric estimators that nearly equal the MLE estimates for the marginal densities while being close to th...
Novel methods for implementation of detector-level multivairiate screening methods are presented.-The methods use present data and classify data as outliers on the basis of comparisons with empirical cutoff Points derived from extensive archived data rather than from standard statistical tables. In addition, while many of the ideas of the classical...
Multivariate receptor models aim to identify the pollution sources based on multivariate air pollution data. This article is concerned with estimation of the source profiles (pollution recipes) and their contributions (amounts of pollution). The estimation procedures are based on constrained nonlinear least squares methods with the constraints give...
The relationship of the concentration of air pollutants to wind direction has been determined by nonparametric regression using a Gaussian kernel. The results are smooth curves with error bars that allow for the accurate determination of the wind direction where the concentration peaks, and thus, the location of nearby sources. Equations for this m...
We show that a nonlinear approach to single use calibration curves gives reasonable intervals. The nonlinear approach produces intervals even when the classical approach fails to do so. The intervals obtained with the nonlinear approach are always slightly shorter, but the coverage rate remains reasonable, as demonstrated in our simulations. The ad...
Experiment 16. Abstract The primary objective of this research is to provide statistical guidance on the performance of the various maintenance treatments placed as part of the Supplemental Maintenance Effectiveness Research Program (SMERP). A key part of this will be the choice of the statistical model. This report documents the choice of the form...
Intelligent transportation systems (ITS) technologies and infrastructure are a potentially rich travel-time data source for travel-time mean and variance estimates. ITS data traditionally have been deployed and used in real time for passenger cars. How ITS data can be used for multimodal analyses and system monitoring is examined. The methodology u...
A new easy-to-understand calibration method for the analysis of spectral data is developed. The "parallel calibration" method is logically simple and intuitive yet often provides an improvement over more complex standard calibration methods. A description of the algorithm with a technical justification for the parallel algorithm is presented, under...
Although most traffic management centers collect intelligent transportation system (ITS) traffic monitoring data from local controllers in 20-s to 30-s intervals, the time intervals for archiving data vary considerably from 1 to 5, 15, or even 60 min. Presented are two statistical techniques that can be used to determine optimal aggregation levels...
We present two new statistics for estimating the number of factors underlying in a multivariate system. One of the two new methods, the original NUMFACT, has been used in high profile environmental studies. The two new methods are first explained from a geometrical viewpoint. We then present an algebraic development and asymptotic cutoff points. Ne...
Near-infrared spectroscopy is being considered as a tool for the noninvasive determination of important cell culture media constituents, which would allow frequent, harmless sampling and computer interfacing for closed-loop control. Partial least-squares calibration models for glucose and lactate are constructed for cell culture media and aqueous m...
This paper presents and compares a new algorithm for finding the number of factors in a data analytic model. After we describe the new method, called NUMFACT, we compare it with standard methods for finding the number of factors to use in a model. The standard methods that we compare NUMFACT with are Malinowski's indicator function, Wold's cross-va...
Wavelength selection is an important preprocessing step for improving and simplifying calibrations in both quantitative and qualitative problems. An improved variable selection algorithm has been developed to improve upon existing methods in terms of speed and prediction error. The new technique uses a novel peak-hopping strategy to move quickly be...
A new stepwise approach to variable selection for spectroscopy that includes chemical information and attempts to test several spectral regions producing high ranking coefficients has been developed to improve on currently available methods. Existing selection techniques can, in general, be placed into two groups: the first, time-consuming optimiza...
A new stepwise approach to variable selection for spectroscopy that includes chemical information and attempts to test several spectral regions producing high ranking coefficients has been developed to improve on currently available methods. Existing selection techniques can, in general, be placed into two groups: the first, time-consuming optimiza...
Complex near-infrared (near-IR) spectra of aqueous solutions containing five independently varying absorbing species were collected to assess the ability of partial least-squares (PLS) regression and wavelength selection for calibration and prediction of these species in the presence of each other. It was confirmed that PLS calibration models can s...
The primary objective of this study was to develop a “Measurement Strategy ” for evaluating the relative degree of success of new hot mix asphalt (HMA) pavement construction specifications. The specific reason for developing a Measurement Strategy was for use in comparing relative performance as a function of time of HMA pavements constructed under...
In many areas of science, novel curve-fitting algorithms are recommended and employed. Users often are left with little means of discerning whether or not the algorithms work as advertised. We propose an ad-hoc method for assessing the behavior of these estimators. By modifying a chemical technique called standard addition, we can assess 1) whether...
The mathematical basis of improved calibration through selection of informative variables for partial least-squares calibration has been identified. A theoretical investigation of calibration slopes indicates that including uninformative wavelengths negatively affect calibrations by producing both large relative bias toward zero and small additive...
A variable selection method that reduces prediction bias in partial least-squares regression models was developed and applied to nearinfrared absorbance spectra of glucose in pH buffer and cell culture medium. Comparisons between calibration and prediction capability for full spectra and reduced sets were completed. Variable selection resulted in s...
Regulatory agencies and photochemical models of ozone rely on self-reported industrial emission rates of organic gases. Incorrect self-reported emissions can severely impact on air quality models and regulatory decisions. We compared self-reported emissions of organic gases in Houston, Texas, to measurements at a receptor site near the Houston ship...
Near-IR spectroscopy has been used in combination with multivariate calibration techniques such as partial-lest squares regression (PLSR) to quantify glucose concentration in various media. However, for reasonable prediction capability in measuring glucose many calibration samples are needed. n addition, spectroscopic data often contain over 1000 d...
This paper gives methods that use measurements from calibrated instruments in an effective and understandable manner. While some chemometric methods such as partial least squares might be considered, the procedures that we use are more transparent. In this paper two simple methods are proposed that use standard and saddlepoint approximations to com...
Improved standard deviation estimates from possibly biased duplicate measurements can be derived from appropriately trimmed plots of standard deviation estimates using pairs of replicates vs the quantiles of a half-normal distribution. Simulated studies show that these estimates exhibit generally lower mean-squared errors and biases than do more st...
One important issue in chemometrics is to detect interactions among several factors. In this paper, we propose methods that detect interactions using low dimensional smoothers. Two methods are investigated and compared with usual least squared methods via Monte Carlo simulations. In addition, we show, using real data, how the methods affect our dec...
This paper represents an ongoing receptor modeling research of airborne species in El Paso, Texas. It represents a six month collaboration between the authors. It extends the case study reported by Spiegelman and Dattner in 1992. For completeness the background material is reviewed.
Spiegelman, C.H., 1992. Plotting aids for multivariate calibration and chemostatistics. Chemometrics and Intelligent Laboratory Systems, 15: 29–38.There are few published procedures for plotting multivariate calibration data. In this paper I give some new plotting techniques that have been useful in my research and consulting. There are plots for d...
Spiegelman, C.H., Watters, R.L. and Hungwu, L., 1991. A statistical method for calibrating flame emission spectrometry which takes account of errors in the calibration standards. Chemometrics and Intelligent Laboratory Systems, 11: 121–130.The determination of potassium in sample solutions using flame emission spectrometry (FES) requires that the c...
Cline, D.B.H. and Spiegelman, C.H., 1991. Bias correcting confidence intervals for a nearly common property. Chemometrics and Intelligent Laboratory System, 11: 131–136.Confidence intervals are an important tool. Realistic confidence intervals account for both random errors and systematic errors (bias). We improve the usual method for combining ran...
European Economic Common-market has awarded several projects to European, chemometricians in its COMETT program. The objective of the COMETT program is to organize industry oriented training on a transnational level in advanced technological subjects. Four types of projects were awarded, namely: Creation of a network for analyzing training needs, o...
This article investigates the use of nonparametric regression methodology to test the adequacy of a parametric linear model. The large-sample properties of parametric goodness-of-fit tests for linearity are considered. The inadequacies of such tests lead to the proposal of new tests that are constructed from nonparametric regression fits to the res...
Eberhardt, K.R., Reeve, C.P. and Spiegelman, C.H., 1989. A minimax approach to combining means, with practical examples. Chemometrics and Intelligent Laboratory Systems, 5: 129–148.This paper describes a method for combining sample means that accounts for bias in those means. It compares the unweighted mean, the weighted mean using reciprocal estim...
Suppose that an approximate linear model, or nonparametric regression, relates instrument readings y to standards x. A method is derived for constructing interval estimates of displacements x
1 − x
2 between standards based on corresponding instrument readings y
1,y
2, and the results of a calibration experiment.
Ordinary least squares (OLS) is one of the most commonly used criteria for fitting data to models and for estimating parameters. Orthogonal distance regression (ODR) extends least squares data fitting to problems with independent variables that are not known exactly. In this paper, we present the results of an empirical study designed to compare OL...
One method of fitting a parametric density function f(x, θ) is first to estimate θ by maximum likelihood, say, and then to estimate f(x, θ) by . On the other hand, when the parametric model does not hold, the true density f(x) may be estimated nonparametrically, as in the case of a kernel estimate .
The key idea proposed is to fit a combination of...
A simple linear calibration function can be used over a wide
concentration range for the Inductively Coupled Plasma (ICP)
spectrometer due to its linear responses. The random errors over wide
concentration ranges are not constant, and constant variance regression
should not be used to estimate the calibration function. Weighted
regression technique...
Modern exploratory data analysis produces models that are not based on physical theory but that are consistent with pictures of the data. When both X and Y have error this can be risky, because important features are hidden. Two examples are given that show that systematic model departures and heteroscedasticity may not be detectable with standard...
An ordinary q-q plot has at least two faults: it has a wavy appearance and its degree of linearity is hard to quantify. We propose a remedy, called a conditional q-q plot that reduces the waviness by conditioning on functions of adjacent random variables. It provides an easy and useful plot for assessing the validity of distributional assumptions....
Certified values of working standards used for calibrations are rarely exact, and calibration curve procedures should take into account all sources of error, including errors in the working standards. When the errors in working standards have a known finite bound, we give an easily implementable accurate calibration curve procedure. It produces con...
The probability of failure of a structure or structural element subjected to wind forces depends, in large part, on the distribution of extreme wind speeds acting on the structure. In the past, distributions of extreme wind speeds were based on extreme wind data without regard to wind direction, and probabilities of failure were computed accordingl...
A wide class of location parameters is shown to satisfy Jensen's inequality. When the expectation EX exists and l is a convex function, Jensen's inequality states that El(x) ≥ l(EX). It is shown that for μ, a properly defined location parameter, μ(l(x)) μ l(μ(x)).
summaryA statistic for identifying influential observations in calibration is given. The statistic is easy to interpret, and provides a useful measure of influence for Scheffé type calibration curves.
We consider binary regression models when some of the predictors are measured with error. For normal measurement errors, structural
maximum likelihood estimates are considered. We show that if the measurement error is large, the usual estimate of the probability
of the event in question can be substantially in error, especially for high risk groups...
Calibration curves are an important part of many measurement processes. The user of a fitted calibration curve must know its precision and accuracy. These are determined in a timely fashion using the data iteratively. This paper gives a method that divides the data into training and test groups. The test group is iteratively checked to see that a p...
This article presents an implementation of a Scheffé-type calibration procedure to the pressure-volume relationship for nuclear-materials processing tanks. We use splines (piecewise poly-nomials) to fit this relationship and present a comparison between the propagation of error and the Scheffé formulation of statistical uncertainties. An ASCII FORT...
For the errors in variables model X = U + V, Y = βf(U) + W, sufficient conditions are given for the L.S. limiting estimate of β to satisfy P ( β ^ / β < 1 ) = 1 or P ( β ^ / β > 1 ) = 1 as the sample size tends to infinity.
The result in this paper explains some of the qualitative nature of Jensen's inequality. It is shown that the more disperse the distribution of a random variable is, the smaller is the expectation of any concave function of it. This result can be used to show the inadequacy of some current methods of reporting environmental data by using geometric...
The measurement process uncertainty is propagated through the use of a calibration curve. The magnitude and direction of this uncertainty depends on the choice of the controllable variable in producing the calibration curve; in other words, the design of the calibration experiment. In this study this design is discussed in the context of Scheffe's...
Let $X_i$ and $Y_i$ be random variables related to other random variables $U_i, V_i$, and $W_i$ as follows: $X_i = U_i + W_i, Y_i = \alpha + \beta U_i + V_i, i = 1, \cdots, n$, where $\alpha$ and $\beta$ are finite constants. Here $X_i$ and $Y_i$ are observable while $U_i, V_i$ and $W_i$ are not. This model is customarily referred to as the regress...
Ph. D. (Applied Mathematics)--Northwestern University, 1976.
When placing hot-mix asphalt concrete (HMAC), paving the full width of the pavement in a single pass is usually impossible; therefore, most bituminous pavements contain longitudinal construction joints. These construction joints can often be inferior to the rest of the pavement and can eventually cause an otherwise sound pavement to deteriorate. Th...