
David W. Scott- PhD
- Professor at Rice University
David W. Scott
- PhD
- Professor at Rice University
About
179
Publications
72,975
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,468
Citations
Introduction
Current institution
Publications
Publications (179)
Over the past few years, we have seen an increased need to analyze the dynamically changing behaviors of economic and financial time series. These needs have led to significant demand for methods that denoise non-stationary time series across time and for specific investment horizons (scales) and localized windows (blocks) of time. Wavelets have lo...
Osteosarcoma (OS) is a genetically diverse bone cancer that lacks a consistent targetable mutation. Recent studies suggest the IGF/PI3K/mTOR pathway and YAP/TAZ paralogs regulate cell fate and proliferation in response to biomechanical cues within the tumor microenvironment. How this occurs and their implication upon osteosarcoma survival, remains...
Growth factors such as bone morphogenetic protein-2 (BMP-2) are potent tools for tissue engineering. Three-dimensional (3D) printing offers a potential strategy for delivery of BMP-2 from polymeric constructs; however, these biomolecules are sensitive to inactivation by the elevated temperatures commonly employed during extrusion-based 3D printing....
The present study sought to demonstrate the swelling behavior of hydrogel-microcarrier composite constructs to inform their use in controlled release and tissue engineering applications. In this study, gelatin methacrylate (GelMA) and GelMA-gelatin microparticle (GMP) composite constructs were three-dimensionally printed, and their swelling and deg...
As modern data analysis pushes the boundaries of classical statistics, it is timely to reexamine alternate approaches to dealing with outliers in multiple regression. As sample sizes and the number of predictors increase, interactive methodology becomes less effective. Likewise, with limited understanding of the underlying contamination process, di...
This work investigated the effect of poly(L-lysine) (PLL) molecular weight and concentration on chondrogenesis of cocultures of mesenchymal stem cells (MSCs) and articular chondrocytes (ACs) in PLL-loaded hydrogels. An injectable dual-network hydrogel composed of a poly(N-isopropylacrylamide)-based synthetic thermogelling macromer and a chondroitin...
There are, in fact, an infinite number of possible probability distributions. This chapter focuses on some of the most important and common examples. These distributions all share the feature that they follow from simple rules and intuitive motivation. In addition, the chapter explores how the results of experiments are measured and recorded. To ma...
This chapter explores the ways in which bivariate data interact, and how that interaction can used for purposes of prediction. The graphical exploration of the Pearson father–son data hints at what the bivariate models will capture analytically. As in the univariate case, the bivariate normal is the exponential of a quadratic form. For simplicity,...
This chapter provides a case study of a research project, namely, how to choose the bin width of a histogram in an optimal fashion. It shows how to generalize the variance criterion to the mean squared error (MSE) and illustrates its use on the problem of constructing an optimal histogram. Researchers showed that probability histograms are consiste...
The field of statistics has a rich history that has become tightly integrated into the emerging field of data sciences. Collaboration with computer scientists, numerical analysts, and decision makers characterizes the field. The mathematical backbone of all of the statistical methods is probability theory. This chapter discusses the basics of proba...
This chapter focuses on parametric probability density estimation, assuming the statistician or subject matter expert has correctly selected the appropriate choice for the probability density function (PDF). An estimator of a parameter is a formula using the random sample. Since the moments of a PDF will generally be a function of the parameters, t...
Statistical analyses are based on a model of random data. Statistics relies on the ability to replicate experiments many times independently in order to arrive at dependable estimates and decisions. This is captured in the idea of a random sample. To understand the properties of estimators and statistics as a function of the number of samples, n, a...
This chapter explores the ways in which statisticians plan experiments, analyze data, and report findings. It describes the structure of common hypotheses tests and introduces the concept of a type I error. The chapter discusses the setting up of a hypotheses test, and deals with the concept of best critical region for simple hypotheses and composi...
This chapter introduces the confidence interval, which provides an additional way of reporting the result of a hypothesis test about a parameter. It shows how to apply these ideas to tests and models for the family of Pearson's goodness‐of‐fit chi‐squared ‐tests, the correlation coefficient, linear regression curve fitting, and, finally, for the an...
Early developments in probability may be attributed to gamblers who reached out to scientists to better understand games of chance. The Russian mathematician Kolmogorov successfully developed an axiomatic model, which relied heavily on the use of set theory. The axioms provided rules to which probabilities must conform, but no guidance on assigning...
Here, we demonstrate the in vivo efficacy of glucose microparticles (GMPs) to serve as porogens within calcium phosphate cements (CPCs) to obtain a fast degrading bone substitute material. Composites were fabricated incorporating 20 wt% GMPs at two different GMP size ranges (100 150 µm (GMP S) and 150 300µm (GMP L)), while CPC containing 20 wt% pol...
The tumor microenvironment harbors essential components required for cancer progression including biochemical signals and mechanical cues. To study the effects of microenvironmental elements on Ewing’s sarcoma (ES) pathogenesis, we tissue-engineered an acellular three-dimensional (3D) bone tumor niche from electrospun poly(ε-caprolactone) (PCL) sca...
In this study of 3D printed composite β-tricalcium phosphate (β-TCP)/hydroxyapatite (HA)/poly(ε-caprolactone) (PCL)-based constructs, the effects of vertical compositional ceramic gradients and architectural porosity gradients on the osteogenic differentiation of rabbit bone marrow derived mesenchymal stem cells (MSCs) were investigated. Specifical...
Modern data science employs many advanced algorithms, but always begins with an exploratory data analysis phase. The data summaries typically used in EDA involve point and frequency diagrams, such as a scatter diagram, a box‐and‐whiskers plot, and a stem‐and‐leaf plot. When massive datasets are encountered, alternatives that produce interpretable f...
Current in vitro methods for assessing cancer biology and therapeutic response rely heavily on monolayer cell culture on hard, plastic surfaces that do not recapitulate essential elements of the tumor microenvironment. While a host of tumor models exist, most are not engineered to control the physical properties of the microenvironment and thus may...
Density estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization, and machine learning. Density estimation is widely adopted in the domain of unsupervised learning especially fo...
Recent developments in 3D printing (3DP) research have led to a variety of scaffold designs and techniques for osteochondral tissue engineering; however, the simultaneous incorporation of multiple types of gradients within the same construct remains a challenge. Herein, we describe the fabrication and mechanical characterization of porous poly(ε-ca...
This study investigated the effects of incorporating glucose microparticles (GMPs) and poly(lactic-co-glycolic acid) microparticles (PLGA MPs) within a calcium phosphate cement on the cement's handling, physicochemical properties, and the respective pore formation. Composites were fabricated with two different weight fractions of GMPs (10 and 20 wt...
University ranking is a popular yet controversial endeavor. Most rankings are based on both public data, such as student test scores and retention rates, and proprietary data, such as school reputation as perceived by high school counselors and academic peers. The weights applied to these characteristics to compute the rankings are often determined...
The “density estimation” article highlights the dual roles of the graphical exploration of data as well as the nonparametric estimation of the distribution of continuous probabilities. Estimators described in detail include the histogram, frequency polygon, averaged shifted histogram, and the classic kernel method. The basic theory of estimation is...
The probability density function is a fundamental concept in statistics. Density estimation is the reconstruction of the density function from a set of observed data. A well‐constructed density estimate can give valuable indication of such features as skewness and multimodality in the underlying density function.
Regularization is an important component of predictive model building. The hybrid bootstrap is a regularization technique that functions similarly to dropout except that features are resampled from other training points rather than replaced with zeros. We show that the hybrid bootstrap offers superior performance to dropout. We also present a sampl...
The Good Judgment Team led by psychologists P. Tetlock and B. Mellers of the University of Pennsylvania was the most successful of five research projects sponsored through 2015 by the Intelligence Advanced Research Projects Activity to develop improved group forecast aggregation algorithms. Each team had at least 10 algorithms under continuous deve...
In this work, we describe the synthesis and characterization of variants of poly(diol fumarate) and poly(diol fumarate-co-succinate). Through a Fischer esterification, α,ω-diols and dicarboxylic acids were polymerized to form aliphatic polyester comacromers. Due to the carbon-carbon double bond of fumaric acid, incorporating it into the macromer ba...
Antibiotic-releasing porous polymethylmethacrylate (PMMA) space maintainers, comprising PMMA with an aqueous porogen and poly(DL-lactic-co-glycolic acid) (PLGA) antibiotic carrier, have been developed to facilitate local delivery of antibiotics and tissue integration. In this study, clindamycin-loaded space maintainers were used to investigate the...
3D printing has emerged as an important technique for fabricating tissue engineered scaffolds. However, systematic evaluations of biomaterials for 3D printing have not been widely investigated. We evaluated poly(propylene fumarate) (PPF) as a model material for extrusion-based printing applications. A full-factorial design evaluating the effects of...
Many statistical procedures contain meta-parameters or design parameters that must be specified in order to run. Cross-validation, simulation, or other advanced algorithms often are invoked to choose those parameters with real data. However, for a preliminary or informal look at data, rules of thumb are often employed in order to start an analysis...
Injectable, biodegradable, dual-gelling macromer solutions were used to encapsulate mesenchymal stem cells (MSCs) within stable hydrogels when elevated to physiologic temperature. Pendant phosphate groups were incorporated in the N-isopropyl acrylamide-based macromers to improve biointegration and facilitate hydrogel degradation. The MSCs were show...
The present work investigated correlations between cartilage and subchondral bone repair, facilitated by a growth factor-delivering scaffold, in a rabbit osteochondral defect model. Histological scoring indices and micro-computed tomography morphological parameters were used to evaluate cartilage and bone repair, respectively, at 6 and 12 weeks. Co...
Clarifies modern data analysis through nonparametric density estimation for a complete working knowledge of the theory and methods. Featuring a thoroughly revised presentation, Multivariate Density Estimation: Theory, Practice, and Visualization, Second Edition maintains an intuitive approach to the underlying methodology and supporting theory of d...
Clarifies modern data analysis through nonparametric density estimation for a complete working knowledge of the theory and methods Featuring a thoroughly revised presentation, Multivariate Density Estimation: Theory, Practice, and Visualization, Second Edition maintains an intuitive approach to the underlying methodology and supporting theory of de...
A simple device has been proposed for eliminating the bin edge problem of the frequency polygon while retaining many of the computational advantages of a density estimate based on bin counts. A nearly equivalent device is to average several shifted histograms, which is just as general but simpler to describe and analyze. The result is the “averaged...
Nonparametric discrimination simply involves estimation of the unknown densities in the likelihood ratio using averaged shifted histogram (ASH) estimates. Bump hunting falls precisely into the classification category. Given a density estimate, the modes and bumps are easily located. Some of those modes and bumps may be spurious and should be smooth...
This chapter explores the relationship of the underlying density estimator to the regression estimator. Many regression estimators discussed in the chapter are linear smoothers, that is, linear combinations of the observed responses. Issues of robustness are relevant, particularly in higher dimensions. The chapter describes nonlinear nonparametric...
A complete analysis of multidimensional data requires the application of an array of statistical tools such as parametric analysis, nonparametric analysis, and graphical analysis. Parametric analysis is the most powerful. Nonparametric analysis is the most flexible. And graphical analysis provides the vehicle for discovering the unexpected. This ch...
The curse of dimensionality describes the apparent paradox of “neighborhoods” in higher dimensions, if the neighborhoods are “local,” then they are almost surely “empty,” whereas if a neighborhood is not “empty,” then it is not “local.” If the bandwidth is large enough to include enough data to hold down the variance, the bias is intolerable due to...
The framework of the classic histogram is useful for conveying the general flavor of nonparametric theory and practice. A histogram conveys visual information of both the frequency and relative frequencies of observations; that is, the essence of a density function. The classical frequency histogram is formed by constructing a complete set of nonov...
In this chapter, an introduction to nonparametric estimation criteria is given, using only tools familiar from parametric analysis. R. A. Fisher and Karl Pearson engaged in a lively debate on aspects of parametric estimation, with Pearson arguing in favor of nonparametric curves. Nonparametric curves are driven by structure in the data and are broa...
The kernel estimator can be motivated not only as the limiting case of averaged shifted histogram but also by other techniques. The kernel density estimate inherits all the properties of its kernel. It is easy to check that the ratio of asymptotic integrated variance (AIV) to asymptotic integrated squared bias (AISB) in the asymptotic mean integrat...
The frequency polygon (FP) is a continuous density estimator based on the histogram, with some form of linear interpolation. In one dimension, the frequency polygon is the linear interpolant of the mid-points of an equally spaced histogram. As such, the frequency polygon extends beyond the histogram into an empty bin on each extreme. There are othe...
Leveraged Exchange Traded Funds (LETFs) are constructed to provide the indicated leverage multiple of the daily total return on an underlying index. LETFs may perform as expected on a daily basis; however, fund issuers state that there is no guarantee of achieving the multiple of the index return over longer time horizons. LETF returns are extremel...
In this study, a full factorial approach was used to investigate the effects of poly(ethylene glycol) (PEG) molecular weight (MW; 10,000 vs. 35,000 nominal MW), crosslinker-to-macromer carbon-carbon double bond ratio (DBR; 40 vs. 60), crosslinker type (PEG-diacrylate (PEGDA) vs. N,N'-methylene bisacrylamide (MB)), crosslinking extent of incorporate...
The “density estimation” article highlights the dual roles of the graphical exploration of data as well as the nonparametric estimation of probabilities. Estimators described in detail include the histogram, frequency polygon, averaged shifted histogram, and kernel method. The basic theory of estimation is outlined. Data-based calibration by cross-...
The History of COPSS A brief history of the Committee of Presidents of Statistical Societies (COPSS) Ingram Olkin Reminiscences and Personal Reflections on Career Paths Reminiscences of the Columbia University Department of Mathematical Statistics in the late 1940s Ingram Olkin A career in statistics Herman Chernoff "... how wonderful the field of...
We investigate a robust penalized logistic regression algorithm based on a
minimum distance criterion. Influential outliers are often associated with the
explosion of parameter vector estimates, but in the context of standard
logistic regression, the bias due to outliers always causes the parameter
vector to implode, that is shrink towards the zero...
The Weibull family is widely used to model failure data, or lifetime data, although the classical two-parameter Weibull distribution is limited to positive data and monotone failure rate. The parameters of the Weibull model are commonly obtained by maximum likelihood estimation; however, it is well-known that this estimator is not robust when deali...
Porous polymethylmethacrylate (PMMA) has been used as an alloplastic bone substitute in the craniofacial complex, showing integration with the surrounding soft and hard tissue. This study investigated the physicochemical properties of curing and cured mixtures of a PMMA-based bone cement and a carboxymethylcellulose (CMC) gel porogen. Four formulat...
This was a book review that appeared in Technometrics. Write me for a copy of the review if you want to see it.
Scott, D.W. ``Review of `The New S Language,' R.A. Becker, J.M. Chambers, and A.R. Wilks, authors,'' Technometrics, 32:103--104, February, 1990.
Mixture models, which include kernel estimators, are used widely to model complex densities; however, one is faced with the challenge of determining an appropriate number of components. This task often involves identifying those components that are close enough to be combined. This article introduces a new easily calculated measure of similarity be...
The generation of pseudo‐random numbers is critical for modern statistical computing. Given any of the well‐tested pseudo‐random generators for the uniform distribution, the probability integral transform may be employed to provide an exact algorithm for transformation to any desired probability distribution. However, if the cumulative distribution...
Data-driven research is often hampered by privacy restrictions in the form of limited data sets or graphical representations without the benefit of raw data. Nonparametric techniques that circumvent these issues by using local moment information, thereby extending the piecewise-polynomial histograms, are developed. These methods utilize not only bi...
The optimal construction of a histogram is a fundamental task in data analysis. Many rules of thumb are available to get started. Some use the normal density as a reference distribution. Scott's rule is of that class, using as the measure of discrepancy the mean integrated squared error. This article discusses the origin and formulation of this for...
For further resources related to this article, please visit the WIREs website.
Histogram is one of the most important graphical objects in statistical practice. In addition, the histogram provides a consistent estimate of any density function with very few assumptions. Construction of a density histogram with arbitrary mesh is described. Asymptotic theory of optimal histograms is used to provide practical rules for choosing a...
The averaged shifted histogram or ASH is a nonparametric probability density estimator derived from a collection of histograms. The ASH enjoys several advantages compared with a single histogram: better visual interpretation, better approximation, and nearly the same computational efficiency. The ASH provides a bridge between the histogram and adva...
Histograms are among the most important graphical objects in statistical practice, providing a consistent estimate of any continuous density function with very few assumptions. Restricting attention to bins of equal width, the histogram is sometimes presented as a frequency chart or normalized to be a true density. The construction of a histogram m...
For further resources related to this article, please visit the WIREs website.
Estimation theory and practice is generally focused on maximum likelihood methodology, which boasts of claims of efficiency and widespread availability in software. Likelihood methods occasionally encounter problems with small sample sizes, or if the data are contaminated with outliers. The first problem can be addressed by regularization methods s...
For further resources related to this article, please visit the WIREs website.
Univariate Frequency Polygons Multivariate Frequency Polygons Bin Edge Problems
Estimation of the Cumulative Distribution Function Direct Nonparametric Estimation of the Density Error Criteria for Density Estimates Nonparametric Families of Distributions
Nonparametric Kernel Regression Generat Linear Nonparametric Estimation Robustness Regression in Several Dimensions Summary
Sturges' Rule for Histogram Bin Width Selection The L2 Theory of Univariate Histograms Practical Data-Based Bin Width Roles L2 Theory for Multivariate Histograms Modes and Bumps in a Histogram Other Error Criteria: L1, L4, L6, Lg, and L, 90 Problems
Construction Asymptotic Properties The Limiting ASH as a Kernel Estimator
Frequency tables are often constructed on intervals of irregular width. When plotted as bar charts, the underlying true density information may be quite distorted. The majority of introductory statistics texts recommend tabulating data into intervals of equal width, but seldom caution the consequences of failing to do so. An occasional introductory...
The of statistical summaries to the public. This year the focus has been on completing joint projects with agency collaborators and on follow-up research funded through supplement or contract by agency sponsors. Here, we highlight three developments.
Modern data analysis requires a number of tools to undercover hidden structure. For initial exploration of data, animated scatter diagrams and nonparametric den-sity estimation in many forms and varieties are the techniques of choice. This article focuses on the application of histograms and nonparametric kernel methods to ex-plore data. The detail...
The specific objective of our dgQG research is to develop and assess quality graphics for federal statistical summaries. The goal has been to develop methods for generating quality graphics that facilitate exploration by agency users evaluating data quality and looking for emergent trends, decision making by public policy makers, and communication...
Clustering algorithms based upon nonparametric or semipara-metric density estimation are of more theoretical interest than some of the distance-based hierarchical or ad hoc algorithmic procedures. However den-sity estimation is subject to the curse of dimensionality so that care must be exercised. Clustering algorithms are sometimes described as bi...
The covariance matrix is a key component of many multivariate robust procedures, whether or not the data are assumed to be Gaussian. We examine the idea of robustly fitting a mixture of multivariate Gaussian densi-ties in the situation when the number of components estimated is intentionally too few. Using a minimum distance criterion, we show how...
This chapter examines the use of flexible methods to approximate an unknown density function, and techniques appropriate for visualization of densities in up to four dimensions. The statistical analysis of data is a multilayered endeavor. Data must be carefully examined and cleaned to avoid spurious findings. A preliminary examination of data by gr...
Statistical models fit to data often require extensive and challenging re-estimation before achieving final form. For example, outliers can adversely affect fits. In other cases involving spatial data, a cluster may exist for which the model is incorrect, also adversely affecting the fit to the "good" data. In both cases, estimate residuals must be...
This article demonstrates the visual power of continuous smoothing and slicing of multivari- ate maps, through an example relating the rates of screening tests to colon cancer rates. A new theoretical result provides legitimacy and understanding by demonstrating how the conditional maps are related to the usual single smoothed choropleth map of col...
In an accompanying paper, we describe an implementation of a smoothed conditional map- ping system, which we have implemented in arcview. Due to space limitations in that paper, we included only a small fraction of the possible cancer-related graphs based upon the SEER collection. In this demonstration, we will make available our arcview software a...
The likelihood function plays a central role in parametric and Bayesian estimation, as well as in nonparametric function estimation via local polynomial modeling. However, integrated square error has enjoyed a long tradition as the goodness-of-fit criterion of choice in nonparametric density estimation. In this article, I investigate the use of int...
Maps can be used to display the relationship between location and a response variable. In this basic form, there is one map to be displayed. Often times there are additional factors that could affect the response variable. In this case, it is of interest to simultaneously view the spatial relationships and the dependence of the response variable on...
The Rice Virtual Laboratory in Statistics is an integrated combination of simulations/demonstrations, case studies, statistical
analysis capabilities, and an electronic textbook. The simulations and demonstrations help make abstract concepts concrete
and allow students to investigate various aspects of statistical tests and distributions. Case stud...