# Jerome Sacks's research while affiliated with National Institute of Statistical Sciences and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (48)

A sidelining of statistical evidence in civil rights cases has been 30 years in the making. By Daniel Kiefer, Jerome Sacks and Donald Ylvisaker

Statistical methods based on a regression model plus a zero-mean Gaussian process (GP) have been widely used for predicting the output of a deterministic computer code. There are many suggestions in the literature for how to choose the regression component and how to model the correlation structure of the GP. This article argues that comprehensive,...

This is an exchange between Jerome Sacks and Donald Ylvisaker covering their
career paths along with some related history and philosophy of Statistics.

Methods are presented for calibrating when there are systematic departures from an exact linear model. The models proposed are much less structured; nonetheless, they admit uncertainty statements about calibration intervals that are usable. Application is made to pressure-volume calibration of a nuclear accountability tank.

We produce reasons and evidence supporting the informal rule that the number of runs for an effective initial computer experiment should be about 10 times the input dimension. Our arguments quantify two key characteristics of computer codes that affect the sample size required for a desired level of accuracy when approximating the code via a Gaussi...

To build a predictor, the output of a deterministic computer model or “code” is often treated as a realization of a stochastic process indexed by the code's input variables. The authors consider an asymptotic form of the Gaussian correlation function for the stochastic process where the correlation tends to unity. They show that the limiting best l...

Effective and feasible procedures for validating microscopic, stochastic traffic simulation models are in short supply. Exercising such microsimulators many times may lead to the occurrence of traffic gridlock (or simulation failures) on some or all replications. Whereas lack of failures does not ensure validity of the simulator for predicting perf...

Effective and feasible procedures for validating microscopic, stochastic traffic simulation models are in short supply. Exercising such microsimulators many times may lead to the occurrence of traffic gridlock (or simulation failures) on some or all replications. Whereas lack of failures does not ensure validity of the simulator for predicting perf...

Calibration and validation of traffic models are processes that depend on field data that are often limited but are essential for determination of inputs to the model and assessment of its reliability. Quantification and systematization of the calibration and validation process expose statistical issues inherent in the use of such data. Formalizati...

Calibration and validation of traffic models are processes that depend on field data that are often limited but are essential for determination of inputs to the model and assessment of its reliability. Quantification and systematization of the calibration and validation process expose statistical issues inherent in the use of such data. Formalizati...

Calibrating and validating a traffic simulation model for use on a transportation network depend on field data that are often limited but essential for determining inputs to the model and for assessing its reliability. Quantification and systemization of the calibration/validation process expose statistical issues inherent in the use of such data....

The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tr...

As decision- and policy-makers come to rely increasingly on estimates and simulations produced by computerized models of the world, in areas as diverse as climate prediction, transportation planning, economic policy and civil engineering, the need for objective evaluation of the accuracy and utility of such models likewise becomes more urgent. This...

. As decision- and policy-makers come to rely increasingly on estimates and simulations produced by computerized models of the world, in areas as diverse and climate prediction, transportation planning, economic policy and civil engineering, the need for objective evaluation of the accuracy and utility of such models likewise becomes more urgent. T...

This paper investigates the reliability of TRANSYT-7F optimal solutions from two perspectives: (1) the effect of the selected optimization criterion on the values of various system performance measures; and (2) the extent to which TRANSYT-7F signal plans remain optimal when evaluated in a microscopic and stochastic traffic environment. The well-est...

At the early stage of drug discovery, many thousands of chemical compounds can be synthesized and tested (assayed) for potency (activity) with high throughput screening (HTS). With ever-increasing numbers of compounds to be tested (now often in the neighborhood of 500,000) it remains a challenge to find strategies via sequential design that reduce...

The tutorial will provide an overview of the research conducted at the MIT Total Data Quality Management (TDQM) program over the last decade, emphasizing the management of information as a product. We will discuss the approach of the MIT TDQM program advocating the institutionalizing of TDQM programs for long-term benefits. Concepts such as multi-d...

Discovery and development of a new drug can cost hundreds of millions of dollars. Pharmaceutical companies have used group testing methodology routinely as one of the efficient high throughput screening techniques to search for "lead" compounds among collections of hundreds of thousands of chemical compounds. The lead compounds can be modified to p...

A stochastic signal optimization method based on a genetic algorithm (GA-SOM) that interfaces with the microscopic simulation program (CORSIM) is assessed. As an evaluation testbed we use a network in Chicago consisting of nine signalized intersections. Taking CORSIM as the best representation of reality, the performance of the GA-SOM plan sets a c...

As biological drug targets multiply through the human genome project and as the number of chemical compounds available for screening becomes very large, the expense of screening every compound against every target becomes prohibitive. We need to improve the efficiency of the drug screening process so that active compounds can be found for more biol...

The effects of certain chemical additives at maintaining a high level of activity in protein constructs during storage is investigated. We use a semiparametric regression technique to model the effects of the additives on protein activity. The model is extended to handle categorical explanatory variables. On the basis of the available data, the imp...

Sampling and prediction strategies relevant at the planning stage of the cleanup of environmental hazards are discussed. Sampling designs and models are compared using an extensive set of data on dioxin contamination at Piazza Road, Missouri. To meet the assumptions of the statistical model, such data are often transformed by taking logarithms. Pre...

In electrical engineering, circuit designs are now often optimized via circuit simulation computer models. Typically, many response variables characterize the circuit’s performance. Each response is a function of many input variables, including factors that can be set in the engineering design and noise factors representing manufacturing conditions...

Ozone in the planetary boundary layer of the troposphere is considered harmful to plants and human health. Surface ozone levels are determined by the strengths of sources and precursor emissions and by meteorological conditions. Therefore, assessing ozone trends is complicated by meteorological variability. Ozone data in the Chicago area over the p...

To investigate the possible relationship between airborne particulate matter and mortality, we developed regression models of daily mortality counts using meteorological covariates and measures of outdoor PM10. Our analyses included data from Cook County, Illinois, and Salt Lake County, Utah. We found no evidence that particulate matter < or = 10 m...

A dynamic-thermodynamic sea ice model is used to illustrate a sensitivity evaluation strategy in which a statistical model is fit to the output of the ice model. The statistical model response, evaluated in terms of certain metrics or integrated features of the ice model output, is a function of a selected set of d(=13) prescribed parameters of the...

Methods for the design and analysis of numerical experiments that are especially useful and efficient in multidimensional parameter spaces are presented. The analysis method, which is similar to kriging in the spatial analysis literature, fits a statistical model to the output of the numerical model. The method is applied to a fully nonlinear, glob...

It is more than a decade since Genichi Taguchi's ideas on quality improvement were inrroduced in the United States. His parameter-design approach for reducing variation in products and processes has generated a great deal of interest among both quality practitioners and statisticians. The statistical techniques used by Taguchi to implement paramete...

The authors describe a sequential strategy for designing
manufacturable integrated circuits using available CAD tools. Optimizing
the performance of complex designs in the presence of unwanted parameter
variations can take a prohibitively large number of computer runs. These
methods overcome this complexity by combining sequential experimentation
w...

Many scientific phenomena are now investigated by complex computer models or codes. Given the input values, the code produces one or more outputs via a complex mathematical model. Often the code is expensive to run, and it may be necessary to build a computationally cheaper predictor to enable, for example, optimization of the inputs. If there are...

A major bottleneck in the design and parametric yield optimization of CMOS integrated circuits lies in the high cost of the circuit simulations. One method that significantly reduces the simulation cost is to approximate the circuit performances by fitted quadratic models and then use these computationally inexpensive models to optimize the paramet...

Many products are now routinely designed with the aid of computer models. Given the inputs-designable engineering parameters and parameters representing manufacturing-process conditions-the model generates the product's quality characteristics. The quality improvement problem is to choose the designable engineering parameters such that the quality...

Taguchi's off-line quality control methods for product and process improvement emphasize experiments to design quality “into” products and processes. In Very Large Scale Integrated (VLSI) circuit design, the application of interest here, computer modeling is invariably quicker and cheaper than physical experimentation. Our approach models quality c...

A method for parametric yield optimization which significantly
reduces the simulation cost is proposed. The method assumes that the
circuit performances ultimately determining yield can be approximated by
computationally inexpensive functions of the inputs to the circuit
simulator. These inputs are the designable parameters, the
uncontrollable stat...

A computer experiment generates observations by running a computer model at inputs x and recording the output (response) Y. Prediction of the response Y to an untried input is treated by modeling the systematic departure of Y from a linear model as a realization of a stochastic process. For given data (selected inputs and the computed responses), b...

Many scientific phenomena are now investigated by complex computer models or codes. A computer experiment is a number of runs of the code with various inputs. A feature of many computer experiments is that the output is deterministic—rerunning the code with the same inputs gives identical observations. Often, the codes are computationally expensive...

## Citations

... The reliable use of traffic simulations, a key tool in transportation engineering analyses, is often limited by a lack of methods for assessing their validity (1). Although strategies now exist (2) to treat comparatively simple deterministic models, such as those found in the Highway Capacity Manual (3), an adequately implemented methodology is lacking for treatment of microsimulators in current practice. ...

... Frequently employed correlation models can be found in Table 2, where is the Gamma function, K v j the modified Bessel function of order v j , and the parameter v j > 0 controls the differentiability of the Matern correlation model. Chen et al. (2016) compared several of the mentioned correlation models, showing a worse performance of the squared exponential correlation model in comparison to the exponential correlation one. On the other hand, an important note is that the generalized exponential correlation model needs twice as many hyper-parameters (2d) when compared to squared exponential correlation one. ...

... Analytical approaches including Highway Capacity Manual (HCM) and the Australian Research Report (ARR 123) attracted many researchers to conduct studies on capacity analysis (McGhee and Arnold, 1997;Akçelik, 2003;Buckholz, 2009); in contrast, only few researchers used microsimulation approach to study capacity analysis. The emphasis on these studies was mainly on validation process of micro-simulation model (Bayarri, et al., 2004;Wan, et al., 2005;Gagnon, et al., 2008). These studies on micro-simulation, though limited in scope and less in number, have reported under representation of actual capacity by the model. ...

... Based on the works of Hellinga (1998) and Sacks et al. (2002), Park and Schneeberger (2003) proposed a nine-step procedure for the calibration and validation of a microscopic simulation model and presented a case study which is widely used among transport researchers. The nine-step procedure is shown below. ...

... w n , with N the exact design in which w i = n i /N. The theory was developed by Kiefer whose work on optimum design is collected in Brown et al. (1985). Use of the continuous design, to which the equivalence theory applies, removes dependence of the design on N. Any practical design for N trials will be exact with integer replication n i at each design point. ...

... Many popular designs such as the A−, D−, E− criteria (Kiefer et al., 1985) are derived using a linear model as the underlying model. A linear model is a regression model where a response variable is modeled as a linear function of features that are functions of the explanatory variables plus some residual error: ...

... This changes the computation of R and r T (x) in the conditional prediction (2.4), which no longer interpolates the training data. For data from physical experimentation or observation, augmenting a GaSP model in this way is natural to reflect random errors (e.g., Gao, Sacks and Welch, 1996;McMillan et al., 1999;Styer et al., 1995). ...

... Nevertheless, insights from the broad body of experimental design literature (Box and Draper, 1987; Box et al., 2005; Montgomery, 2008; Kleijnen, 2008; Anderson-Cook et al., 2009; Morris, 2009) can definitely be of use, especially in cases where low order polynomial-Response Surface Models are used as metamodels. However, one has to be aware of the fact that, due to the absence of random error in any deterministic model simulation run, important DOE-notions such as blocking, replication, and randomization are not relevant here (Lucas et al., 1996). The literature on 'design for computer experiments' illustrates that the most common strategy in designing computer experiments is to use space filling designs to ensure that there will be design points close to x for any x at which one wants to predict f (x); see the textbooks by Fang et al. (2006, chapter 5 and 6), Santner et al. (2003, chapters 2, 3, 4), and the brief review paper by Levy and Steinberg (2010). ...

... Three cross-examination studies were conducted to evaluate the quality of optimized signal timings from various tools. First, Park et al. (20) compared optimal signal timings developed by TRANSYT-7F and the GA in CORSIM. Second, Ratrout and Olba (14) evaluated signal timings developed by Synchro and TRANSYT-7F under local traffic conditions in Saudi Arabia. ...

... (3) and (4) are obtained through the maximum likelihood estimation (MLE) [45]. For more details about the Kriging theory can be founded in Ref. [46]. In this study, the Kriging model is constructed based on the well-known DACE Kriging toolbox, which is implemented using MATLAB [47]. ...