About
153
Publications
61,216
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,807
Citations
Introduction
Current institution
Publications
Publications (153)
Degradation data are essential for determining the reliability of high-end products and systems, especially when covering multiple degradation characteristics (DCs). Modern degradation studies not only measure these characteristics but also record dynamic system usage and environmental factors, such as temperature, humidity, and ultraviolet exposur...
In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Thr...
The coding capabilities of large language models (LLMs) have opened up new opportunities for automatic statistical analysis in machine learning and data science. However, before their widespread adoption, it is crucial to assess the accuracy of code generated by LLMs. A major challenge in this evaluation lies in the absence of a benchmark dataset f...
The programming capabilities of large language models (LLMs) have revolutionized automatic code generation and opened new avenues for automatic statistical analysis. However, the validity and quality of these generated codes need to be systematically evaluated before they can be widely adopted. Despite their growing prominence, a comprehensive eval...
Artificial intelligence (AI) technology and systems have been advancing rapidly. However, ensuring the reliability of these systems is crucial for fostering public confidence in their use. This necessitates the modeling and analysis of reliability data specific to AI systems. A major challenge in AI reliability research, particularly for those in a...
Fatigue data arise in many research and applied areas, and there have been statistical methods developed to model and analyze such data. The distributions of fatigue life and fatigue strength are often of interest to engineers designing products that might fail due to fatigue from cyclic‐stress loading. Based on a specified statistical model and th...
The advent of artificial intelligence (AI) technologies has significantly changed many domains, including applied statistics. This review and vision paper explores the evolving role of applied statistics in the AI era, drawing from our experiences in engineering statistics. We begin by outlining the fundamental concepts and historical developments...
Photovoltaics (PV) are widely used to harvest solar energy, an important form of renewable energy. Photovoltaic arrays consist of multiple solar panels constructed from solar cells. Solar cells in the field are vulnerable to various defects, and electroluminescence (EL) imaging provides effective and non-destructive diagnostics to detect those defe...
The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled an...
Engineers and scientists have been collecting and analyzing fatigue data since the 1800s to ensure the reliability of life-critical structures. Applications include (but are not limited to) bridges,
building structures, aircraft and spacecraft components, ships, ground-based vehicles, and medical devices. Engineers need to estimate S-N relationship...
Multisensor data that track system operating behaviors are widely available nowadays from various engineering systems. Measurements from each sensor over time form a curve and can be viewed as functional data. Clustering of these multivariate functional curves is important for studying the operating patterns of systems. One complication in such app...
Although high-performance computing (HPC) systems have been scaled to meet the exponentially growing demand for scientific computing, HPC performance variability remains a major challenge in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC p...
As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequ...
Graphics processing units (GPUs) are widely used in many high-performance computing (HPC) applications such as imaging/video processing and training deep-learning models in artificial intelligence. GPUs installed in HPC systems are often heavily used, and GPU failures occur during HPC system operations. Thus, the reliability of GPUs is of interest...
Renewable energy is critical for combating climate change, whose first step is the storage of electricity generated from renewable energy sources. Li-ion batteries are a popular kind of storage units. Their continuous usage through charge-discharge cycles eventually leads to degradation. This can be visualized in plotting voltage discharge curves (...
The Poisson multinomial distribution (PMD) describes the distribution of the sum of n independent but non-identically distributed random vectors, in which each random vector is of length m with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the m elements across th...
The modeling and analysis of degradation data have been an active research area in reliability engineering for reliability assessment and system health management. As the sensor technology advances, multivariate sensory data are commonly collected for the underlying degradation process. However, most existing research on degradation modeling requir...
Artificial intelligence (AI) systems are increasingly popular in many applications. Nevertheless, AI technologies are still developing, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that AI systems can be used with confidence by the general public. In this paper, we provide statistical...
Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability remains a major challenge and has become a critical research topic in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performanc...
Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally,...
Performance variability management is an active research area in high-performance computing (HPC). In this article, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to...
Performance variability management is an active research area in high-performance computing (HPC). We focus on input/output (I/O) variability. To study the performance variability, computer scientists often use grid-based designs (GBDs) to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathemat...
The Poisson multinomial distribution (PMD) describes the distribution of the sum of $n$ independent but non-identically distributed random vectors, in which each random vector is of length $m$ with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the $m$ elements acr...
Artificial intelligence (AI) systems have become increasingly popular in many areas. Nevertheless, AI technologies are still in their developing stages, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that the AI systems can be used with confidence by the general public. In this paper, we...
The modeling and analysis of degradation data have been an active research area in reliability and system health management. As the senor technology advances, multivariate sensory data are commonly collected for the underlying degradation process. However, most existing research on degradation modeling requires a univariate degradation index to be...
Increases in the quantity of available data have allowed all fields of science to generate more accurate models of multivariate phenomena. Regression and interpolation become challenging when the dimension of data is large, especially while maintaining tractable computational complexity. Regression is a popular approach to solving approximation pro...
Artificial intelligence (AI) algorithms, such as deep learning and XGboost, are used in numerous applications including autonomous driving, manufacturing process optimization and medical diagnostics. The robustness of AI algorithms is of great interest as inaccurate prediction could result in safety concerns and limit the adoption of AI systems. In...
Multi-type recurrent events are often encountered in medical applications when two or more different event types could repeatedly occur over an observation period. For example, patients may experience recurrences of multi-type nonmelanoma skin cancers in a clinical trial for skin cancer prevention. The aims in those applications are to characterize...
Multi-type recurrent events are often encountered in medical applications when two or more different event types could repeatedly occur over an observation period. For example, patients may experience recurrences of multi-type nonmelanoma skin cancers in a clinical trial for skin cancer prevention. The aims in those applications are to characterize...
Geyser eruption is one of the most popular signature attractions at the Yellowstone National Park. The interdependence of geyser eruptions and impacts of covariates are of interest to researchers in geyser studies. In this paper, we propose a parametric covariate-adjusted recurrent event model for estimating the eruption gap time. We describe a gen...
Geyser eruption is one of the most popular signature attractions at the Yellowstone National Park. The interdependence of geyser eruptions and impacts of covariates are of interest to researchers in geyser studies. In this paper, we propose a parametric covariate-adjusted recurrent event model for estimating the eruption gap time. We describe a gen...
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC v...
Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing, and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally...
Computer experiments with both qualitative and quantitative factors are widely used in many applications. Motivated by the emerging need of optimal configuration in the high-performance computing (HPC) system, this work proposes a sequential design, denoted as adaptive composite exploitation and exploration (CEE), for optimization of computer exper...
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC v...
Accelerated life tests (ALTs) are widely used in industry and many optimal ALT test plans are developed in literature. To achieve an efficient optimal test plan, one needs to make sure that the planning values used in the test planning are not far away from the true parameters, which is challenging because lifetime data are not collected yet in the...
DELAUNAYSPARSE contains both serial and parallel codes written in Fortran 2003 (with OpenMP) for performing medium- to high-dimensional interpolation via the Delaunay triangulation. To accommodate the exponential growth in the size of the Delaunay triangulation in high dimensions, DELAUNAYSPARSE computes only a sparse subset of the complete Delauna...
This paper focuses on investigating the Gamma degradation model with random effects. A generalized p-value procedure is proposed to test whether there exist some heterogeneities among the degradation processes of different units. Using the Cornish-Fisher expansion, an approximate confidence interval (CI) is obtained for the shape parameter. The gen...
Many complex engineering devices experience multiple dependent degradation processes. For each degradation process, there may exist substantial unit-to-unit heterogeneity. In this paper, we describe the dependence structure among multiple dependent degradation processes using copulas and model unit-level heterogeneity as random effects. A two-stage...
Artificial intelligent (AI) algorithms, such as deep learning and XGboost, are used in numerous applications including computer vision, autonomous driving, and medical diagnostics. The robustness of these AI algorithms is of great interest as inaccurate prediction could result in safety concerns and limit the adoption of AI systems. In this paper,...
Degradation data have been broadly used for assessing product and system reliability. Most existing work focuses on modeling and analysis of degradation data with a single characteristic. In some degradation tests, interest lies in measuring multiple characteristics of the product degradation process to understand different aspects of the reliabili...
The accelerated degradation test (ADT) is an efficient tool for assessing the lifetime information of highly reliable products. However, conducting an ADT is very expensive. Therefore, how to conduct a cost‐constrained ADT plan is a great challenging issue for reliability analysts. By taking the experimental cost into consideration, this paper prop...
Computer models with both quantitative and qualitative inputs frequently arise in science, engineering and business. Mixed-input Gaussian process models have been used for emulating such models. The key in building this emulator is to accurately estimate the covariance between different categorical levels of the qualitative inputs. This problem is...
The accelerated degradation test (ADT) is an efficient tool for assessing the lifetime information of highly reliable products. However, conducting an ADT is very expensive. Therefore, how to conduct a cost-constrained ADT plan is a great challenging issue for reliability analysts. By taking the experimental cost into consideration, this paper prop...
This paper investigates degradation modeling under dynamic conditions and its applications. Both univariate and multiple competing degradation processes are considered with individual degradation paths being described by Wiener processes. Parametric and non-parametric approaches are used to capture the effect of environmental conditions on process...
On the basis of the principle of degradation mechanism invariance, a Wiener degradation process with random drift parameter is used to model the data collected from the constant stress accelerated degradation test. Small‐sample statistical inference method for this model is proposed. On the basis of Fisher's method, a test statistic is proposed to...
Background:
Recent limited evidence suggests that the use of a processed electroencephalographic (EEG) monitor to guide anesthetic management may influence postoperative cognitive outcomes; however, the mechanism is unclear.
Methods:
This exploratory, single-center, randomized clinical trial included patients who were ≥65 years of age undergoing...
For several decades, the resampling based bootstrap has been widely used for computing confidence intervals (CIs) for applications where no exact method is available. However, there are many applications where the resampling bootstrap method can not be used. These include situations where the data are heavily censored due to the success response be...
Performance variability is an important factor of high-performance computing (HPC) systems. HPC performance variability is often complex because its sources interact and are distributed throughout the system stack. For example, the performance variability of I/O throughput can be affected by factors such as CPU frequency, the number of I/O threads,...
In recent years, accelerated destructive degradation testing (ADDT) has been applied to obtain the reliability information of an asset (component) at use conditions when the component is highly reliable. In ADDT, degradation data are measured under stress levels more severe than usual so that more component failures can be observed in a short perio...
Purpose
This paper aims to propose a method to diagnose fused deposition modeling (FDM) printing faults caused by the variation of temperature field and establish a fault knowledge base, which helps to study the generation mechanism of FDM printing faults.
Design/methodology/approach
Based on the Spearman rank correlation analysis, four relative t...
In degradation tests, the test units are usually divided into several groups, with each group tested simultaneously in a test rig. Each rig constitutes a rig-layer block from the perspective of design of experiments. Within each rig, the test units measured at the same time further form a gauge-layer block. Due to the uncontrollable factors among t...
The Wiener process is often used to fit degradation data in reliability modeling. Though there is an abundant literature covering the inference procedures for the Wiener model, their performance may not be satisfactory when the sample size is small, which is often the case in degradation data analysis. In this paper, we focus on the accurate reliab...
Consider a coherent system, in which the degradation processes of its performance characteristics are positively correlated, this paper systematically investigates a bivariate degradation model of such a system. To analyze the accelerated degradation data, a flexible class of bivariate stochastic processes are proposed to incorporate the effects of...
Traditional reliability analysis has been using time to event data, degradation data, and recurrent event data, while the associated covariates tend to be simple and constant over time. Over the past years, we have witnessed the rapid development of sensor and wireless technology, which enables us to track how the product has been used and under wh...
Traditional accelerated life test plans are typically based on optimizing the C-optimality for minimizing the variance of an interested quantile of the lifetime distribution. These methods often rely on some specified planning values for the model parameters, which are usually unknown prior to the actual tests. The ambiguity of the specified parame...
Lyme disease is the most significant vector-borne disease in the United States, and its southward advance over several decades has been quantified. Previous research has examined the potential role of climate change on the disease’s expansion, but no studies have considered the role of future land cover upon its distribution. This research examines...
In order to monitor the quality of parts in printing, the methodology to monitor the geometric quality of the printed parts in fused deposition modeling process is researched. A non-contact measurement method based on machine vision technology is adopted to obtain the precise complete geometric information. An image acquisition system is establishe...
Exponential increases in complexity and scale make variability a growing threat to sustaining HPC performance at exascale. Performance variability in HPC I/O is common, acute, and formidable. We take the first step towards comprehensively studying linear and nonlinear approaches to modeling HPC I/O system variability in an effort to demonstrate tha...
Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, Mid-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-li...
Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we...
Traditional accelerated life test plans are typically based on optimizing the C-optimality for minimizing the variance of an interested quantile of the lifetime distribution. The traditional methods rely on some specified planning values for the model parameters, which are usually unknown prior to the actual tests. The ambiguity of the specified pa...
Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, Mid-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-li...
The bootstrap, based on resampling, has, for several decades, been a widely used method for computing confidence intervals for applications where no exact method is available and when sample sizes are not large enough to be able to rely on easy-to-compute large-sample approximate methods, such a Wald (normal-approximation) confidence intervals. Sim...
Polymeric materials are widely used in many applications and are especially useful when combined with other polymers to make polymer composites. The appealing features of these materials come from their comparable levels of strength and endurance to what one would find in metal alloys while being more lightweight and economical. However, these mate...
Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we...
A rapid increase in the quantity of data available is allowing all fields of science to generate more accurate models of multivariate phenomena. Regression and interpolation become challenging when the dimension of data is large, especially while maintaining tractable computational complexity. This paper proposes three novel techniques for multivar...
The Delaunay triangulation is a fundamental construct from computational geometry, which finds wide use as a model for multivariate piecewise linear interpolation in fields such as geographic information systems, civil engineering, physics, and computer graphics. Though efficient solutions exist for computation of two- and three-dimensional Delauna...
Big data features not only large volumes of data but also data with complicated structures. Complexity imposes unique challenges in big data analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an extensive discussion of the opportunities and challenges in big data and reliability, and described engineering systems that can...
Big data features not only large volumes of data but also data with complicated structures. Complexity imposes unique challenges in big data analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an extensive discussion of the opportunities and challenges in big data and reliability, and described engineering systems that can...
The Poisson-binomial distribution is useful in many applied problems in engineering, actuarial science and data mining. The Poisson-binomial distribution models the distribution of the sum of independent but non-identically distributed random indicators whose success probabilities vary. In this paper, we extend the Poisson-binomial distribution to...
Polymeric materials are widely used in many applications and are especially useful
when combined with other polymers to make polymer composites. The appealing
features of these materials come from their having comparable levels of strength and
endurance to what one would find in metal alloys while being more lightweight and
economical. However, the...
Most of the recently developed methods on optimum planning for accelerated life tests (ALT) involve “guessing” values of parameters to be estimated, and substituting such guesses in the proposed solution to obtain the final testing plan. In reality, such guesses may be very different from true values of the parameters, leading to inefficient test p...
Objective
To determine whether self-reports of disaster-related psychological distress predict older adults’ health care utilization during the year after Hurricane Sandy, which hit New Jersey on October 29, 2012.
Methods
Respondents were from the ORANJ BOWL Study, a random-digit dialed sample from New Jersey recruited from 2006 to 2008. Medicare...
Accelerated destructive degradation testing (ADDT) is a widely used technique for long-term material property evaluation. One area of application is in determining the thermal index (TI) of polymeric materials including thermoplastic, thermosetting, and elastomeric materials. There are two approaches to estimating a TI based on data collected from...
When interpolating computing system performance data, there are many input parameters that must be considered. Therefore, the chosen multivariate interpolation model must be capable of scaling to many dimensions. The Delaunay triangulation is a foundational technique, commonly used to perform piecewise linear interpolation in computer graphics, phy...
Each of high performance computing, cloud computing, and computer security have their own interests in modeling and predicting the performance of computers with respect to how they are configured. An effective model might infer internal mechanics, minimize power consumption, or maximize computational throughput of a given system. This paper analyze...
Photodegradation, driven primarily by ultraviolet (UV) radiation, is the primary cause of failure for organic paints and coatings, as well as many other products made from polymeric materials exposed to sunlight. Traditional methods of service life prediction involve the use of outdoor exposure in harsh UV environments (e.g., Florida and Arizona)....
Accelerated destructive degradation tests (ADDTs) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparisons. The R package ADDT provides the functionalities of performing the traditional method...
Accelerated destructive degradation test (ADDT) is a technique that is commonly used by industries to access material’s long-term properties. In many applications, the accelerating variable is temperature. In such cases, a thermal index (TI) is used to indicate the strength of the material. For example, a TI of 200 ∘C may be interpreted as the mate...
There are annually over two million carloads of hazardous materials transported by rail in the United States. The American railroads use large blocks of tank cars to transport petroleum crude oil and other flammable liquids from production to consumption sites. Being different from roadway transport of hazardous materials, a train accident can pote...
Medicaid 1915(c) waivers allow nursing home eligible older adults to avoid institutionalization by providing personal care services (PCS) in their homes. Prior studies have not determined how 1915(c) waiver recipients fare after a hurricane. We identified 26,015 New York (NY) Medicaid 1915(c) beneficiaries age 65 and older who received waiver servi...
Reliability analysis of multicomponent repairable systems with dependent component failures is challenging for two reasons. First, the failure mechanism of one component may depend on other components when considering component failure dependence. Second, imperfect repair actions can have accumulated effects on the repaired components and these acc...