Yili Hong

Yili Hong
  • PhD in Statistics
  • Professor at Virginia Tech

About

153
Publications
61,216
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,807
Citations
Current institution
Virginia Tech
Current position
  • Professor

Publications

Publications (153)
Preprint
Degradation data are essential for determining the reliability of high-end products and systems, especially when covering multiple degradation characteristics (DCs). Modern degradation studies not only measure these characteristics but also record dynamic system usage and environmental factors, such as temperature, humidity, and ultraviolet exposur...
Preprint
Full-text available
In a Cox model, the partial likelihood, as the product of a series of conditional probabilities, is used to estimate the regression coefficients. In practice, those conditional probabilities are approximated by risk score ratios based on a continuous time model, and thus result in parameter estimates from only an approximate partial likelihood. Thr...
Preprint
Full-text available
The coding capabilities of large language models (LLMs) have opened up new opportunities for automatic statistical analysis in machine learning and data science. However, before their widespread adoption, it is crucial to assess the accuracy of code generated by LLMs. A major challenge in this evaluation lies in the absence of a benchmark dataset f...
Preprint
Full-text available
The programming capabilities of large language models (LLMs) have revolutionized automatic code generation and opened new avenues for automatic statistical analysis. However, the validity and quality of these generated codes need to be systematically evaluated before they can be widely adopted. Despite their growing prominence, a comprehensive eval...
Preprint
Full-text available
Artificial intelligence (AI) technology and systems have been advancing rapidly. However, ensuring the reliability of these systems is crucial for fostering public confidence in their use. This necessitates the modeling and analysis of reliability data specific to AI systems. A major challenge in AI reliability research, particularly for those in a...
Article
Full-text available
Fatigue data arise in many research and applied areas, and there have been statistical methods developed to model and analyze such data. The distributions of fatigue life and fatigue strength are often of interest to engineers designing products that might fail due to fatigue from cyclic‐stress loading. Based on a specified statistical model and th...
Preprint
Full-text available
The advent of artificial intelligence (AI) technologies has significantly changed many domains, including applied statistics. This review and vision paper explores the evolving role of applied statistics in the AI era, drawing from our experiences in engineering statistics. We begin by outlining the fundamental concepts and historical developments...
Preprint
Full-text available
Photovoltaics (PV) are widely used to harvest solar energy, an important form of renewable energy. Photovoltaic arrays consist of multiple solar panels constructed from solar cells. Solar cells in the field are vulnerable to various defects, and electroluminescence (EL) imaging provides effective and non-destructive diagnostics to detect those defe...
Article
Full-text available
The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled an...
Preprint
Full-text available
Engineers and scientists have been collecting and analyzing fatigue data since the 1800s to ensure the reliability of life-critical structures. Applications include (but are not limited to) bridges, building structures, aircraft and spacecraft components, ships, ground-based vehicles, and medical devices. Engineers need to estimate S-N relationship...
Article
Full-text available
Multisensor data that track system operating behaviors are widely available nowadays from various engineering systems. Measurements from each sensor over time form a curve and can be viewed as functional data. Clustering of these multivariate functional curves is important for studying the operating patterns of systems. One complication in such app...
Article
Full-text available
Although high-performance computing (HPC) systems have been scaled to meet the exponentially growing demand for scientific computing, HPC performance variability remains a major challenge in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC p...
Article
Full-text available
As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequ...
Preprint
Full-text available
Graphics processing units (GPUs) are widely used in many high-performance computing (HPC) applications such as imaging/video processing and training deep-learning models in artificial intelligence. GPUs installed in HPC systems are often heavily used, and GPU failures occur during HPC system operations. Thus, the reliability of GPUs is of interest...
Preprint
Full-text available
Renewable energy is critical for combating climate change, whose first step is the storage of electricity generated from renewable energy sources. Li-ion batteries are a popular kind of storage units. Their continuous usage through charge-discharge cycles eventually leads to degradation. This can be visualized in plotting voltage discharge curves (...
Article
Full-text available
The Poisson multinomial distribution (PMD) describes the distribution of the sum of n independent but non-identically distributed random vectors, in which each random vector is of length m with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the m elements across th...
Article
Full-text available
The modeling and analysis of degradation data have been an active research area in reliability engineering for reliability assessment and system health management. As the sensor technology advances, multivariate sensory data are commonly collected for the underlying degradation process. However, most existing research on degradation modeling requir...
Article
Full-text available
Artificial intelligence (AI) systems are increasingly popular in many applications. Nevertheless, AI technologies are still developing, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that AI systems can be used with confidence by the general public. In this paper, we provide statistical...
Preprint
Full-text available
Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability remains a major challenge and has become a critical research topic in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performanc...
Article
Full-text available
Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally,...
Article
Full-text available
Performance variability management is an active research area in high-performance computing (HPC). In this article, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to...
Preprint
Performance variability management is an active research area in high-performance computing (HPC). We focus on input/output (I/O) variability. To study the performance variability, computer scientists often use grid-based designs (GBDs) to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathemat...
Preprint
Full-text available
The Poisson multinomial distribution (PMD) describes the distribution of the sum of $n$ independent but non-identically distributed random vectors, in which each random vector is of length $m$ with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the $m$ elements acr...
Preprint
Full-text available
Artificial intelligence (AI) systems have become increasingly popular in many areas. Nevertheless, AI technologies are still in their developing stages, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that the AI systems can be used with confidence by the general public. In this paper, we...
Preprint
Full-text available
The modeling and analysis of degradation data have been an active research area in reliability and system health management. As the senor technology advances, multivariate sensory data are commonly collected for the underlying degradation process. However, most existing research on degradation modeling requires a univariate degradation index to be...
Article
Full-text available
Increases in the quantity of available data have allowed all fields of science to generate more accurate models of multivariate phenomena. Regression and interpolation become challenging when the dimension of data is large, especially while maintaining tractable computational complexity. Regression is a popular approach to solving approximation pro...
Article
Full-text available
Artificial intelligence (AI) algorithms, such as deep learning and XGboost, are used in numerous applications including autonomous driving, manufacturing process optimization and medical diagnostics. The robustness of AI algorithms is of great interest as inaccurate prediction could result in safety concerns and limit the adoption of AI systems. In...
Article
Full-text available
Multi-type recurrent events are often encountered in medical applications when two or more different event types could repeatedly occur over an observation period. For example, patients may experience recurrences of multi-type nonmelanoma skin cancers in a clinical trial for skin cancer prevention. The aims in those applications are to characterize...
Preprint
Full-text available
Multi-type recurrent events are often encountered in medical applications when two or more different event types could repeatedly occur over an observation period. For example, patients may experience recurrences of multi-type nonmelanoma skin cancers in a clinical trial for skin cancer prevention. The aims in those applications are to characterize...
Article
Full-text available
Geyser eruption is one of the most popular signature attractions at the Yellowstone National Park. The interdependence of geyser eruptions and impacts of covariates are of interest to researchers in geyser studies. In this paper, we propose a parametric covariate-adjusted recurrent event model for estimating the eruption gap time. We describe a gen...
Preprint
Full-text available
Geyser eruption is one of the most popular signature attractions at the Yellowstone National Park. The interdependence of geyser eruptions and impacts of covariates are of interest to researchers in geyser studies. In this paper, we propose a parametric covariate-adjusted recurrent event model for estimating the eruption gap time. We describe a gen...
Article
Full-text available
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC v...
Preprint
Full-text available
Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing, and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally...
Preprint
Full-text available
Computer experiments with both qualitative and quantitative factors are widely used in many applications. Motivated by the emerging need of optimal configuration in the high-performance computing (HPC) system, this work proposes a sequential design, denoted as adaptive composite exploitation and exploration (CEE), for optimization of computer exper...
Preprint
Full-text available
Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC v...
Article
Full-text available
Accelerated life tests (ALTs) are widely used in industry and many optimal ALT test plans are developed in literature. To achieve an efficient optimal test plan, one needs to make sure that the planning values used in the test planning are not far away from the true parameters, which is challenging because lifetime data are not collected yet in the...
Article
DELAUNAYSPARSE contains both serial and parallel codes written in Fortran 2003 (with OpenMP) for performing medium- to high-dimensional interpolation via the Delaunay triangulation. To accommodate the exponential growth in the size of the Delaunay triangulation in high dimensions, DELAUNAYSPARSE computes only a sparse subset of the complete Delauna...
Article
This paper focuses on investigating the Gamma degradation model with random effects. A generalized p-value procedure is proposed to test whether there exist some heterogeneities among the degradation processes of different units. Using the Cornish-Fisher expansion, an approximate confidence interval (CI) is obtained for the shape parameter. The gen...
Article
Many complex engineering devices experience multiple dependent degradation processes. For each degradation process, there may exist substantial unit-to-unit heterogeneity. In this paper, we describe the dependence structure among multiple dependent degradation processes using copulas and model unit-level heterogeneity as random effects. A two-stage...
Preprint
Full-text available
Artificial intelligent (AI) algorithms, such as deep learning and XGboost, are used in numerous applications including computer vision, autonomous driving, and medical diagnostics. The robustness of these AI algorithms is of great interest as inaccurate prediction could result in safety concerns and limit the adoption of AI systems. In this paper,...
Article
Full-text available
Degradation data have been broadly used for assessing product and system reliability. Most existing work focuses on modeling and analysis of degradation data with a single characteristic. In some degradation tests, interest lies in measuring multiple characteristics of the product degradation process to understand different aspects of the reliabili...
Article
Full-text available
The accelerated degradation test (ADT) is an efficient tool for assessing the lifetime information of highly reliable products. However, conducting an ADT is very expensive. Therefore, how to conduct a cost‐constrained ADT plan is a great challenging issue for reliability analysts. By taking the experimental cost into consideration, this paper prop...
Article
Full-text available
Computer models with both quantitative and qualitative inputs frequently arise in science, engineering and business. Mixed-input Gaussian process models have been used for emulating such models. The key in building this emulator is to accurately estimate the covariance between different categorical levels of the qualitative inputs. This problem is...
Preprint
The accelerated degradation test (ADT) is an efficient tool for assessing the lifetime information of highly reliable products. However, conducting an ADT is very expensive. Therefore, how to conduct a cost-constrained ADT plan is a great challenging issue for reliability analysts. By taking the experimental cost into consideration, this paper prop...
Article
Full-text available
This paper investigates degradation modeling under dynamic conditions and its applications. Both univariate and multiple competing degradation processes are considered with individual degradation paths being described by Wiener processes. Parametric and non-parametric approaches are used to capture the effect of environmental conditions on process...
Article
On the basis of the principle of degradation mechanism invariance, a Wiener degradation process with random drift parameter is used to model the data collected from the constant stress accelerated degradation test. Small‐sample statistical inference method for this model is proposed. On the basis of Fisher's method, a test statistic is proposed to...
Article
Background: Recent limited evidence suggests that the use of a processed electroencephalographic (EEG) monitor to guide anesthetic management may influence postoperative cognitive outcomes; however, the mechanism is unclear. Methods: This exploratory, single-center, randomized clinical trial included patients who were ≥65 years of age undergoing...
Article
Full-text available
For several decades, the resampling based bootstrap has been widely used for computing confidence intervals (CIs) for applications where no exact method is available. However, there are many applications where the resampling bootstrap method can not be used. These include situations where the data are heavily censored due to the success response be...
Article
Performance variability is an important factor of high-performance computing (HPC) systems. HPC performance variability is often complex because its sources interact and are distributed throughout the system stack. For example, the performance variability of I/O throughput can be affected by factors such as CPU frequency, the number of I/O threads,...
Article
Full-text available
In recent years, accelerated destructive degradation testing (ADDT) has been applied to obtain the reliability information of an asset (component) at use conditions when the component is highly reliable. In ADDT, degradation data are measured under stress levels more severe than usual so that more component failures can be observed in a short perio...
Article
Purpose This paper aims to propose a method to diagnose fused deposition modeling (FDM) printing faults caused by the variation of temperature field and establish a fault knowledge base, which helps to study the generation mechanism of FDM printing faults. Design/methodology/approach Based on the Spearman rank correlation analysis, four relative t...
Article
Full-text available
In degradation tests, the test units are usually divided into several groups, with each group tested simultaneously in a test rig. Each rig constitutes a rig-layer block from the perspective of design of experiments. Within each rig, the test units measured at the same time further form a gauge-layer block. Due to the uncontrollable factors among t...
Article
Full-text available
The Wiener process is often used to fit degradation data in reliability modeling. Though there is an abundant literature covering the inference procedures for the Wiener model, their performance may not be satisfactory when the sample size is small, which is often the case in degradation data analysis. In this paper, we focus on the accurate reliab...
Article
Consider a coherent system, in which the degradation processes of its performance characteristics are positively correlated, this paper systematically investigates a bivariate degradation model of such a system. To analyze the accelerated degradation data, a flexible class of bivariate stochastic processes are proposed to incorporate the effects of...
Preprint
Full-text available
Traditional reliability analysis has been using time to event data, degradation data, and recurrent event data, while the associated covariates tend to be simple and constant over time. Over the past years, we have witnessed the rapid development of sensor and wireless technology, which enables us to track how the product has been used and under wh...
Chapter
Full-text available
Traditional accelerated life test plans are typically based on optimizing the C-optimality for minimizing the variance of an interested quantile of the lifetime distribution. These methods often rely on some specified planning values for the model parameters, which are usually unknown prior to the actual tests. The ambiguity of the specified parame...
Article
Full-text available
Lyme disease is the most significant vector-borne disease in the United States, and its southward advance over several decades has been quantified. Previous research has examined the potential role of climate change on the disease’s expansion, but no studies have considered the role of future land cover upon its distribution. This research examines...
Article
Full-text available
In order to monitor the quality of parts in printing, the methodology to monitor the geometric quality of the printed parts in fused deposition modeling process is researched. A non-contact measurement method based on machine vision technology is adopted to obtain the precise complete geometric information. An image acquisition system is establishe...
Article
Exponential increases in complexity and scale make variability a growing threat to sustaining HPC performance at exascale. Performance variability in HPC I/O is common, acute, and formidable. We take the first step towards comprehensively studying linear and nonlinear approaches to modeling HPC I/O system variability in an effort to demonstrate tha...
Article
Full-text available
Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, Mid-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-li...
Article
Full-text available
Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we...
Preprint
Full-text available
Traditional accelerated life test plans are typically based on optimizing the C-optimality for minimizing the variance of an interested quantile of the lifetime distribution. The traditional methods rely on some specified planning values for the model parameters, which are usually unknown prior to the actual tests. The ambiguity of the specified pa...
Preprint
Full-text available
Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, Mid-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-li...
Preprint
Full-text available
The bootstrap, based on resampling, has, for several decades, been a widely used method for computing confidence intervals for applications where no exact method is available and when sample sizes are not large enough to be able to rely on easy-to-compute large-sample approximate methods, such a Wald (normal-approximation) confidence intervals. Sim...
Chapter
Full-text available
Polymeric materials are widely used in many applications and are especially useful when combined with other polymers to make polymer composites. The appealing features of these materials come from their comparable levels of strength and endurance to what one would find in metal alloys while being more lightweight and economical. However, these mate...
Preprint
Full-text available
Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we...
Conference Paper
A rapid increase in the quantity of data available is allowing all fields of science to generate more accurate models of multivariate phenomena. Regression and interpolation become challenging when the dimension of data is large, especially while maintaining tractable computational complexity. This paper proposes three novel techniques for multivar...
Conference Paper
The Delaunay triangulation is a fundamental construct from computational geometry, which finds wide use as a model for multivariate piecewise linear interpolation in fields such as geographic information systems, civil engineering, physics, and computer graphics. Though efficient solutions exist for computation of two- and three-dimensional Delauna...
Article
Full-text available
Big data features not only large volumes of data but also data with complicated structures. Complexity imposes unique challenges in big data analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an extensive discussion of the opportunities and challenges in big data and reliability, and described engineering systems that can...
Preprint
Full-text available
Big data features not only large volumes of data but also data with complicated structures. Complexity imposes unique challenges in big data analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an extensive discussion of the opportunities and challenges in big data and reliability, and described engineering systems that can...
Article
Full-text available
The Poisson-binomial distribution is useful in many applied problems in engineering, actuarial science and data mining. The Poisson-binomial distribution models the distribution of the sum of independent but non-identically distributed random indicators whose success probabilities vary. In this paper, we extend the Poisson-binomial distribution to...
Article
Full-text available
Polymeric materials are widely used in many applications and are especially useful when combined with other polymers to make polymer composites. The appealing features of these materials come from their having comparable levels of strength and endurance to what one would find in metal alloys while being more lightweight and economical. However, the...
Article
Full-text available
Most of the recently developed methods on optimum planning for accelerated life tests (ALT) involve “guessing” values of parameters to be estimated, and substituting such guesses in the proposed solution to obtain the final testing plan. In reality, such guesses may be very different from true values of the parameters, leading to inefficient test p...
Article
Objective To determine whether self-reports of disaster-related psychological distress predict older adults’ health care utilization during the year after Hurricane Sandy, which hit New Jersey on October 29, 2012. Methods Respondents were from the ORANJ BOWL Study, a random-digit dialed sample from New Jersey recruited from 2006 to 2008. Medicare...
Article
Full-text available
Accelerated destructive degradation testing (ADDT) is a widely used technique for long-term material property evaluation. One area of application is in determining the thermal index (TI) of polymeric materials including thermoplastic, thermosetting, and elastomeric materials. There are two approaches to estimating a TI based on data collected from...
Article
When interpolating computing system performance data, there are many input parameters that must be considered. Therefore, the chosen multivariate interpolation model must be capable of scaling to many dimensions. The Delaunay triangulation is a foundational technique, commonly used to perform piecewise linear interpolation in computer graphics, phy...
Conference Paper
Full-text available
Each of high performance computing, cloud computing, and computer security have their own interests in modeling and predicting the performance of computers with respect to how they are configured. An effective model might infer internal mechanics, minimize power consumption, or maximize computational throughput of a given system. This paper analyze...
Article
Full-text available
Photodegradation, driven primarily by ultraviolet (UV) radiation, is the primary cause of failure for organic paints and coatings, as well as many other products made from polymeric materials exposed to sunlight. Traditional methods of service life prediction involve the use of outdoor exposure in harsh UV environments (e.g., Florida and Arizona)....
Chapter
Full-text available
Accelerated destructive degradation tests (ADDTs) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparisons. The R package ADDT provides the functionalities of performing the traditional method...
Chapter
Accelerated destructive degradation test (ADDT) is a technique that is commonly used by industries to access material’s long-term properties. In many applications, the accelerating variable is temperature. In such cases, a thermal index (TI) is used to indicate the strength of the material. For example, a TI of 200 ∘C may be interpreted as the mate...
Article
There are annually over two million carloads of hazardous materials transported by rail in the United States. The American railroads use large blocks of tank cars to transport petroleum crude oil and other flammable liquids from production to consumption sites. Being different from roadway transport of hazardous materials, a train accident can pote...
Article
Full-text available
Medicaid 1915(c) waivers allow nursing home eligible older adults to avoid institutionalization by providing personal care services (PCS) in their homes. Prior studies have not determined how 1915(c) waiver recipients fare after a hurricane. We identified 26,015 New York (NY) Medicaid 1915(c) beneficiaries age 65 and older who received waiver servi...
Article
Reliability analysis of multicomponent repairable systems with dependent component failures is challenging for two reasons. First, the failure mechanism of one component may depend on other components when considering component failure dependence. Second, imperfect repair actions can have accumulated effects on the repaired components and these acc...

Network

Cited By