Article

Bootstrap Methods: Another Look at the Jackknife

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

We discuss the following problem given a random sample X = (X 1, X 2,…, X n) from an unknown probability distribution F, estimate the sampling distribution of some prespecified random variable R(X, F), on the basis of the observed data x. (Standard jackknife theory gives an approximate mean and variance in the case R(X, F) = \(\theta \left( {\hat F} \right) - \theta \left( F \right)\), θ some parameter of interest.) A general method, called the “bootstrap”, is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Our semantic preprocessing (SP) methodology considers class-based image features, derives a simple metric from these features (typically the mean brightness of each class area in this case), and then adjusts the image according to this feature-aware or semantic metric. By repeating this process iteratively for unknown, unlabeled data using the bootstrapping technique [35], an ADL snapshot ensemble can normalize unknown samples to the training set distribution, increasing prediction quality, mitigating overfitting issues, and improving the likelihood that evaluated samples will be accepted by the human expert. We demonstrate these gains on full-resolution whole-slide images with no downsizing or loss of detail with the use of patch-wise interpolation. ...
... Bootstrapped semantic preprocessing (BSP) takes advantage of the statistical phe nomenon of bootstrapping [35]: When we take an SP-adjusted image and re-sample it wit a trained model, then repeat for some number of iterations, we asymptotically converg toward a stable adjusted image and label that is almost always much higher quality tha the Otsu approximation or initial model evaluations. Thus, we apply SP in 5 iteration First, on the Otsu approximated segmentation. ...
... Bootstrapped semantic preprocessing (BSP) takes advantage of the statistical phenomenon of bootstrapping [35]: When we take an SP-adjusted image and re-sample it with a trained model, then repeat for some number of iterations, we asymptotically converge toward a stable adjusted image and label that is almost always much higher quality than the Otsu approximation or initial model evaluations. Thus, we apply SP in 5 iterations: First, on the Otsu approximated segmentation. ...
Article
Full-text available
The progress of incorporating deep learning in the field of medical image interpretation has been greatly hindered due to the tremendous cost and time associated with generating ground truth for supervised machine learning, alongside concerns about the inconsistent quality of images acquired. Active learning offers a potential solution to these problems of expanding dataset ground truth by algorithmically choosing the most informative samples for ground truth labeling. Still, this effort incurs the costs of human labeling, which needs minimization. Furthermore, automatic labeling approaches employing active learning often exhibit overfitting tendencies while selecting samples closely aligned with the training set distribution and excluding out-of-distribution samples, which could potentially improve the model’s effectiveness. We propose that the majority of out-of-distribution instances can be attributed to inconsistent cross images. Since the FDA approved the first whole-slide image system for medical diagnosis in 2017, whole-slide images have provided enriched critical information to advance the field of automated histopathology. Here, we exemplify the benefits of a novel deep learning strategy that utilizes high-resolution whole-slide microscopic images. We quantitatively assess and visually highlight the inconsistencies within the whole-slide image dataset employed in this study. Accordingly, we introduce a deep learning-based preprocessing algorithm designed to normalize unknown samples to the training set distribution, effectively mitigating the overfitting issue. Consequently, our approach significantly increases the amount of automatic region-of-interest ground truth labeling on high-resolution whole-slide images using active deep learning. We accept 92% of the automatic labels generated for our unlabeled data cohort, expanding the labeled dataset by 845%. Additionally, we demonstrate expert time savings of 96% relative to manual expert ground-truth labeling.
... However, it requires additional measurement time to repeat the multiple times of measurements. Therefore, this study tries applying a resampling technique called the bootstrap method [12], in order to estimate the statistical error from only a single measurement of reactor noise. ...
... For each trial number, count the number of successful trials that the reference value ref exists within the range of the 95% bootstrap confidence interval. (10) Based on the result of the step(9), estimate the coverage probability calc (i.e., the ratio of "number of successful trials" to "total number of trials"), where the statistical error is estimated by the bootstrap method[12].(11) Finally, validate calc compared with the reference value of ref = 95%. ...
Conference Paper
The autocorrelation method is a subcriticality measurement technique based on the reactor noise analysis method. In this method, the prompt neutron decay constant can be obtained from the exponential decay of the autocorrelation of successively detected neutron counts in a target subcritical system. To simply estimate the statistical error of the prompt neutron decay constant, several measurements of the reactor noise are required, although the total measurement time is inevitably long for typical systems where the reactor noise analysis method is applied. Therefore, the purpose of this study is to investigate the applicability of the circular block bootstrap method in order to estimate the statistical error of the prompt neutron decay constant obtained by the autocorrelation method using a single reactor noise measurement. The bootstrap-based statistical error estimation method is validated using the time series data of reactor noise measurements at UTR-KINKI in a shutdown state with the inherent neutron source in the uranium-aluminum fuel. Consequently, this study demonstrates that the circular block bootstrap method can be also utilized for the autocorrelation method, to estimate the confidence interval of the prompt neutron decay constant as the statistical error. Namely, a single reactor noise measurement can be effectively reused for error estimation instead of multiple measurements.
... For all dependent measures, i.e., digit memory span forward and backward, arithmetic, and visuospatial reasoning accuracy (p), a residual bootstrap [45] ANOVA was run for hypothesis testing with CONDITION (2 levels: PC and HDMS) and SEX (2 levels: M and F) as between-subjects predictors. The same analysis was run also with the ...
... For all dependent measures, i.e., digit memory span forward and backward, arithmetic, and visuospatial reasoning accuracy (p), a residual bootstrap [45] ANOVA was run for hypothesis testing with CONDITION (2 levels: PC and HDMS) and SEX (2 levels: M and F) as between-subjects predictors. The same analysis was run also with the "immersivity questionnaire score" as the dependent measure. ...
Article
Full-text available
Virtual reality (VR) can be a promising tool to simulate reality in various settings but the real impact of this technology on the human mental system is still unclear as to how VR might (if at all) interfere with cognitive functioning. Using a computer, we can concentrate, enter a state of flow, and still maintain control over our surrounding world. Differently, VR is a very immersive experience which could be a challenge for our ability to allocate divided attention to the environment to perform executive functioning tasks. This may also have a different impact on women and men since gender differences in both executive functioning and the immersivity experience have been referred to by the literature. The present study aims to investigate cognitive multitasking performance as a function of (1) virtual reality and computer administration and (2) gender differences. To explore this issue, subjects were asked to perform simultaneous tasks (span forward and backward, logical–arithmetic reasoning, and visuospatial reasoning) in virtual reality via a head-mounted display system (HDMS) and on a personal computer (PC). Our results showed in virtual reality an overall impairment of executive functioning but a better performance of women, compared to men, in visuospatial reasoning. These findings are consistent with previous studies showing a detrimental effect of virtual reality on cognitive functioning.
... The bootstrap concept was introduced by [32], then [33,34] introduced the descriptive methods to identify data influence for non-parametric computation. These methods allow the use of statistical inference without compromising the non-parametric nature of the problem; however, it requires manual work, which often makes it impractical due to the amount and diversity of data handled, as in the case of this research (5563 observations, 5 inputs, and 4 outputs). ...
... The concept of the bootstrap was introduced by [32]. Still, its application with the DEA model was only presented by [37], where the bootstrap simulates a sample with the application of the original estimator, making the simulation results replicate the original sample through a Process of Data Generation (PGD), in a process of resampling, repeated several times. ...
Article
Full-text available
With the economic growth of the Brazilian agroindustry, it is necessary to evaluate the efficiency of this activity in relation to environmental demands for the country’s economic, social, and sustainable development. Within this perspective, the present research aims to examine the eco-efficiency of agricultural production in Brazilian regions, covering 5563 municipalities in the north, northeast, center-west, southeast, and south regions, using data from 2016–2017. In this sense, this study uses the DEA methods (classical and stochastic) and the computational bootstrap method to remove outliers and measure eco-efficiency. The findings lead to two fundamental conclusions: first, by emulating the benchmarks, it is feasible to increase annual revenue and preserved areas to an aggregated regional level by 20.84% while maintaining the same inputs. Given that no municipality has reached an eco-efficiency value equal to 1, there is room for optimization and improvement of production and greater sustainable development of the municipalities. Secondly, climatic factors notably influence eco-efficiency scores, suggesting that increasing temperatures and decreasing precipitation can positively impact eco-efficiency in the region. These conclusions, dependent on regional characteristics, offer valuable information for policymakers to design strategies that balance economic growth and environmental preservation. Furthermore, adaptive policies and measures can be implemented to increase the resilience of local producers and reduce vulnerability to changing climate conditions.
... 2.2.6 | Hierarchical nonparametric bootstrapping Nonparametric bootstrapping (Efron, 1979;Hesterberg, 2011) is a method consisting of resampling (with ...
... This single hierarchical resampling operation was replicated 10,000 times, the mean (or weighted mean, see Section 2.2.3) effect size being calculated on each bootstrap sample. As originally intended by Efron (1979), the median and 95th percentile confidence interval were computed from the hierarchically bootstrapped posterior distribution of the means (Adams et al., 1997). Hierarchical nonparametric bootstrapping was performed in R Studio 2022.07.2, based on the 'resample' function written by Shotwell (2012). ...
Article
Conservation farming practices are known for their capacity to mitigate runoff and erosion, but the magnitude of their effectiveness is highly variable across studies. In order to better understand the contribution of environmental and management factors to their effectiveness, up to 37 studies reporting 271 individual trials were collated for a quantitative review regarding 3 common conservation agriculture‐related practices, at the plot scale and in a Western European context. Two different methods suitable for hierarchically structured data sets were used for the meta‐analyses—hierarchical nonparametric bootstrapping and linear random effects models—, yielding nearly identical average outcomes but differing in terms of confidence intervals. We found that, on average, winter cover crops reduce cumulative seasonal (autumn‐winter) runoff by 68% and soil losses by 72% compared with a bare soil. The occurrence and intensity of stubble tillage on the control plot is a key explanatory variable for the mitigation effect of winter cover crops. In potato crops, tied ridging reduces cumulative seasonal (spring–summer) runoff by a mean of 70% and soil erosion by 92%. Conservation (non‐inversion) tillage techniques alleviate cumulative seasonal overland flow by 27% and associated sediments losses by 66%, but strong evidence of publication bias was detected for this farming practice, probably leading to an overestimation of its effectiveness. These mitigation effects are shown to be much greater for spring crops than for winter crops, and to increase with time since ploughing was stopped. The type of conservation tillage scheme strongly affects the ability to attenuate surface flows. Intensive non‐inversion tillage systems relying on repeated use of (powered) tillage operations appear to be the least effective for reducing both water and sediment losses. The best performing scheme against runoff would be a deep (non‐inversion) tillage (−61%), while against erosion it would be a no‐till system (−82%). Although several explanatory factors were identified, there remains a high (unexplained) variability between trials effect sizes, thus not attributable to pure sampling variability. Meanwhile, this review provides farm advisors or policy makers with guidance on the contexts in which implementation of such conservation practices should be supported so as to maximize expected benefits.
... The unfolding was then applied to each of these spectra and the resulting statistical uncertainty in each bin was obtained from a covariance matrix corresponding to the ensemble. To cross-check the correlation of the statistical uncertainties of the bin contents of the jet spectrum, the statistical uncertainties were also evaluated using the Bootstrap method [65,66] and found to be consistent with the pseudo-random experiments. When calculating the ratio of jet cross sections, the spectra which appear in the numerator and denominator are from the same input data. ...
Article
Full-text available
Measurements of inclusive charged-particle jet production in pp and p-Pb collisions at center-of-mass energy per nucleon-nucleon collision \( \sqrt{s_{\textrm{NN}}} \) = 5.02 TeV and the corresponding nuclear modification factor \( {R}_{\textrm{pPb}}^{\textrm{ch}\ \textrm{jet}} \) are presented, using data collected with the ALICE detector at the LHC. Jets are reconstructed in the central rapidity region |ηjet| < 0.5 from charged particles using the anti-kT algorithm with resolution parameters R = 0.2, 0.3, and 0.4. The pT-differential inclusive production cross section of charged-particle jets, as well as the corresponding cross section ratios, are reported for pp and p-Pb collisions in the transverse momentum range 10 < \( {p}_{\textrm{T},\textrm{jet}}^{\textrm{ch}} \) < 140 GeV/c and 10 < \( {p}_{\textrm{T},\textrm{jet}}^{\textrm{ch}} \) < 160 GeV/c, respectively, together with the nuclear modification factor \( {R}_{\textrm{pPb}}^{\textrm{ch}\ \textrm{jet}} \) in the range 10 < \( {p}_{\textrm{T},\textrm{jet}}^{\textrm{ch}} \) < 140 GeV/c. The analysis extends the pT range of the previously-reported charged-particle jet measurements by the ALICE Collaboration. The nuclear modification factor is found to be consistent with one and independent of the jet resolution parameter with the improved precision of this study, indicating that the possible influence of cold nuclear matter effects on the production cross section of charged-particle jets in p-Pb collisions at \( \sqrt{s_{\textrm{NN}}} \) = 5.02 TeV is smaller than the current precision. The obtained results are in agreement with other minimum bias jet measurements available for RHIC and LHC energies, and are well reproduced by the NLO perturbative QCD Powheg calculations with parton shower provided by Pythia8 as well as by Jetscape simulations.
... The idea of bootstrapping lies in repeatedly resampling the sample data. This approach has been pioneered first by Efron (1979) and since then, bootstrap resampling has been widely used in many social sciences. ...
Article
Full-text available
OLS regressions have a set of assumption in order to have its point and interval estimates to be unbiased and efficient. Data missing not at random (MNAR) can pose serious estimations issues in the linear regression. In this study we evaluate the performance of OLS confidence interval estimates with MNAR data. We also suggest bootstrapping as a remedy for such data cases and compare the traditional confidence intervals against bootstrap ones. As we need to know the true parameters, we carry out a simulations study. Research results indicate that both approaches show similar results having similar intervals size. Given that bootstrap required a lot of computations, traditional methods is still recommended to be used even in case of MNAR
... To estimate the error of the computed critical values, we recommend using resampling methods such as the nonparametric bootstrap [4] or jackknife [5] instead of simple methods like binomial errors, in order to capture not only the statistical fluctuations in the number of pseudoexperiments that fall into a range of Δχ 2 values, but also the statistical fluctuations in their weights. . The red line indicates the error contribution from pseudoexperiments with y ≤ YðxÞ < Δχ 2 max [first term with AðyÞ], which is responsible for the exponential reduction of total uncertainty until the contribution from pseudoexperiments with y ≥ Δχ 2 max [second term with BðyÞ shown by green line] takes over for very high-CL critical values. ...
Article
Full-text available
In various high-energy physics contexts, such as neutrino-oscillation experiments, several assumptions underlying the typical asymptotic confidence interval construction are violated, such that one has to resort to computationally expensive methods like the Feldman-Cousins method for obtaining confidence intervals with proper statistical coverage. By construction, the computation of intervals at high confidence levels requires fitting millions or billions of pseudoexperiments, while wasting most of the computational cost on overly precise intervals at low confidence levels. In this work, a simple importance sampling method is introduced that reuses pseudoexperiments produced for all tested parameter values in a single mixture distribution. This results in a significant error reduction on the estimated critical values, especially at high confidence levels, and simultaneously yields a correct interpolation of these critical values between the parameter values at which the pseudoexperiments were produced. The theoretically calculated performance is demonstrated numerically using a simple example from the analysis of neutrino oscillations. The relationship to similar techniques applied in statistical mechanics and p -value computations is discussed. Published by the American Physical Society 2024
... To validate the model, a bootstrap approach with 10,000 iterations was applied. 17 To visualise the performance of the multiple logistic regression model, a Receiver Operating Characteristic (ROC) curve was computed. Additionally, the area under the curve (AUC) was calculated to provide a single measure of the model's overall accuracy. ...
... (The VIM parameter takes non-negative values by construction; increasing the number of available predictors cannot reduce model performance on a population level.) Variance estimates for predictiveness were computed separately on the two test folds using the nonparametric bootstrap [35] with 500 bootstrap replicates, resampled at the patient level. The variance of the VIM estimator was computed as the sum of variance estimates constructed from the two independent test folds [23]. ...
Preprint
Full-text available
Objective: Self-harm risk prediction models developed using health system data (electronic health records and insurance claims information) often use patient information from up to several years prior to the index visit when the prediction is made. Measurements from some time periods may not be available for all patients. We study the predictive potential of variables corresponding to different time horizons prior to the index visit. Materials and Methods: We use variable importance to quantify the potential of recent (up to three months before the index visit) and distant (more than one year before the index visit) patient mental health information for predicting self-harm risk using data from seven health systems. We quantify importance as the decrease in predictiveness when the variable set of interest is excluded from the prediction task. We define predictiveness using discriminative metrics: area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value. Results: Mental health predictors corresponding to the three months prior to the index visit show strong signal of importance; in one setting, excluding these variables decreased AUC from 0.85 to 0.77. Predictors corresponding to more distant information were less important. Discussion: Predictors from the months immediately preceding the index visit are highly important. Implementation of self-harm prediction models may be challenging in settings where recent data are not completely available (e.g., due to lags in insurance claims processing) at the time a prediction is made. Conclusion: Clinically derived variables from different time frames exhibit varying levels of importance for predicting self-harm.
... We estimate the H I score PDF uncertainties using bootstrapping (Efron 1979). Because all score metrics are evaluated based on the H I data from N08, we apply bootstrapping by randomly discarding 20% of all the H I data, then draw at random a sample with replacement until the number of the new drawing in each bootstrap realization is equal to the total number of the H I elements in the original data set. ...
Article
Full-text available
We present a new technique to identify associations of H i emission in the Magellanic Stream (MS) and ultraviolet (UV) absorbers from 92 QSO sight lines near the MS. We quantify the level of associations of individual H i elements to the main H i body of the Stream using Wasserstein distance-based models, and derive characteristic spatial and kinematic distances of the H i emission in the MS. With the emission-based model, we further develop a comparison metric, which identifies the dominant associations of individual UV absorbers with respect to the MS and nearby galaxies. For ionized gas associated with the MS probed by C ii , C iv , Si ii , Si iii , Si iv , we find that the ion column densities are generally ∼0.5 dex higher than those that are not associated, and that the gas is more ionized toward the tail of the MS as indicated by the spatial trend of the C ii /C iv ratios. For nearby galaxies, we identify potential new absorbers associated with the circumgalactic medium of M33 and NGC 300, and affirm the associations of absorbers with IC 1613 and the Wolf–Lundmark–Mellote galaxy. For M31, we find the previously identified gradient in column densities as a function of the impact parameter, and that absorbers with higher column densities beyond M31's virial radius are more likely to be associated with the MS. Our analysis of absorbers associated with the Magellanic Clouds reveals the presence of continuous and blended diffuse ionized gas between the Stream and the Clouds. Our technique can be applied to future applications of identifying associations within physically complex gaseous structures.
... Also, to mitigate the effect of multicollinearity in different situations many researchers have employed RR in Beta regression [22], Gaussian linear model [23], Logistic regression [24], Poisson regression [25], Tobit regression [26], to mention but a few. Bootstrap initiated by [27] is a generic statistical method for assessing the accuracy of an estimator. The core mechanism is based on the concept of random sampling with replacement. ...
Article
Full-text available
Bootstrap is a simple, yet powerful method of estimation based on the concept of random sampling with replacement. The ridge regression using a biasing parameter has become a viable alternative to the ordinary least square regression model for the analysis of data where predictors are collinear. This paper develops a nonparametric bootstrap-quantile approach for the estimation of ridge parameter in the linear regression model. The proposed method is illustrated using some popular and widely used ridge estimators, but this idea can be extended to any ridge estimator. Monte Carlo simulations are carried out to compare the performance of the proposed estimators with their baseline counterparts. It is demonstrated empirically that MSE obtained from our suggested bootstrap-quantile approach are substantially smaller than their baseline estimators especially when collinearity is high. Application to real data sets reveals the suitability of the idea.
... 20µPa RMS), based on the echo detection threshold of vespertilionid bats [48]. We estimated the error of the Lombard response magnitude by bootstrapping [49] the calculation per light conditions (1000 reps × 100 pseudo samples) to calculate 95% quantiles. Additionally, we estimated the magnitude of SL compensation at different stages in the approach by binning range data logarithmically against SL and finding the magnitude of SL compensation to range (magnitude × log 10 (range)) per bin using linear regression (magnitude × log 10 (range) + intercept). ...
Article
Full-text available
Most bats hunt insects on the wing at night using echolocation as their primary sensory modality, but nevertheless maintain complex eye anatomy and functional vision. This raises the question of how and when insectivorous bats use vision during their largely nocturnal lifestyle. Here, we test the hypothesis that the small insectivorous bat, Myotis daubentonii, relies less on echolocation, or dispenses with it entirely, as visual cues become available during challenging acoustic noise conditions. We trained five wild-caught bats to land on a spherical target in both silence and when exposed to broad-band noise to decrease echo detectability, while light conditions were manipulated in both spectrum and intensity. We show that during noise exposure, the bats were almost three times more likely to use multiple attempts to solve the task compared to in silent controls. Furthermore, the bats exhibited a Lombard response of 0.18 dB/dBnoise and decreased call intervals earlier in their flight during masking noise exposures compared to in silent controls. Importantly, however, these adjustments in movement and echolocation behaviour did not differ between light and dark control treatments showing that small insectivorous bats maintain the same echolocation behaviour when provided with visual cues under challenging conditions for echolocation. We therefore conclude that bat echolocation is a hard-wired sensory system with stereotyped compensation strategies to both target range and masking noise (i.e. Lombard response) irrespective of light conditions. In contrast, the adjustments of call intervals and movement strategies during noise exposure varied substantially between individuals indicating a degree of flexibility that likely requires higher order processing and perhaps vocal learning.
... This study used bootstrapping to examine if self-efficacy is a mediator in the association between maternal stress and quality of life. Efron (1992) developed bootstrapping, a nonparametric resampling procedure. A nonparametric approach is highly recommended for small sample sizes due to its lack of dependency on assumptions of normality. ...
Article
Full-text available
Background: Mothers who have children with autism encounter significant difficulties in caring for their autistic youngsters, leading to higher stress levels and a reduced overall quality of life. External or internal factors can cause and respond to stress, affecting an individual’s physical, psychological, and emotional health. Thus, it is crucial to examine the quality of life of mothers with autistic children. Objectives: This study aimed to investigate the relationships between stress, self-efficacy, and quality of life (QoL) in mothers of children with autism. Methods: A cross-sectional study design was used. Self-administered questionnaires were distributed from October to November 2019 to mothers with autistic children using cluster sampling techniques to capture their demographics and perceptions of stress, self-efficacy, and QoL. The data analysis was performed using covariance-based structural equation modeling (CB-SEM). Results: Of the 290 questionnaires distributed, 238 (response rate of 82%) sets were returned, but only 181 questionnaires were usable for further analysis. The findings demonstrated a notable impact of stress and self-efficacy on quality of life and an adverse effect of stress on self-efficacy. Self-efficacy serves as an intermediary in the relationship between stress and quality of life. Conclusion: In general, mothers of autistic children typically face moderate stress levels, but they have low levels of self-efficacy and quality of life. Mothers of children with autism need assistance and support from healthcare professionals, such as doctors, nurses, and psychiatrists, so that they can bear the challenges of raising children with special needs and enjoy a higher standard of living with less emotional and physical strain.
... We quantified the intrinsic OP m (OP per unit aerosol mass; DTT m , AA m , and DCFH m ) of the sources of OA and elements using a multiple linear regression model (Supplementary Table 3) and evaluated the OP m uncertainty using a bootstrapping technique 33 where input matrices were obtained from the random resampling of the rows from original data to create new matrices having multiple entries of some rows and omittance of other rows. Supplementary Fig. 2 shows the seasonal variations of both OP m and OP v for all 3 assays (DCFH, DTT and AA) used in this study. ...
Article
Full-text available
The oxidative potential (OP) of particulate matter (PM) is a major driver of PM-associated health effects. In India, the emission sources defining PM-OP, and their local/regional nature, are yet to be established. Here, to address this gap we determine the geographical origin, sources of PM, and its OP at five Indo-Gangetic Plain sites inside and outside Delhi. Our findings reveal that although uniformly high PM concentrations are recorded across the entire region, local emission sources and formation processes dominate PM pollution. Specifically, ammonium chloride, and organic aerosols (OA) from traffic exhaust, residential heating, and oxidation of unsaturated vapors from fossil fuels are the dominant PM sources inside Delhi. Ammonium sulfate and nitrate, and secondary OA from biomass burning vapors, are produced outside Delhi. Nevertheless, PM-OP is overwhelmingly driven by OA from incomplete combustion of biomass and fossil fuels, including traffic. These findings suggest that addressing local inefficient combustion processes can effectively mitigate PM health exposure in northern India.
... Note that subsampling technique is closely related to the idea of bootstrap (Efron, 1979;Bickel and Freedman, 1981). However, the classical full size bootstrap is often computationally too expensive for massive data analysis. ...
... Many books (e.g., Davison and Hinkley 2009;Efron 1982;Efron and Tibshirani 1993;Hall 1997;Mammen 1992;Shao and Tu 1995) and innumerable papers have been written on bootstrap since the pioneering work of Efron (1979). Various modifications of the idea of Efron's bootstrap have been proposed in the literature, and their applicability boundaries delineated in the form of conditions. ...
Article
The tail conditional allocation plays an important role in a number of areas, including economics, finance, insurance, and management. Fixed-margin confidence intervals and the assessment of their coverage probabilities are of much interest. In this paper, we offer a convenient way to achieve these goals via resampling. The theoretical part of the paper, which is technically demanding, is rigorously established under minimal conditions to facilitate the widest practical use. A simulation-based study and an analysis of real data illustrate the performance of the developed methodology.
... (1) to the N1m-peak amplitudes may raise concerns about the quality of the estimates of the fitting parameters. Hence, to gauge the robustness of our results, we generated surrogate waveforms at each SOI by applying the non-parametric bootstrap technique (Bezanson et al., 2017;Efron, 1979;Sieluzycki et al., 2021). This approach yields information about statistical inferences such as median or mean and confidence intervals (CIs) without the restriction that the data are normally distributed or homoscedastic. ...
Preprint
Adaptation is the attenuation of a neuronal response when a stimulus is repeatedly presented. The phenomenon has been linked to sensory memory, but its exact neuronal mechanisms are under debate. One defining feature of adaptation is its lifetime, that is, the timespan over which the attenuating effect of previous stimulation persists. This can be revealed by varying the stimulus-onset interval (SOI) of the repeated stimulus. As SOI is increased, the peak amplitude of the response grows before saturating at large SOIs. The rate of this growth can be quantified and used as an estimate of adaptation lifetime. Here, we studied whether adaptation lifetime varies across the left and the right auditory cortex of the human brain. Event-related fields of whole-head magnetoencephalograms (MEG) were measured in 14 subjects during binaural presentation of pure tone stimuli. To make statistical inferences on the single-subject level, additional event-related fields were generated by resampling the original single-trial data via bootstrapping. For each hemisphere and SOI, the peak amplitude of the N1m response was then derived from both original and bootstrap-based data sets. Finally, the N1m peak amplitudes we used for deriving subject- and hemisphere-specific estimates of adaptation lifetime. Comparing subject-specific adaptation lifetime across hemispheres, we found a significant difference, with longer adaptation lifetimes in the left than in the right auditory cortex (p = 0.004). This difference might have a functional relevance in the context of temporal binding of auditory stimuli, leading to larger integration time windows in the left than in the right hemisphere.
... In our case, for a concrete cell of the grid, we have the sample of magnitudes with a size k, consisting of k-nearest neighbors (y 1 , …, y k ) and the estimate of the maximum likelihood (6) obtained from this sample. As an average, we take the estimate of the maximum likelihood ; to find Std( ), an original method based on the idea of statistical bootstrap (Efron, 1979) is proposed. The sample (y 1 , …, y k ) undergoes to random permutation; then, the sample obtained is divided into 2 equal parts (y 1 ,…, y k/2 ), (y k/2+1 ,…, y k ). ...
Article
This study is devoted to application of some new statistical methods to analysis of the spatial struc-ture of the seismic field in the seismically active region in the Japan region bounded by the following coordinates: 28°–50° north latitude, 130°–150° east longitude. The estimates of the seismic flux by using the k nearest neighbors method for the magnitude interval m ≥ 5.2. The highest values of intensity of about 10–4 \(\frac{1}{{{\text{year}}{\kern 1pt} - {\kern 1pt} {\text{k}}{{{\text{m}}}^{{\text{2}}}}}}\) are located at depths of down to 100 km and manifest themselves in the neighborhood of the Tohoku megathrust earthquake. The spatial resolution of the intensity estimates is ranging from 33–50 km in the regions with a high intensity to 100 км and larger in the zones of weak intensity. It has been shown that the seismic filed parameters – intensity λ, slope of the graph of repetition β, maximum possible magnitude m1 – have different scales of their spatial variability and, thus, it is necessary to apply different scales of spatial averaging to them. Based on the Gutenberg—Richter truncated distribution model, the estimates are obtained for the slope of the graph of repetition (b-value) and the upper boundary of the distribution m1. An original method is proposed for determining the optimal averaging radius for an arbitrary cell of the space grid. The method is based on the use of the statistical coefficient of variation of the corresponding parameter. For the considered region, the estimate of the maximum possible magnitude Мmax = 9.60 ± 0.41 was obtained with consideration of the correction for bias.
... Resampling methodologies like the jackknife (Quenouille, 1956;Tukey, 1958), the generalized jackknife (GJ) (Gray & Schucany, 1972) and the bootstrap (Efron, 1979) have been used in statistical EVT for the estimation of parameters of extreme events, among which I mention the EVI, in (2.2), and the EI, in (3.5). These methodologies have frequently answered positively to the question whether the combination of information can improve the quality of estimators of parameters/functionals, a discussion that can be seen in Gomes (1995a), a paper associated with a presentation at the "II Congresso Anual da SPE" in 1994, and that initiated the study of reduced-bias (RB) estimators of parameters of extreme events and of threshold selection in statistical EVT through resampling techniques. ...
Preprint
Full-text available
The Portuguese School of Extremes and Applications (PORTSEA) is nowadays well recognized by the international scientific community, and in my opinion , the organization of a NATO Advanced Study Institute on Statistical Extremes and Applications, which took place at Vimeiro in the summer of 1983, was a landmark for the international recognition of the group and the launching of the PORTSEA. The dynamic of publication has been very high and the topics under investigation in the area of Extremes have been quite diverse. In this article, attention will be paid essentially to some of the scientific achievements of the author in this field, but apart from a large group, where the author is included, working in the area of Parametric, Semi-parametric and Non-parametric Estimation of Parameters of Rare Events, the PORTSEA has strong groups in Univariate, Multivariate, Multidimensional, Spatial Extremes and Applications to Dynamical Systems, Environment, Finance and Insurance, among others. We thus think that the dynamism of the Group will provide a healthy growing of the field, with a high international recognition of Extremes in Portugal, a country of 'good extremists' in an extreme of Europe.
... Further, linear regression analysis was performed to explore whether select demographic and clinical covariates were significantly correlated with the annual Cobb angle progression rate. Standard errors and CIs of the coefficients were computed using the bootstrap estimation with 1000 replications [23][24][25]. The threshold value for statistical significance was set at p < 0.05. ...
... We acknowledge that such a development ignores the error of the spline approximation but believe that it empirically provides sufficiently good approximation for estimation and inference (Huang and Liu 2006). Because the analytic form of the standard error (SE) is analytically intractable to compute directly, we use a bootstrapping resampling method (Efron 1992) to construct 95% confidence intervals (CI) of the parameters in our simulation study and data application. ...
Article
Full-text available
In studies with time-to-event outcomes, multiple, inter-correlated, and time-varying covariates are commonly observed. It is of great interest to model their joint effects by allowing a flexible functional form and to delineate their relative contributions to survival risk. A class of semiparametric transformation (ST) models offers flexible specifications of the intensity function and can be a general framework to accommodate nonlinear covariate effects. In this paper, we propose a partial-linear single-index (PLSI) transformation model that reduces the dimensionality of multiple covariates into a single index and provides interpretable estimates of the covariate effects. We develop an iterative algorithm using the regression spline technique to model the nonparametric single-index function for possibly nonlinear joint effects, followed by nonparametric maximum likelihood estimation. We also propose a nonparametric testing procedure to formally examine the linearity of covariate effects. We conduct Monte Carlo simulation studies to compare the PLSI transformation model with the standard ST model and apply it to NYU Langone Health de-identified electronic health record data on COVID-19 hospitalized patients’ mortality and a Veteran’s Administration lung cancer trial.
... Se elaboró un listado de géneros de bracónidos identificados. Se hizo un análisis comparativo de riqueza y diversidad de la comunidad de avispas capturadas, que incluyó: 1) comparar la riqueza y composición de bracónidos capturados en los platos-trampa de diferentes colores; 2) estimar la riqueza esperada para cada color y los cuatro colores combinados mediante el estimador no paramétrico de Chao 2 y la técnica de re-muestreo de Jackknife 1 (Magurran 2004); las estimaciones se mejoraron con un procedimiento de re-muestreo "bootstrap" con 1.000 permutaciones para disminuir el sesgo y reducir la variación debido a las pocas observaciones (Efron 1979); 3) estimar la diversidad capturada por color con el índice no paramétrico de Shannon, cuyos resultados se compararon con una prueba de t modificada por Hutchenson (Zar 2005). Para establecer si las diferencias de captura observadas son efecto de la atracción o del tamaño de muestra, se aplicó el método de rarefacción a las capturas por color de la trampa. ...
Article
Full-text available
Introducción En entomología se ha extendido el uso de trampas pasivas y activas para recolectar insectos de interés económico o cien-tífico. Las trampas pasivas, referidas a las trampas Malaise y de caída "pitfall", capturan insectos al azar, mientras que las trampas activas como las de colores, luz y cebadas lo hacen más selectivamente con el uso de atrayentes específicos tales como olor, color y formas de diseño (Mazón y Bordera 2008). En el diseño de las trampas existen formas variadas que dependen del tipo de insecto que se desea capturar así como de la finalidad del sistema de trampeo. Por ello, se han usado, como trampas, artefactos de plástico de diversos di-seños (tipo delta, cilíndricas, planas, de fondo invaginado y tipo cono) (Barrera et al. 2006). Estas trampas son más atractivas cuando se combinan con diferentes colores debido a que los insectos tienen la capacidad de detectar diferen-tes longitudes de onda de luz (Li et al. 2012). Hoback et al. (1999) publicaron una lista de 67 familias de insectos que Influencia del color y altura de platos-trampa en la captura de bracónidos (Hymenoptera: Braconidae) Influence of color and height of pan traps to capture braconids (Hymenoptera: Braconidae) Resumen: El objetivo del estudio fue determinar la efectividad de captura de bracónidos (Hymenoptera: Braconidae) con platos-trampa de color (amarillo, azul, crema y verde), a dos alturas (0 y 90 cm) durante cuatro semanas en el Área Natural Protegida "Cerro Punhuato" (Morelia, Michoacán, México). La reflectancia de cada color se midió con un espectrofotómetro analítico de campo. Se capturaron 104 especímenes pertenecientes a 14 subfamilias y 28 géneros. Los platos-trampa de color amarillo y verde capturaron la mayor cantidad de especímenes y la mayor diversidad de géneros. La similitud estimada con el índice de Bray-Curtis entre ambos colores fue de 56,5 %. Los platos-trampa amarillos instalados a nivel del suelo superaron significativamente la captura de los platos verdes en una semana. El nivel de reflectancia que tienen los platos-trampa verdes y amarillos es muy similar en los intervalos de 360 a 530 nm, lo que posiblemente indique que éste sea el intervalo de longitud de onda en que los bracónidos son atraídos por estos colores. Palabras clave: Avispas. Reflectancia. Diversidad. Eficiencia de captura. Abstract: The aim of this study was to determine the effectiveness of capturing braconids (Hymenoptera: Braconidae) on yellow, blue, cream, and green pan traps at two heights (0 and 90 cm) for a period of four weeks in the Natural Protected Area "Cerro Punhuato", (Morelia, Michoacan, Mexico). The reflectance of each color was measured with a field analytical spectrophotometer. One hundred four (104) specimens, belonging to 14 subfamilies and 28 genera, were collected. Yellow and green pan traps caught the largest number of specimens and the greatest diversity of genera, showing 56.5 % similarity, as estimated with the Bray-Curtis index. Yellow pan traps installed at ground level exceeded the capture in green pan traps. The level of reflectance of the green and yellow pan traps was in the range from 360 to 530 nm, possibly indicating that this is the wavelength range in which braconids are attracted to both colors.
... However, uncertainty estimates are not immediately available as regular statistical inference neglects any variance regarding class assignments yielding biased results (Grün and Leisch, 2008). This issue has been addressed for ordinary mixture models in Basford et al (1997) or O'Hagan et al (2019) by employing resampling techniques like various bootstrapping routines (Efron, 1979) or the jackknife (Quenouille, 1956). In the case of mixture regression models, Grün and Leisch (2004) and Hennig (2000) already used bootstrapping to detect identifiability issues of fitted mixture regression models and Turner (2000) applied resampling for the number of components in a mixture of regressions. ...
Preprint
Full-text available
Finite Mixture Regression is a popular approach for regression settings with present but unobserved sub-populations. Over the past decades an extensive toolbox has been developed covering various kinds of distributions and effect types. As for any other thorough statistical analysis, reporting of e.g.\ confidence intervals for the parameters of the latent models is of high practical relevance. However, standard theory neglects the additional variability arising from also estimating class assignments, which consequently leads to the corresponding uncertainty estimates usually being too optimistic. In this work we propose a resampling technique for finite mixture regression models to construct confidence intervals for the regression coefficients in order to hold the type-I error threshold. The mechanism relies on bootstrapping which already proved useful for different problems arising for mixture models in the past and is evaluated via various simulation studies and two real world applications. Overall, the routine successfully holds the type-I error threshold and is less computationally expensive than alternative approaches.
Article
El objetivo del presente trabajo consiste en estimar los intervalos de confianza Bootstrap de proporciones de factores asociados a inhumaciones por causa de COVID-19, en el cementerio general de Riobamba - Ecuador, período marzo 2020 – abril 2021. El método Bootstrap consiste en realizar re muestreos con repetición, es decir, obteniendo muestras mediante algún procedimiento aleatorio que utilice la muestra original. De esta manera, se han identificado los grupos vulnerables para inhumaciones a causa de COVID-19, por sexo, edad y edad de acuerdo a sexo. Con respecto al análisis de vulnerabilidad de los grupos, en cuanto a sexo, el grupo más vulnerable es el sexo masculino. A su vez, dentro de intervalos de edades determinadas mediante los ciclos de vida, se encontró que el grupo más vulnerable es el conocido como adultos mayores, que son las personas que tienen una edad superior a 60 años, tanto para los sexos masculino y femenino.
Article
Parameters in the Ricker models of stock and recruitment of yellowtail Seriola quinqueradiata considering the effects of capturing juveniles were estimated using time series of number of recruits over 1994–2020, spawning stock biomass (SSB) and commercial catch number for juveniles under special permit for aquaculture. Equations expressing the relation between stock and recruitment are derived from the differential equation describing the density-dependent mortality process of juveniles with capture, and the catch equation for juveniles is derived using the equation. The unknown parameters are estimated by maximizing the log-likelihood with the above data. The estimations showed that the estimate of natural mortality coefficient averaged over 1994–1996 was 0.119 (per month), that over 2018–2020 was 0.310, and the estimate gradually increased. The annual estimate of fishing mortality coefficient for juveniles gradually decreased, whereas that of the natural coefficient rapidly increased. The current reduction of fishing coefficient restored the level of the expected number of recruits up to 90% of that without capture. Use of the model and its modification are discussed.
Article
The amount of social support partners provide and receive in romantic relationships is important for psychological well-being. But in what sense exactly? Divergent and highly nuanced hypotheses exist in the literature. We explicitly spelled out these hypotheses, specified a statistical model for each using response surface analyses, and simultaneously tested which model had the most empirical support. We analyzed data from more than 16,000 participants and investigated how the amount of social support relates to relationship satisfaction (of participants themselves and partners) and self-esteem (of participants themselves). For participants’ own relationship satisfaction, models postulating that more provided and received social support is linked to higher satisfaction had the most empirical support. For partners’ relationship satisfaction and participants’ self-esteem, models that also take partners’ (dis)-similarity in supportiveness into account received support. In total, the absolute amount of support seems to generally matter and, in some cases, partners’ (dis)-similarity seems relevant.
Chapter
The success of Machine Learning benefits from the availability of various numerical techniques and specialized technical concepts that help to make the training process efficient and effective. In many times, it is even those techniques and technical aspects that make a problem solvable in the first place. This chapter introduces some of these techniques and covers topics ranging from feature engineering and feature importance, training, testing, and cross-validation, to the concepts of baseline models.
Article
Full-text available
Alchemical binding free energy calculations are one of the most accurate methods for estimating ligand-binding affinity. Assessing the accuracy of the approach over protein targets is one of the most interesting issues. The free energy difference of binding between a protein and a ligand was calculated via the alchemical approach. The alchemical approach exhibits satisfactory accuracy over four targets, including AmpC beta-lactamase (AmpC); glutamate receptor, ionotropic kainate 1 (GluK1); heat shock protein 90 (Hsp90); and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro). In particular, the correlation coefficients between calculated binding free energies and the respective experiments over four targets range from 0.56 to 0.86. The affinity computed via free energy perturbation (FEP) simulations is overestimated over the experimental value. Particularly, the electrostatic interaction free energy rules the binding process of ligands to AmpC and GluK1. However, the van der Waals (vdW) interaction free energy plays an important role in the ligand-binding processes of HSP90 and SARS-CoV-2 Mpro. The obtained results associate with the hydrophilic or hydrophobic properties of the ligands. This observation may enhance computer-aided drug design.
Article
DNA barcoding has largely established itself as a mainstay for rapid molecular taxonomic identification in both academic and applied research. The use of DNA barcoding as a molecular identification method depends on a “DNA barcode gap”—the separation between the maximum within-species difference and the minimum between-species difference. Previous work indicates the presence of a gap hinges on sampling effort for focal taxa and their close relatives. Furthermore, both theory and empirical work indicate a gap may not occur for related pairs of biological species. Here, we present a novel evaluation approach in the form of an easily calculated set of nonparametric metrics to quantify the extent of proportional overlap in inter- and intraspecific distributions of pairwise differences among target species and their conspecifics. The metrics are based on a simple count of the number of overlapping records for a species falling within the bounds of maximum intraspecific distance and minimum interspecific distance. Our approach takes advantage of the asymmetric directionality inherent in pairwise genetic distance distributions, which has not been previously done in the DNA barcoding literature. We apply the metrics to the predatory diving beetle genus Agabus as a case study because this group poses significant identification challenges due to its morphological uniformity despite both relative sampling ease and well-established taxonomy. Results herein show that target species and their nearest neighbor species were found to be tightly clustered and therefore difficult to distinguish. Such findings demonstrate that DNA barcoding can fail to fully resolve species in certain cases. Moving forward, we suggest the implementation of the proposed metrics be integrated into a common framework to be reported in any study that uses DNA barcoding for identification. In so doing, the importance of the DNA barcode gap and its components for the success of DNA-based identification using DNA barcodes can be better appreciated.
Article
Full-text available
Assessing the individual risk of Major Adverse Cardiac Events (MACE) is of major importance as cardiovascular diseases remain the leading cause of death worldwide. Quantitative Myocardial Perfusion Imaging (MPI) parameters such as stress Myocardial Blood Flow (sMBF) or Myocardial Flow Reserve (MFR) constitutes the gold standard for prognosis assessment. We propose a systematic investigation of the value of Artificial Intelligence (AI) to leverage [82\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{82}$$\end{document}Rb] Silicon PhotoMultiplier (SiPM) PET MPI for MACE prediction. We establish a general pipeline for AI model validation to assess and compare the performance of global (i.e. average of the entire MPI signal), regional (17 segments), radiomics and Convolutional Neural Network (CNN) models leveraging various MPI signals on a dataset of 234 patients. Results showed that all regional AI models significantly outperformed the global model (p<0.001\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p<0.001$$\end{document}), where the best AUC of 73.9% (CI 72.5–75.3) was obtained with a CNN model. A regional AI model based on MBF averages from 17 segments fed to a Logistic Regression (LR) constituted an excellent trade-off between model simplicity and performance, achieving an AUC of 73.4% (CI 72.3–74.7). A radiomics model based on intensity features revealed that the global average was the least important feature when compared to other aggregations of the MPI signal over the myocardium. We conclude that AI models can allow better personalized prognosis assessment for MACE.
Article
Based on the tenets in self-determination theory, a dual-process model of motivational processes was tested to predict accelerometer-assessed estimates of adolescents’ light physical activity (LPA), moderate to vigorous physical activity (MVPA), and sedentary time. Here, we hypothesized that (a) perceptions of psychological need support for exercise would be positively associated with LPA and MVPA and negatively associated with sedentary time via exercise-related psychological need satisfaction and autonomous exercise motivation and (b) perceptions of psychological need thwarting for exercise would be negatively associated with LPA and MVPA and positively associated with sedentary time via exercise-related psychological need frustration and controlled exercise motivation. Adolescents ( N = 338; 234 female) age 11–15 years ( M = 12.75, SD = .90) wore an ActiGraph accelerometer for 8 days and completed questionnaires pertaining to the self-determination-theory variables. Results showed psychological need support to indirectly and positively predict LPA and MVPA via psychological need satisfaction and autonomous exercise motivation. Although directly predictive of need frustration and indirectly predictive of controlled motivation and amotivation, the hypothesized effects from psychological need thwarting to the behavioral outcomes were nonsignificant. The current findings highlight the important role that need-supportive environments play in facilitating autonomous exercise motivation and behavior by being conducive to exercise-related psychological need satisfaction.
Article
In reliability engineering, different types of accelerated degradation tests have been used to obtain reliability information for evaluating highly reliable or expensive products. The step‐stress accelerated degradation test (SSADT) is one of the useful experimental schemes that can be used to save the resources of an experiment. Motivated by the SSADT data for operational amplifiers collected in Xi'an Microelectronic Technology Institute, in which the underlying degradation mechanism of the operational amplifiers is unknown, we propose a semiparametric approach for SSADT data analysis that does not require strict distributional assumptions. Specifically, the empirical saddlepoint approximation method is proposed to estimate the items' lifetime (first‐passage time) distribution at both stress levels included and not included in the SSADT experiment. Monte Carlo simulation studies are used to evaluate the performance and illustrate the advantages of the proposed approach. Finally, the proposed semiparametric approach is applied to analyze the motivating data set.
Chapter
Credit risk modeling techniques become mature over more than a half century of developments. While modeling for credit risk could be traced back much earlier, theoretical affirmation of statistical models, for example, the multinomial logit model as a special case of the more general conditional logit model, was first provided about a half century ago (McFadden, 1974) using the random utility maximization paradigm. Since then, statistical models like the generalized linear models (GLM) have become the most popular selection in modeling credit risks, though machine learning models start to challenge that dominance in some areas in recent years. Figure 3.1 outlines the structure of various models discussed in this chapter.
Conference Paper
Full-text available
The health sciences often involve survival data that may be censored and can contain correlated covariates. While there has been some research on the impact of correlated variables on survival models, there is a need for further investigation of how bootstrap methods can be used to handle correlation in survival analysis. In fact, if the variables are strongly correlated, the bootstrap samples from the prior may mask the effect of each other, making it difficult to discern the true relationship between the variables and the response, which can ultimately lead to unrealistic estimates. This article aims to extend the Proper Bayesian bootstrap ensemble tree model for analyzing survival data with highly correlated covariates. The model’s performance was assessed through a simulated study, demonstrating better results compared to traditional survival models, such as the Cox model and survival random forest, with greater stability in terms of the integrated Brier score, particularly with smaller sample sizes.
Article
In order to assess prognostic risk for individuals in precision health research, risk prediction models are increasingly used, in which statistical models are used to estimate the risk of future outcomes based on clinical and nonclinical characteristics. The predictive accuracy of a risk score must be assessed before it can be used in routine clinical decision making, where the receiver operator characteristic curves, precision–recall curves, and their corresponding area under the curves are commonly used metrics to evaluate the discriminatory ability of a continuous risk score. Among these the precision–recall curves have been shown to be more informative when dealing with unbalanced biomarker distribution between classes, which is common in rare event, even though except one, all existing methods are proposed for classic uncensored data. This paper is therefore to propose a novel nonparametric estimation approach for the time‐dependent precision–recall curve and its associated area under the curve for right‐censored data. A simulation is conducted to show the better finite sample property of the proposed estimator over the existing method and a real‐world data from primary biliary cirrhosis trial is used to demonstrate the practical applicability of the proposed estimator.
Article
Both bootstrap analysis method and data presentation in behavioral tests are suggested to transcribe preference choice and/or multiple choices, the study of the Egyptian fruit bat pup (Rousettus aegyptiacus) preferences serving as an example. The use of the bootstrap method allows for the reliability of a given result from poor data to be evaluated. The new method of visualization proposed allows to clearly present data at a complex choice.
Article
Inhaled corticosteroid (ICS) is a mainstay treatment for controlling asthma and preventing exacerbations in patients with persistent asthma. Many types of ICS drugs are used, either alone or in combination with other controller medications. Despite the widespread use of ICSs, asthma control remains suboptimal in many people with asthma. Suboptimal control leads to recurrent exacerbations, causes frequent ER visits and inpatient stays, and is due to multiple factors. One such factor is the inappropriate ICS choice for the patient. While many interventions targeting other factors exist, less attention is given to inappropriate ICS choice. Asthma is a heterogeneous disease with variable underlying inflammations and biomarkers. Up to 50% of people with asthma exhibit some degree of resistance or insensitivity to certain ICSs due to genetic variations in ICS metabolizing enzymes, leading to variable responses to ICSs. Yet, ICS choice, especially in the primary care setting, is often not tailored to the patient’s characteristics. Instead, ICS choice is largely by trial and error and often dictated by insurance reimbursement, organizational prescribing policies, or cost, leading to a one-size-fits-all approach with many patients not achieving optimal control. There is a pressing need for a decision support tool that can predict an effective ICS at the point of care and guide providers to select the ICS that will most likely and quickly ease patient symptoms and improve asthma control. To date, no such tool exists. Predicting which patient will respond well to which ICS is the first step toward developing such a tool. However, no study has predicted ICS response, forming a gap. While the biologic heterogeneity of asthma is vast, few, if any, biomarkers and genotypes can be used to systematically profile all patients with asthma and predict ICS response. As endotyping or genotyping all patients is infeasible, readily available electronic health record data collected during clinical care offer a low-cost, reliable, and more holistic way to profile all patients. In this paper, we point out the need for developing a decision support tool to guide ICS selection and the gap in fulfilling the need. Then we outline an approach to close this gap via creating a machine learning model and applying causal inference to predict a patient’s ICS response in the next year based on the patient’s characteristics. The model uses electronic health record data to characterize all patients and extract patterns that could mirror endotype or genotype. This paper supplies a roadmap for future research, with the eventual goal of shifting asthma care from one-size-fits-all to personalized care, improve outcomes, and save health care resources.
Article
Прогнозирование движения астероидов, сближающихся с Землей, (АСЗ) представляет собой комплексную задачу, требующую использования сложной техники, различных методик и больших вычислительных затрат. В последние десятилетия достигнут существенный прогресс в данной области, однако многие проблемы еще ожидают своего решения. В данной работе рассмотрены основные методы прогнозирования движения АСЗ, используемые на разных этапах, начиная c проведения наблюдений и заканчивая изучением таких особенностей движения, как тесные сближения и столкновения с планетами, орбитальные и вековые резонансы, хаотичность и предсказуемость движения. Статья основана на докладе, сделанном на научно-практической конференции с международным участием “Околоземная астрономия–2022” (18–21 апреля 2022 г., Москва).
Article
Due to Covid-19 restrictions, surveys often could not be conducted in originally planned face-to-face mode, and switched to online modes or used different mixed-mode designs. A combination of CATI and CAPI was used for the Austrian ISSP survey on Environment 2020/2021 (N=1.261), which in the past had always been conducted face-to-face. Mixed-mode surveys facilitate field access in pandemic times and show potential to reduce non-response and coverage errors (desired selection effect). However, the combination of different modes comes along with a series of risks such as mode-effects causing bias due to measurement effects. From an analytical perspective, the challenge arising is to disentangle selection and measurement effects. Thus, we analyse differences in the factorial structure and response distributions of two social constructs using Bayesian multigroup confirmatory factor analysis and linear regression. These represent institutional trust and the willingness to sacrifice for environmental protection. The findings show support for scalar invariance and therefore the absence of CAPI vs. CATI mode-effects on the factorial structure for both constructs. However, despite adjusting for differences in sample composition we observe a higher average willingness within the CATI sample. Based on these results, we discuss implications for the interpretation of mode effects in mixed mode surveys.
Conference Paper
Full-text available
Дощові, талі, поливочні води, що формуються на територіях виробничих підприємств, автомийок, міської забудови, містять різного роду забруднення, які повинні бути видалені перед скиданням в центральну каналізацію або водойми. Ці стічні води відносяться до категорії поверхневих, і законодавством України встановлюються нормативи, до яких повинні бути очищені поверхневі стоки. 4. Розроблена установка дозволяє забезпечити очистку поверхнево-зливових стічних вод відповідно до діючих норм і дозволить запобігти засмічення і порушення роботи міських каналізаційних мереж, і забруднення водних об'єктів.
Article
Full-text available
Both the standard jackknife and a weighted jackknife are investigated in the general linear model situation. Properties of bias reduction and standard error estimation are derived. and the weighted jackknife shown to be superior for unbalanced data. There is a preliminary discussion of robust regression fitting using jackknife pseudo-values.
Article
Research on the jackknife technique since its introduction by Quenouille and Tukey is reviewed. Both its role in bias reduction and in robust interval estimation are treated. Some speculations and suggestions about future research are made. The bibliography attempts to include all published work on jackknife methodology.
Article
For normal observations {Xi, j = 1, …, N}, certain weighted sums Yi, are used to draw inferences about the sum Y of a further N observations. The same family of weights {Wij} are applied in error analysis of general computations t on general samples x1, …, xN, with approximate validity in a wide range of situations.
Article
SUMMARY The flexibility of the definition of the first-order generalized jackknife is exploited so that its relation to the method of statistical differentials can be seen. The estimators presented have the same bias reduction and asymptotic distributional properties as the usual generalized jackknife. A limiting case produces an infinitesimal jackknife which represents a generalization of statistical differentials.
Article
It is proved that the jackknife estimate $\tilde{\theta} = n\hat{\theta} - (n - 1)(\sum \hat{\theta}_{-i}/n)$ of a function $\theta = f(\beta)$ of the regression parameters in a general linear model $\mathbf{Y} = \mathbf{X\beta} + \mathbf{e}$ is asymptotically normally distributed under conditions that do not require $\mathbf{e}$ to be normally distributed. The jackknife is applied by deleting in succession each row of the $\mathbf{X}$ matrix and $\mathbf{Y}$ vector in order to compute $\hat{\mathbf{\beta}}_{-i}$, which is the least squares estimate with the $i$th row deleted, and $\hat{\theta}_{-i} = f(\hat\mathbf{\beta}_{-i})$. The standard error of the pseudo-values $\tilde{\theta}_i = n\hat{\theta} - (n - 1)\hat{\theta}_{-i}$ is a consistent estimate of the asymptotic standard deviation of $\tilde{\theta}$ under similar conditions. Generalizations and applications are discussed.
Article
Thesis (Ph. D.)--University of California, Los Angeles-Biostatistics. Includes vita and abstract.
Article
SUMMARY One estimate of a symmetric location cumulative distribution function F{x - θ) is obtained by symmetrizing the sample cumulative distribution function with respect to the estimated centre of symmetry θ. We prove an earlier conjecture that if θ is asymptotically efficient, then the symmetrized estimated cumulative distribution function is uniformly superior to the sample distribution function.
Article
Articles, books, and technical reports on the theoretical and experimental estimation of probability of misclassification are listed for the case of correctly labeled or preclassified training data. By way of introduction, the problem of estimating the probability of misclassification is discussed in order to characterize the contributions of the literature.
The Advanced Theory of Statistics
  • Kendall M Stuart