Article

Probability Inequalities for Sums of Bounded Random Variables

Taylor & Francis
Journal of the American Statistical Association
Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Hoeffding's inequality [64] ensures that | p i −p i | ≤ ε, with a failure probability of at most δ/M , as long as ...
... Finally, in Step 5, the overall output of our method is twice the mean µ of these random variables. We find T for which the sample mean µ is an ε-accurate estimate of the actual expectation value (Erg obs (ρ)) via Hoeffding's inequality [64], which implies for any ε > 0, ...
... For such i.i.d. random variables, for all ε > 0, Hoeffding's inequality [64,85] states that ...
Preprint
Full-text available
Extracting work from a physical system is one of the cornerstones of quantum thermodynamics. The extractable work, as quantified by ergotropy, necessitates a complete description of the quantum system. This is significantly more challenging when the state of the underlying system is unknown, as quantum tomography is extremely inefficient. In this article, we analyze the number of samples of the unknown state required to extract work. With only a single copy of an unknown state, we prove that extracting any work is nearly impossible. In contrast, when multiple copies are available, we quantify the sample complexity required to estimate extractable work, establishing a scaling relationship that balances the desired accuracy with success probability. Our work develops a sample-efficient protocol to assess the utility of unknown states as quantum batteries and opens avenues for estimating thermodynamic quantities using near-term quantum computers.
... where [88,89]: ...
... For confidence level 1 − ϵ, the required number of measurements satisfies [88]: ...
... Theorem 29 (Preparation Error Bounds). For target fidelity F , the error probability satisfies [88]: ...
Preprint
Full-text available
This work proposes quantum circuit complexity-the minimal number of elementary operations needed to implement a quantum transformation-be established as a legitimate physical observable. We prove that circuit complexity satisfies all requirements for physical observables, including self-adjointness, gauge invariance, and a consistent measurement theory with well-defined uncertainty relations. We develop complete protocols for measuring complexity in quantum systems and demonstrate its connections to gauge theory and quantum gravity. Our results suggest that computational requirements may constitute physical laws as fundamental as energy conservation. This framework grants insights into the relationship between quantum information, gravity, and the emergence of spacetime geometry while offering practical methods for experimental verification. Our results indicate that the physical universe may be governed by both energetic and computational constraints, with profound implications for our understanding of fundamental physics.
... The key issue in using a PAC-Bayesian analysis is to account for the dependence across tuples (via negative samples). Our main technical contribution is to show that current PAC-Bayes bounds can be improved when Hoeffding's and McDiarmid's inequalities are applied carefully [20,30]. We further build on recent advances in risk certificates for neural networks [35,34], and, hence, arrive at certificates that are non-vacuous and significantly tighter than previous ones [32]. ...
... terms, allowing us to directly apply the PAC-Bayes bound. The overall bound is summarized through the following theorem that provides a novel PAC-Bayes bound for the SimCLR population loss, extending the PAC-Bayes-kl bound using Hoeffding's inequality [20]. ...
Preprint
Full-text available
Contrastive representation learning is a modern paradigm for learning representations of unlabeled data via augmentations -- precisely, contrastive models learn to embed semantically similar pairs of samples (positive pairs) closer than independently drawn samples (negative samples). In spite of its empirical success and widespread use in foundation models, statistical theory for contrastive learning remains less explored. Recent works have developed generalization error bounds for contrastive losses, but the resulting risk certificates are either vacuous (certificates based on Rademacher complexity or f-divergence) or require strong assumptions about samples that are unreasonable in practice. The present paper develops non-vacuous PAC-Bayesian risk certificates for contrastive representation learning, considering the practical considerations of the popular SimCLR framework. Notably, we take into account that SimCLR reuses positive pairs of augmented data as negative samples for other data, thereby inducing strong dependence and making classical PAC or PAC-Bayesian bounds inapplicable. We further refine existing bounds on the downstream classification loss by incorporating SimCLR-specific factors, including data augmentation and temperature scaling, and derive risk certificates for the contrastive zero-one risk. The resulting bounds for contrastive loss and downstream prediction are much tighter than those of previous risk certificates, as demonstrated by experiments on CIFAR-10.
... This is the author's version which has not been fully edited and content may change prior to final publication. Lemma 4 is derived from [11], which results in the concentration inequality applicable when 0 < ϵ ≤ 1. ...
... By applying Hölder's inequality [11] and recalling the estimate in(8) ...
Article
Full-text available
In this paper, we investigate the performance of Gaussian Empirical Gain Maximization (EGM) in a regression setting and conduct a detailed theoretical analysis, particularly in the presence of heavy-tailed noise, where we establish improved convergence rates. To achieve this, we introduce a new moment condition, from which we derive several enhanced theoretical results for the Gaussian model. First, we propose a newcomparison theorem, proving that this theorem plays a crucial role in improving the estimation of approximation errors and variance. This theorem not only characterizes the regression properties of Gaussian EGM but also plays a key role in enhancing the convergence rate. Second, we derive improved error bounds for the Gaussian model, providing theoretical support for the application of Gaussian EGM under different noise conditions. broadens our theoretical understanding of Gaussian EGM.
... 19 Query Procedure: Input: parameters 0 < ε < 1 and µ ≥ 1 Output: a clustering result with respect to ε and µ 20 V core ← ∅, E cr ← ∅; 21 V core ← CoreFindStr.find-core(ε, µ); 22 for each u ∈ V core do 23 for each v ∈ N (u) do 24 if σ(u, v) ≥ ε, add (u, v) to E cr ; otherwise, break; • update((u, v), op): maintain the counters d u and I(u, x) for each x ∈ N (u) according to the given update. Perform the same maintenance symmetrically for the end-vertex v. Therefore, ...
... Dice similarity, we have Dice(u, v) = E[X] and byLine 17,σ(u, v) =X. Therefore, Pr[|σ(u, v)−Dice(u, v)| > 1 2 ρ] = Pr[|X − E[X]| > 1 2 ρ].According to the Hoeffiding Bound[23], by setting L = 1 2·r 2 ln 2 δ , we have Pr[|X − E[X] > r] ≤ δ. As a result, by setting δ = 1 2n 4 , r j = 1 4 ρ, r c = 1 4 ρ 2 , and r d = 1 2 ρ, respectively for Jaccard, Cosine and Dice similarities, we can get the corresponding number of samples L to achieveσ(u, v) being a correct 1 2 ρ-absolute approximation to σ(u, v) with high probability at least 1 − 1 2n 4 .Proof of Lemma 2. Recall that for each edge (u, v) right afterσ(u, v) is computed, (u, v) allocates an affordability quota q(u, v) = 1 4 ⌊τ (u, v)⌋ 2 to an entry in a bucket B i with index i = log 2 q(u, v) in both the sorted linked bucket lists B(u) ...
Preprint
We study structural clustering on graphs in dynamic scenarios, where the graphs can be updated by arbitrary insertions or deletions of edges/vertices. The goal is to efficiently compute structural clustering results for any clustering parameters ϵ\epsilon and μ\mu given on the fly, for arbitrary graph update patterns, and for all typical similarity measurements. Specifically, we adopt the idea of update affordability and propose an a-lot-simpler yet more efficient (both theoretically and practically) algorithm (than state of the art), named VD-STAR to handle graph updates. First, with a theoretical clustering result quality guarantee, VD-STAR can output high-quality clustering results with up to 99.9% accuracy. Second, our VD-STAR is easy to implement as it just needs to maintain certain sorted linked lists and hash tables, and hence, effectively enhances its deployment in practice. Third and most importantly, by careful analysis, VD-STAR improves the per-update time bound of the state-of-the-art from O(log2n)O(\log^2 n) expected with certain update pattern assumption to O(logn)O(\log n) amortized in expectation without any update pattern assumption. We further design two variants of VD-STAR to enhance its empirical performance. Experimental results show that our algorithms consistently outperform the state-of-the-art competitors by up to 9,315 times in update time across nine real datasets.
... The efficiency of fidelity estimation protocols between a measured state and a target state depend on the number of measurements performed. The fidelity can be written as a classical expectation value of estimator functions over these measurement outcomes, and with a finite number of samples N , we use Hoeffding's inequality [57] to estimate the closeness of the average estimator values calculated from these samples to the expectation value of the estimators. ...
... We direct the reader to section C of the appendix for the proof, which combines Eq. (19) with Hoeffding's inequality [57]. ...
Preprint
Modern quantum devices are highly susceptible to errors, making the verification of their correct operation a critical problem. Usual tomographic methods rapidly become intractable as these devices are scaled up. In this paper, we introduce a general framework for the efficient verification of large quantum systems. Our framework combines robust fidelity witnesses with efficient classical post-processing to implement measurement back-propagation. We demonstrate its usefulness by focusing on the verification of bosonic quantum systems, and developing efficient verification protocols for large classes of target states using the two most common types of Gaussian measurements: homodyne and heterodyne detection. Our protocols are semi-device independent, designed to function with minimal assumptions about the quantum device being tested, and offer practical improvements over previous existing approaches. Overall, our work introduces efficient methods for verifying the correct preparation of complex quantum states, and has consequences for calibrating large quantum devices, witnessing quantum properties, supporting demonstrations of quantum computational speedups and enhancing trust in quantum computations.
... See Theorem 1 and Example 3 in Chernoff [1952], and Theorem 1 in Hoeffding [1963]. ...
... See Theorem 2 in Hoeffding [1963] and Azuma [1967]. ...
Preprint
Full-text available
We provide bounds on the tail probabilities for simple procedures that generate random samples _without replacement_, when the probabilities of being selected need not be equal.
... Lemma 2.2 (Hoeffding's inequality [13]). For X 1 , X 2 . . . ...
... This matches up with the distribution of S. Now, also observe that this distribution is identical to the one where we have a bit string with 2p 1s and 2p 0s, and we sum up the elements at 2p uniformly random distinct positions (without replacement). In [13], Hoeffding showed that the bound of Lemma 2.2 also holds in this setting. So, using t = nǫ in Lemma 2.2, we get that ...
Preprint
Full-text available
We describe a new shadow tomography algorithm that uses n=Θ(mlogm/ϵ2)n=\Theta(\sqrt{m}\log m/\epsilon^2) samples, for m measurements and additive error ϵ\epsilon, which is independent of the dimension of the quantum state being learned. This stands in contrast to all previously known algorithms that improve upon the naive approach. The sample complexity also has optimal dependence on ϵ\epsilon. Additionally, this algorithm is efficient in various aspects, including quantum memory usage (possibly even O(1)), gate complexity, classical computation, and robustness to qubit measurement noise. It can also be implemented as a read-once quantum circuit with low quantum memory usage, i.e., it will hold only one copy of ρ\rho in memory, and discard it before asking for a new one, with the additional memory needed being O(mlogn)O(m\log n). Our approach builds on the idea of using noisy measurements, but instead of focusing on gentleness in trace distance, we focus on the \textit{gentleness in shadows}, i.e., we show that the noisy measurements do not significantly perturb the expected values.
... In order to identify the best attribute to test at a given node, only a small subset of the streaming data is considered. The number of instances to be considered at each node is identified by using a Hoeffding bound (Hoeffding, 1963), which ensures that with high probability, the same attribute would be chosen using infinite examples. ...
Article
Full-text available
Aim/Purpose: This paper describes how to use a multilayer perceptron to improve concept drift recovery in streaming environments. Background: Classifying instances in a data stream environment with concept drift is a challenging topic. The base learner must be adapted online to the current data. Several data mining algorithms have been adapted/used to this type of environment. In this study, two techniques are used to speed up the adaptation of an artificial neural network to the current data, increasing its predictive accuracy while detecting the concept drift sooner. Methodology: Experiments were performed to analyze how some techniques behave in different scenarios and compare them with other classifiers built to deal with data streams and concept drifts. Contribution: This study suggests two techniques to improve the classification results: an embedded concept drift detection method to identify when a change has occurred and setting the learning rate to a higher level whenever a new concept is being learned to give more weight to recent instances, with its value decreased over time. Findings: Results indicate that gradually reducing the learning rate with an embedded concept drift detector has better statistical results than other single classifiers built to deal with data streams and concept drifts. Recommendations for Practitioners: Based on the empirical results, this study provides recommendations on how to improve the multilayer perceptron in data stream environments suffering from concept drifts. Recommendation for Researchers: Researchers should conduct investigations to increase the number of base classifiers used in data stream environments and in situations where concept drifts occur. Impact on Society: The objective of this study is to increase the use of multilayer perceptrons in data stream environments suffering from concept drifts, as nowadays, Hoeffding Trees and Naive Bayes are the base classifiers mostly used. Future Research: Additional research includes adapting the online learning rate by increasing/decreasing it based on the performance of the Multilayer Perceptron. This scheme would allow the removal of parameters that must be set by the user, like learning rate upper bound and number of instances to return to the stable value.
... If we regard the { } as random variables with the following assumptions: 1) { } are independent; 2) has symmetric distribution regarding zero (so ( ) = 0). Since | | ≤ , according to [46,47], { } are sub-Gaussian, and the following concentration inequality holds for each > 0 [16,46,47]: and cannot guarantee a deterministic bound of the multivariate QoI error, the QoI validation module in the QPET will post-process the outlier data points so that the QoI error will still be strictly bounded in any case. Moreover, if we set a high confidence level (such as = 0.999), the increase of benefits the compression ratio much more than the overhead brought by correcting very few data points. ...
Preprint
Error-bounded lossy compression has been widely adopted in many scientific domains because it can address the challenges in storing, transferring, and analyzing the unprecedented amount of scientific data. Although error-bounded lossy compression offers general data distortion control by enforcing strict error bounds on raw data, they may fail to meet the quality requirements on the results of downstream analysis derived from raw data, a.k.a Quantities of Interest (QoIs). This may lead to uncertainties and even misinterpretations in scientific discoveries, significantly limiting the use of lossy compression in practice. In this paper, we propose QPET, a novel, versatile, and portable framework for QoI-preserving error-bounded lossy compression, which overcomes the challenges of modeling diverse QoIs by leveraging numerical strategies. QPET features (1) high portability to multiple existing lossy compressors, (2) versatile preservation to most differentiable univariate and multivariate QoIs, and (3) significant compression improvements in QoI-preservation tasks. Experiments with six real-world datasets demonstrate that QPET outperformed existing QoI-preserving compression framework in terms of speed, and integrating QPET into state-of-the-art error-bounded lossy compressors can gain up to 250% compression ratio improvements to original compressors and up to 75% compression ratio improvements to existing QoI-integrated scientific compressors. Under the same level of peak signal-to-noise ratios in the QoIs, QPET can improve the compression ratio by up to 102%.
... Lemma 3.1 (Hoeffding's inequality (Hoeffding 1963)) Let X 1 , . . . , X n be independently random variables bounded in the interval [0, 1] and letX denote their empirical mean. ...
Article
Full-text available
In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k-submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives its k-submodular reward. The k-submodularity enriches the application scenarios of the problem we consider in contexts characterized by diverse options. We present two simple greedy algorithms for two budget constraints (total size and individual size) and provide the theoretical analysis for upper bound of the regret value. For the total size budget, the proposed algorithm achieves a 12\frac{1}{2}-regret upper bound by O~(T23(kn)13B)\tilde{\mathcal {O}}\left( T^\frac{2}{3}(kn)^{\frac{1}{3}}B\right) where T is the time horizon, n is the number of base arms and B denotes the budget. For the individual size budget, the proposed algorithm achieves a 13\frac{1}{3}-regret with the same upper bound. Moreover, we conduct numerical experiments on these two algorithms to empirically demonstrate the effectiveness.
... In 1963, Hoeffding systematically derived O(e −a 2 )-decay tail probabilities for the sum of independent r.v.s. Lemma 5 (Hoeffding's inequality for bounded r.v.s, Theorem 2 in [38]). Let {X i } n i=1 be independent r.v.s satisfying the bounded condition a i ≤ X i ≤ b i . ...
Preprint
Full-text available
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. We also extend the discussion to K-armed contextual bandits and SCAB, examining their methodologies, regret analyses, and discussing the relation between the SCAB problems and the functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.
... Lemma 2.33 (Hoeffding's Inequality for sampling without replacement [Hoe94]). Let n and m be two integers such that 1 ≤ n ≤ m, and x 1 , . . . ...
Preprint
Full-text available
The Huge Object model of property testing [Goldreich and Ron, TheoretiCS 23] concerns properties of distributions supported on {0,1}n\{0,1\}^n, where n is so large that even reading a single sampled string is unrealistic. Instead, query access is provided to the samples, and the efficiency of the algorithm is measured by the total number of queries that were made to them. Index-invariant properties under this model were defined in [Chakraborty et al., COLT 23], as a compromise between enduring the full intricacies of string testing when considering unconstrained properties, and giving up completely on the string structure when considering label-invariant properties. Index-invariant properties are those that are invariant through a consistent reordering of the bits of the involved strings. Here we provide an adaptation of Szemer\'edi's regularity method for this setting, and in particular show that if an index-invariant property admits an ϵ\epsilon-test with a number of queries depending only on the proximity parameter ϵ\epsilon, then it also admits a distance estimation algorithm whose number of queries depends only on the approximation parameter.
... Theorem 5 (Theorem 2 in [4]). Let X 1 , . . . ...
Preprint
We provide space complexity lower bounds for data structures that approximate logistic loss up to ϵ\epsilon-relative error on a logistic regression problem with data XRn×d\mathbf{X} \in \mathbb{R}^{n \times d} and labels y{1,1}d\mathbf{y} \in \{-1,1\}^d. The space complexity of existing coreset constructions depend on a natural complexity measure μy(X)\mu_\mathbf{y}(\mathbf{X}), first defined in (Munteanu, 2018). We give an Ω~(dϵ2)\tilde{\Omega}(\frac{d}{\epsilon^2}) space complexity lower bound in the regime μy(X)=O(1)\mu_\mathbf{y}(\mathbf{X}) = O(1) that shows existing coresets are optimal in this regime up to lower order factors. We also prove a general Ω~(dμy(X))\tilde{\Omega}(d\cdot \mu_\mathbf{y}(\mathbf{X})) space lower bound when ϵ\epsilon is constant, showing that the dependency on μy(X)\mu_\mathbf{y}(\mathbf{X}) is not an artifact of mergeable coresets. Finally, we refute a prior conjecture that μy(X)\mu_\mathbf{y}(\mathbf{X}) is hard to compute by providing an efficient linear programming formulation, and we empirically compare our algorithm to prior approximate methods.
... It is typically derived using statistical learning theory and depends on factors such as the complexity of the model (hypothesis class), the number of examples in the training data, and the randomness in the data generation process. Hoeffding's inequality (Hoeffding 1994) gives a common form of the generalization bound in classical machine learning. Hoeffding's inequality provides an error bound for f based on D (Duchi n.d.) by giving the deviation probability ofÊ(g) from E(g) as function of a positive tolerance and total number of training samples N and is given by ...
Article
Full-text available
Despite the mounting anticipation for the quantum revolution, the success of quantum machine learning (QML) in the noisy intermediate-scale quantum (NISQ) era hinges on a largely unexplored factor: the generalization error bound, a cornerstone of robust and reliable machine learning models. Current QML research, while exploring novel algorithms and applications extensively, is predominantly situated in the context of noise-free, ideal quantum computers. However, quantum circuit (QC) operations in NISQ-era devices are susceptible to various noise sources and errors. In this article, we conducted a systematic mapping study (SMS) to explore the state-of-the-art generalization error bound for QML in NISQ-era devices and analyze the latest practices in the field. Our study systematically summarizes the existing computational platforms with quantum hardware, datasets, optimization techniques, and the proposed error bounds detailed in the literature. It also highlights the limitations and challenges in QML in the NISQ era and discusses future research directions to advance the field. Using a detailed Boolean operators query in five reliable indexers, we collected 544 papers and filtered them to a small set of 37 relevant articles. This filtration was done following the best practice of SMS with well-defined research questions and inclusion and exclusion criteria.
... Lemma 2 (Hoeffding's inequality [94]). Given ϵ, δ ∈ (0, 1), c > 0, let N s = log(2/δ)c 2 /(2ϵ 2 ) Suppose X 1 , X 2 , . . . ...
Preprint
Understanding the capabilities of classical simulation methods is key to identifying where quantum computers are advantageous. Not only does this ensure that quantum computers are used only where necessary, but also one can potentially identify subroutines that can be offloaded onto a classical device. In this work, we show that it is always possible to generate a classical surrogate of a sub-region (dubbed a "patch") of an expectation landscape produced by a parameterized quantum circuit. That is, we provide a quantum-enhanced classical algorithm which, after simple measurements on a quantum device, allows one to classically simulate approximate expectation values of a subregion of a landscape. We provide time and sample complexity guarantees for a range of families of circuits of interest, and further numerically demonstrate our simulation algorithms on an exactly verifiable simulation of a Hamiltonian variational ansatz and long-time dynamics simulation on a 127-qubit heavy-hex topology.
... As for using RS to approximate randomisation probabilities, we saw in the previous section that OCs could only be estimated using simulation. For EPASA and VPASA, we cannot directly use a Chernoff bound to guarantee the accuracy of such estimates as we do not know the distribution of the simulated outcomes and bounds on them are large, so that Hoeffding's inequality (Hoeffding, 1963) gives very weak guarantees. Nevertheless, we expect an even smaller impact on these metrics because of the greater accuracy of RS. ...
Preprint
Full-text available
To implement a Bayesian response-adaptive trial it is necessary to evaluate a sequence of posterior probabilities. This sequence is often approximated by simulation due to the unavailability of closed-form formulae to compute it exactly. Approximating these probabilities by simulation can be computationally expensive and impact the accuracy or the range of scenarios that may be explored. An alternative approximation method based on Gaussian distributions can be faster but its accuracy is not guaranteed. The literature lacks practical recommendations for selecting approximation methods and comparing their properties, particularly considering trade-offs between computational speed and accuracy. In this paper, we focus on the case where the trial has a binary endpoint with Beta priors. We first outline an efficient way to compute the posterior probabilities exactly for any number of treatment arms. Then, using exact probability computations, we show how to benchmark calculation methods based on considerations of computational speed, patient benefit, and inferential accuracy. This is done through a range of simulations in the two-armed case, as well as an analysis of the three-armed Established Status Epilepticus Treatment Trial. Finally, we provide practical guidance for which calculation method is most appropriate in different settings, and how to choose the number of simulations if the simulation-based approximation method is used.
... where the final inequality holds because Cm < εn/2. By Hoeffding's inequality [Hoe63], it follows that ...
Preprint
Full-text available
Finding near-rainbow Hamilton cycles in properly edge-coloured graphs was first studied by Andersen, who proved in 1989 that every proper edge colouring of the complete graph on n vertices contains a Hamilton cycle with at least n2nn-\sqrt{2n} distinct colours. This result was improved to nO(log2n)n-O(\log^2 n) by Balogh and Molla in 2019. In this paper, we consider Anderson's problem for general graphs with a given minimum degree. We prove every globally n/8-bounded (i.e. every colour is assigned to at most n/8 edges) properly edge-coloured graph G with δ(G)(1/2+ε)n\delta(G) \geq (1/2+\varepsilon)n contains a Hamilton cycle with no(n)n-o(n) distinct colours. Moreover, we show that the constant 1/8 is best possible.
... Hoeffding's inequality [10] for bounded random variables X 1 , . . . , X T implies: ...
Preprint
Full-text available
Point cloud representation has gained traction due to its efficient memory usage and simplicity in acquisition, manipulation, and storage. However, as point cloud sizes increase, effective down-sampling becomes essential to address the computational requirements of downstream tasks. Classical approaches, such as furthest point sampling (FPS), perform well on benchmarks but rely on heuristics and overlook geometric features, like curvature, during down-sampling. In this paper, We introduce a reinforcement learning-based sampling algorithm that enhances FPS by integrating curvature information. Our approach ranks points by combining FPS-derived soft ranks with curvature scores computed by a deep neural network, allowing us to replace a proportion of low-curvature points in the FPS set with high-curvature points from the unselected set. Existing differentiable sampling techniques often suffer from training instability, hindering their integration into end-to-end learning frameworks. By contrast, our method achieves stable end-to-end learning, consistently outperforming baseline models across multiple downstream geometry processing tasks. We provide comprehensive ablation studies, with both qualitative and quantitative insights into the effect of each feature on performance. Our algorithm establishes state-of-the-art results for classification, segmentation and shape completion, showcasing its robustness and adaptability.
... After computing the different perturbed noisy circuits, the output state of each noisy circuitP j (U ) is measured N times, and the empirical mean, denoted as ϕ(P j (U )), is computed. This value is an estimate of Tr OP j (U ) |0⟩ ⟨0| ⊗n , as shown by Hoeffding's inequality [17], i.e., ...
Preprint
To address the challenge posed by noise in real quantum devices, quantum error mitigation techniques play a crucial role. These techniques are resource-efficient, making them suitable for implementation in noisy intermediate-scale quantum devices, unlike the more resource-intensive quantum error correction codes. A notable example of such a technique is Clifford Data Regression, which employs a supervised learning approach. This work investigates two variants of this technique, both of which introduce a non-trivial set of gates into the original circuit. The first variant uses multiple copies of the original circuit, while the second adds a layer of single-qubit rotations. Different characteristics of these methods are analyzed theoretically, such as their complexity, or the scaling of the error with various parameters. Additionally, the performance of these methods is evaluated through numerical experiments, demonstrating a reduction in root mean square error.
... Proposition 4 ( Hoeffding (1963)). Let Z 1 , . . . ...
Preprint
Distributional regression aims at estimating the conditional distribution of a targetvariable given explanatory co-variates. It is a crucial tool for forecasting whena precise uncertainty quantification is required. A popular methodology consistsin fitting a parametric model via empirical risk minimization where the risk ismeasured by the Continuous Rank Probability Score (CRPS). For independentand identically distributed observations, we provide a concentration result for theestimation error and an upper bound for its expectation. Furthermore, we considermodel selection performed by minimization of the validation error and provide aconcentration bound for the regret. A similar result is proved for convex aggregationof models. Finally, we show that our results may be applied to various models suchas Ensemble Model Output Statistics (EMOS), distributional regression networks,distributional nearest neighbors or distributional random forests and we illustrateour findings on two data sets (QSAR aquatic toxicity and Airfoil self-noise).
... Given a fixed ratio t ′ n and a small ϵ > 0, the probability P BDD (n, t ′ , p ′ ) as a function of p ′ exhibits a sharp decline at p ′ = t ′ n − ϵ, with exponential decay in n, by a bound from Hoeffding's inequality [62] under the condition t ′ ≥ np ′ : ...
Preprint
Erasures are the primary type of errors in physical systems dominated by leakage errors. While quantum error correction (QEC) using stabilizer codes can combat these error, the question of achieving near-capacity performance with explicit codes and efficient decoders remains a challenge. Quantum decoding is a classical computational problem that decides what the recovery operation should be based on the measured syndromes. For QEC, using an accurate decoder with the shortest possible runtime will minimize the degradation of quantum information while awaiting the decoder's decision. We examine the quantum erasure decoding problem for general stabilizer codes and present decoders that not only run in linear-time but are also accurate. We achieve this by exploiting the symmetry of degenerate errors. Numerical evaluations show near maximum-likelihood decoding for various codes, achieving capacity performance with topological codes and near-capacity performance with non-topological codes. We furthermore explore the potential of our decoders to handle other error models, such as mixed erasure and depolarizing errors, and also local deletion errors via concatenation with permutation invariant codes.
... Finally, the third inequality follows from Hoeffding's inequality [16,Theorem 1], and thus we have established (8). Further, using the Taylor expansion 1/(1 + x) = 1 + O(x) in (13), we obtain ...
... When following a path through a learned MDP, these errors may accumulate, leading to potentially significant differences in values (and possibly optimal policies) between the learned model and the true underlying MDP [47,88]. To account for these errors, confidence intervals around the probabilities or distributions may be computed via, e.g., Hoeffding's inequality [61] or the Weissman bound [116], and included in the learned model, yielding an RMDP. Resulting policies and values can then be given a probably approximately correct (PAC) guarantee. ...
Preprint
Full-text available
Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.
... To go one step further, we now use Hoeffding's inequality [26]. As the heads of items are selected uniformly at random, X v is independent from other items' random variables. ...
Preprint
Full-text available
Distributed systems often serve dynamic workloads and resource demands evolve over time. Such a temporal behavior stands in contrast to the static and demand-oblivious nature of most data structures used by these systems. In this paper, we are particularly interested in consistent hashing, a fundamental building block in many large distributed systems. Our work is motivated by the hypothesis that a more adaptive approach to consistent hashing can leverage structure in the demand, and hence improve storage utilization and reduce access time. We initiate the study of demand-aware consistent hashing. Our main contribution is H&A, a constant-competitive online algorithm (i.e., it comes with provable performance guarantees over time). H&A is demand-aware and optimizes its internal structure to enable faster access times, while offering a high utilization of storage. We further evaluate H&A empirically.
... We can bound the right hand by considering Hoeffding's inequality [64]. Let X i be a set of independent random variables, and let X i ∈ [a i , b i ] almost surely, then ...
Article
Full-text available
Quantum re-uploading models have been extensively investigated as a form of machine learning within the context of variational quantum algorithms. Their trainability and expressivity are not yet fully understood and are critical to their performance. In this work, we address trainability through the lens of the magnitude of the gradients of the cost function. We prove bounds for the differences between gradients of the better-studied data-less parameterized quantum circuits and re-uploading models. We coin the concept of absorption witness to quantify such difference. For the expressivity, we prove that quantum re-uploading models output functions with vanishing high-frequency components and upper-bounded derivatives with respect to data. As a consequence, such functions present limited sensitivity to fine details, which protects against overfitting. We performed numerical experiments extending the theoretical results to more relaxed and realistic conditions. Overall, future designs of quantum re-uploading models will benefit from the strengthened knowledge delivered by the uncovering of absorption witnesses and vanishing high frequencies.
... where for the last inequality we used Fact A.3 and the fact that Fact A.3 holds for draw-withoutreplacement experiments as well (see e.g., [Hoe94,BLM13]). Similarly, for any n − n/2 d+2 ≤ s ≤ n and Z ′ = (Z ′ 1 , . . . , Z ′ n ) ∼ D s , we have ...
Preprint
Full-text available
We characterize the power of constant-depth Boolean circuits in generating uniform symmetric distributions. Let f ⁣:{0,1}m{0,1}nf\colon\{0,1\}^m\to\{0,1\}^n be a Boolean function where each output bit of f depends only on O(1) input bits. Assume the output distribution of f on uniform input bits is close to a uniform distribution D with a symmetric support. We show that D is essentially one of the following six possibilities: (1) point distribution on 0n0^n, (2) point distribution on 1n1^n, (3) uniform over {0n,1n}\{0^n,1^n\}, (4) uniform over strings with even Hamming weights, (5) uniform over strings with odd Hamming weights, and (6) uniform over all strings. This confirms a conjecture of Filmus, Leigh, Riazanov, and Sokolov (RANDOM 2023).
... where the last inequality follows from standard concentration bounds for sampling without replacement [Hoe63]. Now it remains to upper bound CovrZ u , Z v s for a pair pu, vq P t0, 1u k k{2ˆt 0, 1u k k{2 such that ∆pu, vq P I. Fix such a pair pu, vq and note that ∆pu, vq " 2d for some integer d such that |d´pk{4q| ď pC{2q¨?k log k. ...
Preprint
Full-text available
In this work, we show that the class of multivariate degree-d polynomials mapping {0,1}n\{0,1\}^{n} to any Abelian group G is locally correctable with O~d((logn)d)\widetilde{O}_{d}((\log n)^{d}) queries for up to a fraction of errors approaching half the minimum distance of the underlying code. In particular, this result holds even for polynomials over the reals or the rationals, special cases that were previously not known. Further, we show that they are locally list correctable up to a fraction of errors approaching the minimum distance of the code. These results build on and extend the prior work of the authors [ABPSS24] (STOC 2024) who considered the case of linear polynomials and gave analogous results. Low-degree polynomials over the Boolean cube {0,1}n\{0,1\}^{n} arise naturally in Boolean circuit complexity and learning theory, and our work furthers the study of their coding-theoretic properties. Extending the results of [ABPSS24] from linear to higher-degree polynomials involves several new challenges and handling them gives us further insights into properties of low-degree polynomials over the Boolean cube. For local correction, we construct a set of points in the Boolean cube that lie between two exponentially close parallel hyperplanes and is moreover an interpolating set for degree-d polynomials. To show that the class of degree-d polynomials is list decodable up to the minimum distance, we stitch together results on anti-concentration of low-degree polynomials, the Sunflower lemma, and the Footprint bound for counting common zeroes of polynomials. Analyzing the local list corrector of [ABPSS24] for higher degree polynomials involves understanding random restrictions of non-zero degree-d polynomials on a Hamming slice. In particular, we show that a simple random restriction process for reducing the dimension of the Boolean cube is a suitably good sampler for Hamming slices.
... We first list a series of lemmas that will be useful for proving Theorem 4.1 and Theorem 4.2. The following is a restatement of Theorem 2 of Hoeffding (1963). ...
Preprint
In this paper, we study large losses arising from defaults of a credit portfolio. We assume that the portfolio dependence structure is modelled by the Archimedean copula family as opposed to the widely used Gaussian copula. The resulting model is new, and it has the capability of capturing extremal dependence among obligors. We first derive sharp asymptotics for the tail probability of portfolio losses and the expected shortfall. Then we demonstrate how to utilize these asymptotic results to produce two variance reduction algorithms that significantly enhance the classical Monte Carlo methods. Moreover, we show that the estimator based on the proposed two-step importance sampling method is logarithmically efficient while the estimator based on the conditional Monte Carlo method has bounded relative error as the number of obligors tends to infinity. Extensive simulation studies are conducted to highlight the efficiency of our proposed algorithms for estimating portfolio credit risk. In particular, the variance reduction achieved by the proposed conditional Monte Carlo method, relative to the crude Monte Carlo method, is in the order of millions.
... Lemma 3.1 (Hypergeometric tail bounds [Hoe94]). Let X ∼ Hyp(N, K, n) and p = K/N . ...
Preprint
Pseudorandom codes are error-correcting codes with the property that no efficient adversary can distinguish encodings from uniformly random strings. They were recently introduced by Christ and Gunn [CRYPTO 2024] for the purpose of watermarking the outputs of randomized algorithms, such as generative AI models. Several constructions of pseudorandom codes have since been proposed, but none of them are robust to error channels that depend on previously seen codewords. This stronger kind of robustness is referred to as adaptive robustness, and it is important for meaningful applications to watermarking. In this work, we show the following. - Adaptive robustness: We show that the pseudorandom codes of Christ and Gunn are adaptively robust, resolving a conjecture posed by Cohen, Hoover, and Schoenbach [S&P 2025]. - Ideal security: We define an ideal pseudorandom code as one which is indistinguishable from the ideal functionality, capturing both the pseudorandomness and robustness properties in one simple definition. We show that any adaptively robust pseudorandom code for single-bit messages can be bootstrapped to build an ideal pseudorandom code with linear information rate, under no additional assumptions. - CCA security: In the setting where the encoding key is made public, we define a CCA-secure pseudorandom code in analogy with CCA-secure encryption. We show that any adaptively robust public-key pseudorandom code for single-bit messages can be used to build a CCA-secure pseudorandom code with linear information rate, in the random oracle model. These results immediately imply stronger robustness guarantees for generative AI watermarking schemes, such as the practical quality-preserving image watermarks of Gunn, Zhao, and Song (2024).
... a m+r (n) for some 0 ≤ n < 3 r coincides with {0, 1, 2} r . As such we can apply a well known large deviation inequality due to Hoeffding [21] to assert that ...
Article
Full-text available
Let C be the middle third Cantor set and μ be the log  2log  3\frac{\log\;2}{\log\;3}-dimensional Hausdorff measure restricted to C. In this paper we study approximations of elements of C by dyadic rationals. Our main result implies that for μ almost every x ∈ C we have This improves upon a recent result of Allen, Chow, and Yu which gives a sub-logarithmic improvement over the trivial approximation rate.
... We require the following two probabilistic tools. Theorem 3.4 (Hoeffding's inequality [14]). Let X 1 , . . . ...
Preprint
Full-text available
We study a generalisation of Vizing's theorem, where the goal is to simultaneously colour the edges of graphs G1,,GkG_1,\dots,G_k with few colours. We obtain asymptotically optimal bounds for the required number of colours in terms of the maximum degree Δ\Delta, for small values of k and for an infinite sequence of values of k. This asymptotically settles a conjecture of Cabello for k=2. Moreover, we show that kΔ+o(Δ)\sqrt k \Delta + o(\Delta) colours always suffice, which tends to the optimal value as k grows. We also show that Δ+o(Δ)\ell \Delta + o(\Delta) colours are enough when every edge appears in at most \ell of the graphs, which asymptotically confirms a conjecture of Cambie. Finally, our results extend to the list setting. We also find a close connection to a conjecture of F\"uredi, Kahn, and Seymour from the 1990s and an old problem about fractional matchings.
Article
An essential cover of the vertices of the n-cube {0, 1}n by hyperplanes is a minimal covering where no hyperplane is redundant and every variable appears in the equation of at least one hyperplane. Linial and Radhakrishnan gave a construction of an essential cover with n2+1\left\lceil{n \over 2}\right\rceil+1 hyperplanes and showed that Ω(n)\Omega(\sqrt n) hyperplanes are required. Recently, Yehuda and Yehudayoff improved the lower bound by showing that any essential cover of the n-cube contains at least Ω(n0.52) hyperplanes. In this paper, building on the method of Yehuda and Yehudayoff, we prove that Ω(n5/9(log n)4/9)\Omega({{n^{5/9}} \over ({{\log} \ n})^{4/9}}) hyperplanes are needed.
Article
In this paper, we examine quantum contract signing protocol of Paunkovic, Bouda and Matheus. We show that there is a limit function for modified probability of cheating, substantially improving results previously obtained by the authors.
Article
Full-text available
We study the problem of identifying a small number knθk\sim n^\theta , 0<θ<10\lt \theta \lt 1 , of infected individuals within a large population of size n by testing groups of individuals simultaneously. All tests are conducted concurrently. The goal is to minimise the total number of tests required. In this paper, we make the (realistic) assumption that tests are noisy, that is, that a group that contains an infected individual may return a negative test result or one that does not contain an infected individual may return a positive test result with a certain probability. The noise need not be symmetric. We develop an algorithm called SPARC that correctly identifies the set of infected individuals up to o(k) errors with high probability with the asymptotically minimum number of tests. Additionally, we develop an algorithm called SPEX that exactly identifies the set of infected individuals w.h.p. with a number of tests that match the information-theoretic lower bound for the constant column design, a powerful and well-studied test design.
Article
Edge closeness and betweenness centralities are widely used path-based metrics for characterizing the importance of edges in networks. In general graphs, edge closeness centrality indicates the importance of edges by the shortest distances from the edge to all the other vertices. Edge betweenness centrality ranks which edges are significant based on the fraction of all-pairs shortest paths that pass through the edge. Nowadays, extensive research efforts go into centrality computation over general graphs that omit time dimension. However, numerous real-world networks are modeled as temporal graphs, where the nodes are related to each other at different time instances. The temporal property is important and should not be neglected because it guides the flow of information in the network. This state of affairs motivates the paper’s study of edge centrality computation methods on temporal graphs. We introduce the concepts of the label, and label dominance relation, and then propose multi-thread parallel labeling-based methods on OpenMP to efficiently compute edge closeness and betweenness centralities w.r.t. three types of optimal temporal paths. For edge closeness centrality computation, a time segmentation strategy and two observations are presented to aggregate some related temporal edges for uniform processing. For edge betweenness centrality computation, to improve efficiency, temporal edge dependency formulas, a labeling-based forward-backward scanning strategy, and a compression-based optimization method are further proposed to iteratively accumulate centrality values. Extensive experiments using 13 real temporal graphs are conducted to provide detailed insights into the efficiency and effectiveness of the proposed methods. Compared with state-of-the-art methods, labeling-based methods are capable of up to two orders of magnitude speedup.
Chapter
Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.
Preprint
We present two sharp empirical Bernstein inequalities for symmetric random matrices with bounded eigenvalues. By sharp, we mean that both inequalities adapt to the unknown variance in a tight manner: the deviation captured by the first-order 1/n1/\sqrt{n} term asymptotically matches the matrix Bernstein inequality exactly, including constants, the latter requiring knowledge of the variance. Our first inequality holds for the sample mean of independent matrices, and our second inequality holds for a mean estimator under martingale dependence at stopping times.
Article
In this article, we show the mean square consistency for a generalized subsampling estimator based on the aggregation of the mean, median, and trimmed mean of some subsampling estimators for general non‐stationary time series. Consistency requires standard assumptions, including the existence of moments and ‐mixing conditions. We apply our results to the Fourier coefficients of the autocovariance function of periodically correlated time series. Furthermore, as in the i.i.d. case, we show that the generalized subsampling estimator satisfies Bernstein inequality and concentrates at an improved rate (under the condition of no or small bias) compared with the original estimator. Finally, we illustrate our results with some simulation data examples.
Article
Estimating the state preparation fidelity of highly entangled states on noisy intermediate-scale quantum (NISQ) devices is important for benchmarking and application considerations. Unfortunately, exact fidelity measurements quickly become prohibitively expensive, as they scale exponentially as O (3 N ) for N -qubit states, using full state tomography with measurements in all Pauli bases combinations. However, Somma et al. established that the complexity could be drastically reduced when looking at fidelity lower bounds for states that exhibit symmetries, such as Dicke States and GHZ States. These bounds must still be tight enough for larger states to provide reasonable estimations on NISQ devices. For the first time and more than 15 years after the theoretical introduction, we report meaningful lower bounds for the state preparation fidelity of all Dicke States up to N = 10 and all GHZ states up to N = 20 on Quantinuum H1 ion-trap systems using efficient implementations of recently proposed scalable circuits for these states. Our achieved lower bounds match or exceed previously reported exact fidelities on superconducting systems for much smaller states. Furthermore, we provide evidence that for large Dicke States DNN/2\vert {{D_{N}^{N}{/2}}\rangle } , we may resort to a GHZ-based approximate state preparation to achieve better fidelity. This work provides a path forward to benchmarking entanglement as NISQ devices improve in size and quality.
Article
In this paper, we establish three upper bounds of Cantelli’s inequality type, out of which two are optimal, improving upon previous results, including those derived from Cantelli’s and Hoeffding’s inequalities. We demonstrate the practical relevance of our research by analyzing exchange-rate risk in the real estate market. Specifically, we develop an effective hedging strategy that enables firms to safeguard their profits in the case of currency mismatches between revenue and costs. By effectively modeling the dependence structure of revenue and costs, we enhance the precision of our bounds, as demonstrated through simulation data.
Article
This paper proves a number of inequalities which improve on existing upper limits to the probability distribution of the sum of independent random variables. The inequalities presented require knowledge only of the variance of the sum and the means and bounds of the component random variables. They are applicable when the number of component random variables is small and/or have different distributions. Figures show the improvement on existing inequalities.
Article
The comparison of two treatments generally falls into one of the following two categories: (a) we may have a number of replications for each of the two treatments, which are unpaired, or (b) we may have a number of paired comparisons leading to a series of differences, some of which may be positive and some negative. The appropriate methods for testing the significance of the differences of the means in these two cases are described in most of the textbooks on statistical methods.
Article
For continuous random variables, we study a problem similar to that considered earlier by one of the authors for discrete random variables. Let numbers N>0,E>0,0λ1λ2λsN > 0, E > 0, 0 \leqslant \lambda _1 \leqslant \lambda _2 \leqslant \cdots \leqslant \lambda _s be given. Consider a random vector x = (x 1, …, x s), uniformly distributed on the set xj0,j=1,,s;j=1sxj=N,j=1sλjxjE.x_j \geqslant 0, j = 1, \ldots ,s; \sum\limits_{j = 1}^s {x_j = N} , \sum\limits_{j = 1}^s {\lambda _j x_j \leqslant E} . We study the weak limit of x as s → ∞.
Article
Let x and y be two random variables with continuous cumulative distribution functions f and g. A statistic U depending on the relative ranks of the x's and y's is proposed for testing the hypothesis f=gf = g. Wilcoxon proposed an equivalent test in the Biometrics Bulletin, December, 1945, but gave only a few points of the distribution of his statistic. Under the hypothesis f=gf = g the probability of obtaining a given U in a sample of nxsn x's and mysm y's is the solution of a certain recurrence relation involving n and m. Using this recurrence relation tables have been computed giving the probability of U for samples up to n=m=8n = m = 8. At this point the distribution is almost normal. From the recurrence relation explicit expressions for the mean, variance, and fourth moment are obtained. The 2rth moment is shown to have a certain form which enabled us to prove that the limit distribution is normal if m,nm, n go to infinity in any arbitrary manner. The test is shown to be consistent with respect to the class of alternatives f(x)>g(x)f(x) > g(x) for every x.
Article
It is shown that there exist strictly unbiased and consistent tests for the univariate and multivariate two- and k-sample problem, for the hypothesis of independence, and for the hypothesis of symmetry with respect to a given point. Certain new tests for the univariate two-sample problem are discussed. The large sample power of these tests and of the Mann-Whitney test are obtained by means of a theorem of Hoeffding. There is a discussion of the problem of tied observations.
Article
In many cases an optimum or computationally convenient test of a simple hypothesis H0H_0 against a simple alternative H1H_1 may be given in the following form. Reject H0H_0 if Sn=j=1nXjk,S_n = \sum^n_{j=1} X_j \leqq k, where X1,X2,,XnX_1, X_2, \cdots, X_n are n independent observations of a chance variable X whose distribution depends on the true hypothesis and where k is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index ρ\rho. If ρ1\rho_1 and ρ2\rho_2 are the indices corresponding to two alternative tests e=logρ1/logρ2e = \log \rho_1/\log \rho_2 measures the relative efficiency of these tests in the following sense. For large samples, a sample of size n with the first test will give about the same probabilities of error as a sample of size en with the second test. To obtain the above result, use is made of the fact that P(Snna)P(S_n \leqq na) behaves roughly like mnm^n where m is the minimum value assumed by the moment generating function of XaX - a. It is shown that if H0H_0 and H1H_1 specify probability distributions of X which are very close to each other, one may approximate ρ\rho by assuming that X is normally distributed.
On generalizations of Tchebychef's inequality Journal of the A mer-ican Statistical Association
  • B J Godwin
Godwin, B. J., "On generalizations of Tchebychef's inequality, " Journal of the A mer-ican Statistical Association, 50 (1955), 923-45.
Sur les valeurs limites des intégrales
  • P Tchebichef