Chapter

Test-and-Decode: A Partial Recovery Scheme for Verifiable Coded Computing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Coded computing has proven its efficiency in tolerating stragglers in distributed computing. Workers return the sub-computation results to the master after computing, and the master recovers the final computation result by decoding. However, the workers may provide incorrect results, which leads to wrong final result. Therefore, it is meaningful to improve the resilience of coded computing against errors. Most existing verification schemes only use the workers’ fully correct computations to recover the final result, and the defective computations are not considered for decoding. In this paper, we focus on matrix multiplication and design a general Test-and-Decode (TD) scheme to recover the final result efficiently. Furthermore, we divide each sub-computation result into multiple parts and fully use the correct parts for partial recovery, which can improve the tolerance for errors in computations. Decoding is performed only when the verification result satisfies the permission, which avoids repetitive decoding. We conduct extensive simulation experiments to evaluate the probability of successful recovery of the results and the computation time of the TD scheme. We also compare the TD scheme with other verification schemes and the results show that it outperforms the current schemes in terms of efficiency in verifying and recovering computational results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Accurate prediction of the cloud workload is essential in cloud computing, especially for cloud resource demand with sudden changes. The existing research on workload prediction approaches for cloud datacentres has one disadvantage: the results of the prediction approach lack generalization and universality. Hence, to improve the accuracy and generalization of cloud workload prediction approaches, we propose a novel prediction approach, called the re-prediction for error correction model based on the change-based three-way decision model, for workload prediction problems in cloud datacentres. The core idea of our approach is to design a re-prediction system for the predicted error sequence, where we explore the re-prediction criteria of the error interval based on the trisecting-acting-outcome (TAO) model of C-3WD. First, we use a long short-term memory (LSTM) model to predict the workload, and then the error series generated by the LSTM model is represented as the error intervals by the interval set and divided into three regions by the C-3WD model. Finally, the error value of every interval region is corrected by the corresponding prediction model. The experiment results show that the RPEM-3WC model improves the prediction accuracy by up to 15.6%, 11.35%, 10.33%, 16.37%, 12.83%, 12.93%, and 8.68% over the EMA, HWES, ARIMA, SES, LSTM-RNN, NN, and TWD-RCPM models, respectively.
Article
Full-text available
We consider the problem of designing codes with flexible rate (referred to as rateless codes), for private distributed matrix-matrix multiplication. A master server owns two private matrices A\mathbf {A} and B\mathbf {B} and hires worker nodes to help computing their multiplication. The matrices should remain information-theoretically private from the workers. Codes with fixed rate require the master to assign tasks to the workers and then wait for a predetermined number of workers to finish their assigned tasks. The size of the tasks, hence the rate of the scheme, depends on the number of workers that the master waits for. We design a rateless private matrix-matrix multiplication scheme, called RPM3. In contrast to fixed-rate schemes, our scheme fixes the size of the tasks and allows the master to send multiple tasks to the workers. The master keeps sending tasks and receiving results until it can decode the multiplication; rendering the scheme flexible and adaptive to heterogeneous environments. Despite resulting in a smaller rate than known straggler-tolerant schemes, RPM3 provides a smaller mean waiting time of the master by leveraging the heterogeneity of the workers. The waiting time is studied under two different models for the workers’ service time. We provide upper and lower bounds for the mean waiting time under both models. In addition, we provide lower bounds on the mean waiting time under the worker-dependent fixed service time model.
Article
Full-text available
Cloud computing platforms have created the possibility for computationally limited users to delegate demanding tasks to strong but untrusted servers. Verifiable computing algorithms help build trust in such interactions by enabling the server to provide a proof of correctness of his results which the user can check very efficiently. In this paper, we present a doubly-efficient interactive algorithm for verifiable polynomial evaluation. Unlike the mainstream literature on verifiable computing, the soundness of our algorithm is information-theoretic and cannot be broken by a computationally unbounded server. By relying on basic properties of error correcting codes, our algorithm enforces a dishonest server to provide false results to problems which become progressively easier to verify. After roughly logd\log d rounds, the user can verify the response of the server against a look-up table that has been pre-computed during an initialization phase. For a polynomial of degree d, we achieve a user complexity of O(dϵ)O(d^{\epsilon}), a server complexity of O(d1+ϵ)O(d^{1+\epsilon}), a round complexity of O(logd)O(\log d) and an initialization complexity of O(d1+ϵ)O(d^{1+\epsilon}).
Article
Full-text available
We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to efficiently deal with straggling workers. The proposed strategy, named as \emph{polynomial codes}, achieves the optimum recovery threshold, defined as the minimum number of workers that the master needs to wait for in order to compute the output. Furthermore, by leveraging the algebraic structure of polynomial codes, we can map the reconstruction problem of the final output to a polynomial interpolation problem, which can be solved efficiently. Polynomial codes provide order-wise improvement over the state of the art in terms of recovery threshold, and are also optimal in terms of several other metrics. Furthermore, we extend this code to distributed convolution and show its order-wise optimality.
Article
Full-text available
We consider the problem of computing the convolution of two long vectors using parallel processing units in the presence of "stragglers". Stragglers refer to the small fraction of faulty or slow processors that delays the entire computation in time-critical distributed systems. We first show that splitting the vectors into smaller pieces and using a linear code to encode these pieces provides better resilience against stragglers than replication-based schemes under a simple, worst-case straggler analysis. We then demonstrate that under commonly used models of computation time, coding can dramatically improve the probability of finishing the computation within a target "deadline" time. As opposed to the more commonly used technique of expected computation time analysis, we quantify the exponents of the probability of failure in the limit of large deadlines. Our exponent metric captures the probability of failing to finish before a specified deadline time, i.e. , the behavior of the "tail". Moreover, our technique also allows for simple closed form expressions for more general models of computation time, e.g. shifted Weibull models instead of only shifted exponentials. Thus, through this problem of coded convolution, we establish the utility of a novel asymptotic failure exponent analysis for distributed systems.
Article
We consider a generalization of the gradient coding framework where a dataset is divided across n workers and each worker transmits to a master node one or more linear combinations of the gradients over its assigned data subsets. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of straggler workers, we relax the goal to computing the sum of at least some α fraction of the gradients. We begin by deriving a lower bound on the computation load of any scheme and also propose two strategies which achieve this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in n . We then propose schemes based on cyclic assignment which utilize n data partitions and have a lower communication load. When each worker transmits a single linear combination, we prove lower bounds on the computation load of any scheme using n data partitions. Finally, we describe a class of schemes which achieve different intermediate operating points for the computation and communication load and provide simulation results to demonstrate the empirical performance of our schemes.
Article
Recently, edge computing has demonstrated increasing potential to provide low-latency computing services. Coded edge computing can not only make full use of the resources of heterogeneous edge computing servers, but also significantly reduce the negative effects of slow computing devices on computing time. Nevertheless, since edge servers may be unreliable or untrustworthy, the user will decode and get incorrect computation results even if it uses one incorrect sub-computation result returned by faulty edge servers. In this paper, for the existing coded edge computing schemes, we focus on the distributed matrix-matrix multiplication and design a general and efficient Decode-and-Compare Verification (DCV) scheme to verify the correctness of computation results and identify faulty edge servers by utilizing the properties of coded computing itself. The DCV scheme contains two components: (1) computation result verification, i.e. , obtain the computation result and verify its correctness, and (2) faulty edge server identification, i.e. , identify the faulty edge servers by verifying the correctness of returned sub-computation results. For both the independent and collusion faulty edge server models, we conduct solid theoretical analyses on the required decoding rounds, the coding redundancy and the successful verification probability to demonstrate that the correct computation result can be efficiently verified. We also conduct a lot of experiments on the DCV scheme from different aspects and the results show that it achieves much less computation time to get the correct computation result compared with other potential schemes, including homomorphic encryption and local computation.
Article
We consider the problem of designing secure and private codes for distributed matrix-matrix multiplication. A master server owns two private matrices and hires worker nodes to help compute their product. The matrices should remain information-theoretically private from the workers. Some of the workers are malicious and return corrupted results to the master. We design a framework for security against malicious workers in private matrix-matrix multiplication. The main idea is a careful use of Freivalds’ algorithm to detect erroneous matrix multiplications. Our main goal is to apply this security framework to schemes with adaptive rates. Adaptive schemes divide the workers into clusters and thus provide flexibility in trading decoding complexity for efficiency. Our new scheme, SRPM3, provides a computationally efficient security check per cluster that detects the presence of one or more malicious workers with high probability. An additional per worker check is used to identify the malicious nodes. SRPM3 can tolerate the presence of an arbitrary number of malicious workers. We provide theoretical guarantees on the complexity of the security checks and simulation results on both, the missed detection rate as well as on the time needed for the integrity check.
Article
In this paper, we study the problem of private and secure distributed matrix multiplication (PSDMM) , where a user having a private matrix A and N non-colluding servers sharing a library of L ( L>1L>1 ) matrices B(0),B(1),,B(L1)B^{(0)}, B^{(1)},\ldots,B^{(L-1)} , for which the user wishes to compute AB(θ)AB^{(\theta)} for some θ[0,L\theta \in [0, L ) without revealing any information of the matrix A to the servers, and keeping the index θ\theta private to the servers. Previous work is limited to the case that the shared library ( i.e., the matrices B(0),B(1),,B(L1)B^{(0)}, B^{(1)},\ldots,B^{(L-1)} ) is stored across the servers in a replicated form and schemes are very scarce in the literature, there is still much room for improvement. In this paper, we propose two PSDMM schemes, where one is limited to the case that the shared library is stored across the servers in a replicated form but has a better performance than state-of-the-art schemes in that it can achieve a smaller recovery threshold and download cost. The other one focuses on the case that the shared library is stored across the servers in an MDS-coded form, which requires less storage in the servers. The second PSDMM code does not subsume the first one even if the underlying MDS code is degraded to a repetition code as they are totally two different schemes.
Article
Most of our lives are conducted in the cyberspace. The human notion of privacy translates into a cyber notion of privacy on many functions that take place in the cyberspace. This article focuses on three such functions: how to privately retrieve information from cyberspace (privacy in information retrieval), how to privately leverage large-scale distributed/parallel processing (privacy in distributed computing), and how to learn/train machine learning models from private data spread across multiple users (privacy in distributed (federated) learning). The article motivates each privacy setting, describes the problem formulation, summarizes breakthrough results in the history of each problem, and gives recent results and discusses some of the major ideas that emerged in each field. In addition, the cross-cutting techniques and interconnections between the three topics are discussed along with a set of open problems and challenges.
Article
Coded computing has proved its efficiency in handling a straggler issue in distributed computing framework. It uses error correcting codes to mitigate the effect of the stragglers. However, in a coded distributed computing framework, there may exist Byzantine workers who send the wrong computation results to a master in order to contaminate the overall computation output. Therefore, it is essential to identify Byzantine workers from their computation results in coded computing. In this paper, we consider Byzantine attack identification problem in coded computing for distributed matrix multiplication tasks. We propose a new coding scheme which facilitates the efficient Byzantine attack identification, namely locally testable codes. We also suggest a hierarchical group testing method for Byzantine attack identification. We claim the required number of tests for group testing in our scheme, and show that it requires smaller number of tests than the conventional group testing method for the existing coded computing schemes.
Article
Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behavior and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR) , which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed scheme in terms of the trade-off between the computation accuracy and latency.
Article
Coded computing is a new framework to address fundamental issues in large scale distributed computing, by injecting structured randomness and redundancy. We first provide an overview of coded computing and summarize some recent advances. Then we focus on distributed matrix multiplication and consider a common scenario where each worker is assigned a fraction of the multiplication task. In particular, by partitioning two input matrices into m -by- p and p -by- n subblocks, a single multiplication task can be viewed as computing linear combinations of pmn submatrix products, which can be assigned to pmn workers. Such block-partitioning-based designs have been widely studied under the topics of secure, private, and batch computation, where the state of the arts all require computing at least “cubic” ( pmn ) number of submatrix multiplications. Entangled polynomial codes, first presented for straggler mitigation, provides a powerful method for breaking the cubic barrier. It achieves a subcubic recovery threshold, i.e., recovering the final product from any subset of multiplication results with a size order-wise smaller than pmn . We show that entangled polynomial codes can be further extended to also include these three important settings, providing unified frameworks that order-wise reduce the total computational costs by achieving subcubic recovery thresholds.
Conference Paper
We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×, and also achieves a 2.36×-12.65× speedup over the state-of-the-art straggler mitigation strategies.
Article
We provide novel coded computation strategies for distributed matrix–matrix products that outperform the recent “Polynomial code” constructions in recovery threshold, i.e., the required number of successful workers. When a fixed 1/m fraction of each matrix can be stored at each worker node, Polynomial codes require m2m^{2} successful workers, while our MatDot codes only require 2m12m-1 successful workers. However, MatDot codes have higher computation cost per worker and higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Furthermore, we propose “PolyDot” coding that interpolates between Polynomial codes and MatDot codes to trade off computation/communication costs and recovery thresholds. Finally, we demonstrate a novel coding technique for multiplying n matrices ( n3n \geq 3 ) using ideas from MatDot and PolyDot codes.
Article
We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangled polynomial code}, for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, using bilinear complexity, we characterize the optimal recovery threshold among all linear coding strategies within a factor of 2. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of 2 of this fundamental quantity. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault tolerance computing, leading to tight characterizations.
Article
We obtain randomized algorithms for factoring degree n univariate polynomials over F_q requiring O(n^(1.5+o(1)) log^(1+o(1))q + n^(1+o(1)) log^(2+o(1))q) bit operations. When log q < n, this is asymptotically faster than the best previous algorithms [J. von zur Gathen and V. Shoup, Comput. Complexity, 2 (1992), pp. 187–224; E. Kaltofen and V. Shoup, Math. Comp., 67 (1998), pp. 1179–1197]; for log q ≥ n, it matches the asymptotic running time of the best known algorithms. The improvements come from new algorithms for modular composition of degree n univariate polynomials, which is the asymptotic bottleneck in fast algorithms for factoring polynomials over finite fields. The best previous algorithms for modular composition use O(n^((ω+1)/2)) field operations, where ω is the exponent of matrix multiplication [R. P. Brent and H. T. Kung, J. Assoc. Comput. Mach., 25 (1978), pp. 581–595], with a slight improvement in the exponent achieved by employing fast rectangular matrix multiplication [X. Huang and V. Y. Pan, J. Complexity, 14 (1998), pp. 257–299]. We show that modular composition and multipoint evaluation of multivariate polynomials are essentially equivalent, in the sense that an algorithm for one achieving exponent α implies an algorithm for the other with exponent α+o(1), and vice versa. We then give two new algorithms that solve the problem near-optimally: an algebraic algorithm for fields of characteristic at most n^(o(1)), and a nonalgebraic algorithm that works in arbitrary characteristic. The latter algorithm works by lifting to characteristic 0, applying a small number of rounds of multimodular reduction, and finishing with a small number of multidimensional FFTs. The final evaluations are reconstructed using the Chinese remainder theorem. As a bonus, this algorithm produces a very efficient data structure supporting polynomial evaluation queries, which is of independent interest. Our algorithms use techniques that are commonly employed in practice, in contrast to all previous subquadratic algorithms for these problems, which relied on fast matrix multiplication.