Conference Paper

Parallelizing security checks on commodity hardware.

DOI: 10.1145/1346281.1346321 In proceeding of: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2008, Seattle, WA, USA, March 1-5, 2008
Source: DBLP

ABSTRACT Speck1 is a system that accelerates powerful security checks on commodity hardware by executing them in parallel on multiple cores. Speck provides an infrastructure that allows sequential in- vocations of a particular security check to run in parallel without sacrificing the safety of the system. Speck creates parallelism in two ways. First, Speck decouples a security check from an appli- cation by continuing the application, using speculative execution, while the security check executes in parallel on another core. Sec- ond, Speck creates parallelism between sequential invocations of a security check by running later checks in parallel with earlier ones. Speck provides a process-level replay system to deterministically and efficiently synchronize state between a security check and the original process. We use Speck to parallelize three security checks: sensitive data analysis, on-access virus scanning, and taint propaga- tion. Running on a 4-core and an 8-core computer, Speck improves performance 4x and 7.5x for the sensitive data analysis check, 3.3x and 2.8x for the on-access virus scanning check, and 1.6x and 2x for the taint propagation check.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.
    IEEE Transactions on Parallel and Distributed Systems 01/2012; abs/1202.3669. · 1.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: TxBox is a new system for sand boxing untrusted applications. It speculatively executes the application in a system transaction, allowing security checks to be parallelized and yielding significant performance gains for techniques such as on-access anti-virus scanning. TxBox is not vulnerable to TOCTTOU attacks and incorrect mirroring of kernel state. Furthermore, TxBox supports automatic recovery: if a violation is detected, the sand boxed program is terminated and all of its effects on the host are rolled back. This enables effective enforcement of security policies that span multiple system calls.
    Security and Privacy (SP), 2011 IEEE Symposium on; 06/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order of or the values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15% with two worker threads and 28% with four threads.
    ACM Transactions on Computer Systems 02/2012; 30(1):3:1-3:24. · 0.80 Impact Factor

Full-text (2 Sources)

Available from
Jun 5, 2014