Conference Paper

Advances and Throwbacks in Hardware-Assisted Security: Special Session

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Hardware security architectures and primitives are becoming increasingly important in practice providing trust anchors and trusted execution environment to protect modern software systems. Over the past two decades we have witnessed various hardware security solutions and trends from Trusted Platform Modules (TPM), performance counters for security, ARM's TrustZone, and Physically Unclonable Functions (PUFs), to very recent advances such as Intel's Software Guard Extension (SGX). Unfortunately, these solutions are rarely used by third party developers, make strong trust assumptions (including in manufacturers), are too expensive for small constrained devices, do not easily scale, or suffer from information leakage. Academic research has proposed a variety of solutions, in hardware security architectures, these advancements are rarely deployed in practice.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... With respect to related works [57][58][59][60][61][62][63], our classification method does not require any disassembly or execution of the actual malware code. Moreover, the image textures used for classification provide more resilient features in terms of obfuscation techniques, and in particular for encryption. ...
Preprint
Machine learning based malware detection techniques rely on grayscale images of malware and tends to classify malware based on the distribution of textures in graycale images. Albeit the advancement and promising results shown by machine learning techniques, attackers can exploit the vulnerabilities by generating adversarial samples. Adversarial samples are generated by intelligently crafting and adding perturbations to the input samples. There exists majority of the software based adversarial attacks and defenses. To defend against the adversaries, the existing malware detection based on machine learning and grayscale images needs a preprocessing for the adversarial data. This can cause an additional overhead and can prolong the real-time malware detection. So, as an alternative to this, we explore RRAM (Resistive Random Access Memory) based defense against adversaries. Therefore, the aim of this thesis is to address the above mentioned critical system security issues. The above mentioned challenges are addressed by demonstrating proposed techniques to design a secure and robust cognitive system. First, a novel technique to detect stealthy malware is proposed. The technique uses malware binary images and then extract different features from the same and then employ different ML-classifiers on the dataset thus obtained. Results demonstrate that this technique is successful in differentiating classes of malware based on the features extracted. Secondly, I demonstrate the effects of adversarial attacks on a reconfigurable RRAM-neuromorphic architecture with different learning algorithms and device characteristics. I also propose an integrated solution for mitigating the effects of the adversarial attack using the reconfigurable RRAM architecture.
... As described in the previous sections, I have been working and publishing papers on Malware Detection [6], Side-Channels Analysis [20,21], Hardware-based Trojan Attack and Detection [10], and Survey-based papers [43,44] published to conferences and journals. I intend to dive deeper into more of SCA based attacks and defenses in future and contribute my work to top tier conferences and journals. ...
Preprint
Full-text available
Evolving attacks on the vulnerabilities of the computing systems demand novel defense strategies to keep pace with newer attacks. This report discusses previous works on side-channel attacks (SCAs) and defenses for cache-targeted and physical proximity attacks. We then discuss the proposed Entropy-Shield as a defense against timing SCAs, and explain how we can extend the same to hardware-based implementations of crypto applications as "Entropy-Shield for FPGA". We then discuss why we want to build newer attacks with the hope of coming up with better defense strategies.
... Employing third-party Intellectual Property modules (3PIP) benefits the Intellectual property (IP) holder by reducing the time-to-market while cutting down the design flow efforts. Despite the economic benefits, this trend poses significant challenges to hardware security in numerous forms [2]- [5]. ...
... Request permissions from permissions@acm.org. Traditional and primitive software-based malware detection techniques such as signature-based and semantics-based anomaly detection techniques exist for more than two decades [1], though effective, induces remarkable computational and processing overheads and is inefficient to detect unseen threats [2]. To overcome the limitations of the software-based malware detection approaches, the work in [3,6] proposed using the microarchitectural event traces captured through on-chip hardware performance counters (HPCs) 1 fed to machine learning (ML) classifiers for classifying benign and malware applications. ...
Conference Paper
To thwart the detection of malware through traditional and emerging approaches, malware development has seen a paradigm of embedding the malware into benign applications. This calls for a localized feature extraction scheme for detecting stealthy malware with more robustness. To address this challenge, we introduce a hybrid approach which utilizes the microarchitectural traces obtained through on-chip embedded hardware performance counters (HPCs) and the application binary for malware detection. The obtained HPCs are fed to multi-stage machine learning (ML) classifier for detecting and classifying the malware. To overcome the challenge of detecting the stealthy malware, image processing based approach is applied in parallel. In this approach, the malware binaries are converted into images, which is further converted into sequences and fed to recurrent neural networks to recognize patterns of stealthy malware. Based on the localized patterns, sequence classification is further applied to perform binary classification and further discover the variation of the identified malware family. Our proposed framework exhibits high resilience to popular obfuscation techniques such as code relocation.
... Security threats utilize the side-channels or covert channels to obtain the secret information from the system and are passive in nature. Side-channel attacks are a class of attacks that primarily exploit security of computing systems based on the obtained sidechannel information as a result of design vulnerabilities rather than the exploits in the application [1]. Side-channels are inherent in any computing system and the foremost challenge in defending against side-channel attacks is that they cannot be completely terminated. ...
Conference Paper
The hardware security domain in recent years has experienced a plethora of side-channel attacks (SCAs) with cache-based SCAs being one of the dominant threats. These SCAs function by exploiting the side-channels which invariably leak important data during an application's execution. Shutting down the side channels is not a feasible approach due to various restrictions it would pose to system performance. To overcome such concerns and protect the data integrity, we introduce Sequence-Crafter (SC) in this work. The proposed Sequence-Crafter (SC) aims to minimize the entropy in the side channel leaked information rather than attempting to close the side-channels. To achieve this, we introduce carefully crafted perturbations into the victim application which will be randomly activated to introduce perturbations, thus resulting in misleading information which looks legit that will be observed by the attacker. This methodology has been successfully tested for Flush+Reload attack and the key information observed by the attacker is seen to be completely futile, indicating the success of proposed method.
Article
Inspired by the idiom, “Mitigation (prevention) is better than cure!”, this work presents a random yet cognitive side-channel mitigation technique that is independent of underlying architecture and/or operating system. Unlike malware and other cyber-attacks, side-channel attacks (SCAs) exploit the architectural and design vulnerabilities and obtain sensitive information through the side channels. In contrast to the existing randomization-based side-channel defenses, we introduce a cognitive perturbation-based defense, Covert-Enigma, where the introduced perturbations look legit, but lead to an incorrect observation when interpreted by the attacker. To achieve this, the perturbations are injected at appropriate time instances to introduce additional operations, thereby misleading the attacker making the extracted data futile. To further make the attack more intricate for the attacker, proposed Covert-Enigma offers two modes of operation, chosen by the user, to determine the kind of induced cognitive perturbations— arbitrary and cyclic modes. Arbitrary mode selects a group of key bits and flips them during every execution of the victim. Cyclic mode exhibits similar behavior, except it selects a new set of bits to flip after “ N ” cycles as chosen by the user. The cognitive perturbations are introduced in the form of a wrapper application to the victim, thus imposing no requirements on architectural level modifications nor soft updates/edits to the operating system. We report rigorous evaluation of the proposed Covert-Enigma protecting RSA cryptosystem attacked by Flush+Reload crypto SCA along with the bit(s) recovered after observing RSA under attack. Compared to traditional randomization-based defenses, proposed cognitive Covert-Enigma leads to 50% less overhead.
Conference Paper
To overcome the performance overheads incurred by the traditional software-based malware detection techniques, Hardware-assisted Malware Detection (HMD) using machine learning (ML) classifiers has emerged as a panacea to detect malicious applications and secure the systems. To classify benign and malicious applications, HMD primarily relies on the generated low-level microarchitectural events captured through Hardware Performance Counters (HPCs). This work creates an adversarial attack on the HMD systems to tamper the security by introducing the perturbations in the HPC traces with the aid of an adversarial sample generator application. To craft the attack, we first deploy an adversarial sample predictor to predict the adversarial HPC pattern for a given application to be misclassified by the deployed ML classifier in the HMD. Further, as the attacker has no direct access to manipulate the HPCs generated during runtime, based on the output of the adversarial sample predictor, we devise an adversarial sample generator wrapped around a normal application to produce HPC patterns similar to the adversarial predictor HPC trace. As the crafted adversarial sample generator application does not have any malicious operations, it is not detectable with traditional signature-based malware detection solutions. With the proposed attack, malware detection accuracy has been reduced to 18.04% from 82.76%.
Conference Paper
Full-text available
The ever-increasing prevalence of malware has led to the explorations of various detection mechanisms. Several recent works propose to use Hardware Performance Counters (HPCs) values with machine learning classification models for malware detection. HPCs are hardware units that record low-level micro-architectural behavior, such as cache hits/misses, branch (mis)prediction, and load/store operations. However, this information does not reliably capture the nature of the application, i.e. whether it is benign or malicious. In this paper, we claim and experimentally support that using the micro-architectural level information obtained from HPCs cannot distinguish between benignware and malware. We evaluate the fidelity of malware detection using HPCs. We perform quantitative analysis using Principal Component Analysis (PCA) to systematically select micro-architectural events that have the most predictive powers. We then run 1,924 programs, 962 benignware and 962 malware, on our experimental setups. We achieve 83.39%, 84.84%, 83.59%, 75.01%, 78.75%, and 14.32% F1-score (a metric of detection rates) of Decision Tree (DT), Random Forest (RF), K Nearest Neighbors (KNN), Adaboost, Neural Net (NN), and Naive Bayes, respectively. We cross-validate our models 1,000 times to show the distributions of detection rates in various models. Our cross-validation analysis shows that many of the experiments produce low F1-scores. The F1-score of models in DT, RF, KNN, Adaboost, NN, and Naive Bayes is 80.22%, 81.29%, 80.22%, 70.32%, 35.66%, and 9.903%, respectively. To further highlight the incapability of malware detection using HPCs, we show that one benignware (Notepad++) infused with malware (ransomware) cannot be detected by HPC-based malware detection.
Conference Paper
Full-text available
Malware detection at the hardware level has emerged recently as a promising solution to improve the security of computing systems. Hardware-based malware detectors take advantage of Machine Learning (ML) classifiers to detect pattern of malicious applications at run-time. These ML classifiers are trained using low-level features such as processor Hardware Performance Counters (HPCs) data which are captured at run-time to appropriately represent the application behaviour. Recent studies show the potential of standard ML-based classifiers for detecting malware using analysis of large number of microarchitectural events, more than the very limited number of HPC registers available in today's microprocessors which varies from 2 to 8. This results in executing the application more than once to collect the required data, which in turn makes the solution less practical for effective run-time malware detection. Our results show a clear trade-off between the performance of standard ML classifiers and the number and diversity of HPCs available in modern microprocessors. This paper proposes a machine learning-based solution to break this trade-off to realize effective run-time detection of malware. We propose ensemble learning techniques to improve the performance of the hardware-based malware detectors despite using a very small number of microarchitectural events that are captured at run-time by existing HPCs, eliminating the need to run an application several times. For this purpose, eight robust machine learning models and two well-known ensemble learning classifiers applied on all studied ML models (sixteen in total) are implemented for malware detection and precisely compared and characterized in terms of detection accuracy, robustness, performance (accuracy×robustness), and hardware overheads. The experimental results show that the proposed ensemble learning-based malware detection with just 2 HPCs using ensemble technique outperforms standard classifiers with 8 HPCs by up to 17%. In addition, it can match the robustness and performance of standard ML-based detectors with 16 HPCs while using only 4 HPCs allowing effective run-time detection of malware.
Conference Paper
Full-text available
Recent studies have demonstrated the effectiveness of Hardware Performance Counters (HPCs) for detecting pattern of malicious applications. Hardware-supported detectors utilize Machine Learning (ML) classifiers for malware detection by analyzing a large number of HPC features, more than the very limited number of HPC registers available in modern microprocessors. Obtaining more HPCs requires running the application (malware or benign) more than once to collect the required data, which in turn makes the solution less practical for run-time detection of malware. In response to this challenge, in this work, we first identify the critical HPC features required for malware detection. Next, we explore the use of various ML techniques to classify benign and malware applications using the selected HPCs at run-time. Further, we investigate the effectiveness of ensemble learning in improving the performance of ML classifiers. For this purpose, we apply AdaBoost on all general ML classifiers. We thoroughly compare the general and ensemble ML classifiers in terms of accuracy, robustness, performance, and hardware overhead. The experimental results indicate that ensemble learning enhances the performance of malware detection for rule-based and tree-based algorithms up to 13%. However, it diminishes the performance of neural network and Bayesian network-based detectors by 6% and 4%, respectively.
Article
Full-text available
Hardware-based malware detectors (HMDs) are a promising new approach to defend against malware. HMDs collect low-level architectural features and use them to classify malware from normal programs. With simple hardware support, HMDs can be always on, operating as a first line of defense that prioritizes the application of more expensive and more accurate software-detector. In this paper, we explore improving the accuracy of HMDs, to improve detection, and reduce overhead. First, we use specialized detectors targeted towards a specific type of malware. Next, we use ensemble learning techniques to improve accuracy by combining detectors. We explore detectors based on logistic regression (LR) and neural networks (NN). The proposed detectors reduce the false-positive rate by more than half compared to using a single detector, while increasing their sensitivity. We develop metrics to estimate detection overhead; the proposed detectors achieve more than 16.6x overhead reduction during online detection compared to a software-only detector, with an 8x improvement in relative detection time. NN detectors outperform LR detectors in accuracy, overhead (by 40%), and time-to-detection of the hardware component (by 5x). Finally, we characterize the hardware complexity by extending an open-core and synthesizing it on an FPGA platform, showing that the overhead is minimal.
Article
Full-text available
Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try to guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access the victim's memory and registers, and can perform operations with measurable side effects. Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim's confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side-channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, such as operating system process separation, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing and side-channel attacks. These attacks represent a serious threat to actual systems because vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices. Although makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.
Article
Full-text available
Cache attacks pose a threat to any code whose execution flow or memory accesses depend on sensitive information. Especially in public clouds, where caches are shared across several tenants, cache attacks remain an unsolved problem. Cache attacks rely on evictions by the spy process, which alter the execution behavior of the victim process. We show that hardware performance events of cryptographic routines reveal the presence of cache attacks. Based on this observation, we propose CacheShield, a tool to protect legacy code by monitoring its execution and detecting the presence of cache attacks, thus providing the opportunity to take preventative measures. CacheShield can be run by users and does not require alteration of the OS or hypervisor, while previously proposed software-based countermeasures require cooperation from the hypervisor. Unlike methods that try to detect malicious processes, our approach is lean, as only a fraction of the system needs to be monitored. It also integrates well into today's cloud infrastructure, as concerned users can opt to use CacheShield without support from the cloud service provider. Our results show that CacheShield detects cache attacks fast, with high reliability, and with few false positives, even in the presence of strong noise.
Conference Paper
Full-text available
For the first time, we practically demonstrate that Intel SGX enclaves are vulnerable against cache-timing attacks. As a case study, we present an access-driven cache-timing attack on AES when running inside an Intel SGX enclave. Using Neve and Seifert's elimination method, as well as a cache probing mechanism relying on Intel PMC, we are able to extract the AES secret key in less than 10 seconds by investigating 480 encrypted blocks on average. The AES implementation we attack is based on a Gladman AES implementation taken from an older version of OpenSSL, which is known to be vulnerable to cache-timing attacks. In contrast to previous works on cache-timing attacks, our attack is executed with root privileges running on the same host as the vulnerable enclave. Intel SGX, however, was designed to precisely protect applications against such root-level attacks. As a consequence, we show that SGX cannot withstand its designated attacker model when it comes to side-channel vulnerabilities. To the contrary, the attack surface for side-channels increases dramatically in the scenario of SGX due to the power of root-level attackers, for example, by exploiting the accuracy of PMC, which is restricted to kernel code.
Article
Full-text available
The scatter–gather technique is a commonly implemented approach to prevent cache-based timing attacks. In this paper, we show that scatter–gather is not constant time. We implement a cache timing attack against the scatter–gather implementation used in the modular exponentiation routine in OpenSSL version 1.0.2f. Our attack exploits cache-bank conflicts on the Sandy Bridge microarchitecture. We have tested the attack on an Intel Xeon E5-2430 processor. For 4096-bit RSA, our attack can fully recover the private key after observing 16,000 decryptions.
Conference Paper
Full-text available
The theoretical construct of a Trusted Third Party (TTP) has the potential to solve many security and privacy challenges. In particular, a TTP is an ideal way to achieve secure multiparty computation---a privacy-enhancing technique in which mutually distrusting participants jointly compute a function over their private inputs without revealing these inputs. Although there exist cryptographic protocols to achieve this, their performance often limits them to the two-party case, or to a small number of participants. However, many real-world applications involve thousands or tens of thousands of participants. Examples of this type of many-party application include privacy-preserving energy metering, location-based services, and mobile network roaming. Challenging the notion that a trustworthy TTP does not exist, recent research has shown how trusted hardware and remote attestation can be used to establish a sufficient level of assurance in a real system such that it can serve as a trustworthy remote entity (TRE). We explore the use of Intel SGX, the most recent and arguably most promising trusted hardware technology, as the basis for a TRE for many-party applications. Using privacy-preserving energy metering as a case study, we design and implement a prototype TRE using SGX, and compare its performance to a previous system based on the Trusted Platform Module (TPM). Our results show that even without specialized optimizations, SGX provides comparable performance to the optimized TPM system, and therefore has significant potential for large-scale many-party applications.
Conference Paper
Full-text available
Although numerous attacks revealed the vulnerability of different PUF families to non-invasive Machine Learning (ML) attacks, the question is still open whether all PUFs might be learnable. Until now, virtually all ML attacks rely on the assumption that a mathematical model of the PUF functionality is known a priori. However, this is not always the case, and attention should be paid to this important aspect of ML attacks. This paper aims to address this issue by providing a provable framework for ML attacks against a PUF family, whose underlying mathematical model is unknown. We prove that this PUF family is inherently vulnerable to our novel PAC (Probably Approximately Correct) learning framework. We apply our ML algorithm on the Bistable Ring PUF (BR-PUF) family, which is one of the most interesting and prime examples of a PUF with an unknown mathematical model. We practically evaluate our ML algorithm through extensive experiments on BR-PUFs implemented on Field-Programmable Gate Arrays (FPGA). In line with our theoretical findings, our experimental results strongly confirm the effectiveness and applicability of our attack. This is also interesting since our complex proof heavily relies on the spectral properties of Boolean functions, which are known to hold only asymptotically. Along with this proof, we further provide the theorem that all PUFs must have some challenge bit positions, which have larger influences on the responses than other challenge bits.
Article
Full-text available
Critical infrastructure components nowadays use microprocessor-based embedded control systems. It is often infeasible, however, to employ the same level of security measures used in general purpose computing systems, due to the stringent performance and resource constraints of embedded control systems. Furthermore, as software sits atop and relies on the firmware for proper operation, software-level techniques cannot detect malicious behavior of the firmware. In this work, we propose ConFirm, a low-cost technique to detect malicious modifications in the firmware of embedded control systems by measuring the number of low-level hardware events that occur during the execution of the firmware. In order to count these events, ConFirm leverages the Hardware Performance Counters (HPCs), which readily exist in many embedded processors. We propose a comparison-based technique to detect malicious modifications in firmwares with simple control-flows. For firmwares with more complex control-flows, we use machine learning techniques to automatically extract the relations among different hardware events. This method significantly reduces the number of pre-stored valid HPC signatures without compromising the detection accuracy. Finally, we reduce the consumption of local resources by implementing a remote-based detection mechanism. We evaluate the detection capability and performance overhead of the proposed technique on various types of firmware running on ARM- and PowerPC-based embedded processors. Experimental results demonstrate its practicality and effectiveness.
Conference Paper
Full-text available
Recent work demonstrated hardware-based online malware detection using only low-level features. This detector is envisioned as a first line of defense that prioritizes the application of more expensive and more accurate software detectors. Critical to such a framework is the detection performance of the hardware detector. In this paper, we explore the use of both specialized detectors and ensemble learning techniques to improve performance of the hardware detector. The proposed detectors reduce the false positive rate by more than half compared to a single detector, while increasing the detection rate. We also contribute approximate metrics to quantify the detection overhead, and show that the proposed detectors achieve more than 11x reduction in overhead compared to a software only detector (1.87x compared to prior work), while improving detection time. Finally, we characterize the hardware complexity by extending an open core and synthesizing it on an FPGA platform, showing that the overhead is minimal.
Article
Full-text available
Security exploits and ensuant malware pose an increasing challenge to computing systems as the variety and complexity of attacks continue to increase. In response, software-based malware detection tools have grown in complexity, thus making it computationally difficult to use them to protect systems in real-time. Therefore, software detectors are applied selectively and at a low frequency, creating opportunities for malware to remain undetected. In this paper, we propose Malware-Aware Processors (MAP)-processors augmented with a hardware-based online malware detector to serve as the first line of defense to differentiate malware from legitimate programs. The output of this detector helps the system prioritize how to apply more expensive software-based solutions. The always-on nature of MAP detector helps protect against intermittently operating malware. We explore the use of different features for classification and study both logistic regression and neural networks. We show that the detectors can achieve excellent performance, with little hardware overhead. We integrate the MAP implementation with an open-source x86-compatible core, synthesizing the resulting design to run on an FPGA.
Conference Paper
Full-text available
In this paper we demonstrate the first real-world cloning attack on a commercial PUF-based RFID tag. The examined commercial PUFs can be attacked by measuring only 4 protocol executions, which takes less than 200 ms. Using a RFID smartcard emulator, it is then possible to impersonate, i.e., “clone” the PUF. While attacking the 4-way PUF used by these tags can be done using traditional machine learning attacks, we show that the tags can still be attacked if they are configured as presumably secure XOR PUFs. We achieved this by using a new reliability-based machine learning attack that uses a divide-and-conquer approach for attacking the XOR PUFs. This new divide-and-conquer approach results in only a linear increase in needed number of challenge and responses for increasing numbers of XORs. This is in stark contrast to the state-of-the-art machine learning attacks on XOR PUFs that are shown to have an exponential increase in challenge and responses. Hence, it is now possible to attack XOR PUF constructs that were previously believed to be secure against machine learning attacks. Since XOR Arbiter PUFs are one of the most popular and promising electrical strong PUF designs, our reliability-based machine learning attack raises doubts that secure and lightweight electrical strong PUFs can be realized in practice.
Conference Paper
Full-text available
Code reuse attacks such as return-oriented programming (ROP) have become prevalent techniques to exploit memory corruption vulnerabilities in software programs. A variety of corresponding defenses has been proposed, of which some have already been successfully bypassed—and the arms race continues. In this paper, we perform a systematic assessment of recently proposed CFI solutions and other defenses against code reuse attacks in the context of C++. We demonstrate that many of these defenses that do not consider object-oriented C++ semantics precisely can be generically bypassed in practice. Our novel attack technique, denoted as counterfeit object-oriented programming (COOP), induces malicious program behavior by only invoking chains of existing C++ virtual functions in a program through corresponding existing call sites. COOP is Turing complete in realistic attack scenarios and we show its viability by developing sophisticated, real-world exploits for Internet Explorer 10 on Windows and Firefox 36 on Linux. Moreover, we show that even recently proposed defenses (CPS, T-VIP, vfGuard, and VTint) that specifically target C++ are vulnerable to COOP. We observe that constructing defenses resilient to COOP that do not require access to source code seems to be challenging. We believe that our investigation and results are helpful contributions to the design and implementation of future defenses against controlflow hijacking attacks.
Conference Paper
Full-text available
Adversaries exploit memory corruption vulnerabilities to hijack a program's control flow and gain arbitrary code execution. One promising mitigation, control-flow integrity (CFI), has been the subject of extensive research in the past decade. One of the core findings is that adversaries can construct Turing-complete code-reuse attacks against coarse-grained CFI policies because they admit control flows that are not part of the original program. This insight led the research community to focus on fine-grained CFI implementations. In this paper we show how to exploit heap-based vul-nerabilities to control the stack contents including security-critical values used to validate control-flow transfers. Our investigation shows that although program analysis and compiler-based mitigations reduce stack-based vulnerabili-ties, stack-based memory corruption remains an open problem. Using the Chromium web browser we demonstrate real-world attacks against various CFI implementations: 1) against CFI implementations under Windows 32-bit by exploiting unprotected context switches, and 2) against state-of-the-art fine-grained CFI implementations (IFCC and VTV) in the two premier open-source compilers under Unix-like operating systems. Both 32 and 64-bit x86 CFI checks are vulnerable to stack manipulation. Finally, we provide an exploit technique against the latest shadow stack implementation.
Conference Paper
Full-text available
Malicious programs, also known as malware, often use code obfuscation techniques to make static analysis more difficult and to evade signature-based detection. To resolve this problem, various behavioral detection techniques have been proposed that focus on the run-time behaviors of programs in order to dynamically detect malicious ones. Most of these techniques describe the run-time behavior of a program on the basis of its data flow and/or its system call traces. Recent work in behavioral malware detection has shown promise in using hardware performance counters (HPCs), which are a set of special-purpose registers built into modern processors providing detailed information about hardware and software events. In this paper, we pursue this line of research by presenting HPCMalHunter, a novel approach for real-time behavioral malware detection. HPCMalHunter uses HPCs to collect a set of event vectors from the beginning of a program's execution. It also uses the singular value decomposition (SVD) to reduce these event vectors and generate a behavioral vector for the program. By applying support vector machines (SVMs) to the feature vectors of different programs, it is able to identify malicious programs in real-time. Our results of experiments show that HPCMalHunter can detect malicious programs at the beginning of their execution with a high detection rate and a low false alarm rate.
Conference Paper
Full-text available
Physically Unclonable Functions (PUFs) are introduced to remedy the shortcomings of traditional methods of secure key storage and random key generation on Integrated Circuits (ICs). Due to their effective and low-cost implementations, intrinsic PUFs are popular PUF instances employed to improve the security of different applications on reconfigurable hardware. In this work we introduce a novel laser fault injection attack on intrinsic PUFs by manipulating the configuration of logic cells in a programable logic device. We present two fault attack scenarios, where not only the effectiveness of modeling attacks can be dramatically increased, but also the entropy of the targeted PUF responses are drastically decreased. In both cases, we conduct detailed theoretical analyses by considering XOR arbiter PUFs and RO PUFs as the examples of PUF-based authenticators and PUF-based random key generators, respectively. Finally we present our experimental results based on conducting laser fault injection on real PUFs, implemented on a common complex programmable logic device manufactured in 180 nm technology.
Article
Major cloud operators offer machine learning (ML) as a service, enabling customers who have the data but not ML expertise or infrastructure to train predictive models on this data. Existing ML-as-a-service platforms require users to reveal all training data to the service operator. We design, implement, and evaluate Chiron, a system for privacy-preserving machine learning as a service. First, Chiron conceals the training data from the service operator. Second, in keeping with how many existing ML-as-a-service platforms work, Chiron reveals neither the training algorithm nor the model structure to the user, providing only black-box access to the trained model. Chiron is implemented using SGX enclaves, but SGX alone does not achieve the dual goals of data privacy and model confidentiality. Chiron runs the standard ML training toolchain (including the popular Theano framework and C compiler) in an enclave, but the untrusted model-creation code from the service operator is further confined in a Ryoan sandbox to prevent it from leaking the training data outside the enclave. To support distributed training, Chiron executes multiple concurrent enclaves that exchange model parameters via a parameter server. We evaluate Chiron on popular deep learning models, focusing on benchmark image classification tasks such as CIFAR and ImageNet, and show that its training performance and accuracy of the resulting models are practical for common uses of ML-as-a-service.
Preprint
Recent research has demonstrated that Intel's SGX is vulnerable to various software-based side-channel attacks. In particular, attacks that monitor CPU caches shared between the victim enclave and untrusted software enable accurate leakage of secret enclave data. Known defenses assume developer assistance, require hardware changes, impose high overhead, or prevent only some of the known attacks. In this paper we propose data location randomization as a novel defensive approach to address the threat of side-channel attacks. Our main goal is to break the link between the cache observations by the privileged adversary and the actual data accesses by the victim. We design and implement a compiler-based tool called DR.SGX that instruments enclave code such that data locations are permuted at the granularity of cache lines. We realize the permutation with the CPU's cryptographic hardware-acceleration units providing secure randomization. To prevent correlation of repeated memory accesses we continuously re-randomize all enclave data during execution. Our solution effectively protects many (but not all) enclaves from cache attacks and provides a complementary enclave hardening technique that is especially useful against unpredictable information leakage.
Conference Paper
Protection of data privacy and prevention of unwarranted information disclosure is an enduring challenge in cloud computing when data analytics is performed on an untrusted third-party resource. Recent advances in trusted processor technology, such as Intel SGX, have rejuvenated the efforts of performing data analytics on a shared platform where data security and trustworthiness of computations are ensured by the hardware. However, a powerful adversary may still be able to infer private information in this setting from side channels such as cache access, CPU usage and other timing channels, thereby threatening data and user privacy. Though studies have proposed techniques to hide such information leaks through carefully designed data-independent access paths, such techniques can be prohibitively slow on models with large number of parameters, especially when employed in a real-time analytics application. In this paper, we introduce a defense strategy that can achieve higher computational efficiency with a small trade-off in privacy protection. In particular, we study a strategy that adds noise to traces of memory access observed by an adversary, with the use of dummy data instances. We quantitatively measure privacy guarantee, and empirically demonstrate the effectiveness and limitation of this randomization strategy, using classification and clustering algorithms. Our results show significant reduction in execution time overhead on real-world data sets, when compared to a defense strategy using only data-oblivious mechanisms.
Conference Paper
Software-based approaches for search over encrypted data are still either challenged by lack of proper, low-leakage encryption or slow performance. Existing hardware-based approaches do not scale well due to hardware limitations and software designs that are not specifically tailored to the hardware architecture, and are rarely well analyzed for their security (e.g., the impact of side channels). Additionally, existing hardware-based solutions often have a large code footprint in the trusted environment susceptible to software compromises. In this paper we present HardIDX: a hardware-based approach, leveraging Intel’s SGX, for search over encrypted data. It implements only the security critical core, i.e., the search functionality, in the trusted environment and resorts to untrusted software for the remainder. HardIDX is deployable as a highly performant encrypted database index: it is logarithmic in the size of the index and searches are performed within a few milliseconds. We formally model and prove the security of our scheme showing that its leakage is equivalent to the best known searchable encryption schemes.
Conference Paper
In modern computer systems, user processes are isolated from each other by the operating system and the hardware. Additionally, in a cloud scenario it is crucial that the hypervisor isolates tenants from other tenants that are co-located on the same physical machine. However, the hypervisor does not protect tenants against the cloud provider and thus the supplied operating system and hardware. Intel SGX provides a mechanism that addresses this scenario. It aims at protecting user-level software from attacks from other processes, the operating system, and even physical attackers. In this paper, we demonstrate fine-grained software-based side-channel attacks from a malicious SGX enclave targeting co-located enclaves. Our attack is the first malware running on real SGX hardware, abusing SGX protection features to conceal itself. Furthermore, we demonstrate our attack both in a native environment and across multiple Docker containers. We perform a Prime+Probe cache side-channel attack on a co-located SGX enclave running an up-to-date RSA implementation that uses a constant-time multiplication primitive. The attack works although in SGX enclaves there are no timers, no large pages, no physical addresses, and no shared memory. In a semi-synchronous attack, we extract 96% of an RSA private key from a single trace. We extract the full RSA private key in an automated attack from 11 traces within 5 minutes.
Conference Paper
Shielded execution based on Intel SGX provides strong security guarantees for legacy applications running on untrusted platforms. However, memory safety attacks such as Heartbleed can render the confidentiality and integrity properties of shielded execution completely ineffective. To prevent these attacks, the state-of-the-art memory-safety approaches can be used in the context of shielded execution. In this work, we first showcase that two prominent software- and hardware-based defenses, AddressSanitizer and Intel MPX respectively, are impractical for shielded execution due to high performance and memory overheads. This motivated our design of SGXBounds---an efficient memory-safety approach for shielded execution exploiting the architectural features of Intel SGX. Our design is based on a simple combination of tagged pointers and compact memory layout. We implemented SGXBounds based on the LLVM compiler framework targeting unmodified multithreaded applications. Our evaluation using Phoenix, PARSEC, and RIPE benchmark suites shows that SGXBounds has performance and memory overheads of 17% and 0.1% respectively, while providing security guarantees similar to AddressSanitizer and Intel MPX. We have obtained similar results with SPEC CPU2006 and four real-world case studies: SQLite, Memcached, Apache, and Nginx.
Conference Paper
Intel Software Guard Extension (SGX) protects the confidentiality and integrity of an unprivileged program running inside a secure enclave from a privileged attacker who has full control of the entire operating system (OS). Program execution inside this enclave is therefore referred to as shielded. Unfortunately, shielded execution does not protect programs from side-channel attacks by a privileged attacker. For instance, it has been shown that by changing page table entries of memory pages used by shielded execution, a malicious OS kernel could observe memory page accesses from the execution and hence infer a wide range of sensitive information about it. In fact, this page-fault side channel is only an instance of a category of side-channel attacks, here called privileged side-channel attacks, in which privileged attackers frequently preempt the shielded execution to obtain fine-grained side-channel observations. In this paper, we present Deja Vu, a software framework that enables a shielded execution to detect such privileged side-channel attacks. Specifically, we build into shielded execution the ability to check program execution time at the granularity of paths in its control-flow graph. To provide a trustworthy source of time measurement, Deja Vu implements a novel software reference clock that is protected by Intel Transactional Synchronization Extensions (TSX), a hardware implementation of transactional memory. Evaluations show that Deja Vu effectively detects side-channel attacks against shielded execution and against the reference clock itself.
Conference Paper
Recent work has investigated the use of hardware performance counters (HPCs) for the detection of malware running on a system. These works gather traces of HPCs for a variety of applications (both malicious and non-malicious) and then apply machine learning to train a detector to distinguish between benign applications and malware. In this work, we provide a more comprehensive analysis of the applicability of using machine learning and HPCs for a specific subset of malware: kernel rootkits. We design five synthetic rootkits, each providing a single piece of rootkit functionality, and execute each while collecting HPC traces of its impact on a specific benchmark application. We then apply machine learning feature selection techniques in order to determine the most relevant HPCs for the detection of these rootkits. We identify 16 HPCs that are useful for the detection of hooking based roots, and also find that rootkits employing direct kernel object manipulation (DKOM) do not significantly impact HPCs. We then use these synthetic rootkit traces to train a detection system capable of detecting new rootkits it has not seen previously with an accuracy of over 99%. Our results indicate that HPCs have the potential to be an effective tool for rootkit detection, even against new rootkits not previously seen by the detector.
Conference Paper
A key requirement for most security solutions is to provide secure cryptographic key storage in a way that will easily scale in the age of the Internet of Things. In this paper, we focus on providing such a solution based on Physical Unclonable Functions (PUFs). To this end, we focus on microelectromechanical systems (MEMS)-based gyroscopes and show via wafer-level measurements and simulations, that it is feasible to use the physical and electrical properties of these sensors for cryptographic key generation. After identifying the most promising features, we propose a novel quantization scheme to extract bit strings from the MEMS analog measurements. We provide upper and lower bounds for the minimum entropy of the derived bit strings and fully analyze the intra- and inter-class distributions across the operation range of the MEMS device. We complement these measurements via Monte-Carlo simulations based on the distributions of the parameters measured on actual devices. We also propose and evaluate a complete cryptographic key generation chain based on fuzzy extractors. We derive a full entropy 128-bit key using the obtained min-entropy estimates, requiring 1219 bits of helper data with an (authentication) failure probability of 4 . 10⁻⁷. In addition, we propose a dedicated MEMS-PUF design, which is superior to our measured sensor, in terms of chip area, quality and quantity of key seed features.
Article
In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attack in about one fifth of the time required to complete it. We could not experience the presence of false positives in our test environment and the overhead caused by the detection systems is negligible. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios.
Conference Paper
We present CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems. CloudRadar operates by correlating two events: first, it exploits signature-based detection to identify when the protected virtual machine (VM) executes a cryptographic application; at the same time, it uses anomaly-based detection techniques to monitor the co-located VMs to identify abnormal cache behaviors that are typical during cache-based side-channel attacks. We show that correlation in the occurrence of these two events offer strong evidence of side-channel attacks. Compared to other work on side-channel defenses, CloudRadar has the following advantages: first, CloudRadar focuses on the root causes of cache-based side-channel attacks and hence is hard to evade using metamorphic attack code, while maintaining a low false positive rate. Second, CloudRadar is designed as a lightweight patch to existing cloud systems, which does not require new hardware support, or any hypervisor, operating system, application modifications. Third, CloudRadar provides real-time protection and can detect side-channel attacks within the order of milliseconds. We demonstrate a prototype implementation of CloudRadar in the OpenStack cloud framework. Our evaluation suggests CloudRadar achieves negligible performance overhead with high detection accuracy.
Conference Paper
The use of Physically Unclonable Functions (PUFs) in cryptographic protocols attracted an increased interest over recent years. Since sound security analysis requires a concise specification of the alleged properties of the PUF, there have been numerous trials to provide formal security models for PUFs. However, all these approaches have been tailored to specific types of applications or specific PUF instantiations. For the sake of applicability, composability, and comparability, however, there is a strong need for a unified security model for PUFs (to satisfy, for example, a need to answer whether a future protocol requirements match a new and coming PUF realization properties). In this work, we propose a PUF model which generalizes various existing PUF models and includes security properties that have not been modeled so far. We prove the relation between some of the properties, and also discuss the relation of our model to existing ones.
Conference Paper
Smart personal devices equipped with a wide range of sensors and peripherals can potentially be misused in various environments. They can be used to exfiltrate sensitive information from enterprises and federal offices or be used to smuggle unauthorized information into classrooms and examination halls. One way to prevent these situations is to regulate how smart devices are used in such restricted spaces. In this paper, we present an approach that robustly achieves this goal for ARM TrustZone-based personal devices. In our approach, restricted space hosts use remote memory operations to analyze and regulate guest devices within the restricted space. We show that the ARM TrustZone allows our approach to obtain strong security guarantees while only requiring a small trusted computing base to execute on guest devices.
Conference Paper
Sharing of functional units inside a processor by two applications can lead to to information leaks and micro-architectural side-channel attacks. Meanwhile, processors now commonly come with hardware performance counters which can count a variety of micro-architectural events, ranging from cache behavior to floating point unit usage. In this paper we propose that the hardware performance counters can be leveraged by the operating system's scheduler to predict the upcoming program phases of the applications running on the system. By detecting and predicting program phases, the scheduler can make sure that programs in the same program phase, i.e. using same type of functional unit, are not scheduled on the same processor core, thus helping to mitigate potential side-channel attacks.
Article
Hardware Performance Counter-based (HPC) runtime checking is an effective way to identify malicious behaviors of malware and detect malicious modifications to a legitimate program's control flow. To reduce the overhead in the monitored system which has limited storage and computing resources, we present a "sample-locally-analyze-remotely" technique. The sampled HPC data are sent to a remote server for further analysis. To minimize the I/O bandwidth required for transmission, the fine-grained HPC profiles are compressed into much smaller vectors with Compressive Sensing. The experimental results demonstrate an 80% I/O bandwidth reduction after applying Compressive Sensing, without compromising the detection and identification capabilities.
Article
We present a side-channel attack based on remanence decay in volatile memory and show how it can be exploited effectively to launch a noninvasive cloning attack against SRAM physically unclonable functions (PUFs)-an important class of PUFs typically proposed as lightweight security primitives, which use existing memory on the underlying device. We validate our approach using SRAM PUFs instantiated on two 65-nm CMOS devices. We discuss countermeasures against our attack and propose the constructive use of remanence decay to improve the cloning resistance of SRAM PUFs. Moreover, as a further contribution of independent interest, we show how to use our evaluation results to significantly improve the performance of the recently proposed TARDIS scheme, which is based on remanence decay in SRAM memory and used as a time-keeping mechanism for low-power clockless devices.
Article
Kernel rootkits are formidable threats to computer systems. They are stealthy and can have unrestricted access to system resources. This paper presents NumChecker, a new virtual machine (VM) monitor based framework to detect and identify control-flow modifying kernel rootkits in a guest VM. NumChecker detects and identifies malicious modifications to a system call in the guest VM by measuring the number of certain hardware events that occur during the system call's execution. To automatically count these events, NumChecker leverages the hardware performance counters (HPCs), which exist in modern processors. By using HPCs, the checking cost is significantly reduced and the tamper-resistance is enhanced. We implement a prototype of NumChecker on Linux with the kernel-based VM. An HPC-based two-phase kernel rootkit detection and identification technique is presented and evaluated on a number of real-world kernel rootkits. The results demonstrate its practicality and effectiveness.