PreprintPDF Available

Relaxed virtual memory in Armv8-A (extended version)

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Virtual memory is an essential mechanism for enforcing security boundaries, but its relaxed-memory concurrency semantics has not previously been investigated in detail. The concurrent systems code managing virtual memory has been left on an entirely informal basis, and OS and hypervisor verification has had to make major simplifying assumptions. We explore the design space for relaxed virtual memory semantics in the Armv8-A architecture, to support future system-software verification. We identify many design questions, in discussion with Arm; develop a test suite, including use cases from the pKVM production hypervisor under development by Google; delimit the design space with axiomatic-style concurrency models; prove that under simple stable configurations our architectural model collapses to previous "user" models; develop tooling to compute allowed behaviours in the model integrated with the full Armv8-A ISA semantics; and develop a hardware test harness. This lays out some of the main issues in relaxed virtual memory bringing these security-critical systems phenomena into the domain of programming-language semantics and verification with foundational architecture semantics. This document is an extended version of a paper in ESOP 2022, with additional explanation and examples in the main body, and appendices detailing our litmus tests, models, proofs, and test results.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Operating system (OS) kernels achieve isolation between user-level processes using hardware features such as multi-level page tables and translation lookaside buffers (TLBs). The TLB caches address translation, and therefore correctly controlling the TLB is a fundamental security property of OS kernels—yet all large-scale formal OS verification projects we are aware of leave the correct functionality of TLB as an assumption. In this paper, we present a verified sound abstraction of a detailed concrete model of the memory management unit (MMU) of the ARMv7-A architecture. This MMU abstraction revamps our previous address space specific MMU abstraction to include new software-visible TLB features such as caching of globally-mapped and partial translation entries in a two-stage TLB. We use this abstraction as the underlying model to develop a logic for reasoning about low-level programs in the presence of cached address translation. We extract invariants and necessary conditions for correct TLB operation that mirrors the informal reasoning of OS engineers. We systematically show how these invariants adapt to global and partial translation entries. We show that our program logic reduces to a standard logic for user-level reasoning, reduces to side-condition checks for kernel-level reasoning, and can handle typical OS kernel tasks such as context switching.
Thesis
Full-text available
Operating system (OS) kernels achieve isolation between user-level processes using multi-level page tables. The hardware-implemented translation lookaside buffer (TLB) caches page table walks, and therefore the TLB and its consistency with memory are security critical for OS kernels, including formally verified kernels such as seL4. If performance is paramount, this consistency can be subtle to achieve; yet, all major formally verified kernels currently leave the TLB as an assumption. They assume correct TLB management because faithfully modeling the hardware details of a TLB would significantly complicate the program logic used to verify the OS code. For instance, a simple memory read operation would now change the state of the program. In this thesis, we present a formal model of the memory management unit (MMU) in the interactive proof assistant Isabelle/HOL for the ARMv7-A architecture which includes the TLB, its maintenance operations, and its derived properties. We integrate this specification into the Cambridge ARM model. We derive sufficient conditions for TLB consistency, and we abstract away the functional details of the MMU using data refinement for simpler reasoning about executions in the presence of cached address translation, including complete and partial walks. Based on the verified abstraction of the MMU model of the ARMv7-A architecture, we present a logic in Isabelle/HOL for reasoning about low-level programs in the presence of cached address translation. We extract invariants and necessary conditions for correct TLB operation that mirror the informal reasoning of OS engineers. We show that our program logic reduces to a standard logic for user-level reasoning, reduces to side-condition checks for kernel-level reasoning, and can handle typical OS kernel tasks such as context switching and page table manipulations. This research removes the unnecessary TLB complexities from program reasoning, and provides a reasoning framework for validating TLB management in OS kernel verification.
Article
Full-text available
Architecture specifications notionally define the fundamental interface between hardware and software: the envelope of allowed behaviour for processor implementations, and the basic assumptions for software development and verification. But in practice, they are typically prose and pseudocode documents, not rigorous or executable artifacts, leaving software and verification on shaky ground. In this paper, we present rigorous semantic models for the sequential behaviour of large parts of the mainstream ARMv8-A, RISC-V, and MIPS architectures, and the research CHERI-MIPS architecture, that are complete enough to boot operating systems, variously Linux, FreeBSD, or seL4. Our ARMv8-A models are automatically translated from authoritative ARM-internal definitions, and (in one variant) tested against the ARM Architecture Validation Suite. We do this using a custom language for ISA semantics, Sail, with a lightweight dependent type system, that supports automatic generation of emulator code in C and OCaml, and automatic generation of proof-assistant definitions for Isabelle, HOL4, and (currently only for MIPS) Coq. We use the former for validation, and to assess specification coverage. To demonstrate the usability of the latter, we prove (in Isabelle) correctness of a purely functional characterisation of ARMv8-A address translation. We moreover integrate the RISC-V model into the RMEM tool for (user-mode) relaxed-memory concurrency exploration. We prove (on paper) the soundness of the core Sail type system. We thereby take a big step towards making the architectural abstraction actually well-defined, establishing foundations for verification and reasoning.
Article
Full-text available
Emerging non-volatile memory (NVM) technologies promise the durability of disks with the performance of volatile memory (RAM). To describe the persistency guarantees of NVM, several memory persistency models have been proposed in the literature. However, the formal semantics of such persistency models in the context of existing mainstream hardware has been unexplored to date. To close this gap, we integrate the buffered epoch persistency model with the 'total-store-order' (TSO) memory model of the x86 and SPARC architectures. We thus develop the PTSO ('persistent' TSO) model and formalise its semantics both operationally and declaratively. We demonstrate that the two characterisations of PTSO are equivalent. We then formulate the notion of persistent linearisability to establish the correctness of library implementations in the context of persistent memory. To showcase our formalism, we develop two persistent implementations of a queue library, and apply persistent linearisability to show their correctness.
Conference Paper
Full-text available
The main security mechanism for enforcing memory isolation in operating systems is provided by page tables. The hardware-implemented Translation Lookaside Buffer (TLB) caches these, and therefore the TLB and its consistency with memory are security crit- ical for OS kernels, including formally verified kernels such as seL4. If performance is paramount, this consistency can be subtle to achieve; yet, all major formally verified ker- nels currently leave the TLB as an assumption. In this paper, we present a formal model of the Memory Management Unit (MMU) for the ARM architecture which includes the TLB, its maintenance operations, and its derived properties. We integrate this specification into the Cambridge ARM model. We derive sufficient conditions for TLB consistency, and we abstract away the functional details of the MMU for simpler reasoning about executions in the presence of cached address translation, including complete and partial walks.
Thesis
Previous work has established precise operational concurrency models for Power and ARMv8, in an abstract micro-architectural style based on detailed discussion with IBM and ARM staff and extensive hardware testing. To account for the precise architectural behaviour these models are complex. This thesis aims to provide a better understanding for the relaxed memory concurrency models of the architectures ARMv8, RISC-V, and (to a lesser degree) Power. Power and early versions of ARMv8 have non-multicopy-atomic (non-MCA) concurrency models. This thesis provides abstraction results for these, including a more abstract non-MCA ARMv8 storage subsystem model, and characterisations of the behaviour of mixed-size Power and non-MCA ARMv8 programs when using barriers or release/acquire instructions for all memory accesses, with respect to notions of Sequential Consistency for mixed-size programs. During the course of this PhD project, and partly due to our extended collaboration with ARM, ARM have shifted to a much simplified multicopy-atomic concurrency architecture that also includes a formal axiomatic concurrency model. We develop a correspondingly simplified operational model based on the previous non-MCA models, and, as the main result of this thesis, prove equivalence between the simplified operational and the reference axiomatic model. We have also been actively involved in the RISC-V Memory Model Task Group. RISC-V has adopted a multicopy atomic model closely following that of ARMv8, but which incorporates some changes motivated by issues raised in our operational modelling of ARMv8. We develop an adapted RISC-V operational concurrency model that is now part of the official architecture documentation. Finally, in order to give a simpler explanation of the MCA ARMv8 and RISC-V concurrency models for programmers, we develop an equivalent operational concurrency model in a different style. The \promisingarmriscv model, based on the C11 Promising model, gives up the micro-architectural intuition the other operational models offer in favour of providing a more abstract model. We prove it equivalent to the MCA ARMv8 and RISC-V axiomatic models in Coq.