Meltdown and Spectre
Published on January 6, 2018 by Paul Ciano
Over the past few days, there has been a flurry of news reporting on several, major CPU vulnerabilities. Here are some of the better stories I could find (in chronological order), as well as links to the official researchers and their papers:
The flaws have been named Meltdown and Spectre. Meltdown was independently discovered by three groups—researchers from the Technical University of Graz in Austria, German security firm Cerberus Security, and Google’s Project Zero. Spectre was discovered independently by Project Zero and independent researcher Paul Kocher.
At their heart, both attacks take advantage of the fact that processors execute instructions speculatively. All modern processors perform speculative execution to a greater or lesser extent; they’ll assume that, for example, a given condition will be true and execute instructions accordingly. If it later turns out that the condition was false, the speculatively executed instructions are discarded as if they had no effect.
However, while the discarded effects of this speculative execution don’t alter the outcome of a program, they do make changes to the lowest level architectural features of the processors. For example, speculative execution can load data into cache even if it turns out that the data should never have been loaded in the first place. The presence of the data in the cache can then be detected, because accessing it will be a little bit quicker than if it weren’t cached. Other data structures in the processor, such as the branch predictor, can also be probed and have their performance measured, which can similarly be used to reveal sensitive information.
For systems with Intel chips, the impact is quite severe, as potentially any kernel memory can be read by user programs. It’s this attack that the operating system patches are designed to fix. It works by removing the shared kernel mapping, an operating system design that has been a mainstay since the early 1990s due to the efficiency it provides. Without that shared mapping, there’s no way for user programs to provoke the speculative reads of kernel memory, and hence no way to leak kernel information. But it comes at a cost: it makes every single call into the kernel a bit slower, because each switch to the kernel now requires the kernel page to be reloaded.
The impact of this change will vary wildly depending on workload. Applications that are heavily dependent on user programs and which don’t call into the kernel often will see very little impact; games, for example, should see very little change. But applications that call into the operating system extensively, typically to perform disk or network operations, can see a much more substantial impact. In synthetic benchmarks that do nothing but make kernel calls, the difference can be substantial, dropping from five million kernel calls per second to two-to-three million.
Owners of AMD and ARM systems shouldn’t rest easy, though, and that’s thanks to Spectre. Spectre is a more general attack, based on a wider range of speculative execution features. The paper describes using speculation around, for example, array bounds checks and branches instructions to leak information, with proof-of-concept attacks being successful on AMD, ARM, and Intel systems. Spectre attacks can be used both to leak information from the kernel to user programs, but also from virtualization hypervisors to guest systems.
Moreover, Spectre doesn’t offer any straightforward solution. Speculation is essential to high-performance processors, and while there may be limited ways to block certain kinds of speculative execution, general techniques that will defend against any information leakage due to speculative execution aren’t known.
In the immediate term, it looks like most systems will shortly have patches for Meltdown. At least for Linux and Windows, these patches allow end-users to opt out if they would prefer. The most vulnerable users are probably cloud service providers; Meltdown and Spectre can both in principle be used to further attacks against hypervisors, making it easier for malicious users to break out of their virtual machines.
Longer term, we’d expect a future Intel architecture to offer some kind of a fix, either by avoiding speculation around this kind of problematic memory access or making the memory access permission checks faster so that this time interval between reading kernel memory, and checking that the process has permission to read kernel memory, is eliminated.
Rumors of an undisclosed CPU security issue have been circulating since before LWN first covered the kernel page-table isolation patch set in November 2017. Now, finally, the information is out — and the problem is even worse than had been expected.
All three disclosed vulnerabilities take advantage of the CPU’s speculative execution mechanism. In a simple view, a CPU is a deterministic machine executing a set of instructions in sequence in a predictable manner. Real-world CPUs are more complex, and that complexity has opened the door to some unpleasant attacks.
A CPU is typically working on the execution of multiple instructions at once, for performance reasons. Executing instructions in parallel allows the processor to keep more of its subunits busy at once, which speeds things up. But parallel execution is also driven by the slowness of access to main memory. A cache miss requiring a fetch from RAM can stall the execution of an instruction for hundreds of processor cycles, with a clear impact on performance. To minimize the amount of time it spends waiting for data, the CPU will, to the extent it can, execute instructions after the stalled one, essentially reordering the code in the program. That reordering is often invisible, but it occasionally leads to the sort of fun that caused Documentation/memory-barriers.txt to be written.
Out-of-order execution runs into a challenge whenever the code branches, though. The processor may not yet be able to tell which branch will be taken, so it doesn’t know where to go to execute ahead of the stalled instruction(s). The answer here is “branch prediction”. The processor will make a guess based on past experience with the branch in question and, possibly, explicit guidance from the code (the unlikely() directive used in kernel code, for example). Once the actual branch condition can be evaluated, the processor will determine whether it guessed right. If not, the “speculatively” executed instructions after the branch will be unwound, and everything will proceed as if they had never been run.
A branch-prediction failure should really only lead to slower execution, with no visible side effects. That turns out to not be the case, though, leading to a set of severe information-disclosure vulnerabilities. In particular, speculative instruction execution can cause data to be loaded into the CPU memory cache; timing attacks can then be used to learn which instructions were executed. If speculative execution of kernel code can be controlled by an attacker, the contents of the cache can be used as a covert channel to get data out of the kernel.
What emerges is a picture of unintended processor functionality that can be exploited to leak arbitrary information from the kernel, and perhaps from other guests in a virtualized setting. If these vulnerabilities are already known to some attackers, they could have been using them to attack cloud providers for some time now. It seems fair to say that this is one of the most severe vulnerabilities to surface in some time.
The fact that it is based in hardware makes things significantly worse. We will all be paying the performance penalties associated with working around these problems for the indefinite future. For the owners of vast numbers of systems that cannot be updated, the consequences will be worse: they will remain vulnerable to a set of vulnerabilities with known exploits. This is not a happy time for the computing industry.
For protection against Spectre:
In response to the vulnerabilities that were discovered we developed a novel mitigation called “Retpoline” – a binary modification technique that protects against “branch target injection” attacks. We shared Retpoline with our industry partners and have deployed it on Google’s systems, where we have observed negligible impact on performance.
Red Hat ran extensive Meltdown/Spectre performance benchmarks and found the following performance issues:
- Measureable: 8 percent to 19 percent – Highly cached random memory with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8 percent to 19 percent. Examples include OLTP Workloads (tpc), sysbench, pgbench, netperf (< 256 byte), and fio (random I/O to NvME).
- Modest: 3 percent to 7 percent – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measurable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005, Queries/Hour, and overall analytic timing (sec).
- Small: 2 percent to 5 percent – HPC (High Performance Computing) CPU-intensive workloads are affected the least, with only 2 percent to 5 percent performance impact, because jobs run mostly in user space and are scheduled using cpu-pinning or numa-control. Examples include Linpack NxN on x86 and SPECcpu2006.
- Minimal: Linux accelerator technologies that generally bypass the kernel in favor of user direct access are the least affected, with less than 2 percent overhead measured. Examples tested include DPDK (VsPERF at 64 byte) and OpenOnload (STAC-N). Userspace accesses to VDSO like get-time-of-day are not impacted. We expect similar minimal impact for other offloads.
All these patches are stop-gap measures. As the Spectre white paper states: “While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.”
Or, as CERT put it, “The underlying vulnerability is primarily caused by CPU architecture design choices. Fully removing the vulnerability requires replacing vulnerable CPU hardware.” In other words, to be fully secure, you must replace every computing device you own.
Brace yourself, 2018 is going to be a really hard – and expensive – year for IT.
Originally, the CERT recommendation stated the following (thanks to the Internet Archive’s Wayback Machine for page snapshots):
Now, it only states:
Operating system, CPU microcode updates, and some application updates mitigate these attacks.
“Throw it away and buy a new one” is ridiculous security advice, but it’s what US-CERT recommends. It is also unworkable. The problem is that there isn’t anything to buy that isn’t vulnerable. Pretty much every major processor made in the past 20 years is vulnerable to some flavor of these vulnerabilities. Patching against Meltdown can degrade performance by almost a third. And there’s no patch for Spectre; the microprocessors have to be redesigned to prevent the attack, and that will take years.
These aren’t normal software vulnerabilities, where a patch fixes the problem and everyone can move on. These vulnerabilities are in the fundamentals of how the microprocessor operates.
In their rush to make computers faster, they weren’t thinking about security. They didn’t have the expertise to find these vulnerabilities. And those who did were too busy finding normal software vulnerabilities to examine microprocessors. Security researchers are starting to look more closely at these systems, so expect to hear about more vulnerabilities along these lines.
Spectre and Meltdown are pretty catastrophic vulnerabilities, but they only affect the confidentiality of data. Now that they – and the research into the Intel ME vulnerability – have shown researchers where to look, more is coming – and what they’ll find will be worse than either Spectre or Meltdown. There will be vulnerabilities that will allow attackers to manipulate or delete data across processes, potentially fatal in the computers controlling our cars or implanted medical devices. These will be similarly impossible to fix, and the only strategy will be to throw our devices away and buy new ones.
Intel, AMD, and ARM seem to suffer from the same issues that proprietary software suffers from: a lack of transparency that results in an unethical design which shifts us further away from an ethical society. RISC-V is something we are closely following in the hopes that it can create a future where processor hardware can be as ethical as Free Software—meaning that the user is in control of their own hardware and software, not the developer.
Purism, as a Social Purposes Corporation, will continue to advance along the best paths possible to offer high-end hardware that is as secure as possible, in alignment with our strict philosophy of ethical computing.
Meltdown breaks the most fundamental isolation between user applications and the operating system. This attack allows a program to access the memory, and thus also the secrets, of other programs and the operating system.
If your computer has a vulnerable processor and runs an unpatched operating system, it is not safe to work with sensitive information without the chance of leaking the information. This applies both to personal computers as well as cloud infrastructure. Luckily, there are software patches against Meltdown.
Spectre breaks the isolation between different applications. It allows an attacker to trick error-free programs, which follow best practices, into leaking their secrets. In fact, the safety checks of said best practices actually increase the attack surface and may make applications more susceptible to Spectre
Spectre is harder to exploit than Meltdown, but it is also harder to mitigate. However, it is possible to prevent specific known exploits based on Spectre through software patches.
It’s going to be an interesting year.