We have been studying the issue of malicious hypervisors for quite some time at McAfee Avert Labs and have come up with several techniques to detect whether the system runs on top of a hypervisor or whether there is a piece of code that is trying to initiate a hypervisor. Our work included, of course, analyzing things like Blue Pill and other similar malicious hypervisors.

Last week I was at BlackHat, and it was a very exciting week in terms of Blue Pill and the virtualization rootkits issue in general. During the BlackHat 2007 Briefings in Las Vegas there were three interesting sessions that relate to virtualization system security and rootkits. I attended those three sessions and had a chance to chat some with three presenters. The main points I would emphasize are the following:

  1. Providing a system virtualization facility at the processor level without applying any sound security policy is a serious design flaw.
  2. A malware authors’ job is to leverage system design flaws and hence the virtualization rootkits were very expected, including Blue Pill.
  3. There is no rootkit that is undetectable even if it installs itself as a hypervisor. The challenge is always in how to repair rootkits once they control some layer in the system architecture
  4. There needs to be a more organized effort between hardware virtualization vendors, software hypervisor providers and security companies to ensure the secure deployment of virtualization solutions

Now before I go into what happened during the three sessions at BlackHat, I would like to provide our readers with some background and personal thoughts about this topic. Less than two years ago, both Intel and AMD started to provide virtualization support at the processor level. This support is essentially comprised of a set of processor enhancements that improve traditional software-based virtualization solutions. These integrated features give virtualization software, namely Virtual Machine Monitors (VMMs) and Hypervisors, the ability to take advantage of offloading workloads to the system hardware, enabling more streamlined virtualization software stacks and “near native” performance characteristics. For instance, virtualization-enabled processors allow VMMs to rely on the hardware for isolating and mapping memory between virtual machines. This is achieved by adding another level of indirection for mapping VM-based physical address to host-based physical addresses. Both Intel and AMD also provide an additional level of indirection for mapping VM I/O addresses to host I/O physical address. Virtualizing memory addresses and I/O addresses at the processor level is a great extension that would minimize the work done by today’s software hypervisors. However, in doing that neither Intel nor AMD considered the security risk by providing such a powerful facility in the hardware with no restriction to which software piece could take advantage of it. In theory there have been lots of publications about safer computing initiative and how to use TPM technology to authenticate the piece of software that is initializing the processor into the virtualization mode. But in reality, this was not provided in the first release of the virtualization-aware processors as the hypervisors authentication was not provided at the firmware or BIOS level.

Now think of that with me for a moment – we have now a very powerful un-locked facility in the processor that allows any piece of software running in ring zero (like a device driver) to initialize a processor-supported hypervisor and hence take control of the whole computing environment, including the operating system. Yes, this is true, and it was a serious design flaw. Of course both Intel and AMD designers assumed that operating system kernel developers are the only ones who would care about virtualization and would use that facility provided by their processors, which turned out to be untrue. Joanna Rutkowska (the Blue Pill author) and other people have demonstrated some sample code that would initiate a hypervisor, and since it runs outside the operating a system then it can be considered a rootkit. But as the reader may understand now, there are no secrets there. No undocumented stuff; it is all about a powerful hardware feature that was not protected by any security policy.

Now to make the situation worse, both Intel and AMD are competing in that space and I guess both are trying to get software virtualization vendors to rely on their processor native virtualization support. But software-based hypervisors do more than memory and I/O virtualization. They do binary translation for instance which allows them to control programs execution at the instruction level and control programs response to system interrupts. To accommodate that need, both Intel and AMD provide the ability to exit from the VM to the VMM when a certain instruction is executed or a certain condition takes place inside the VM. For hackers this is a very lucrative feature, so not only can they install a thin hypervisor but they can also control the execution of certain instructions and fake many things from below the operating system, like timestamp counters which used to be a very reliable method for measuring elapsed time. When looking at the Intel and AMD virtualization specification, it does not look like they require many things from the hypervisor. In other words, it is up to the hypervisor to decide on what it wants and what it does not want to virtualize. This by itself lowers the cost of making a malicious hypervisor. Let me conclude this introduction by making the following statements:

  • Providing a hardware based virtualization support without protecting it with sound security policy is a major flaw in the system design!!!;
  • Hardware assisted hypervisors have the freedom to choose which software execution facility to virtualize and control;
  • Blue Pill and other types of malicious hypervisors were anticipated by security experts who are well acquainted with the processor architecture.

I think I have provided quite enough background as well as some personal thoughts on the subject, so let’s move on to talk about what happened at Las Vegas last week. As I said there were three sessions that related to virtualization based malware and Blue Pill:

  1. Don’t Tell Joanna, The Virtualized Rootkit Is Dead,” by Thomas Ptacek, Nate Lawson and Peter Ferrie;
  2. IsGameOver(), anyone?,” by Joanna Rutkowska and Alexander Tereshkin; and
  3. Kick Ass Hypervisoring: Windows Server Virtualization,” by Brandon Baker.

The first session was the “Don’t tell Joanna” on Wednesday morning. The main point we got from that session is that it is very easy to detect virtualization rootkits. Speaking from my experience in the anti-rootkit space over twelve years, including my last project/product offered by McAfee “The McAfee Rootkit Detective”, I totally believe that “there is no rootkit that is undetectable”. I also tried to emphasize that fact in a McAfee podcast recorded before Black Hat. In their session Peter, Thomas and Nate focused more on time-based detection methods by calling an instruction that would cause the system to exit from the VM to the VMM, then measure the time elapsed until the execution is back to the VM and compare that with the regular time taken when running without the hypervisor. I have always liked that time-based approach and it was heavily discussed in Avert Labs some time ago, but we thought of using some other non-time based methods that rely on observing changes made to some processor status and cache fields like TLB (Translation Lookaside Buffers). Anyhow, after the session ended I talked for about an hour or more with Peter Ferrie – I told Peter that it was a very nice presentation and that my personal research findings support their conclusions although I use some different non-time based detection methods. Peter and I were wondering how Joanna would respond in her presentation in the afternoon.

Then came the afternoon and I was sitting there in the second row in front of Joanna. Joanna seemed a little bit nervous when she started her presentation. Initially Joanna picked again on Windows Vista by showing some Visa-signed drivers that allow anyone to write to any kernel memory or modify the MSR (Model Specific Register). That was nice but it is something we see every week at Avert Labs so nothing new in it to me at least. Then came the second part of Joanna’s presentation and she started to say how her Blue Pill rootkit can adjust the time stamp counters in such a way that would not allow any code to detect the overhead of running on top of a hypervisor. I made a comment in the form of a question during the presentation but Joanna said questions would be answered only after she finished the presentation. The point I wanted to make and maybe Joanna is reading this now, is that her argument of being able to fix the time stamp counters is not a strong technical argument for the following reasons:

  1. This would require Blue Pill to emulate all the processor instructions that cause a VM exit and adjust the time stamp counter. Therefore we are no longer talking about a thin hypervisor that intercepts only specific instruction, interrupt, etc. but rather about a heavy hypervisor that would require significant amount of work from Joanna and her team.
  2. The detection code can still issue arbitrary I/O requests to any I/O device that may be doing nothing but causing a VM exit and would then calculate the execution time. This would require Blue Pill to handle requests to I/O devices.
  3. Manipulating time stamp counters does not seem to be a wise thing to do and there might be some device drivers that rely on the validity of those time stamp counters to perform correctly.

During the session I started questioning the value in spending all that time trying to build a Blue Pill that cannot be detected. There are many factors to consider like:

  1. One day soon either hardware systems or operating systems will ship by default with a hypervisor. That hypervisor would have to be the first hypervisor and would not allow nested hypervisors. Intel has already produced the Intel AMT/vPro systems that ship with a hypervisor. Microsoft is soon to release the next version of its server platform that has a built-in hypervisor.
  2. There are only a few commercial hypervisors and most provide some interface to the VM to communicate with the hypervisor if it exists. This interface can be used to authenticate the hypervisor. Security software can decide to halt the system if the system is not running on a hypervisor that is trusted by the company security policy. McAfee as a security company certainly encourages hypervisor vendors to pay more attention to those interfaces and make them solid enough to be used by security software running inside the VM.
  3. Maybe Joanna can still claim that Blue Pill will emulate that commercial hypervisor interface, which is another layer in the system that would be emulated to hide its presence. Still we have a valid question: “what is this all about”. Eventually and very soon there will be only certain hypervisors that are trusted by the firmware and that’s it.

Anyhow, I felt kind of bored in the middle of the presentation and started to write a simple detection method that is not time-based and would definitely detect if the system is running on top of a hypervisor or not. This technique is based on some research I was doing less than a year ago at Avert Labs. Here is a scanned image of my hand writing of that approach made during Joana’s presentation.

Link to my Blue Pill notes here.

This detection method relies also on another major design flaw in the existing processor architecture. Here is some technical background: processors use TLBs (Translation Lookaside buffers) to cache the mapping from virtual (more accurately linear) addresses to physical addresses. But in doing that processors need to know where to get the address translation or mapping from. Well the mapping is stored inside the PTE (Page Table Entries). But the question is who would fill those entries inside the PTE? Well presumably (at least by the system designer) it’s the operating system of course. But guess what? PTEs themselves are writable and any code running in ring zero (like a device driver) can modify PTEs and hence change the mapping of linear addresses to physical addresses. Hah, this is the trick, and here is how the detection code works:

  1. Allocate large contiguous block of non-paged memory;
  2. Fill that allocated memory with character ‘A’;
  3. Allocate another contiguous block of non-paged memory of the same size like block ‘A’;
  4. Fill that second allocated memory with character ‘B’;
  5. Freeze the execution of the operating system (do not ask how but we can do it);
  6. Invalidate all TLB entries. There are processor instructions for that which could be as simple as moving execution “cr3, system_page_directory_table_address”;
  7. Read the first byte of each page in the allocated ‘A’. This would cause those entries to be added to the processor TLB cache;
  8. Change the mapping of the allocated ‘A’ pages to point to physical memory holding pages ‘B’. This means that what the processor uses inside the TLBs is not what is there in the PTE;
  9. Call any instruction that would cause an exit to the hypervisor if it exists like CPUID. Exiting from the VM to the VMM causes the TLBs to be invalidated or cleared; and
  10. Try to read the virtual memory of the first allocated block. If you see character ‘A’ then it means that the processor found entries in the TLBs and hence those entries were not cleared among an exit from the VM to the VMM. If it reads B, then it means that the TLB entries were invalidated due to the existence of the hypervisor and the processor has to use PTEs again to get the mapping from virtual to physical.

I wrote those steps briefly in my BlackHat conference block note and waited for the session to end. Then to my surprise just before the end of the presentation Joanna had a slide that mentioned a detection method similar to mine but without the step that freezes the system. I kind of felt proud of myself, of course, and showed the person next to me that I had it written in my block note. Anyhow, after briefly embracing that detection method Joanna said that it does not work and the people who came up with it did not try it. Well, that was too much! I have been researching that space for quite some time and I know it works!

After Joanna finished her presentation, off course, with no room for asking questions or making comments I felt that maybe I needed to talk with her. I waited until the crowd around Joanna was reduced to few people that included my friend Peter Ferrie, and I went to talk to Joanna. I told her “Joanna, this detection method that you mentioned at the end of your presentation should work and we have tried similar things.” Joanna looked at me and said no it does not. I said well I know it works. She then grabbed my conference ID and looked at my name while asking me who I am. I said Ahmed Sallam from McAfee Avert Labs. Joanna said she did not know that McAfee is working on that and I told her that we have been researching that area for some time. She then asked how it worked, I said that this is not a subject to be discussed in front in a crowd. But in all cases, Joanna, we can detect the Blue Pill so you may stop claiming that it is undetectable.

That was the end of the first day at Black Hat and I started to feel that we have been putting too much energy into something that may not deserve all the time and effort that we have been putting into it.

Now let’s get to the third session which was the “Kick Ass Hypervisoring: Windows Server Virtualization” by Brandon Baker from Microsoft, the following day. I went very excited to the session waiting for Microsoft to outline their plan for how to secure the hypervisor or to leverage the hypervisor for having better security. I heard none of that. As a matter of fact, Microsoft said that they are not utilizing the processor-based DMA remapping feature which allows true isolation of physical memory and hence protect against DMA-based physical memory attacks. We certainly understand that Microsoft is working hard to build its new hypervisor but we need to hear some good answers on Microsoft plans to make its hypervisor truly secured.

I hope that our blog readers now have a better understanding of this serious topic and would like to conclude this post by re-emphasizing on the importance of having an organized effort between hardware virtualization vendors, software hypervisor providers and security companies to ensure the secure deployment of virtualization solutions.