Temporal isolation among virtual machines


Temporal isolation or performance isolation among virtual machine refers to the capability of isolating the temporal behavior of multiple VMs among each other, despite them running on the same physical host and sharing a set of physical resources such as processors, memory, and disks.

Introduction to the problem

One of the key advantages of using virtualization in server consolidation, is the possibility to seamlessly "pack" multiple under-utilized systems into a single physical host, thus achieving a better overall utilization of the available hardware resources. In fact, an entire Operating System, along with the applications running within, can be run in a virtual machine.
However, when multiple VMs concurrently run on the same physical host, they share the available physical resources, including CPU, network adapter, disk and memory. This adds a level of unpredictability in the performance that may be exhibited by each individual VM, as compared to what is expected. For example, a VM with a temporary compute-intensive peak might disturb the other running VMs, causing a significant and undesirable temporary drop in their performance. In a world of computing that is shifting towards cloud computing paradigms where resources may be remotely rented in virtualized form under precise service-level agreements, it would be highly desirable that the performance of the virtualized resources be as stable and predictable as possible.

Possible solutions

Multiple techniques may be used to face with the aforementioned problem. They aim to achieve some degree of temporal isolation across the concurrently running VMs, at the various critical levels of scheduling: CPU scheduling, network scheduling and disk scheduling.
For the CPU, it is possible to use proper scheduling techniques at the hypervisor level to contain the amount of computing each VM may impose on a shared physical CPU or core. For example, on the Xen hypervisor, the BVT, Credit-based and S-EDF schedulers have been proposed for controlling how the computing power is distributed among competing VMs.
To get stable performance in virtualized applications, it is necessary to use scheduler configurations that are not work-conserving.
Also, on the KVM hypervisor, some have proposed using EDF-based scheduling strategies
to maintain stable and predictable performance of virtualized applications. Finally, with a multi-core or multi-processor physical host, it is possible to deploy each VM on a separate processor or core to temporally isolate the performance of various VMs.
For the network, it is possible to use traffic shaping techniques to limit the amount of traffic that each VM can impose on the host. Also, it is possible to install multiple network adapters on the same physical host, and configure the virtualization layer so that each VM may grant exclusive access to each one of them. For example, this is possible with the driver domains of the Xen hypervisor. Multi-queue network adapters exist which support multiple VMs at the hardware level, having separate packet queues associated to the different hosted VMs, such as the Virtual Machine Device Queue devices by Intel. Finally, real-time scheduling of the CPU may also be used for enhancing temporal isolation of network traffic from multiple VMs deployed on the same CPU.
When using real-time scheduling for controlling the amount of CPU resources reserved for each VM, one challenging problem is properly accounting for the CPU time applicable to system-wide activities. For example, in the case of the Xen scheduler, the Dom0 and the driver domains services might be shared across multiple VMs accessing them. Similarly, in the case of the KVM hypervisor, the workload imposed on the host OS due to serving network traffic for each individual guest OS might not be easily distinguishable, because it mainly involves kernel-level device drivers and the networking infrastructure. Some techniques for mitigating such problems have been proposed for the Xen case.
Along the lines of adaptive reservations, it is possible to apply feedback-control strategies to dynamically adapt the amount of resources reserved to each virtual machine to maintain stable performance for the virtualized application.
Following the trend of adaptiveness, in those cases in which a virtualized system is not fulfilling the expected performance levels, it is possible to live-migrate virtual machines while they are running, so as to host them on a more capable physical host.