The Importance of IO Efficiency

IOCM is a major technology of the sKVM hypervisor developed in Mikelangelo, providing a  virtual I/O performance superior to the vanilla KVM hypervisor. We’ve explained the importance of efficient I/O for virtual machines in the following two blogs:

The new developments in the areas of IoT, Big Data and HPC further increase  the importance of I/O even more, even within the HPC world where the compute power has historically been  perceived as the dominant and almost the only factor. New scenarios emerge, where millions of devices send metrics into computation centers, generating Big Data on which HPC applications work and provide insights. In such scenarios, the ability to consume and process this vast amount of messages is highly dependant on efficient I/O.   

In this document we describe some of the technical aspects of the IOCM technology.

The Basics

IOcm introduces efficiency improvements to the I/O subsystem of the hypervisor. By using a shared I/O processing thread, the hypervisor is then able to take control of its own decisions regarding the scheduling policies of I/O processing. By relieving the general Linux thread scheduler, and transferring the responsibility to the hypervisor, the system gains a higher level of control over I/O traffic, and less wasted overhead of thread context switching.

SchedulerComparison

The shared I/O processing thread allows the hypervisor to control processing per virtual device very precisely, allowing the hypervisor to make rapid scheduling decisions in response to a changing environment. This alleviates thread starvation, and threads that hold the CPU despite not having any active I/O requests.

The Dynamicity

IOcm provides a mechanism to control the aforementioned shared I/O threads. It enables low-level functionality such as creating and destroying I/O threads (vhost [3]), and migrating devices between I/O threads.

At a higher level, the main component of IOcm is the monitor, which makes decisions about resource allocation to ensure maximum I/O performance.

The monitor takes a system-wide view for balancing between the I/O and the computation requirements. It periodically reads statistics such as the throughput, and uses these statistics to determine the optimum configuration of the I/O subsystem of the hypervisor (vhost thread).

As you can see in the experiment results for running Apache bench and a more dynamic workload, the importance of dynamic management of the side cores is evident. In the lower graph you can see even a point where a static cores assignment fell even shorter than the baseline (vanilla KVM), this is due to not adjusting to the dynamicity of the workload resource requirements.

Results

The Interface

IOCM is implemented within a kernel module called vhost. The vhost module is controlled from user-space through an API implemented using sysfs[4]. Sysfs is a common mechanism in the Linux kernel for providing information and control points to user space. Specific data is exposed through a set of virtual files which can be used to understand the state of vhost, and modify its behaviour accordingly. The files exposed through sysfs appear in the file system as regular files owned by ’root’, and are subject to the same treatment as other files (ownership, access controls, etc). An application in user space can communicate with the vhost module by reading and writing these sysfs files. Together, these files constitute the vhost IOcm interface (API).

Evaluation with IBM MessageSight

IBM MessageSight is the bridge between the IoT devices that generate data and the consumers within the enterprise up to the HPC/analytics applications that consume the data. The MessageSight bridging is planned to be provided in several deployment models, including as-a-Service, where a tenant receives a dedicated, hosted MessageSight cluster running on VMs over the KVM hypervisor. Due to the expected high throughput of IoT messages, the incoming I/O performance is the major constraint on the number of MessageSight VMs per physical server. Thus, the performance improvement of the incoming I/O in sKVM will enable more MessageSight VMs per physical server, and as a result, will improve the efficiency of the MessageSight service per a given number of physical servers.

Evaluation tesbed The experiments were done with one (or in some cases two) servers running benchmarking tools, which were connected back-to-back (i.e. no network switch) over a 10GbE link to a third server which hosted the MessageSight VMs (Figure 8). The experiments were conducted with a MqttBench, a proprietary IBM tool used for benchmarking MessageSight by simulating a large number of IoT devices sending messages over the mqtt protocol. The MessageSight host was an IBM System x3650 M4, with 2 sockets, 8 cores Intel(R) Xeon(R) CPU E5-2680 2.70GHz, HyperThreading enabled, 128G memory, Intel 82599ES 10-Gigabit Ethernet network card. The host ran Linux with a 3.18 kernel, sKVM (with IOcm support and a vhost backend) and QEMU version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.21). We ran a series of experiments on MessageSight running in VMs over sKVM to simulate the as-a-Service deployment model with the IOcm technology, and compared it to the standard KVM (vanilla). These experiments show a significant improvement in the maximum total message throughput obtained by a cluster of MessageSight VMs running on a single physical server.

Evaluation The experiments were conducted to determine maximum message rates for different numbers of connections in various cluster sizes (numbers of MessageSight VMs). The results showed an increased message rate enabled by sKVM with the IOcm technology in the range of 20-50%. The high throughput is gained through both the number of connections (up to 100K) and increased message rate per VM.

In figures below, the X axis shows the requested message rate in each experiment, and the Y axis represents the actual message rate attained by MessageSight processing. The blue bars represents the ideal state where the full requested rate is achieved. It can be seen that starting at 220K msg/sec the MessageSight running on the vanilla KVM is starting to get into trouble, while the IOcm enables a reasonable operation.

Following is the result of simulating 100K (separate) TCP connections, sending at a requested message rate overall (across these 100K connections), against one MessageSight VM consisting of 8 vCPUs, 32GB of memory, and a multiqueue virtio NIC with 8 queues.

MessageSight experiment set against 1 big VM

Following is the result diagram of simulating 100K IoT devices, each with its own persistent TCP connection (100K TCP connections), against 6 MessageSight VMs, each consisting of 2 vcpus, 32GB of memory, and a virtio NIC (with 1 queue).

MessageSight experiment set against 6 small VMs

Conclusion

IOCM is a technology that that increases the efficiency of I/O intensive workloads running inside virtual machines. It relieves part of the overhead in I/O processing resulting from the mixture of I/O and compute threads by assigning dedicated cores for I/O, and dynamically adjusts the number of I/O cores according to the workload’s changing resource requirements. Experimental results show promising improvements, both for standard benchmarks such as Netperf and Apache HTTP server, and for the real life application IBM MessageSight. For IBM MessageSight, the results show, as expected, the importance of IOcm for heavy I/O applications with throughput intensive streams of small packets.

We conclude that despite impressive results with small packet sizes, we must continue to search other directions to improve I/O intensive applications that use larger packet sizes.

Links

[1] MIKELANGELO Report D2.13 The first sKVM hypervisor architecture

[2] MIKELANGELO Report D3.1 The First Super KVM – Fast virtual I/O hypervisor

[3] vhost