Paravirtualization has been commonly used in virtualized environments to improve system efficiency and to optimize management workloads. In the era of Performance Cloud Computing and Big Data Use Cases, cloud providers and data centers focus on developing paravirtualization solutions that provide fast and efficient I/O. Thanks to its nature of high bandwidth, low latency and kernel bypass, Remote Direct Memory Access (RDMA) interconnects are now widely adopted in HPC and Cloud centers as an I/O performance booster. To benefit from these advantages that RDMA offer, network communication supporting InfiniBand and RDMA over Converged Ethernet (RoCE) must be made ready for the underlying virtualized devices.
To enable such communications for virtualized devices, experts at Huawei’s European Research Center (ERC) Munich are developing new para-virtual device drivers for RDMA-capable fabrics, called virtio RDMA (vRDMA). Our vRDMA solution aims to disrupt the overhead barrier preventing HPC Cloud adoption, enabling HPC applications to run with a performance comparable to bare metal, yet enjoying all the benefits of the Cloud: agility, cost-efficiency, flexibility, high-bandwidth and low-latency for I/O.
This work is going to develop techniques and mechanisms accelerating the virtual I/O and improving the scalability of multiple virtual machines running on a multi-core host , see figure 1. Within the Mikelangelo project, we proposed three prototypes of vRDMA solution, addressing different hardware requirement and needs. Prototype develops vRDMA solutions that support socket based API. Prototype II aims to support guest applications that directly use RDMA verbs. Future work of prototype III will focus on combining the two prototypes to support both socket and RDMA API with automatic selection of communication protocols, i.e. RDMA for inter-host communication or shared memory for intra-host communication, while the guest application is unaware of which protocol is being used.
Our vRDMA solutions will not only provide minimized overhead for conventional socket-based interfaces, but also enable programming models relying on RDMA (i.e. InfiniBand verbs API). As a result it reduces the performance overhead incurred by virtualization, makes Cloud and High Performance Computing significantly more efficient, hence facilitating Big Data uses cases to run in a virtualized HPC environments. Small Medium Enterprises (SMEs) and companies who are currently unable to deploy their own HPC infrastructure will benefit from our solution and be able to explore and scale their workloads to cloud providers, thus accelerating time to market and improving competitiveness.
By the end of year two, we have complete the first two prototypes of the proposed design. Figure 2 shows an example of performance testing while using vRDMA over InfiniBand Mode. Using our implementation, the write bandwidth between two OSv guests yields a comparable results to the results of using bare metal (blue bars, Linux to Linux). Further optimization and analysis will be carried out to optimize the usage of our prototypes.
Figure 1. Final design of vRDMA prototypes.
Figure 2. Performance tests on write_bandwidth using virtualized RDMA over infiniband.
Types of interconnects we plan to support:
- RDMA over Converged Ethernet
Our vRDMA solution supports the following software configuration on guest/host side:
- Ubuntu 14.04
- DPDK 2.1.0
- Open vSwitch 2.4.0
- QEMU 2.3.0
- libvirt 1.2.19
- Ubuntu 14.04 / OSv
- 13 The First sKVM hypervisor architecture
- 16 The First OSv Guest Operating System MIKELANGELO architecture
- 20 The intermediate MIKELANGELO architecture
- 1 The First Super KVM – Fast virtual I/O hypervisor
- 1 The First Report on I/O Aspects
- 5 OSv-Guest Operating System- Intermediate Version