ZeCoRx – Zero Copy Receive

An optimized receive path for Virtual Machines served by virtio

In the KVM hypervisor, incoming packets from the network must pass through several objects in the Linux kernel before being delivered to the guest VM. Currently, both the hypervisor and the guest keep their own sets of buffers on the receive path. For large packets, the overall processing time is dominated by the copying of data from hypervisor buffers to guest buffers.

Some Linux network drivers support zero-copy on transmit (tx) messages. Zero-copy tx avoids the copy of data between VM guest kernel buffers and host kernel buffers, thus improving tx latency.  Buffers in a VM guest kernel for a virtualized NIC are passed through the host device drivers and DMA-d directly to the network adapter, without an additional copy into host memory buffers. Since the tx data from the VM guest is always in-hand, it is quite straight-forward to map the buffer for DMA and to pass the data down the stack to the network adapter driver.

Zero-copy for receive (rx) messages is not yet supported in Linux. A number of significant obstacles must be overcome in order to support zero-copy for rx. Buffers must be prepared to receive data arriving from the network. Currently, DMA buffers are allocated by the low-level network adapter driver. The data is then passed up the stack to be consumed. When rx data arrives, it is not necessarily clear a-priori for whom the data is designated. The data may eventually be copied to VM guest kernel buffers. The challenge is to allow the use of VM guest kernel buffers as DMA buffers, at least when we know that the VM guest is the sole consumer of a particular stream of data.

One solution that has been tried is page-flipping, in which the page with the received data is mapped into the memory of the target host after the data has already been placed in the buffer. The overhead to perform the page mapping is significant, and essentially negates the benefit we wanted to achieve by avoiding the copy (Ronciak, 2004). Our solution requires the introduction of several interfaces that enable us to communicate between the high and low level drivers to pass buffers down and up the stack, when needed. We also need to deal with the case when insufficient VM guest buffers have been made available to receive data from the network.

Expected performance benefits of the proposed solution are such that ZeCoRx technology will be able to achieve general case IO performance similar to that of VMs served by direct assignment methods while allowing full span of virtualization benefits similarly to VMs served by widespread virtualization methods.

This work is currently under implementation. The design has been accepted for publication as an extended abstract and a poster at Systor 2017.