virtio-comment.lists.oasis-open.org archive mirror
 help / color / mirror / Atom feed
From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: virtio-comment@lists.oasis-open.org
Cc: hans@linux.alibaba.com, herongguang@linux.alibaba.com,
	zmlcc@linux.alibaba.com, dust.li@linux.alibaba.com,
	tonylu@linux.alibaba.com, zhenzao@linux.alibaba.com,
	helinguo@linux.alibaba.com, gerry@linux.alibaba.com,
	xuanzhuo@linux.alibaba.com, mst@redhat.com, cohuck@redhat.com,
	jasowang@redhat.com, Jan Kiszka <jan.kiszka@siemens.com>,
	wintera@linux.ibm.com, kgraul@linux.ibm.com,
	wenjia@linux.ibm.com, jaka@linux.ibm.com, hca@linux.ibm.com,
	twinkler@linux.ibm.com, raspl@linux.ibm.com,
	virtio-dev@lists.oasis-open.org, pasic@linux.ibm.com
Subject: [virtio-comment] [PATCH v4 0/1] introduce virtio-ism: internal shared memory device
Date: Thu, 18 May 2023 16:09:18 +0800	[thread overview]
Message-ID: <20230518080919.48797-1-xuanzhuo@linux.alibaba.com> (raw)

Hello everyone,

# Background

    Nowadays, there is a common scenario to accelerate communication between
    different VMs and containers, including light weight virtual machine based
    containers. One way to achieve this is to colocate them on the same host.
    However, the performance of inter-VM communication through network stack is
    not optimal and may also waste extra CPU cycles. This scenario has been
    discussed many times, but still no generic solution available [1] [2] [3].

    We also have a lot of such scenarios internally, except for general network
    communication, there are also many application scenarios of shared
    memory. Due to various reasons, it is difficult for us to realize these
    business data using network communication. For example, in some scenarios,
    the application needs to exchange a large amount of data with the physical
    device on the host, so shared memory is the most suitable solution.

    Shared memory is an efficient communication method, so we hope to implement
    a cross-vm shared memory method. We were inspired by the IBM ism device[4],
    we use virtio-ism to achieve memory sharing between vm on the same host.

# virtio-ism

    An ISM(Internal Shared Memory) device provides the ability to access memory
    shared between multiple devices. This allows low-overhead communication in
    presence of such memory. For example, memory can be shared with guests of
    multiple virtual machines running on the same host, with each virtual
    machine including an ism device and with the guests getting the shared
    memory by the ism devices.

    An ism device can communicate with multiple peers simultaneously. This
    communication can be dynamically started and ended.

    This is a structure diagram based on ism sharing between two vms.

    |-------------------------------------------------------------------------------------------------------------|
    | |------------------------------------------------|       |------------------------------------------------| |
    | | Guest                                          |       | Guest                                          | |
    | |                                                |       |                                                | |
    | |   ----------------                             |       |   ----------------                             | |
    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
    | |    |  |                -------------------     |       |    |  |                --------------------    | |
    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
    | |    |  |                -------------------     |       |    |  |                --------------------    | |
    | |                                |               |       |                               |                | |
    | |                                |               |       |                               |                | |
    | | Qemu                           |               |       | Qemu                          |                | |
    | |--------------------------------+---------------|       |-------------------------------+----------------| |
    |                                  |                                                       |                  |
    |                                  |                                                       |                  |
    |                                  |------------------------------+------------------------|                  |
    |                                                                 |                                           |
    |                                                                 |                                           |
    |                                                   --------------------------                                |
    |                                                    | M1 |   | M2 |   | M3 |                                 |
    |                                                   --------------------------                                |
    |                                                                                                             |
    | HOST                                                                                                        |
    ---------------------------------------------------------------------------------------------------------------

    On the top, we found that for the existing tcp network communication
    scenario, if it is replaced with smc + shared memory, a great performance
    improvement can also be obtained. And for smc, user processes just need to
    do little modification.
      - latency reduced by about 50%
      - throughput increased by about 300%
      - CPU consumption reduced by about 50%

    Since there is no particularly suitable shared memory management solution
    matches the need for SMC(See ## Comparison with existing technology), and
    virtio is the standard for communication in the virtualization world, we
    want to implement a virtio-ism device based on virtio, which can support
    on-demand memory sharing across VMs, containers or VM-container. To match
    the needs of SMC, the virtio-ism device need to support:

    1. Dynamic provision: shared memory regions are dynamically allocated and
       provisioned.
    2. Multi-region management: the shared memory is divided into regions,
       and a peer may allocate one or more regions from the same shared memory
       device.
    3. Permission control: the permission of each region can be set separately.
    4. Dynamic connection: each ism region of a device can be shared with
       different devices, eventually a device can be shared with thousands of
       devices

## Live Migration

    If two VMs is migrated from the same host to two different physical hosts,
    it is impossible to share memory, so we will not consider supporting
    migration for the time being.

# Comparison with existing technology

## ivshmem or ivshmem 2.0 of Qemu

   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices
      that use this VM, so the security is not enough.

   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by
      all other VMs that use the ivshmem 2.0 shared memory device, which also
      does not meet our needs in terms of security.

## vhost-pci and virtiovhostuser

    1. does not support dynamic allocation
    2. one device just support connect to one vm

# Usage
    This is the usage steps by the user process.

    |                                                | user process syscall                     | driver to device
 ---|------------------------------------------------|------------------------------------------|-------------------------------
  1 | got memory and token                           | ioctl(fd, VIRTIO_ISM_IOCTL_ALLOC, &ctl)  | VIRTIO_ISM_CTRL_ALLOC_REGION
 ---|------------------------------------------------|------------------------------------------|-------------------------------
  2 | send token to peer process                     |                                          |
 ---|------------------------------------------------|------------------------------------------|-------------------------------
  3 | got shared memory(two process share the memory)| ioctl(fd, VIRTIO_ISM_IOCTL_ATTACH, &ctl) | VIRTIO_ISM_CTRL_ATTACH_REGION
 ---|------------------------------------------------|------------------------------------------|-------------------------------
  4 | notify peer process                            | ioctl(fd, VIRTIO_ISM_IOCTL_KICK)         | write notify area
 ---|------------------------------------------------|------------------------------------------|-------------------------------
  5 | receive notify from other process              | wakeup by select/epoll/....              | driver recv interrupt
 ---|------------------------------------------------|------------------------------------------|-------------------------------
  6 | release the reference to the shared memory     | ioctl(fd, VIRTIO_ISM_IOCTL_DETACH, &ctl) | VIRTIO_ISM_CTRL_DETACH_REGION
 ---|------------------------------------------------|------------------------------------------|-------------------------------

# POC CODE

    There are no functions related to eventq and perm yet.
    This implementation is for V2 version spec. So some details are not match
    this version.

    Qemu   (virtio ism device): https://github.com/fengidri/qemu/compare/7d66b74c4dd0d74d12c1d3d6de366242b13ed76d...ism-upstream-1216?expand=1
    Kernel (virtio ism driver): https://github.com/fengidri/linux-kernel-virtio-ism/compare/6f8101eb21bab480537027e62c4b17021fb7ea5d...ism-upstream-1223

    Start qemu with option "--device virtio-ism-pci,disable-legacy=on, disable-modern=off".

### User Space APP

    The ism driver provide /dev/vismX interface, allow users to use Virtio-ISM
    device in user space directly.

    Try tools/virtio/virtio-ism/virtio-ism-mmap

    Usage:
         cd tools/virtio/virtio-ism/; make
         insmode virtio-ism.ko

    case1: communicate

       vm1: ./virtio-ism-mmap alloc -> token
       vm2: ./virtio-ism-mmap attach -t <token> --write-msg AAAA --commit

       vm2 will write msg to shared memory, then notify vm1. After vm1 receive
       notify, then read from shared memory.

    case2: ping-pong test.

        vm1: ./virtio-ism-mmap server
        vm2: ./virtio-ism-mmap -i 192.168.122.101 pp

        1. server alloc one ism region
        2. client get the token by tcp

        3. client commit(kick) to server, server recv notify, commit(kick) to client
        4. loop #3

    case3: throughput test.

        vm1: ./virtio-ism-mmap server
        vm2: ./virtio-ism-mmap -i 192.168.122.101 tp

        1. server alloc one ism region
        2. client get the token by tcp

        3. client write 1M data to ism region
        4. client commit(kick) to server
        5. server recv notify, copy the data, the commit(kick) back to client
        6. loop #3-#5

    case4: throughput test with protocol defined by user.

        vm1: ./virtio-ism-mmap server
        vm2: ./virtio-ism-mmap -i 192.168.122.101 tp --polling --tp-chunks 15 --msg-size 64k -n 50000

        Used the ism region as a ring.

        In this scene, client and server are in the polling mode. Test it on
        my machine, the maximum can reach 12GBps

## About smc with virtio-ism

    At present, my colleagues are advancing the work of this area, and have
    contacted IBM's developers, but smc may need to do some modification, which
    may involve some complicated things, please give them more time.

# References

    [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
    [2] https://dl.acm.org/doi/10.1145/2847562
    [3] https://hal.archives-ouvertes.fr/hal-00368622/document
    [4] Information about IBM ism device and SMC:
            1. SMC reference: https://www.ibm.com/docs/en/zos/2.5.0?topic=system-shared-memory-communications
            2. SMC-Dv2 and ISMv2 introduction: https://www.newera.com/INFO/SMCv2_Introduction_10-15-2020.pdf
            3. ISM device: https://www.ibm.com/docs/en/linux-on-systems?topic=n-ism-device-driver-1
            4. SMC protocol (including SMC-D): https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Shared%20Memory%20Communications%20Version%202_2.pdf
            5. SMC-D FAQ: https://www.ibm.com/support/pages/system/files/inline-files/2021-02-09-SMC-D-FAQ.pdf


If there are any problems, please point them out.
Hope to hear from you, thank you.

v4:
   1. reorganize the structure of the spec
   2. fix some problems

v3:
   1. support to apply memory from vm
   2. add query operation
   3. optimize the description of spec and enrich some details
   4. use the communication domain as a term
   5. replace gid with cdid

v2:
   1. add Attach/Detach event
   2. add Events Filter
   3. allow Alloc/Attach huge region
   4. remove host/guest terms

v1:
   1. cover letter adding explanation of ism vlan
   2. spec add gid
   3. explain the source of ideas about ism
   4. POC support virtio-ism-smc.ko virtio-ism-dev.ko and support virtio-ism-mmap



Xuan Zhuo (1):
  virtio-ism: introduce new device virtio-ism

 conformance.tex                         |   2 +
 content.tex                             |   1 +
 device-types/ism/description.tex        | 591 ++++++++++++++++++++++++
 device-types/ism/device-conformance.tex |  17 +
 device-types/ism/driver-conformance.tex |  13 +
 device-types/ism/layout-pic.tex         | 112 +++++
 virtio-html.tex                         |   9 +
 virtio.tex                              |   9 +
 8 files changed, 754 insertions(+)
 create mode 100644 device-types/ism/description.tex
 create mode 100644 device-types/ism/device-conformance.tex
 create mode 100644 device-types/ism/driver-conformance.tex
 create mode 100644 device-types/ism/layout-pic.tex

-- 
2.32.0.3.g01195cf9f


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


             reply	other threads:[~2023-05-18  8:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-18  8:09 Xuan Zhuo [this message]
2023-05-18  8:09 ` [virtio-comment] [PATCH v4 1/1] virtio-ism: introduce new device virtio-ism Xuan Zhuo
2023-05-18  8:14   ` [virtio-comment] " Xuan Zhuo
2023-05-31  7:01   ` Xuan Zhuo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230518080919.48797-1-xuanzhuo@linux.alibaba.com \
    --to=xuanzhuo@linux.alibaba.com \
    --cc=cohuck@redhat.com \
    --cc=dust.li@linux.alibaba.com \
    --cc=gerry@linux.alibaba.com \
    --cc=hans@linux.alibaba.com \
    --cc=hca@linux.ibm.com \
    --cc=helinguo@linux.alibaba.com \
    --cc=herongguang@linux.alibaba.com \
    --cc=jaka@linux.ibm.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kgraul@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=raspl@linux.ibm.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=twinkler@linux.ibm.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=wenjia@linux.ibm.com \
    --cc=wintera@linux.ibm.com \
    --cc=zhenzao@linux.alibaba.com \
    --cc=zmlcc@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).