[PATCH v2] migration/rdma: Use huge page register VM memory

QEMU-Devel Archive mirror
 help / color / mirror / Atom feed

From: "LIZHAOXIN1 [李照鑫]" <LIZHAOXIN1@kingsoft.com>
To: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>
Cc: "LIZHAOXIN1 [李照鑫]" <LIZHAOXIN1@kingsoft.com>,
	"sunhao2 [孙昊]" <sunhao2@kingsoft.com>,
	"DENGLINWEN [邓林文]" <DENGLINWEN@kingsoft.com>,
	"YANGFENG1 [杨峰]" <YANGFENG1@kingsoft.com>
Subject: [PATCH v2] migration/rdma: Use huge page register VM memory
Date: Thu, 10 Jun 2021 15:39:30 +0000	[thread overview]
Message-ID: <a67bba1280e54ed0bc65a01e6a3b0d1a@kingsoft.com> (raw)

When using libvirt for RDMA live migration, if the VM memory
is too large, it will take a lot of time to deregister the VM
at the source side, resulting in a long downtime (VM 64G,
deregister vm time is about 400ms).
    
Although the VM's memory uses 2M huge pages, the MLNX driver
still uses 4K pages for pin memory, as well as for unpin.
So we use huge pages to skip the process of pin memory and
unpin memory to reduce downtime.
    
---
v2
- Add page_size in struct RDMALocalBlock
- Use page_size to determine whether VM uses huge page
---
    
Signed-off-by: lizhaoxin <lizhaoxin1@kingsoft.com>

diff --git a/migration/rdma.c b/migration/rdma.c
index 1cdb4561f3..703816ebc7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -215,6 +215,7 @@ typedef struct RDMALocalBlock {
     uint64_t       remote_host_addr; /* remote virtual address */
     uint64_t       offset;
     uint64_t       length;
+    uint64_t       page_size;
     struct         ibv_mr **pmr;    /* MRs for chunk-level registration */
     struct         ibv_mr *mr;      /* MR for non-chunk-level registration */
     uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
@@ -565,7 +566,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
 
 static int rdma_add_block(RDMAContext *rdma, const char *block_name,
                          void *host_addr,
-                         ram_addr_t block_offset, uint64_t length)
+                         ram_addr_t block_offset, uint64_t length,
+                         uint64_t page_size)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *block;
@@ -595,6 +597,7 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
     block->local_host_addr = host_addr;
     block->offset = block_offset;
     block->length = length;
+    block->page_size = page_size;
     block->index = local->nb_blocks;
     block->src_index = ~0U; /* Filled in by the receipt of the block list */
     block->nb_chunks = ram_chunk_index(host_addr, host_addr + length) + 1UL;
@@ -634,7 +637,8 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t block_offset = qemu_ram_get_offset(rb);
     ram_addr_t length = qemu_ram_get_used_length(rb);
-    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    ram_addr_t page_size = qemu_ram_pagesize(rb);
+    return rdma_add_block(opaque, block_name, host_addr, block_offset, length, page_size);
 }
 
 /*
@@ -1123,13 +1127,25 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
 
     for (i = 0; i < local->nb_blocks; i++) {
-        local->block[i].mr =
-            ibv_reg_mr(rdma->pd,
-                    local->block[i].local_host_addr,
-                    local->block[i].length,
-                    IBV_ACCESS_LOCAL_WRITE |
-                    IBV_ACCESS_REMOTE_WRITE
-                    );
+        if (local->block[i].page_size != qemu_real_host_page_size) {
+            local->block[i].mr =
+                ibv_reg_mr(rdma->pd,
+                        local->block[i].local_host_addr,
+                        local->block[i].length,
+                        IBV_ACCESS_LOCAL_WRITE |
+                        IBV_ACCESS_REMOTE_WRITE |
+                        IBV_ACCESS_ON_DEMAND |
+                        IBV_ACCESS_HUGETLB
+                        );
+        } else {
+            local->block[i].mr =
+                ibv_reg_mr(rdma->pd,
+                        local->block[i].local_host_addr,
+                        local->block[i].length,
+                        IBV_ACCESS_LOCAL_WRITE |
+                        IBV_ACCESS_REMOTE_WRITE
+                        );
+        }
         if (!local->block[i].mr) {
             perror("Failed to register local dest ram block!\n");
             break;

next             reply	other threads:[~2021-06-10 15:40 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 15:39 LIZHAOXIN1 [李照鑫] [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-06-16 13:02 [PATCH v2] migration/rdma: Use huge page register VM memory LIZHAOXIN1 [李照鑫]

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:1cdb4561f dfblob:703816ebc
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a67bba1280e54ed0bc65a01e6a3b0d1a@kingsoft.com \
    --to=lizhaoxin1@kingsoft.com \
    --cc=DENGLINWEN@kingsoft.com \
    --cc=YANGFENG1@kingsoft.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=sunhao2@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).