All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* inconsistent metadata of vhd file while live migration
@ 2011-02-13 15:45 alice wan
  2011-02-13 21:11 ` Daniel Stodden
  0 siblings, 1 reply; 5+ messages in thread
From: alice wan @ 2011-02-13 15:45 UTC (permalink / raw
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 817 bytes --]

hi all,

i have some doubt about live migration which may cause inconsistent metadata
of vhd file between two tapdisk2 process.

given that vm migrates from host A to host B, which image is vhd file.

in host B, it first creates devices including starting tapdisk2 process, at
this time, tapdisk2 will read some metadata of vhd file. then, it xc_restore

in host A, before it start last iteration(stop-and-copy phase), while
xc_save's going, vhd file has been changed including metadata. So, in hostB
tapdisk2 process doesn't read the

newest metadata of vhd file.

for tapdisk2, when it starts, it will read footer, header, bat of vhd file.
especially bat structure, if it's inconsistent, it'll cause problem.

Maybe my doubt isn't a real problem, however, i hope someone to figure it
out for me. thanks in advance.

[-- Attachment #1.2: Type: text/html, Size: 1027 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent metadata of vhd file while live migration
  2011-02-13 15:45 inconsistent metadata of vhd file while live migration alice wan
@ 2011-02-13 21:11 ` Daniel Stodden
  2011-02-13 21:13   ` Daniel Stodden
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2011-02-13 21:11 UTC (permalink / raw
  To: alice wan; +Cc: xen-devel@lists.xensource.com

On Sun, 2011-02-13 at 10:45 -0500, alice wan wrote:
> hi all,
>  
> i have some doubt about live migration which may cause inconsistent
> metadata of vhd file between two tapdisk2 process. 
>  
> given that vm migrates from host A to host B, which image is vhd
> file. 
>  
> in host B, it first creates devices including starting tapdisk2
> process, at this time, tapdisk2 will read some metadata of vhd file.
> then, it xc_restore
>  
> in host A, before it start last iteration(stop-and-copy phase), while
> xc_save's going, vhd file has been changed including metadata. So, in
> hostB tapdisk2 process doesn't read the 
>  
> newest metadata of vhd file.
>  
> for tapdisk2, when it starts, it will read footer, header, bat of vhd
> file. especially bat structure, if it's inconsistent, it'll cause
> problem.
>  
> Maybe my doubt isn't a real problem, however, i hope someone to figure
> it out for me. thanks in advance.

If that's what's done right now in the toolchain, it's a real problem
and needs to be fixed.

Options:

A. Avoid VBD lifetime overlap. This is how XCP presently does it. XCP
has vdi.activate/deactivate operations in addition to attach/detach to
control storage during migration.

Attach/detach is the same as described above. It may be desired as the
preferred transfer method on non-shared storage nodes to avoid latency
in stop/copy.

The simpler way is of course activate/deactivate semantics everywhere,
which is mutually exclusive.

This is needed for any indirectly mapped disk format (vhd, qcow? etc) on
shared physical nodes. 

Not that this doesn't only matter for metadata. There are physical
layers where exclusive login is preferred/mandatory, so you won't even
get access to the device before pre-copy is done and the node could be
released on A.

Diagram:

Node            A				  B

VM.migrate      .. pre-copy >  < stop-and-copy >  <resumed ...

VDI.attached    ..------------A--------------->
                            <-----------B-------------------..

VDI.active     -----------A---->           <----B-------..

B. Hack. 
   Let the toolstack issue a tap-ctl pause/unpause cycle before resume.
   This will reopen the image.

C. Back then, in the dark ages, blktap did this implicitly.
   Every I/O request after disk create run an implicit close/open 
   cycle on the physical image. 


Cheers,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent metadata of vhd file while live migration
  2011-02-13 21:11 ` Daniel Stodden
@ 2011-02-13 21:13   ` Daniel Stodden
  2011-02-16 10:55     ` alice wan
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2011-02-13 21:13 UTC (permalink / raw
  To: alice wan; +Cc: xen-devel@lists.xensource.com

On Sun, 2011-02-13 at 16:11 -0500, Daniel Stodden wrote:

> B. Hack. 
>    Let the toolstack issue a tap-ctl pause/unpause cycle before resume.
>    This will reopen the image.
> 
> C. Back then, in the dark ages, blktap did this implicitly.
>    Every
 
 *first*

>  I/O request after disk create run an implicit close/open 
>    cycle.

:o)

D
a
niel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent metadata of vhd file while live migration
  2011-02-13 21:13   ` Daniel Stodden
@ 2011-02-16 10:55     ` alice wan
  2011-02-16 21:11       ` Daniel Stodden
  0 siblings, 1 reply; 5+ messages in thread
From: alice wan @ 2011-02-16 10:55 UTC (permalink / raw
  To: Daniel Stodden; +Cc: xen-devel@lists.xensource.com


[-- Attachment #1.1: Type: text/plain, Size: 738 bytes --]

option b, c seems simpler and needs less codes for my code
version(xen4.0.0+2.6.31.13).
i'm not familiar with blktap code. would you please tell in which function
blktap run an implicit close/open when process first io?

and in latest stable version blktap2 pause/unpause is available ?
thanks
2011/2/14 Daniel Stodden <daniel.stodden@citrix.com>

> On Sun, 2011-02-13 at 16:11 -0500, Daniel Stodden wrote:
>
> > B. Hack.
> >    Let the toolstack issue a tap-ctl pause/unpause cycle before resume.
> >    This will reopen the image.
> >
> > C. Back then, in the dark ages, blktap did this implicitly.
> >    Every
>
>  *first*
>
> >  I/O request after disk create run an implicit close/open
> >    cycle.
>
> :o)
>
> D
> a
> niel
>
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 1157 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent metadata of vhd file while live migration
  2011-02-16 10:55     ` alice wan
@ 2011-02-16 21:11       ` Daniel Stodden
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Stodden @ 2011-02-16 21:11 UTC (permalink / raw
  To: alice wan; +Cc: xen-devel@lists.xensource.com

On Wed, 2011-02-16 at 05:55 -0500, alice wan wrote:
> option b, c seems simpler and needs less codes for my code
> version(xen4.0.0+2.6.31.13).

Example:

[1]+ tail -f /var/log/daemon.log &

root@vantst07:~# tap-ctl list
    7781  0    0        vhd /var/tmp/lenny.vhd
Feb 16 13:04:00 vantst07 tapdisk2[7779]: received 'pid' message (uuid = 0)
Feb 16 13:04:00 vantst07 tapdisk2[7779]: sending 'pid response' message (uuid = 0)
Feb 16 13:04:00 vantst07 tapdisk2[7779]: received 'list' message (uuid = 65535)
Feb 16 13:04:00 vantst07 tapdisk2[7779]: sending 'list response' message (uuid = 65535)
Feb 16 13:04:00 vantst07 tapdisk2[7779]: sending 'list response' message (uuid = 65535)

root@vantst07:~# tap-ctl pause -p 7781 -m 0
Feb 16 13:04:12 vantst07 tapdisk2[7779]: received 'pause' message (uuid = 0)
Feb 16 13:04:12 vantst07 tapdisk2[7779]: /var/tmp/lenny.vhd: b: 256, a: 256, f: 140, n: 1050624
Feb 16 13:04:12 vantst07 tapdisk2[7779]: closed image /var/tmp/lenny.vhd (0 users, state: 0x00000000, type: 4)
Feb 16 13:04:12 vantst07 tapdisk2[7779]: sending 'pause response' message (uuid = 0)

root@vantst07:~# tap-ctl unpause -p 7781 -m 0
Feb 16 13:04:20 vantst07 tapdisk2[7779]: received 'resume' message (uuid = 0)
Feb 16 13:04:20 vantst07 tapdisk2[7779]: /var/tmp/lenny.vhd version: tap 0x00010003, b: 256, a: 256, f: 140, n: 1050624
Feb 16 13:04:20 vantst07 tapdisk2[7779]: opened image /var/tmp/lenny.vhd (1 users, state: 0x00000001, type: 4)
Feb 16 13:04:20 vantst07 tapdisk2[7779]: VBD CHAIN:
Feb 16 13:04:20 vantst07 tapdisk2[7779]: /var/tmp/lenny.vhd: 4
Feb 16 13:04:20 vantst07 tapdisk2[7779]: sending 'resume response' message (uuid = 0)

> i'm not familiar with blktap code. would you please tell in which
> function blktap run an implicit close/open when process first io?

I think those lines never made it into tools/blktap. XCP's srpm should
still have those patches, but they're already removed post-5.6fp1, so
I'd recommend to rather go for b. and let c. fade out. The toolstack
should stay in control, not the disk to try paper over mistaken
assumptions.

> and in latest stable version blktap2 pause/unpause is available ?

Yup.

Daniel

> thanks
> 2011/2/14 Daniel Stodden <daniel.stodden@citrix.com>
>         On Sun, 2011-02-13 at 16:11 -0500, Daniel Stodden wrote:
>         
>         > B. Hack.
>         >    Let the toolstack issue a tap-ctl pause/unpause cycle
>         before resume.
>         >    This will reopen the image.
>         >
>         > C. Back then, in the dark ages, blktap did this implicitly.
>         >    Every
>         
>         
>          *first*
>         
>         >  I/O request after disk create run an implicit close/open
>         
>         >    cycle.
>         
>         :o)
>         
>         D
>         a
>         niel
>         
>         
>         
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-16 21:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-13 15:45 inconsistent metadata of vhd file while live migration alice wan
2011-02-13 21:11 ` Daniel Stodden
2011-02-13 21:13   ` Daniel Stodden
2011-02-16 10:55     ` alice wan
2011-02-16 21:11       ` Daniel Stodden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.