LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
@ 2011-04-22 21:25 Daniel Kiper
  2011-04-22 22:33 ` [Xen-devel] " Samuel Thibault
  2011-04-26 13:42 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 5+ messages in thread
From: Daniel Kiper @ 2011-04-22 21:25 UTC (permalink / raw
  To: konrad.wilk, stefano.stabellini, xen-devel; +Cc: linux-kernel

Added missed Signed-off-by line.

After a lot of debugging and long reading of Linux Kernel and Xen code
finally I killed deeply hidden bug in pv-grub. Details below.
Additionally, I am CC'ing this e-mail to LKML because this issue
looks like Linux Kernel problem, however, it is not.

This patch applies to Xen Ver. 4.0, Xen Ver. 4.1 and unstable tree.

# HG changeset patch
# User dkiper@net-space.pl
# Date 1303474763 -7200
# Node ID b33bf24be129b7b9cd2248460beb1298088c6af5
# Parent  dbf2ddf652dc3dd927447e79ef4bc586de55d708
Introduction of Linux Kernel git commit ceefccc93932b920a8ec6f35f596db05202a12fe
(x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed
deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[] list has
been incorrectly initialized.

At the beginning of kernel load stage dom->p2m_host[] list is populated with
current pfn->mfn layout. Later during memory allocation (memory is allocated
page by page in kexec_allocate()) page order is changed to establish linear
layout in new domain. It is done by exchanging subsequent mfns with newly
allocated mfns. dom->p2m_host[] list is indexed by currently requested pfn
(it is incremented from 0) and pfn of newly allocated paged. If pfn of newly
allocated page is less than currently requested pfn then relevant earlier
allocated mfn is overwritten which leads to domain crash later. This patch
fix that issue. If pfn of newly allocated page is less then currently
requested pfn then relevant pfn/mfn pair is properly calculated and usual
exchange occurs later.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>

diff -r dbf2ddf652dc -r b33bf24be129 stubdom/grub/kexec.c
--- a/stubdom/grub/kexec.c	Thu Apr 07 15:26:58 2011 +0100
+++ b/stubdom/grub/kexec.c	Fri Apr 22 14:19:23 2011 +0200
@@ -91,6 +91,11 @@ int kexec_allocate(struct xc_dom_image *
         new_pfn = PHYS_PFN(to_phys(pages[i]));
         pages_mfns[i] = new_mfn = pfn_to_mfn(new_pfn);
 
+	if (new_pfn < i)
+		for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn)
+			if (dom->p2m_host[new_pfn] == new_mfn)
+				break;
+
         /* Put old page at new PFN */
         dom->p2m_host[new_pfn] = old_mfn;
 
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-devel] [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
  2011-04-22 21:25 [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization Daniel Kiper
@ 2011-04-22 22:33 ` Samuel Thibault
  2011-04-26 14:25   ` Daniel Kiper
  2011-04-26 13:42 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 5+ messages in thread
From: Samuel Thibault @ 2011-04-22 22:33 UTC (permalink / raw
  To: Daniel Kiper; +Cc: konrad.wilk, stefano.stabellini, xen-devel, linux-kernel

Hello,

Daniel Kiper, le Fri 22 Apr 2011 23:25:45 +0200, a écrit :
> If pfn of newly allocated page is less than currently requested pfn
> then relevant earlier allocated mfn is overwritten which leads to
> domain crash later.

Oops, good catch! And unfortunately it happens seldomly... I guess it
may be the culprit for a fair number of other issues.

> +	if (new_pfn < i)
> +		for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn)
> +			if (dom->p2m_host[new_pfn] == new_mfn)
> +				break;

Instead of looking for the page, which takes a linear time for each page
and thus potentially quadratic time, we should probably rather record
which PFN the MFNs < allocated have been moved to?

Samuel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
  2011-04-22 21:25 [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization Daniel Kiper
  2011-04-22 22:33 ` [Xen-devel] " Samuel Thibault
@ 2011-04-26 13:42 ` Konrad Rzeszutek Wilk
  2011-04-26 14:34   ` Daniel Kiper
  1 sibling, 1 reply; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-04-26 13:42 UTC (permalink / raw
  To: Daniel Kiper; +Cc: stefano.stabellini, xen-devel, linux-kernel

On Fri, Apr 22, 2011 at 11:25:45PM +0200, Daniel Kiper wrote:
> Added missed Signed-off-by line.
> 
> After a lot of debugging and long reading of Linux Kernel and Xen code
> finally I killed deeply hidden bug in pv-grub. Details below.
> Additionally, I am CC'ing this e-mail to LKML because this issue
> looks like Linux Kernel problem, however, it is not.
> 
> This patch applies to Xen Ver. 4.0, Xen Ver. 4.1 and unstable tree.
> 
> # HG changeset patch
> # User dkiper@net-space.pl
> # Date 1303474763 -7200
> # Node ID b33bf24be129b7b9cd2248460beb1298088c6af5
> # Parent  dbf2ddf652dc3dd927447e79ef4bc586de55d708
> Introduction of Linux Kernel git commit ceefccc93932b920a8ec6f35f596db05202a12fe
> (x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed
> deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[] list has
> been incorrectly initialized.
> 
> At the beginning of kernel load stage dom->p2m_host[] list is populated with
> current pfn->mfn layout. Later during memory allocation (memory is allocated
> page by page in kexec_allocate()) page order is changed to establish linear
> layout in new domain. It is done by exchanging subsequent mfns with newly
> allocated mfns. dom->p2m_host[] list is indexed by currently requested pfn
> (it is incremented from 0) and pfn of newly allocated paged. If pfn of newly
> allocated page is less than currently requested pfn then relevant earlier
> allocated mfn is overwritten which leads to domain crash later. This patch
> fix that issue. If pfn of newly allocated page is less then currently
> requested pfn then relevant pfn/mfn pair is properly calculated and usual
> exchange occurs later.

Nice! I presume this fixes the issue you had at the Xen Hack-O-Thon with
your guest right?

> 
> Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
> 
> diff -r dbf2ddf652dc -r b33bf24be129 stubdom/grub/kexec.c
> --- a/stubdom/grub/kexec.c	Thu Apr 07 15:26:58 2011 +0100
> +++ b/stubdom/grub/kexec.c	Fri Apr 22 14:19:23 2011 +0200
> @@ -91,6 +91,11 @@ int kexec_allocate(struct xc_dom_image *
>          new_pfn = PHYS_PFN(to_phys(pages[i]));
>          pages_mfns[i] = new_mfn = pfn_to_mfn(new_pfn);
>  
> +	if (new_pfn < i)
> +		for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn)
> +			if (dom->p2m_host[new_pfn] == new_mfn)
> +				break;
> +
>          /* Put old page at new PFN */
>          dom->p2m_host[new_pfn] = old_mfn;
>  
> Daniel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-devel] [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
  2011-04-22 22:33 ` [Xen-devel] " Samuel Thibault
@ 2011-04-26 14:25   ` Daniel Kiper
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Kiper @ 2011-04-26 14:25 UTC (permalink / raw
  To: Samuel Thibault
  Cc: Daniel Kiper, konrad.wilk, stefano.stabellini, xen-devel,
	linux-kernel

On Sat, Apr 23, 2011 at 12:33:32AM +0200, Samuel Thibault wrote:
> Hello,
>
> Daniel Kiper, le Fri 22 Apr 2011 23:25:45 +0200, a ?crit :
> > If pfn of newly allocated page is less than currently requested pfn
> > then relevant earlier allocated mfn is overwritten which leads to
> > domain crash later.
>
> Oops, good catch! And unfortunately it happens seldomly... I guess it
> may be the culprit for a fair number of other issues.

I discovered that issue on domU i386. It does not affect x86_64
in my environment. However, as you stated above that issue in some
circumstances could lead to mysterious system crashes or data
corruptions.

> > +	if (new_pfn < i)
> > +		for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn)
> > +			if (dom->p2m_host[new_pfn] == new_mfn)
> > +				break;
>
> Instead of looking for the page, which takes a linear time for each page
> and thus potentially quadratic time, we should probably rather record
> which PFN the MFNs < allocated have been moved to?

I am going to post new time optimized version
of that patch today or tommorow.

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
  2011-04-26 13:42 ` Konrad Rzeszutek Wilk
@ 2011-04-26 14:34   ` Daniel Kiper
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Kiper @ 2011-04-26 14:34 UTC (permalink / raw
  To: Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, stefano.stabellini, xen-devel, linux-kernel

On Tue, Apr 26, 2011 at 09:42:42AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 22, 2011 at 11:25:45PM +0200, Daniel Kiper wrote:
> > Added missed Signed-off-by line.
> >
> > After a lot of debugging and long reading of Linux Kernel and Xen code
> > finally I killed deeply hidden bug in pv-grub. Details below.
> > Additionally, I am CC'ing this e-mail to LKML because this issue
> > looks like Linux Kernel problem, however, it is not.
> >
> > This patch applies to Xen Ver. 4.0, Xen Ver. 4.1 and unstable tree.
> >
> > # HG changeset patch
> > # User dkiper@net-space.pl
> > # Date 1303474763 -7200
> > # Node ID b33bf24be129b7b9cd2248460beb1298088c6af5
> > # Parent  dbf2ddf652dc3dd927447e79ef4bc586de55d708
> > Introduction of Linux Kernel git commit ceefccc93932b920a8ec6f35f596db05202a12fe
> > (x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed
> > deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[] list has
> > been incorrectly initialized.
> >
> > At the beginning of kernel load stage dom->p2m_host[] list is populated with
> > current pfn->mfn layout. Later during memory allocation (memory is allocated
> > page by page in kexec_allocate()) page order is changed to establish linear
> > layout in new domain. It is done by exchanging subsequent mfns with newly
> > allocated mfns. dom->p2m_host[] list is indexed by currently requested pfn
> > (it is incremented from 0) and pfn of newly allocated paged. If pfn of newly
> > allocated page is less than currently requested pfn then relevant earlier
> > allocated mfn is overwritten which leads to domain crash later. This patch
> > fix that issue. If pfn of newly allocated page is less then currently
> > requested pfn then relevant pfn/mfn pair is properly calculated and usual
> > exchange occurs later.
>
> Nice! I presume this fixes the issue you had at the Xen Hack-O-Thon with
> your guest right?

Yes, it does. It was very difficult to discover because that
issue overlapped with other memory management issues which
were coming out last time. Currently, I am working on time
optimized version of that patch.

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-04-26 14:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-22 21:25 [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization Daniel Kiper
2011-04-22 22:33 ` [Xen-devel] " Samuel Thibault
2011-04-26 14:25   ` Daniel Kiper
2011-04-26 13:42 ` Konrad Rzeszutek Wilk
2011-04-26 14:34   ` Daniel Kiper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).