From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205] Date: Wed, 24 Aug 2011 22:24:09 +0100 Message-ID: <1314221049.17978.709.camel@dagon.hellion.org.uk> References: <1313577856.13030.17.camel@scarafaggio> <1314003611.5010.400.camel@zakaz.uk.xensource.com> <20110824202400.GA27448@dumpdata.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2125741463==" Return-path: In-Reply-To: <20110824202400.GA27448@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: Sacco , Giuseppe, xen-devel , Ben Hutchings , "638172@bugs.debian.org" <638172@bugs.debian.org> List-Id: xen-devel@lists.xenproject.org --===============2125741463== Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-sk6ZSoBIvMR1nXP/jF56" --=-sk6ZSoBIvMR1nXP/jF56 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable On Wed, 2011-08-24 at 21:24 +0100, Konrad Rzeszutek Wilk wrote: > On Mon, Aug 22, 2011 at 10:00:11AM +0100, Ian Campbell wrote: > > @xen-devel: > >=20 > > Does this look familiar to anyone, this is (I expect, hopefully Giusepp= e > > will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops > > dom0 kernel based on xen.git from last summer (e73f4955a821) with more > > recent upstream longterm kernels (up to and including 2.6.32.41) merged > > in. While it does seem to have the switch from level to edge triggered > > interrupt the Debian kernel doesn't appear to have the switch to fasteo= i > > for pirqs (0672fb44a111 plus a few followups) -- could that be related > > to this? (I'm not sure if that was a cleanup or a fix) >=20 > It was a fix. We had some interrupts getting wedged - but I don't recall > the stack exactly. OK, sounds very much like those fixes are worth a try then. Thanks. > But there are some follows - like > e5ac0bda96c495321dbad9b57a4b1a93a5a72e7f > 7e186bdd0098b34c69fb8067c67340ae610ea499 The list of changesets against drivers/xen/events.c which are not in the Debian kernel which I came up with is below [0]. A small number are false positives (Debian already got them via the longterm branches) but most are not. The majority look like real fixes to me either for this particular issue or other problems. I would consider them all candidates for inclusion in a future update of the Debian kernel. Giuseppe, are you able to reproduce the issue you are seeing at will? If I build a test kernel would you be able to try it? You are using a -686 kernel right (as opposed to amd64). OOI which hypervisor flavour do you use? > The interesting about the stack trace is that it looks similiar to: >=20 > http://groups.google.com/group/linux.kernel/browse_thread/thread/39a39756= 6cafc979 >=20 > which has some fixes https://patchwork.kernel.org/patch/1091772/ > but they may not help. Looks like it is an issue on native too. If it is an issue as far back as 2.6.32 as well I expect we'll see the fix via the longterm channels at some point. Ian. [0] 652c98bac315a2253628885f05cfd5f30b553ae5 xen: Use IRQF_FORCE_RESUME f9f09329407e3a11140827ba71d8f9d9ede42823 xen: events: do not unmask event c= hannels on resume ea2020837ca7dc2c9bcfc477fb4d261cf067db4f xen: do not try to allocate the ca= llback vector again at restore time acad13511ebe1db666aab5807117d3ac647ea58d xen: events: Remove redundant clea= r of l2i at end of round-robin loop 0e2ec1fb16f9ca84f91de3d9427a0964d679738a xen: events: Make round-robin scan= fairer by snapshotting each l2 word 188449f889c6c30709c7e9e8710b9eff14fd963f xen: events: Clean up round-robin = evtchn scan. 1acdebd2d67f71d230f5857c28843e636b7dd92e xen: events: Make last processed e= vent channel a per-cpu variable. 2d9c33e1b47b800e43a1444a65353fcb96e27165 xen: events: Process event channel= s notifications in round-robin order. 2b1c9503c615f68262ae2e96ee26ee128b486287 xen/events: only unmask irq if ena= bled c756a6e7f711308ce85afc7d4c79213cce58a033 xen: allocate irq descriptors on a= ny numa node b1a003a2aa9ee0d3d69237725c91839f4b6a8559 xen/events: use locked set|clear_b= it() for cpu_evtchn_mask cca68cf2d344eb3c4ff996e99f36cf8f8382bc2b xen/evtchn: clear secondary CPUs' = cpu_evtchn_mask[] after restore c7ff70d2824191af119091d3af8db3bb57b06f77 xen: events: do not unmask event c= hannels on resume d4283609c7504309b8b93d7582857ff4623105f3 xen: improvements to VIRQ_DEBUG ou= tput 7c42097171f2e0beafa16e007a06e464b3014bea xen: correct parameter type for pi= rq_eoi 97708051c14157e95e25d112c26902f1c6fbb462 xen: ensure that all event channel= s start off bound to VCPU 0 e05885b24a55db82fbdb5cbc3f31426b976d7fc1 xen: set up IRQ before binding vir= q to evtchn f0d4a0552f03b52027fb2c7958a1cbbe210cf418 xen/apic: fix pirq_eoi_gmfn resume d2ea486300ca6e207ba178a425fbd023b8621bb1 xen/pirq: use fasteoi for MSI too 158d6550716687486000a828c601706b55322ad0 xen/pirq: use eoi as enable 2390c371ecd32d9f06e22871636185382bf70ab7 xen/events: use PHYSDEVOP_pirq_eoi= _gmfn to get pirq need-EOI info cb23e8d58ca35b6f9e10e1ea5682bd61f2442ebd xen/evtchn: correction, pirq hyper= call does not unmask 43d8a5030a502074f3c4aafed4d6095ebd76067c xen/evtchn: pirq_eoi does unmask f4526f9a78ffb3d3fc9f81636c5b0357fc1beccd xen/evtchn: make pirq enable/disab= le unmask/mask c6a16a778f86699b339585ba5b9197035d77c40f xen/evtchn: rename retrigger_dynir= q -> irq d0936845a856816af2af48ddf019366be68e96ba xen/evtchn: rename enable/disable_= dynirq -> unmask/mask_irq 2789ef00cbe2cdb38deb30ee4085b88befadb1b0 xen: make pirq interrupts use fast= eoi 0672fb44a111dfb6386022071725c5b15c9de584 xen/events: change to using fasteo= i 9fa90aa72d6af5cc2c2eddf56f9a586035e13ae7 xen: use dynamic_irq_init_keep_chi= p_data f55ce8740101c54016544a0d633dc1b6b21244ae Introduce CONFIG_XEN_PVHVM compile= option f61692642a2a2b83a52dd7e64619ba3bb29998af xen/pirq: do EOI properly for pirq= events 47cd3eb068a8a0cea124495e525ac16876fa08f6 xen/pci: fix compile error when CO= NFIG_PCI_XEN disabled 29a2e2a7bd19233c62461b104c69233f15ce99ec xen/apic: use handle_edge_irq for = pirq events 6dc7b8080195ed43ee6de5b1d60c65aa719208ad xen/irq: replace boot boot allocat= or 66fd3052fec7e7c21a9d88ba1a03bc062f5fb53d xen: handle events as edge-trigger= ed 8401e9b96f80f9c0128e7c8fc5a01abfabbfa021 xen: use percpu interrupts for IPI= s and VIRQs --=20 Ian Campbell A Fortran compiler is the hobgoblin of little minis. --=-sk6ZSoBIvMR1nXP/jF56 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCgAGBQJOVWv5AAoJEOxjaZd5B0+otnQP/i77JnRRc9AgaVIExs0u/H3j +WyRZJbobATgKaDJnWf1fmgcATG7wuKwepNKRrDY8jacJNmIdD/r4NHf6pPd9OMW 2AMMJh3MBLNxebV+aICkOCOOf8PmtQF+9jQnxaX9Wi/t4Dt5xvT8HAREH3usTIbM xaSuQPZFiwJv1NLA5kcyyKJI/WImCaLF/GqkXtofcw146ncApKKI6pZSfdk+ZGQ0 aCHC/S+Tji/auUyHBFoqvvvYmXEW0JTMkyz0tLTiwDrFbEgVhZjSlwqyCYIwtv9j 9e47CLBxTqoMh35BhZqm1CskrxUyugFadLmcdxbB27hvXSmbNseqGA6gpJHvxLiA DlsxZNTGmW8uzFblr8HZxPTydMLr93Dq56Gu/8dm5pXuTRn9O+k4/s+e7mXAzZy8 FrpBIIieSHWZNOrsOcd7IoZciT+yRoX3A8sJBSGG1ZTtFmslUibT2FEWSBKRy5Wl eU/7DXXcn3qLlC/I1aMeQgwL17bjR5xIJHvFMCT5g92OBUJgXKtyFrR0udENYbye j6b/3F8CzuPoJa4h1horBYYbPxyO03PbyTWl3YPD6rQINNbDs80Hpa9GfhR09EIw 6UAxMrpw3QtwFCaDkCAbwBWa0nhmY0DJD8lBWH3WiGooL8n17ELY8kenC7dPyL2Q 0RtQsFJoGcQ5VS6Tr7Zr =ticK -----END PGP SIGNATURE----- --=-sk6ZSoBIvMR1nXP/jF56-- --===============2125741463== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============2125741463==--