linux-smp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nielsen, Eric" <eric.nielsen@thomson.com>
To: "Ulstad, Jeremy (TLR Corp)" <jeremy.ulstad@thomson.com>,
	"'linux-smp@vger.kernel.org'" <linux-smp@vger.kernel.org>
Cc: "Bluhm, Mark (TLR Corp)" <mark.bluhm@thomson.com>
Subject: Re: Nmi_watchdog and x86_64 lockups
Date: Thu, 10 Mar 2005 18:20:12 -0600	[thread overview]
Message-ID: <BD964DF2053D1A498C2B182A9B2945B8031EF9A9@eg-msgmbx-b07.int.westgroup.com> (raw)

This is fine.  No crown jewels here.  Let it fly.

--------------------------
Sent from my BlackBerry Wireless Handheld


-----Original Message-----
From: Ulstad, Jeremy (TLR Corp) <jeremy.ulstad@thomson.com>
To: linux-smp@vger.kernel.org <linux-smp@vger.kernel.org>
CC: Bluhm, Mark (TLR Corp) <mark.bluhm@thomson.com>
Sent: Thu Mar 10 18:18:18 2005
Subject: Nmi_watchdog and x86_64 lockups

Having narrowly skirted death by allowing photographers near the lab today.
. .

I would like to enlist the open source community in debugging our Oracle
problem.   Online kernel docs recommend reporting issues with NMI (related
to our lockup/dump issue) to the kernel-smp list.

I have composed the following email, but want to make sure you are
comfortable with me pursuing this.   I do not mention any application
details, but it is not possible to omit fairly detailed descriptions of the
hardware when submitting to the kernel list.   Not sure if that is kosher or
not.

Please let me know how I should proceed with this.

Domo Arigato.

Jeremy


Hypothetical email:
-------------------
I am looking for assistance with x86_64 SMP systems locking up.  Under a
heavy application workload, the system freezes and I am unable to send an
alt-sysrq-d to trigger a dump.   The systems are booting with nmi_watchdog=1
set, but the watchdog is not working.   No oops events are registered in
messages and I have observed nothing on the console (direct attached KVM -
working on setting up a term server and logging serial console).

According to nmi_watchdog.txt, I should see non-zero counters in
/proc/interrupts with this enabled or "you probably have a processor that
needs to be
added to the nmi code". 

The lockups are occurring in two separate configurations (details below),
both of which are showing all zeros for NMI in /proc/interrupts.  Any advice
on if these configurations are supported by the NMI code or suggestions for
how to successfully get a dump would be most appreciated.

Thanks in advance,

Jeremy Ulstad

Config 1:  2 x AMD Opteron 240 (8 GB RAM)
SLES 9
Linux number6 2.6.5-7.111.19-smp #1 SMP Fri Dec 10 15:10:58 UTC 2004 x86_64
x86_64 x86_64 GNU/Linux

number6:~ # cat /proc/interrupts 
           CPU0       CPU1       
  0:     383170   23276745    IO-APIC-edge  timer
  1:          9        227    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          0          0    IO-APIC-edge  rtc
  9:          0          0   IO-APIC-level  acpi
 12:        207          0    IO-APIC-edge  i8042
 14:       4900      57432    IO-APIC-edge  ide0
 15:         54          0    IO-APIC-edge  ide1
 19:          0          0   IO-APIC-level  ohci_hcd, ohci_hcd
 27:  327047839          0   IO-APIC-level  eth0, eth1
NMI:          0          0 
LOC:   23656684   23657709 
ERR:          0
MIS:          0

Config 2: 4 x AMD Opteron 850 (8 GB RAM)
SLES 9
Linux riddick 2.6.5-7.145-smp #1 SMP Thu Jan 27 09:19:29 UTC 2005 x86_64
x86_64 x86_64 GNU/Linux

riddick:~ # cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:   20317266   25048606   25048495   25048500    IO-APIC-edge  timer
  1:          9          0          0          0    IO-APIC-edge  i8042
  2:          0          0          0          0          XT-PIC  cascade
  4:        652         92          0          0    IO-APIC-edge  serial
  8:          0          0          0          0    IO-APIC-edge  rtc
  9:          0          0          0          0   IO-APIC-level  acpi
 12:         59          0          0          0    IO-APIC-edge  i8042
 15:         63          4          0          0    IO-APIC-edge  ide1
 19:          0          0          0          0   IO-APIC-level  ohci_hcd,
ohci_hcd
 25:   93875682          0          1         81   IO-APIC-level  eth0
 27:          0     275078      99550       4603   IO-APIC-level  ioc0
NMI:          0          0          0          0 
LOC:   95441672   95441724   95441724   95441606 
ERR:          0
MIS:          0

I should also note that all the config 1 systems are being forced to 3.8 GB
of memory with "mem=3800m" to compensate for a bug with lkcd which results
in dumps (triggered manually with system up) failing with >= 4GB RAM.

             reply	other threads:[~2005-03-11  0:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-11  0:20 Nielsen, Eric [this message]
  -- strict thread matches above, loose matches on Subject: below --
2005-03-11  0:18 Nmi_watchdog and x86_64 lockups jeremy.ulstad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BD964DF2053D1A498C2B182A9B2945B8031EF9A9@eg-msgmbx-b07.int.westgroup.com \
    --to=eric.nielsen@thomson.com \
    --cc=jeremy.ulstad@thomson.com \
    --cc=linux-smp@vger.kernel.org \
    --cc=mark.bluhm@thomson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).