[PATCH 0/2] add new notifier function ,take3

LKML Archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] add new notifier function ,take3
@ 2008-04-11  7:53 Takenori Nagano
  2008-04-12  4:07 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Takenori Nagano @ 2008-04-11  7:53 UTC (permalink / raw
  To: linux-kernel, Andrew Morton
  Cc: kdb, vgoyal, Eric W. Biederman, k-miyoshi, kexec, Bernhard Walle,
	Keith Owens, nickpiggin, Randy Dunlap, greg

Hi,

A big thanks to everybody who read and replied to previous version.

changelog take2 -> take3

- Rebased 2.6.25-rc8-mm1
- comment updated
- renamed the notifiner name "tunable_notifier" to "tunable_atomic_notifier"
- fixed typo
- move control files debugfs to /sys/kernel

These patches add new notifier function and implement it to panic_notifier_list.
We used the hardcoded notifier chain so far, but it was not flexible. New
notifier is very flexible, because user can change a list of order by control files.

Example)

# cd /sys/kernel/notifiers/
# ls
panic_notifier_list
# cd panic_notifier_list/
# ls
ipmi_msghandler  ipmi_wdog
# insmod notifier_test.ko
# ls
ipmi_msghandler  ipmi_wdog  notifier_test1  notifier_test2
# cat */priority
200
150
500
1000
Kernel panic - not syncing: Panic by panic_module.
__tunable_atomic_notifier_call_chain enter
notifier_test: notifier_test_panic2() is called.
notifier_test: notifier_test_panic() is called.
msg_handler:panic_event was called.
ipmi_wdog:wdog_panic_handler was called.

.....(reboot)

# cd /sys/kernel/notifiers/panic_notifier_list/
# ls
ipmi_msghandler  ipmi_wdog  notifier_test1  notifier_test2
# cat */priority
200
150
500
1000
# echo 10000 > ipmi_msghandler/priority
# echo 5000 > ipmi_wdog/priority
# echo 3000 > notifier_test1/priority
# echo 1500 > notifier_test2/priority
# cat */priority
10000
5000
3000
1500
Kernel panic - not syncing: Panic by panic_module.
__tunable_atomic_notifier_call_chain enter
msg_handler:panic_event was called.
ipmi_wdog:wdog_panic_handler was called.
notifier_test: notifier_test_panic() is called.
notifier_test: notifier_test_panic2() is called.

--
Takenori Nagano <t-nagano@ah.jp.nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-11  7:53 [PATCH 0/2] add new notifier function ,take3 Takenori Nagano
@ 2008-04-12  4:07 ` Andrew Morton
  2008-04-14 13:46   ` Vivek Goyal
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2008-04-12  4:07 UTC (permalink / raw
  To: Takenori Nagano
  Cc: linux-kernel, kdb, vgoyal, Eric W. Biederman, k-miyoshi, kexec,
	Bernhard Walle, Keith Owens, nickpiggin, Randy Dunlap, greg

On Fri, 11 Apr 2008 16:53:47 +0900 Takenori Nagano <t-nagano@ah.jp.nec.com> wrote:

> Hi,
> 
> A big thanks to everybody who read and replied to previous version.
> 
> changelog take2 -> take3
> 
> - Rebased 2.6.25-rc8-mm1
> - comment updated
> - renamed the notifiner name "tunable_notifier" to "tunable_atomic_notifier"
> - fixed typo
> - move control files debugfs to /sys/kernel
> 
> These patches add new notifier function and implement it to panic_notifier_list.
> We used the hardcoded notifier chain so far, but it was not flexible. New
> notifier is very flexible, because user can change a list of order by control files.
> 
> Example)
> 
> # cd /sys/kernel/notifiers/
> # ls
> panic_notifier_list
> # cd panic_notifier_list/
> # ls
> ipmi_msghandler  ipmi_wdog
> # insmod notifier_test.ko
> # ls
> ipmi_msghandler  ipmi_wdog  notifier_test1  notifier_test2
> # cat */priority
> 200
> 150
> 500
> 1000
> Kernel panic - not syncing: Panic by panic_module.
> __tunable_atomic_notifier_call_chain enter
> notifier_test: notifier_test_panic2() is called.
> notifier_test: notifier_test_panic() is called.
> msg_handler:panic_event was called.
> ipmi_wdog:wdog_panic_handler was called.
> 
> .....(reboot)
> 
> # cd /sys/kernel/notifiers/panic_notifier_list/
> # ls
> ipmi_msghandler  ipmi_wdog  notifier_test1  notifier_test2
> # cat */priority
> 200
> 150
> 500
> 1000
> # echo 10000 > ipmi_msghandler/priority
> # echo 5000 > ipmi_wdog/priority
> # echo 3000 > notifier_test1/priority
> # echo 1500 > notifier_test2/priority
> # cat */priority
> 10000
> 5000
> 3000
> 1500
> Kernel panic - not syncing: Panic by panic_module.
> __tunable_atomic_notifier_call_chain enter
> msg_handler:panic_event was called.
> ipmi_wdog:wdog_panic_handler was called.
> notifier_test: notifier_test_panic() is called.
> notifier_test: notifier_test_panic2() is called.

OK.  But I don't see anywhere in here the most important piece of
information: why do we need this feature in Linux?

What are the use-cases?  What is the value?  etc.

Often I can guess (but I like the originator to remove the guesswork).  In
this case I'm stumped - I can't see any reason why anyone would want this.

Awaiting enlightenment ;)

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-12  4:07 ` Andrew Morton
@ 2008-04-14 13:46   ` Vivek Goyal
  2008-04-14 14:42     ` Neil Horman
  0 siblings, 1 reply; 10+ messages in thread
From: Vivek Goyal @ 2008-04-14 13:46 UTC (permalink / raw
  To: Andrew Morton
  Cc: Takenori Nagano, nickpiggin, k-miyoshi, greg, Bernhard Walle, kdb,
	kexec, linux-kernel, Randy Dunlap, Eric W. Biederman, Keith Owens

On Fri, Apr 11, 2008 at 09:07:51PM -0700, Andrew Morton wrote:

[..]
> > Kernel panic - not syncing: Panic by panic_module.
> > __tunable_atomic_notifier_call_chain enter
> > msg_handler:panic_event was called.
> > ipmi_wdog:wdog_panic_handler was called.
> > notifier_test: notifier_test_panic() is called.
> > notifier_test: notifier_test_panic2() is called.
> 
> OK.  But I don't see anywhere in here the most important piece of
> information: why do we need this feature in Linux?
> 
> What are the use-cases?  What is the value?  etc.
> 
> Often I can guess (but I like the originator to remove the guesswork).  In
> this case I'm stumped - I can't see any reason why anyone would want this.
> 

Hi Andrew,

To begin with, he wants kdb, kgdb etc to co-exist with kdump. He wants
to put all the RAS tools (who are interested in panic event) on a list
and export it to user space and let user decide in what order do the tool get
executed at panic time (based on priority).

This brings in little bit reliability concerns for kdump due to notifier
code being run after panic.

I think people want to use this infrastrutucure beyond RAS tools. I
remember somebody wanting to send a message to remote node after a
panic (before kdump kicks in)  so that remote node can initiate failover
etc.

Ideally, doing any operation after panic is not safe and one should avoid
such things and any action required should be done in next kernel (like
sending messages to remote nodes etc). Having said that, it makes the
job harder as one needs to pass all the required data to second kernel.

So it will not left to user whether he should execute the code after
panic in first kernel or create required bits to execute code in second
kernel. Things should be more reliable in second kernel. 

I am not very sure how paranoid one should be about this additional bit of
notifier code being executed after panic. Probably we can take this in
to make user's life easier.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-14 13:46   ` Vivek Goyal
@ 2008-04-14 14:42     ` Neil Horman
  2008-04-14 14:46       ` Bernhard Walle
  2008-04-14 14:53       ` Vivek Goyal
  0 siblings, 2 replies; 10+ messages in thread
From: Neil Horman @ 2008-04-14 14:42 UTC (permalink / raw
  To: Vivek Goyal
  Cc: Andrew Morton, nickpiggin, k-miyoshi, greg, Bernhard Walle, kdb,
	kexec, Takenori Nagano, linux-kernel, Randy Dunlap,
	Eric W. Biederman, Keith Owens

On Mon, Apr 14, 2008 at 09:46:22AM -0400, Vivek Goyal wrote:
> On Fri, Apr 11, 2008 at 09:07:51PM -0700, Andrew Morton wrote:
> 
> [..]
> > > Kernel panic - not syncing: Panic by panic_module.
> > > __tunable_atomic_notifier_call_chain enter
> > > msg_handler:panic_event was called.
> > > ipmi_wdog:wdog_panic_handler was called.
> > > notifier_test: notifier_test_panic() is called.
> > > notifier_test: notifier_test_panic2() is called.
> > 
> > OK.  But I don't see anywhere in here the most important piece of
> > information: why do we need this feature in Linux?
> > 
> > What are the use-cases?  What is the value?  etc.
> > 
> > Often I can guess (but I like the originator to remove the guesswork).  In
> > this case I'm stumped - I can't see any reason why anyone would want this.
> > 
> 
> Hi Andrew,
> 
> To begin with, he wants kdb, kgdb etc to co-exist with kdump. He wants
> to put all the RAS tools (who are interested in panic event) on a list
> and export it to user space and let user decide in what order do the tool get
> executed at panic time (based on priority).
> 
> This brings in little bit reliability concerns for kdump due to notifier
> code being run after panic.
> 
> I think people want to use this infrastrutucure beyond RAS tools. I
> remember somebody wanting to send a message to remote node after a
> panic (before kdump kicks in)  so that remote node can initiate failover
> etc.
> 
I know it doesn't particularly relate to this patch, but FWIW, for cases like
failover, I've inserted infrastrucutre in the userspace part of kdump for
Fedora/RHEL to support this sort of thing.  We can run arbitrary scripts righte
before and after a capture so that notifications can be sent to remote nodes in
a much safer fashion than using the notifier chain after a panic.
Neil


-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 *nhorman@redhat.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-14 14:42     ` Neil Horman
@ 2008-04-14 14:46       ` Bernhard Walle
  2008-04-14 14:53       ` Vivek Goyal
  1 sibling, 0 replies; 10+ messages in thread
From: Bernhard Walle @ 2008-04-14 14:46 UTC (permalink / raw
  To: Neil Horman
  Cc: Vivek Goyal, Andrew Morton, nickpiggin, k-miyoshi, greg, kdb,
	kexec, Takenori Nagano, linux-kernel, Randy Dunlap,
	Eric W. Biederman, Keith Owens

* Neil Horman [2008-04-14 10:42]:
>
> I know it doesn't particularly relate to this patch, but FWIW, for cases like
> failover, I've inserted infrastrucutre in the userspace part of kdump for
> Fedora/RHEL to support this sort of thing.  We can run arbitrary scripts righte
> before and after a capture so that notifications can be sent to remote nodes in
> a much safer fashion than using the notifier chain after a panic.

But that doesn't help if you want to use a debugger (KDB, KGDB).


	Bernhard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-14 14:42     ` Neil Horman
  2008-04-14 14:46       ` Bernhard Walle
@ 2008-04-14 14:53       ` Vivek Goyal
  2008-04-14 16:01         ` Neil Horman
  1 sibling, 1 reply; 10+ messages in thread
From: Vivek Goyal @ 2008-04-14 14:53 UTC (permalink / raw
  To: Neil Horman
  Cc: Andrew Morton, nickpiggin, k-miyoshi, greg, Bernhard Walle, kdb,
	kexec, Takenori Nagano, linux-kernel, Randy Dunlap,
	Eric W. Biederman, Keith Owens

On Mon, Apr 14, 2008 at 10:42:28AM -0400, Neil Horman wrote:
> On Mon, Apr 14, 2008 at 09:46:22AM -0400, Vivek Goyal wrote:
> > On Fri, Apr 11, 2008 at 09:07:51PM -0700, Andrew Morton wrote:
> > 
> > [..]
> > > > Kernel panic - not syncing: Panic by panic_module.
> > > > __tunable_atomic_notifier_call_chain enter
> > > > msg_handler:panic_event was called.
> > > > ipmi_wdog:wdog_panic_handler was called.
> > > > notifier_test: notifier_test_panic() is called.
> > > > notifier_test: notifier_test_panic2() is called.
> > > 
> > > OK.  But I don't see anywhere in here the most important piece of
> > > information: why do we need this feature in Linux?
> > > 
> > > What are the use-cases?  What is the value?  etc.
> > > 
> > > Often I can guess (but I like the originator to remove the guesswork).  In
> > > this case I'm stumped - I can't see any reason why anyone would want this.
> > > 
> > 
> > Hi Andrew,
> > 
> > To begin with, he wants kdb, kgdb etc to co-exist with kdump. He wants
> > to put all the RAS tools (who are interested in panic event) on a list
> > and export it to user space and let user decide in what order do the tool get
> > executed at panic time (based on priority).
> > 
> > This brings in little bit reliability concerns for kdump due to notifier
> > code being run after panic.
> > 
> > I think people want to use this infrastrutucure beyond RAS tools. I
> > remember somebody wanting to send a message to remote node after a
> > panic (before kdump kicks in)  so that remote node can initiate failover
> > etc.
> > 
> I know it doesn't particularly relate to this patch, but FWIW, for cases like
> failover, I've inserted infrastrucutre in the userspace part of kdump for
> Fedora/RHEL to support this sort of thing.  We can run arbitrary scripts righte
> before and after a capture so that notifications can be sent to remote nodes in
> a much safer fashion than using the notifier chain after a panic.
> Neil
> 

That's great. I did not know about these. So user can write custom
scripts/binaries which can be packed into kdump initrd and executed either
before or after dump capture? Any idea, if somebody has started using it
already?

If that's the case then only other serious user at this point of time
is kernel debugger (kdb, kgdb), which needs to run before kdump, in case
of panic. And Eric suggested for those cases debugger can just insert a 
break point at panic(), instead of introducing the tunable notifier list
infrastructure.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-14 14:53       ` Vivek Goyal
@ 2008-04-14 16:01         ` Neil Horman
  2008-04-14 19:33           ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Horman @ 2008-04-14 16:01 UTC (permalink / raw
  To: Vivek Goyal
  Cc: Neil Horman, Andrew Morton, nickpiggin, k-miyoshi, greg,
	Bernhard Walle, kdb, kexec, Takenori Nagano, linux-kernel,
	Randy Dunlap, Eric W. Biederman, Keith Owens

On Mon, Apr 14, 2008 at 10:53:23AM -0400, Vivek Goyal wrote:
> On Mon, Apr 14, 2008 at 10:42:28AM -0400, Neil Horman wrote:
> > On Mon, Apr 14, 2008 at 09:46:22AM -0400, Vivek Goyal wrote:
> > > On Fri, Apr 11, 2008 at 09:07:51PM -0700, Andrew Morton wrote:
> > > 
> > > [..]
> > > > > Kernel panic - not syncing: Panic by panic_module.
> > > > > __tunable_atomic_notifier_call_chain enter
> > > > > msg_handler:panic_event was called.
> > > > > ipmi_wdog:wdog_panic_handler was called.
> > > > > notifier_test: notifier_test_panic() is called.
> > > > > notifier_test: notifier_test_panic2() is called.
> > > > 
> > > > OK.  But I don't see anywhere in here the most important piece of
> > > > information: why do we need this feature in Linux?
> > > > 
> > > > What are the use-cases?  What is the value?  etc.
> > > > 
> > > > Often I can guess (but I like the originator to remove the guesswork).  In
> > > > this case I'm stumped - I can't see any reason why anyone would want this.
> > > > 
> > > 
> > > Hi Andrew,
> > > 
> > > To begin with, he wants kdb, kgdb etc to co-exist with kdump. He wants
> > > to put all the RAS tools (who are interested in panic event) on a list
> > > and export it to user space and let user decide in what order do the tool get
> > > executed at panic time (based on priority).
> > > 
> > > This brings in little bit reliability concerns for kdump due to notifier
> > > code being run after panic.
> > > 
> > > I think people want to use this infrastrutucure beyond RAS tools. I
> > > remember somebody wanting to send a message to remote node after a
> > > panic (before kdump kicks in)  so that remote node can initiate failover
> > > etc.
> > > 
> > I know it doesn't particularly relate to this patch, but FWIW, for cases like
> > failover, I've inserted infrastrucutre in the userspace part of kdump for
> > Fedora/RHEL to support this sort of thing.  We can run arbitrary scripts righte
> > before and after a capture so that notifications can be sent to remote nodes in
> > a much safer fashion than using the notifier chain after a panic.
> > Neil
> > 
> 
> That's great. I did not know about these. So user can write custom
> scripts/binaries which can be packed into kdump initrd and executed either
> before or after dump capture? Any idea, if somebody has started using it
> already?
> 
Thats exactly right.  I'm not sure if there is any serious use as of yet, but
I've had some interrogatories about it.  Specific cases that I recall include:

1) A set of users in japan that are using the pre-dump script to block execution
until a scsi controller detects all its drives (it apparently takes up to three
minues to scan its bus)

2) I think some people using clustering services were using the pre-script to
notify cluster peers of the failure to avoid power fencing while a node
completed the crash dump

3) A national lab had an interest in using the pre script to send an email to an
administrative address to log the failure in a cluster 

Neil

> If that's the case then only other serious user at this point of time
> is kernel debugger (kdb, kgdb), which needs to run before kdump, in case
> of panic. And Eric suggested for those cases debugger can just insert a 
> break point at panic(), instead of introducing the tunable notifier list
> infrastructure.
> 
> Thanks
> Vivek

-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 *nhorman@redhat.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-14 16:01         ` Neil Horman
@ 2008-04-14 19:33           ` Andrew Morton
  2008-04-17  5:31             ` Takenori Nagano
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2008-04-14 19:33 UTC (permalink / raw
  To: Neil Horman
  Cc: vgoyal, nhorman, nickpiggin, k-miyoshi, greg, bwalle, kdb, kexec,
	t-nagano, linux-kernel, rdunlap, ebiederm, kaos

On Mon, 14 Apr 2008 12:01:46 -0400
Neil Horman <nhorman@redhat.com> wrote:

> On Mon, Apr 14, 2008 at 10:53:23AM -0400, Vivek Goyal wrote:
> > On Mon, Apr 14, 2008 at 10:42:28AM -0400, Neil Horman wrote:
> > > On Mon, Apr 14, 2008 at 09:46:22AM -0400, Vivek Goyal wrote:
> > > > On Fri, Apr 11, 2008 at 09:07:51PM -0700, Andrew Morton wrote:
> > > > 
> > > > [..]
> > > > > > Kernel panic - not syncing: Panic by panic_module.
> > > > > > __tunable_atomic_notifier_call_chain enter
> > > > > > msg_handler:panic_event was called.
> > > > > > ipmi_wdog:wdog_panic_handler was called.
> > > > > > notifier_test: notifier_test_panic() is called.
> > > > > > notifier_test: notifier_test_panic2() is called.
> > > > > 
> > > > > OK.  But I don't see anywhere in here the most important piece of
> > > > > information: why do we need this feature in Linux?
> > > > > 
> > > > > What are the use-cases?  What is the value?  etc.
> > > > > 
> > > > > Often I can guess (but I like the originator to remove the guesswork).  In
> > > > > this case I'm stumped - I can't see any reason why anyone would want this.
> > > > > 
> > > > 
> > > > Hi Andrew,
> > > > 
> > > > To begin with, he wants kdb, kgdb etc to co-exist with kdump. He wants
> > > > to put all the RAS tools (who are interested in panic event) on a list
> > > > and export it to user space and let user decide in what order do the tool get
> > > > executed at panic time (based on priority).
> > > > 
> > > > This brings in little bit reliability concerns for kdump due to notifier
> > > > code being run after panic.
> > > > 
> > > > I think people want to use this infrastrutucure beyond RAS tools. I
> > > > remember somebody wanting to send a message to remote node after a
> > > > panic (before kdump kicks in)  so that remote node can initiate failover
> > > > etc.
> > > > 
> > > I know it doesn't particularly relate to this patch, but FWIW, for cases like
> > > failover, I've inserted infrastrucutre in the userspace part of kdump for
> > > Fedora/RHEL to support this sort of thing.  We can run arbitrary scripts righte
> > > before and after a capture so that notifications can be sent to remote nodes in
> > > a much safer fashion than using the notifier chain after a panic.
> > > Neil
> > > 
> > 
> > That's great. I did not know about these. So user can write custom
> > scripts/binaries which can be packed into kdump initrd and executed either
> > before or after dump capture? Any idea, if somebody has started using it
> > already?
> > 
> Thats exactly right.  I'm not sure if there is any serious use as of yet, but
> I've had some interrogatories about it.  Specific cases that I recall include:
> 
> 1) A set of users in japan that are using the pre-dump script to block execution
> until a scsi controller detects all its drives (it apparently takes up to three
> minues to scan its bus)
> 
> 2) I think some people using clustering services were using the pre-script to
> notify cluster peers of the failure to avoid power fencing while a node
> completed the crash dump
> 
> 3) A national lab had an interest in using the pre script to send an email to an
> administrative address to log the failure in a cluster 
> 

OK, thanks.

I think I'll duck the patch for now as it seems that a littlee more thought
and coordination is neeed.

Plus it appears that the only users of this infrastructure are provided via
presently-out-of-tree patches, so people who are already patching and
building their own kernels can easily add this other patch as well, for now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-14 19:33           ` Andrew Morton
@ 2008-04-17  5:31             ` Takenori Nagano
  2008-04-23 12:31               ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Takenori Nagano @ 2008-04-17  5:31 UTC (permalink / raw
  To: Andrew Morton, vgoyal
  Cc: Neil Horman, nickpiggin, k-miyoshi, greg, bwalle, kdb, kexec,
	linux-kernel, rdunlap, ebiederm, kaos

Andrew Morton wrote:
> On Mon, 14 Apr 2008 12:01:46 -0400
> Neil Horman <nhorman@redhat.com> wrote:
> 
>> On Mon, Apr 14, 2008 at 10:53:23AM -0400, Vivek Goyal wrote:
>>> On Mon, Apr 14, 2008 at 10:42:28AM -0400, Neil Horman wrote:
>>>> On Mon, Apr 14, 2008 at 09:46:22AM -0400, Vivek Goyal wrote:
>>>>> On Fri, Apr 11, 2008 at 09:07:51PM -0700, Andrew Morton wrote:
>>>>>
>>>>> [..]
>>>>>>> Kernel panic - not syncing: Panic by panic_module.
>>>>>>> __tunable_atomic_notifier_call_chain enter
>>>>>>> msg_handler:panic_event was called.
>>>>>>> ipmi_wdog:wdog_panic_handler was called.
>>>>>>> notifier_test: notifier_test_panic() is called.
>>>>>>> notifier_test: notifier_test_panic2() is called.
>>>>>> OK.  But I don't see anywhere in here the most important piece of
>>>>>> information: why do we need this feature in Linux?
>>>>>>
>>>>>> What are the use-cases?  What is the value?  etc.
>>>>>>
>>>>>> Often I can guess (but I like the originator to remove the guesswork).  In
>>>>>> this case I'm stumped - I can't see any reason why anyone would want this.
>>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> To begin with, he wants kdb, kgdb etc to co-exist with kdump. He wants
>>>>> to put all the RAS tools (who are interested in panic event) on a list
>>>>> and export it to user space and let user decide in what order do the tool get
>>>>> executed at panic time (based on priority).
>>>>>
>>>>> This brings in little bit reliability concerns for kdump due to notifier
>>>>> code being run after panic.
>>>>>
>>>>> I think people want to use this infrastrutucure beyond RAS tools. I
>>>>> remember somebody wanting to send a message to remote node after a
>>>>> panic (before kdump kicks in)  so that remote node can initiate failover
>>>>> etc.
>>>>>
>>>> I know it doesn't particularly relate to this patch, but FWIW, for cases like
>>>> failover, I've inserted infrastrucutre in the userspace part of kdump for
>>>> Fedora/RHEL to support this sort of thing.  We can run arbitrary scripts righte
>>>> before and after a capture so that notifications can be sent to remote nodes in
>>>> a much safer fashion than using the notifier chain after a panic.
>>>> Neil
>>>>
>>> That's great. I did not know about these. So user can write custom
>>> scripts/binaries which can be packed into kdump initrd and executed either
>>> before or after dump capture? Any idea, if somebody has started using it
>>> already?
>>>
>> Thats exactly right.  I'm not sure if there is any serious use as of yet, but
>> I've had some interrogatories about it.  Specific cases that I recall include:
>>
>> 1) A set of users in japan that are using the pre-dump script to block execution
>> until a scsi controller detects all its drives (it apparently takes up to three
>> minues to scan its bus)
>>
>> 2) I think some people using clustering services were using the pre-script to
>> notify cluster peers of the failure to avoid power fencing while a node
>> completed the crash dump
>>
>> 3) A national lab had an interest in using the pre script to send an email to an
>> administrative address to log the failure in a cluster 
>>
> 
> OK, thanks.
> 
> I think I'll duck the patch for now as it seems that a littlee more thought
> and coordination is neeed.
> 
> Plus it appears that the only users of this infrastructure are provided via
> presently-out-of-tree patches, so people who are already patching and
> building their own kernels can easily add this other patch as well, for now.
> 
> 

Hi,

The one of the reason why I want this functionality is managing RAS
tool behavior for postmotem actions, initially from kdb invocation.
(I used kdb for debugging and crash analysis very useful in lkcd days,
but it is "want" and it is not "must" today ;-))

The other postmotem action is disabling hardware watchdog.
Watch dog handler would stop keepalive heartbeat when system panics
and we must disable hardware watchdog as soon as possible, since 2nd
kernel startup takes some time (10 or 100? secs) and there may be
miss-firing window. But currently we have no chance to do anything
before crash_exec().

And thinking about a clustering software. If the system encounter
the panic, system must notify standby node. But... :-(

I am interested in pre-dump scripts Neil mentioned. I think it can
resolve some of our requirements. I will try it.

For quick invocation of kdump, I partially agree with the idea of
"kdump should be invoked as soon as system panic, since we can not
trust broken kernels", but we would like to have some choise what
to do on panic (and if notifier is controllable by my patch,
you can still call kdump first)

Anyway, completely broken kernel can not call kdump or any other
mechanism  ;-P  and I feel it is somewhat matter of degree.

Thanks,
    Takenori

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] add new notifier function ,take3
  2008-04-17  5:31             ` Takenori Nagano
@ 2008-04-23 12:31               ` Eric W. Biederman
  0 siblings, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2008-04-23 12:31 UTC (permalink / raw
  To: Takenori Nagano
  Cc: Andrew Morton, vgoyal, Neil Horman, nickpiggin, k-miyoshi, greg,
	bwalle, kdb, kexec, linux-kernel, rdunlap, ebiederm, kaos

Takenori Nagano <t-nagano@ah.jp.nec.com> writes:

> Hi,
>
> The one of the reason why I want this functionality is managing RAS
> tool behavior for postmotem actions, initially from kdb invocation.
> (I used kdb for debugging and crash analysis very useful in lkcd days,
> but it is "want" and it is not "must" today ;-))

Ok.  I have not heard any reason here why a break point at panic
or inside of panic is not useful.

> The other postmotem action is disabling hardware watchdog.
> Watch dog handler would stop keepalive heartbeat when system panics
> and we must disable hardware watchdog as soon as possible, since 2nd
> kernel startup takes some time (10 or 100? secs) and there may be
> miss-firing window. But currently we have no chance to do anything
> before crash_exec().

The transition time from one kernel to the next should be under 1 sec.
After that you are talking time for the drivers to initialize.
Although sha256 over the kdump kernel and it's ramdisk may slow things
down a little more for lots of data.

If the concern is of petting a watchdog to keep the system from
rebooting getting the kernel to initialize the watchdogs quickly
appears to be the correct answer.

> And thinking about a clustering software. If the system encounter
> the panic, system must notify standby node. But... :-(

If the concern is to notify another system of the crash quickly.
I see no reason why very early in the second kernel or perhaps
even in the purgatory code in kexec we can not do this.  If the
code is to hairy to do there then the code is likely to be
too hairy to do reliably when the system panics.

> I am interested in pre-dump scripts Neil mentioned. I think it can
> resolve some of our requirements. I will try it.

> For quick invocation of kdump, I partially agree with the idea of
> "kdump should be invoked as soon as system panic, since we can not
> trust broken kernels", but we would like to have some choise what
> to do on panic (and if notifier is controllable by my patch,
> you can still call kdump first)
>
> Anyway, completely broken kernel can not call kdump or any other
> mechanism  ;-P  and I feel it is somewhat matter of degree.

Yes.  Completely broken kernels may not recognize they have a problem.

It is the design goal of the kexec on panic path to work with as much
of kernel broken as possible.

Additionally reviewing and testing that code is extremely difficult,
because it is the one piece of code in the kernel where debugging
tools are not available.  Putting random tunable code on that path
hugely reduces it's maintainability.

Further for all of the cases I have seen there is only one correct
action to take, things do not need to be tunable.  So a generally
tunable interface appears to be a design mistake.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-04-23 12:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-11  7:53 [PATCH 0/2] add new notifier function ,take3 Takenori Nagano
2008-04-12  4:07 ` Andrew Morton
2008-04-14 13:46   ` Vivek Goyal
2008-04-14 14:42     ` Neil Horman
2008-04-14 14:46       ` Bernhard Walle
2008-04-14 14:53       ` Vivek Goyal
2008-04-14 16:01         ` Neil Horman
2008-04-14 19:33           ` Andrew Morton
2008-04-17  5:31             ` Takenori Nagano
2008-04-23 12:31               ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).