[STATUS 2.5] October 30, 2002

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [STATUS 2.5]  October 30, 2002
@ 2002-10-30 15:13 Guillaume Boissiere
  2002-10-30 15:55 ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 1 reply; 29+ messages in thread
From: Guillaume Boissiere @ 2002-10-30 15:13 UTC (permalink / raw
  To: linux-kernel

Many new big items merged in the last few days:
IPsec, CryptoAPI, LVM2 (device-mapper), Digital Video Broadcasting layer, etc.
And still a long list of pending items marked as "Ready".
Oh, and Halloween is tomorrow.... :-)

http://www.kernelnewbies.org/status/  for all the details.
Enjoy!

-- Guillaume



-------------------------------------------------------
Linux Kernel 2.5 Status - October 30th, 2002
(Latest kernel release is 2.5.44)

Items in bold have changed since last week.
Items in grey are post Halloween (feature freeze).

Features:  
 
Merged  
o in 2.5.1+  Rewrite of the block IO (bio) layer  (Jens Axboe)  
o in 2.5.2  Initial support for USB 2.0  (David Brownell, Greg Kroah-Hartman, etc.)  
o in 2.5.2  Per-process namespaces, late-boot cleanups  (Al Viro, Manfred Spraul)  
o in 2.5.2+  New scheduler for improved scalability  (Ingo Molnar)  
o in 2.5.2+  New kernel device structure (kdev_t)  (Linus Torvalds, etc.)  
o in 2.5.3  IDE layer update  (Andre Hedrick)  
o in 2.5.3  Support reiserfs external journal  (Reiserfs team)  
o in 2.5.3  Generic ACL (Access Control List) support  (Nathan Scott)  
o in 2.5.3  PnP BIOS driver  (Alan Cox, Thomas Hood, Dave Jones, etc.)  
o in 2.5.3+  New driver model & unified device tree  (Patrick Mochel)  
o in 2.5.4  Add preempt kernel option  (Robert Love, MontaVista team)  
o in 2.5.4  Support for Next Generation POSIX Threading  (NGPT team)  
o in 2.5.5  Add ALSA (Advanced Linux Sound Architecture)  (ALSA team)  
o in 2.5.5  Pagetables in highmem support  (Ingo Molnar, Arjan van de Ven)  
o in 2.5.5  New architecture: AMD 64-bit (x86-64)  (Andi Kleen, x86-64 Linux team)  
o in 2.5.5  New architecture: PowerPC 64-bit (ppc64)  (Anton Blanchard, ppc64 team)  
o in 2.5.6  Add JFS (Journaling FileSystem from IBM)  (JFS team)  
o in 2.5.6  per_cpu infrastructure  (Rusty Russell)  
o in 2.5.6  HDLC (High-level Data Link Control) update  (Krzysztof Halasa)  
o in 2.5.6  smbfs Unicode and large file support  (Urban Widmark)  
o in 2.5.7  New driver API for Wireless Extensions  (Jean Tourrilhes)  
o in 2.5.7  Video for Linux (V4L) redesign  (Gerd Knorr)  
o in 2.5.7  Futexes (Fast Lightweight Userspace Semaphores)  (Rusty Russell, etc.)  
o in 2.5.7+  NAPI network interrupt mitigation  (Jamal Hadi Salim, Robert Olsson, Alexey 
Kuznetsov)  
o in 2.5.7+  ACPI (Advanced Configuration & Power Interface)  (Andy Grover, ACPI team)  
o in 2.5.8  Syscall interface for CPU task affinity  (Robert Love)  
o in 2.5.8  Radix-tree pagecache  (Momchil Velikov, Christoph Hellwig)  
o in 2.5.9  Smarter IRQ balancing  (Ingo Molnar)  
o in 2.5.11  Replace old NTFS driver with NTFS TNG driver  (Anton Altaparmakov)  
o in 2.5.11  Fast walk dcache  (Hanna Linder)  
o in 2.5.11+  Rewrite of the framebuffer layer  (James Simmons)  
o in 2.5.12+  Rewrite of the buffer layer  (Andrew Morton)  
o in 2.5.14  Support for IDE TCQ (Tagged Command Queueing)  (Jens Axboe)  
o in 2.5.14  Bluetooth support (no longer experimental!)  (Maxim Krasnyansky, Bluetooth team)  
o in 2.5.17  New quota system supporting plugins  (Jan Kara)  
o in 2.5.17+  Move ISDN4Linux to CAPI based interface  (Kai Germaschewski, ISDN4Linux team)  
o in 2.5.18  Software suspend (to disk & RAM)  (Pavel Machek)  
o in 2.5.23  More complete IEEE 802.2 stack  (Arnaldo, Jay Schullist, from Procom donated 
code)  
o in 2.5.23+  Hotplug CPU support  (Rusty Russell)  
o in 2.5.25  Faster internal kernel clock frequency  (Linus Torvalds)  
o in 2.5.26  Direct pagecache <-> BIO disk I/O  (Andrew Morton)  
o in 2.5.27+  New VM with reverse mappings  (Rik van Riel)  
o in 2.5.28+  Serial driver restructure  (Russell King)  
o in 2.5.28  Remove the "Big IRQ lock"  (Ingo Molnar)  
o in 2.5.29+  Thread-Local Storage (TLS) support  (Ingo Molnar)  
o in 2.5.29+  Add Linux Security Module (LSM)  (LSM team)  
o in 2.5.29+  Strict address space accounting  (Alan Cox)  
o in 2.5.31+  Disk description cleanups  (Al Viro)  
o in 2.5.31  Support insane number of processes  (Linus Torvalds)  
o in 2.5.32  New MTRR (Memory Type Range Register) driver  (Patrick Mochel)  
o in 2.5.32+  Porting all input devices over to input API  (Vojtech Pavlik, James Simmons)  
o in 2.5.32+    Asynchronous IO (aio) support  (Ben LaHaise)  
o in 2.5.32+  Improved POSIX threading support  (Ingo Molnar)  
o in 2.5.33  SCTP (Stream Control Transmission Protocol)  (lksctp team)  
o in 2.5.33  TCP segmentation offload  (Alexey Kuznetsov)  
o in 2.5.34  discontigmem support (ia32)  (Pat Gaughen, Martin Bligh, Jack Steiner, Tony Luck) 
 
o in 2.5.34  POSIX threading support for signals  (Ingo Molnar)  
o in 2.5.35  Add User-Mode Linux (UML)  (Jeff Dike)  
o in 2.5.35  Serial ATA support  (Andre Hedrick)  
o in 2.5.36  Add XFS (A journaling filesystem from SGI)  (XFS team)  
o in 2.5.37  Remove the global tasklist  (Ingo Molnar, William Lee Irwin)  
o in 2.5.39  New IO scheduler  (Jens Axboe)  
o in 2.5.40  Add support for CPU clock/voltage scaling  (Dominik Brodowski, Erik Mouw, Dave 
Jones, Russell King, Arjan van de Ven)  
o in 2.5.40  NUMA topology support  (Matt Dobson)  
o in 2.5.40  Parallelizing page replacement  (Andrew Morton, Momchil Velikov, Dave Hansen, 
William Lee Irwin)  
o in 2.5.42  Improved i2o (Intelligent Input/Ouput) layer  (Alan Cox)  
o in 2.5.42  Remove the 2TB block device limit  (Peter Chubb)  
o in 2.5.42  Add new CIFS (Common Internet File System)  (Steve French)  
o in 2.5.42  ext2/ext3 large directory support: HTree index  (Daniel Phillips, Christopher Li, 
Andrew Morton, Ted Ts'o)  
o in 2.5.43  Add support for NFS v4  (NFS v4 team, Trond Myklebust, Neil Brown)  
o in 2.5.43  Read-Copy Update (RCU) Mutual Exclusion  (Dipankar Sarma, Rusty Russell, Andrea 
Arcangeli, LSE Team)  
o in 2.5.43  Add OProfile, a low-overhead profiler  (John Levon)  
o in 2.5.43  Andrew File System (AFS) support  (David Howells)  
o in 2.5.44  x86 BIOS Enhanced Disk Device (EDD) polling  (Matt Domsch)  
o in 2.5.44  Plug'N Play Layer Rewrite  (Adam Belay)  
o in 2.5.45  Device mapper for Logical Volume Manager (LVM2)  (Alasdair Kergon, Patrick 
Caulfield, Joe Thornber)  
o in 2.5.45  Digital Video Broadcasting (DVB) layer  (LinuxTV team)  
o in 2.5.45  IPsec support  (Alexey Kuznetsov, Dave Miller, USAGI team)  
o in 2.5.45  CryptoAPI  (James Morris)  

 
o in -mm  Page table sharing  (Daniel Phillips, Dave McCracken)  
o in -mm  Extended Attributes and ACLs for ext2/ext3  (Ted Ts'o)  
o in -mm  Per-cpu hot & cold page lists  (Andrew Morton, Martin Bligh)  
o in -ac  MMU-less processor support (ucLinux)  (Greg Ungerer)  

 
o Ready  Build option for Linux Trace Toolkit (LTT)  (Karim Yaghmour)  
o Ready  Kernel Probes (kprobes)  (Vamsi Krishna, kprobes team)  
o Ready  High resolution timers  (George Anzinger, etc.)  
o Ready  EVMS (Enterprise Volume Management System)  (EVMS team)  
o Ready  Linux Kernel Crash Dumps  (Matt Robinson, LKCD team)  
o Ready  Rewrite of the console layer  (James Simmons)  
o Ready  Zerocopy NFS  (Hirokazu Takahashi)  
o Ready  Kexec, syscall to load kernel from kernel  (Eric Biederman)  
o Ready  New Linux configuration system  (Roman Zippel)  
o Ready  In-kernel module loader  (Rusty Russell)  
o Ready  Unified boot/parameter support  (Rusty Russell)  
o Ready  Support insane number of groups  (Tim Hockin)  
o Ready  Better I/O performance with epoll  (Davide Libenzi)  
o Ready  NUMA aware scheduler extensions  (Erich Focht, Michael Hohnbaum)  
o Ready  Replace initrd by initramfs  (H. Peter Anvin, Al Viro)  
o Ready  SCSI and FibreChannel Hotswap Support  (Steven Dake)  

 
o Beta  Worldclass support for IPv6  (Alexey Kuznetsov, Dave Miller, Jun Murai, Yoshifuji 
Hideaki, USAGI team)  
o Beta  Reiserfs v4  (Reiserfs team)  
o Beta  SCSI multipath IO (with NUMA support)  (Patrick Mansfield, Mike Anderson)  

 
o Alpha  Basic NUMA API  (Matt Dobson)  
o Alpha  Remove waitqueue heads from kernel structures  (William Lee Irwin)  
o Alpha  NUMA aware slab allocator  (Manfred Spraul, Martin Bligh)  

 
o Started  32bit dev_t  (?)  

 
o Post-freeze  Change all drivers to new driver model  (All maintainers)  
o Post-freeze  Fix device naming issues  (Patrick Mochel, Greg Kroah-Hartman)  
o Post-freeze  Better event logging for enterprise systems  (Larry Kessler, evlog team)  
o Post-freeze  Page table reclamation  (William Lee Irwin, Rik Van Riel)  
o Post-freeze  UMSDOS (Unix under MS-DOS) Rewrite  (Al Viro)  
o Post-freeze  USB gadget support  (Stuart Lynne, Greg Kroah-Hartman)  
o Post-freeze  Overhaul PCMCIA support  (David Woodhouse, David Hinds)  
o Post-freeze  InfiniBand support  (InfiniBand team)  
o Post-freeze  Per-mountpoint read-only, union-mounts, unionfs  (Al Viro)  
o Post-freeze  More complete NetBEUI stack  (Arnaldo Carvalho de Melo, from Procom donated 
code)  
o Post-freeze  New mount API  (Al Viro)  
o Post-freeze  Add thrashing control  (Rik van Riel)  
o Post-freeze  Remove all hardwired drivers from kernel  (Alan Cox, etc.)  
o Post-freeze  Improved AppleTalk stack  (Arnaldo Carvalho de Melo)  
o Post-freeze  ext2/ext3 online resize support  (Andreas Dilger)  
o Post-freeze  New lightweight library (klibc)  (H. Peter Anvin)  
o Post-freeze  UDF Write support for CD-R/RW (packet writing)  (Jens Axboe, Peter Osterlund)  
o Post-freeze  Scalable Statistics Counter  (Ravikiran Thirumalai)  
o Post-freeze  Add hardware sensors drivers  (lm_sensors team)  


 
Cleanups:  
 
Merged  
o in 2.5.3  Break Configure.help into multiple files  (Linus Torvalds)  
o in 2.5.3  Untangle sched.h & fs.h include dependancies  (Dave Jones, Roman Zippel)  
o in 2.5.4  Per network protocol slabcache & sock.h  (Arnaldo Carvalho de Melo)  
o in 2.5.4  Per filesystem slabcache & fs.h  (Daniel Phillips, Jeff Garzik, Al Viro)  
o in 2.5.6  Killing kdev_t for block devices  (Al Viro)  
o in 2.5.18+  ->getattr() ->setattr() ->permission() changes  (Al Viro)  
o in 2.5.21  Split up x86 setup.c into managable pieces  (Patrick Mochel)  
o in 2.5.23+  Major MD tool (RAID 5) cleanup  (Neil Brown)  
o in 2.5.30  Remove khttpd  (Christoph Hellwig)  
o in 2.5.31  Rework datalink protocols to not use cli/sti  (Arnaldo Carvalho de Melo)  
o in 2.5.31  Remove incomplete SPX network stack  (Arnaldo Carvalho de Melo)  
o in 2.5.43  Remove kiobufs  (Andrew Morton)  

 
o in -mm  Avoid dcache_lock while path walking  (Maneesh Soni, IBM team)  

 
o Ready  Switch to ->get_super() for file_system_type  (Al Viro)  

 
o Beta  file.h and INIT_TASK  (Benjamin LaHaise)  
o Beta  Proper UFS fixes, ext2 and locking cleanups  (Al Viro)  
o Beta  Lifting limitations on mount(2)  (Al Viro)  

 
o Started  Reorder x86 initialization  (Dave Jones, Randy Dunlap)  



Have some free time and want to help? Check out the Kernel Janitor
TO DO list for a list of source code cleanups you can work on.
A great place to start learning more about kernel internals!


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-30 15:13 Guillaume Boissiere
@ 2002-10-30 15:55 ` YOSHIFUJI Hideaki / 吉藤英明
  2002-10-30 22:36   ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2002-10-30 15:55 UTC (permalink / raw
  To: boissiere, davem, kuznet; +Cc: linux-kernel

In article <3DBFB0D2.21734.21E3A6B@localhost> (at Wed, 30 Oct 2002 10:13:38 -0500), "Guillaume Boissiere" <boissiere@adiglobal.com> says:

> o in 2.5.45  IPsec support  (Alexey Kuznetsov, Dave Miller, USAGI team)  

How is the status of IPsec for IPv6?


> o Beta  Worldclass support for IPv6  (Alexey Kuznetsov, Dave Miller, Jun Murai, Yoshifuji 
> Hideaki, USAGI team)  

We've almost done.

One thing that I'll contribute before the feature freeze is:
  - Privacy Extensions for IPv6 addrconf

The remaining things which we DO want to see in 2.6 are:
  - check is "rmmod ipv6" is ok
  - IPv6 source address selection; which will be mandated by the 
    node requirement.
  - IPsec for IPv6
  - make IPv6 non-experimental :-)
  - several enhancements on specification conformity
    (neighbour discovery etc.)

Thanks.

-- 
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
@ 2002-10-30 16:17 Dave Jones
  2002-10-30 17:14 ` Randy.Dunlap
  2002-10-31  6:22 ` Eric W. Biederman
  0 siblings, 2 replies; 29+ messages in thread
From: Dave Jones @ 2002-10-30 16:17 UTC (permalink / raw
  To: boissiere, Linux Kernel

> o in 2.5.35  Serial ATA support  (Andre Hedrick)

Erm, really ? 

> o Post-freeze  Add hardware sensors drivers  (lm_sensors team)

Something else I took a look at in the last few days was the ECC
drivers. These are also zero impact, and could go in after the freeze
(assuming the authors want them merged). They could do with a small
amount of cleanup, but otherwise look ok.

> o Started  Reorder x86 initialization  (Dave Jones, Randy Dunlap)

I've jiggled a bunch of this (Randy didnt have time to play here)
around as much as its probably going to be for 2.6. It's in -dj,
has been sent for -ac, and will likely go to Linus post-freeze
as its all cleanups, and one-liners.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-30 16:17 [STATUS 2.5] October 30, 2002 Dave Jones
@ 2002-10-30 17:14 ` Randy.Dunlap
  2002-10-31  6:22 ` Eric W. Biederman
  1 sibling, 0 replies; 29+ messages in thread
From: Randy.Dunlap @ 2002-10-30 17:14 UTC (permalink / raw
  To: Dave Jones; +Cc: boissiere, Linux Kernel

On Wed, 30 Oct 2002, Dave Jones wrote:

| > o Started  Reorder x86 initialization  (Dave Jones, Randy Dunlap)
|
| I've jiggled a bunch of this (Randy didnt have time to play here)
| around as much as its probably going to be for 2.6. It's in -dj,
| has been sent for -ac, and will likely go to Linus post-freeze
| as its all cleanups, and one-liners.

Right.  Please remove my name from that item.

-- 
~Randy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-30 15:55 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2002-10-30 22:36   ` David S. Miller
  2002-10-31  2:48     ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-10-30 22:36 UTC (permalink / raw
  To: yoshfuji; +Cc: boissiere, kuznet, linux-kernel

   From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
   Date: Thu, 31 Oct 2002 00:55:35 +0900 (JST)

   > o in 2.5.45  IPsec support  (Alexey Kuznetsov, Dave Miller, USAGI team)  

   How is the status of IPsec for IPv6?

It will be done after ipv4 side is fully functional.

     - IPv6 source address selection; which will be mandated by the 
       node requirement.

We told you several times how this USAGI patch is not currently in an
acceptable form and needs to be reimplemented via the routing code.

     - IPsec for IPv6

Alexey and I will implement this, it is basically reading RFCs and
typing at the keyboard, no more.

     - several enhancements on specification conformity
       (neighbour discovery etc.)

Where are these patches?  I've applied everything you've submitted.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-31  2:48     ` YOSHIFUJI Hideaki / 吉藤英明
@ 2002-10-31  2:44       ` David S. Miller
  2002-10-31  3:07         ` kuznet
  2002-10-31  3:16         ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 2 replies; 29+ messages in thread
From: David S. Miller @ 2002-10-31  2:44 UTC (permalink / raw
  To: yoshfuji; +Cc: boissiere, kuznet, linux-kernel

   From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
   Date: Thu, 31 Oct 2002 11:48:32 +0900 (JST)

   In article <20021030.143615.10738219.davem@redhat.com> (at Wed, 30 Oct 2002 14:36:15 -0800 (PST)), "David S. Miller" <davem@redhat.com> says:

   > We told you several times how this USAGI patch is not currently in an
   > acceptable form and needs to be reimplemented via the routing code.

   Yes, but I think
     - integrate our code to your tree
   then
     - reimplement (re-design)
   is better way to go forward.

Absolutely not, we do not put improperly architected code into the
tree first then clean it up later.

Especially because this source address selection code interferes with
many IPSEC issues.  Source address selection belongs at routing
tables, and there is no arguing about this.  If you put it somewhere
else it gets in the way and causes many problems.

Please implement source address selection properly, then resubmit.
Thank you.

   I need to check the result of current code and 
   to look at diff by byte-to-byte before preparing
   patches for current tree.

Ok.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-30 22:36   ` David S. Miller
@ 2002-10-31  2:48     ` YOSHIFUJI Hideaki / 吉藤英明
  2002-10-31  2:44       ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2002-10-31  2:48 UTC (permalink / raw
  To: davem; +Cc: boissiere, kuznet, linux-kernel

In article <20021030.143615.10738219.davem@redhat.com> (at Wed, 30 Oct 2002 14:36:15 -0800 (PST)), "David S. Miller" <davem@redhat.com> says:

>      - IPv6 source address selection; which will be mandated by the
>        node requirement.
> 
> We told you several times how this USAGI patch is not currently in an
> acceptable form and needs to be reimplemented via the routing code.

Yes, but I think
  - integrate our code to your tree
then
  - reimplement (re-design)
is better way to go forward.

This is because the code, which works well in O(n) as current one 
does, will tell you our needs and intentions better than our
babble when you re-design it;  I belive we will achieve better 
design in this way.


>      - several enhancements on specification conformity
>        (neighbour discovery etc.)
> 
> Where are these patches?  I've applied everything you've submitted.

Yes, thanks.

I need to check the result of current code and 
to look at diff by byte-to-byte before preparing
patches for current tree.

-- 
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-31  2:44       ` David S. Miller
@ 2002-10-31  3:07         ` kuznet
  2002-10-31  3:16         ` YOSHIFUJI Hideaki / 吉藤英明
  1 sibling, 0 replies; 29+ messages in thread
From: kuznet @ 2002-10-31  3:07 UTC (permalink / raw
  To: David S. Miller; +Cc: yoshfuji, boissiere, linux-kernel

Hello!

> Please implement source address selection properly, then resubmit.

Actually, I would propose... not to worry about this for a while.
The issue might happen to dissolve after cleaning the space
around ip6_route_output().

Alexey

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-31  3:16         ` YOSHIFUJI Hideaki / 吉藤英明
@ 2002-10-31  3:13           ` David S. Miller
  0 siblings, 0 replies; 29+ messages in thread
From: David S. Miller @ 2002-10-31  3:13 UTC (permalink / raw
  To: yoshfuji; +Cc: boissiere, kuznet, linux-kernel

   From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
   Date: Thu, 31 Oct 2002 12:16:09 +0900 (JST)

   In article <20021030.184443.87162307.davem@redhat.com> (at Wed, 30 Oct 2002 18:44:43 -0800 (PST)), "David S. Miller" <davem@redhat.com> says:
   
   > Absolutely not, we do not put improperly architected code into the
   > tree first then clean it up later.
   
   That patch do NOT change current architecture so above is unfair.

Ok, I correct myself, this patch adds more dependencies on badly
architected area making it _harder_ for us to clean it up.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-10-31  2:44       ` David S. Miller
  2002-10-31  3:07         ` kuznet
@ 2002-10-31  3:16         ` YOSHIFUJI Hideaki / 吉藤英明
  2002-10-31  3:13           ` David S. Miller
  1 sibling, 1 reply; 29+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2002-10-31  3:16 UTC (permalink / raw
  To: davem; +Cc: boissiere, kuznet, linux-kernel

In article <20021030.184443.87162307.davem@redhat.com> (at Wed, 30 Oct 2002 18:44:43 -0800 (PST)), "David S. Miller" <davem@redhat.com> says:

>    > We told you several times how this USAGI patch is not currently in an
>    > acceptable form and needs to be reimplemented via the routing code.
>    
>    Yes, but I think
>      - integrate our code to your tree
>    then
>      - reimplement (re-design)
>    is better way to go forward.
>    
> Absolutely not, we do not put improperly architected code into the
> tree first then clean it up later.

That patch do NOT change current architecture so above is unfair.

It would be ok to say "we do not put code into the part of 
improperly architected code in the tree then clean it up later."

-- 
Hideaki YOSHIFUJI @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-30 16:17 [STATUS 2.5] October 30, 2002 Dave Jones
  2002-10-30 17:14 ` Randy.Dunlap
@ 2002-10-31  6:22 ` Eric W. Biederman
  2002-10-31 10:56   ` Alan Cox
                     ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Eric W. Biederman @ 2002-10-31  6:22 UTC (permalink / raw
  To: Dave Jones; +Cc: boissiere, Linux Kernel

Dave Jones <davej@codemonkey.org.uk> writes:

> Something else I took a look at in the last few days was the ECC
> drivers. These are also zero impact, and could go in after the freeze
> (assuming the authors want them merged). They could do with a small
> amount of cleanup, but otherwise look ok.

Assuming they work.  No offense to the guys who got the ball rolling, but
the architecture is lousy, and every driver I have messed with does not
work correctly, and I wind up reimplementing it before I can use it.

I actually like the idea of ECC drivers, and routinely make certain
there is a working ECC driver on the systems I ship.  It is so much
very easier to catch memory errors with good ECC error reporting.  But
unless I have slept soundly through a fundamental change, the
linux-ecc project currently does not ship quality drivers.  The
infrastructure is bad, and the code is not quite correct. 

If you want I can dig up the drivers I am currently using and send
them to you.

I even have a working memory scrub routine.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-31  6:22 ` Eric W. Biederman
@ 2002-10-31 10:56   ` Alan Cox
  2002-10-31 16:30     ` Randy.Dunlap
  2002-10-31 14:40   ` Dave Jones
  2002-10-31 23:01   ` Pavel Machek
  2 siblings, 1 reply; 29+ messages in thread
From: Alan Cox @ 2002-10-31 10:56 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Dave Jones, boissiere, Linux Kernel Mailing List

On Thu, 2002-10-31 at 06:22, Eric W. Biederman wrote:
> I actually like the idea of ECC drivers, and routinely make certain
> there is a working ECC driver on the systems I ship.  It is so much
> very easier to catch memory errors with good ECC error reporting.  But
> unless I have slept soundly through a fundamental change, the
> linux-ecc project currently does not ship quality drivers.  The
> infrastructure is bad, and the code is not quite correct. 
> 
> If you want I can dig up the drivers I am currently using and send
> them to you.

That would be really cool


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-31  6:22 ` Eric W. Biederman
  2002-10-31 10:56   ` Alan Cox
@ 2002-10-31 14:40   ` Dave Jones
  2002-10-31 23:01   ` Pavel Machek
  2 siblings, 0 replies; 29+ messages in thread
From: Dave Jones @ 2002-10-31 14:40 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: boissiere, Linux Kernel

On Wed, Oct 30, 2002 at 11:22:12PM -0700, Eric W. Biederman wrote:
 > I actually like the idea of ECC drivers, and routinely make certain
 > there is a working ECC driver on the systems I ship.  It is so much
 > very easier to catch memory errors with good ECC error reporting.  But
 > unless I have slept soundly through a fundamental change, the
 > linux-ecc project currently does not ship quality drivers.  The
 > infrastructure is bad, and the code is not quite correct. 
 > 
 > If you want I can dig up the drivers I am currently using and send
 > them to you.

Go wild..

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-31 10:56   ` Alan Cox
@ 2002-10-31 16:30     ` Randy.Dunlap
  0 siblings, 0 replies; 29+ messages in thread
From: Randy.Dunlap @ 2002-10-31 16:30 UTC (permalink / raw
  To: Alan Cox
  Cc: Eric W. Biederman, Dave Jones, boissiere,
	Linux Kernel Mailing List

On 31 Oct 2002, Alan Cox wrote:

| On Thu, 2002-10-31 at 06:22, Eric W. Biederman wrote:
| > I actually like the idea of ECC drivers, and routinely make certain
| > there is a working ECC driver on the systems I ship.  It is so much
| > very easier to catch memory errors with good ECC error reporting.  But
| > unless I have slept soundly through a fundamental change, the
| > linux-ecc project currently does not ship quality drivers.  The
| > infrastructure is bad, and the code is not quite correct.
| >
| > If you want I can dig up the drivers I am currently using and send
| > them to you.
|
| That would be really cool
Ditto.

-- 
~Randy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-31  6:22 ` Eric W. Biederman
  2002-10-31 10:56   ` Alan Cox
  2002-10-31 14:40   ` Dave Jones
@ 2002-10-31 23:01   ` Pavel Machek
  2002-11-01 14:05     ` Eric W. Biederman
  2 siblings, 1 reply; 29+ messages in thread
From: Pavel Machek @ 2002-10-31 23:01 UTC (permalink / raw
  To: Eric W. Biederman; +Cc: Dave Jones, boissiere, Linux Kernel

Hi!

> If you want I can dig up the drivers I am currently using and send
> them to you.
> 
> I even have a working memory scrub routine.

What is "memory scrubbing" good for?
								Pavel
-- 
When do you have heart between your knees?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-10-31 23:01   ` Pavel Machek
@ 2002-11-01 14:05     ` Eric W. Biederman
  2002-11-01 16:49       ` Alan Cox
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2002-11-01 14:05 UTC (permalink / raw
  To: Pavel Machek; +Cc: Dave Jones, boissiere, Linux Kernel

Pavel Machek <pavel@ucw.cz> writes:

> Hi!
> 
> > If you want I can dig up the drivers I am currently using and send
> > them to you.
> > 
> > I even have a working memory scrub routine.
> 
> What is "memory scrubbing" good for?

When you have a correctable ECC error on a page you need to rewrite the
memory to remove the error.  This prevents the correctable error from becoming
an uncorrectable error if another bit goes bad.  Also if you have a
working software memory scrub routine you can be certain multiple
errors from the same address are actually distinct.  As opposed to
multiple reports of the same error.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-11-01 14:05     ` Eric W. Biederman
@ 2002-11-01 16:49       ` Alan Cox
  2002-11-01 17:00         ` Richard B. Johnson
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Cox @ 2002-11-01 16:49 UTC (permalink / raw
  To: Eric W. Biederman
  Cc: Pavel Machek, Dave Jones, boissiere, Linux Kernel Mailing List

On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> When you have a correctable ECC error on a page you need to rewrite the
> memory to remove the error.  This prevents the correctable error from becoming
> an uncorrectable error if another bit goes bad.  Also if you have a
> working software memory scrub routine you can be certain multiple
> errors from the same address are actually distinct.  As opposed to
> multiple reports of the same error.

Note that this area has some extremely "interesting" properties. For one
you have to be very careful what operation you use to scrub and its
platform specific. On x86 for example you want to do something like lock
addl $0, mem. A simple read/write isnt safe because if the memory area
is a DMA target your read then write just corrupted data and made the
problem worse not better!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-11-01 16:49       ` Alan Cox
@ 2002-11-01 17:00         ` Richard B. Johnson
  2002-11-02 12:19           ` Eric W. Biederman
  0 siblings, 1 reply; 29+ messages in thread
From: Richard B. Johnson @ 2002-11-01 17:00 UTC (permalink / raw
  To: Alan Cox
  Cc: Eric W. Biederman, Pavel Machek, Dave Jones, boissiere,
	Linux Kernel Mailing List

On 1 Nov 2002, Alan Cox wrote:

> On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> > When you have a correctable ECC error on a page you need to rewrite the
> > memory to remove the error.  This prevents the correctable error from becoming
> > an uncorrectable error if another bit goes bad.  Also if you have a
> > working software memory scrub routine you can be certain multiple
> > errors from the same address are actually distinct.  As opposed to
> > multiple reports of the same error.
> 
> Note that this area has some extremely "interesting" properties. For one
> you have to be very careful what operation you use to scrub and its
> platform specific. On x86 for example you want to do something like lock
> addl $0, mem. A simple read/write isnt safe because if the memory area
> is a DMA target your read then write just corrupted data and made the
> problem worse not better!
> 

The correctable ECC is supposed to be just that (correctable). It's
supposed to be entirely transparent to the CPU/Software. An additional
read of the affected error produces the same correction so the CPU
will never even know. The x86 CPU/Software is only notified on an
uncorrectable error. I don't know of any SDRAM controller that
generates an interrupt upon a correctable error. Some store "logging"
information internally, very difficult to get at on a running system.

Given that, "scrubbing" RAM seems to be somewhat useless on a
running system. The next write to the affected area will fix the
ECC bits, that't what is supposed to clear up the condition.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [STATUS 2.5]  October 30, 2002
@ 2002-11-01 18:17 Ed Vance
  2002-11-01 18:46 ` Malcolm Beattie
  0 siblings, 1 reply; 29+ messages in thread
From: Ed Vance @ 2002-11-01 18:17 UTC (permalink / raw
  To: 'Richard B. Johnson'; +Cc: Alan Cox, Linux Kernel Mailing List

On Fri, November 01, 2002 at 9:00 AM, Richard B. Johnson wrote:
> [...]
> The correctable ECC is supposed to be just that (correctable). It's
> supposed to be entirely transparent to the CPU/Software. An additional
> read of the affected error produces the same correction so the CPU
> will never even know. The x86 CPU/Software is only notified on an
> uncorrectable error. I don't know of any SDRAM controller that
> generates an interrupt upon a correctable error. Some store "logging"
> information internally, very difficult to get at on a running system.
> 
> Given that, "scrubbing" RAM seems to be somewhat useless on a
> running system. The next write to the affected area will fix the
> ECC bits, that's what is supposed to clear up the condition.
> 

Scrubbing has nothing whatever to do with reporting of correctable errors to
the CPU, even if it does the scrubbing.

Scrubbing does not happen on the basis of chance detection of correctable
errors from normal activity, because that would sometimes be too late.
Remember, the hardware only finds out about an error when the word is
accessed. There is no detection of the bit cell getting its charge altered,
and the errors are cumulative between corrections. 

Scrubbing is intended to lower the probability that any given memory word
will be hit by a second error causing event (such as an alpha particle
emitted from a ceramic case) without having been accessed and corrected. The
scrub just continuously rolls through all of physical memory (at low
priority) again and again doing whatever level of access is necessary to
cause correction. This limits the maximum time between correction of any
memory word. Some memory systems automatically correct and rewrite
(atomically) on a read of a word with a single bit error. Some mainframe
memory systems do the whole ECC scrub/correction operation in hardware,
simultaneously in each bank. 

The primary benefit of logging is to catch deteriorating memory cells during
periodic maintenance that either do not correct at all (single stuck bit,
single hits become uncorrectable) or that repeatedly fail over time, perhaps
due to charge leaks from long term diffusion of contaminants. 

Cheers,
Ed

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-11-01 18:17 Ed Vance
@ 2002-11-01 18:46 ` Malcolm Beattie
  0 siblings, 0 replies; 29+ messages in thread
From: Malcolm Beattie @ 2002-11-01 18:46 UTC (permalink / raw
  To: Ed Vance
  Cc: 'Richard B. Johnson', Alan Cox, Linux Kernel Mailing List

Ed Vance writes:
>                                                           Some mainframe
> memory systems do the whole ECC scrub/correction operation in hardware,
> simultaneously in each bank. 

For those interested in the gory details of how the z900 mainframe
does memory scrubbing, see the section on "Memory" in
"RAS design for the IBM eServer z900" by L. C. Alves et al
in the z900 issue of IBM Journal of Research and Development.
HTML version at
    http://www.research.ibm.com/journal/rd/464/alves.html
PDF version at
    http://www.research.ibm.com/journal/rd/464/alves.pdf
Web page for whole issue at
    http://www.research.ibm.com/journal/rd46-45.html

--Malcolm

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
@ 2002-11-01 19:14 Ken Ryan
  2002-11-01 19:56 ` Richard B. Johnson
  0 siblings, 1 reply; 29+ messages in thread
From: Ken Ryan @ 2002-11-01 19:14 UTC (permalink / raw
  To: Linux Kernel Mailing List

>Given that, "scrubbing" RAM seems to be somewhat useless on a
>running system. The next write to the affected area will fix the
>ECC bits, that't what is supposed to clear up the condition. 

If a region of RAM isn't written to it won't help, and may accumulate
additional errors.  Kernel code, for instance, can then rot
away.  Scrubbing guarantees that all locations in memory get rewritten
periodically, so correctable errors are removed.

I first saw this when I was brought in to help on a design for a
spacecraft.  Even rad-hard devices (these weren't) will flip a bit in a
matter of hours due to background radiation.  Non-hardened memories can
get errors within minutes.  Scrubbing assured the system would only notice
once every few years (when too many bits get flipped in a word during the
scrub interval).

		Ken Ryan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-11-01 19:14 Ken Ryan
@ 2002-11-01 19:56 ` Richard B. Johnson
  2002-11-01 21:27   ` Ken Ryan
  0 siblings, 1 reply; 29+ messages in thread
From: Richard B. Johnson @ 2002-11-01 19:56 UTC (permalink / raw
  To: Ken Ryan; +Cc: Linux Kernel Mailing List

On Fri, 1 Nov 2002, Ken Ryan wrote:

> 
> >Given that, "scrubbing" RAM seems to be somewhat useless on a
> >running system. The next write to the affected area will fix the
> >ECC bits, that't what is supposed to clear up the condition. 
> 
> If a region of RAM isn't written to it won't help, and may accumulate
> additional errors.  Kernel code, for instance, can then rot
> away.  Scrubbing guarantees that all locations in memory get rewritten
> periodically, so correctable errors are removed.
> 
> I first saw this when I was brought in to help on a design for a
> spacecraft.  Even rad-hard devices (these weren't) will flip a bit in a
> matter of hours due to background radiation.  Non-hardened memories can
> get errors within minutes.  Scrubbing assured the system would only notice
> once every few years (when too many bits get flipped in a word during the
> scrub interval).
> 
> 		Ken Ryan
> 

Hang with me a second. This gets complicated and is
not anything that naive "scrubbing" can fix on a
desktop machine.

With a conventional ix86 machine, you have uncorrectable
errors reported via NMI. Some specialized machines have
correctable errors reported by maskable interrupt. For
instance, the AMD SC520's SDRAM memory controller can
set a bit upon a correctable error and this can be mapped
to a maskable interrupt but you still have little information
about what caused the interrupt. Upon either interrupt,
the return address points to code to be continued. Nothing
points to the address of the memory causing an error.
Now, internal to the SDRAM controller, there are registers
that can be used to identify the "bank" that caused the
problem. It would require the kernel to completely understand
the memory configuration in order to isolate this to an
address. Further, to read the SDRAM controller, requires that
refresh be turned OFF, etc. Not a good thing to do on a
live system.

But, in principle, one could read all the pages addressable
from each of the segments, CS, SS, DS, ES, FS, GS, and try to
do what? Make another error, causing a double-fault?
I think not. That is the problem with handling ECC errors.
That's also the reason why VAX/VMS would map out any RAM that
caused such an error, by killing off the process and making
all the RAM accessible to the process (without a page-fault),
"owned" by a non-existent process called "Bad Pages". There
wasn't really anything else you could do. If the RAM was
owned by the kernel, you got a "Fatal machine-check" and
that's all she wrote.

Now, given this, if you read all the RAM in the machine at, say
ten-second intervals, do you think you would fix anything? What
would happen is the memory locations that got corrupt would be
read and you would have a fatal ECC error. Most of these memory
locations would have never even been accessed, and therefore
the fatal error would have never happened if you didn't force
the fatality by reading bad locations. If you turned OFF ECC
when you read all the memory, you just made good ECC check-bits
out of bad ones. The data is corrupt and will never be reported.

So, ten seconds after you have some cosmic-ray upset, you guarantee
that your machine will crash if you read everything every ten
seconds. This will never be acceptable. You need to leave the
machine alone and not try to "pick scabs". That's how you get
the best reliability. Also, at some periodic intervals, you
re-boot (restart) the whole machine, reinitializing everything
including all the RAM.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-11-01 19:56 ` Richard B. Johnson
@ 2002-11-01 21:27   ` Ken Ryan
  0 siblings, 0 replies; 29+ messages in thread
From: Ken Ryan @ 2002-11-01 21:27 UTC (permalink / raw
  To: Richard B. Johnson; +Cc: Linux Kernel Mailing List

Actually, it's much simpler: with hardware ECC (correction, not just
detection) the OS never needs to know what happened.

Let's say a single bit cell gets corrupted, e.g. changes state
because of an alpha particle.  When that word is read, the ECC logic
corrects the error and presents the intended value to the bus (note it
doesn't matter if it's CPU, a DMA, or whatever).  If the read was a scrub
operation, the same value is immediately written back to the same
location.  This overwrites the bad value with a correct one, making the
error go away.  Therefore if a later event corrupts another bit in that
word, it doesn't get beyond what the ECC can handle; whereas if the word
was never rewritten it may accumulate two, three, four etc. errors until
the ECC logic can't fix it anymore.

So the mere act of reading and rewriting makes errors go away so long as
it hasn't exceeded the capability of the ECC logic.  This therefore
reduces the odds of an uncorrectable error to the chance of multiple bits
flipping within a short time, which is good enough for life-critical
systems on the Space Station.

As you mentioned, correctable errors can optionally be reported to the
OS.  This is useful (to an extent) for predicting failures[1]; the same
correctable error showing up repeatedly in the same bit indicates a weak
cell.

Note Alan's point was if an unrelated write gets to the memory word
between the read and rewrite, that's very bad.  I don't know the x86
architecture well enough to comment on whether 'lock' is adequate to
prevent DMA from sneaking in; that's one reason why we put the scrub
operation in the DRAM controller hardware (this was a custom hardware
design [not running Linux :-( ]).

		ken

[1] Or as a thermometer.  I had a Sun workstation which would spew ECC
corrections only over weekends; it took a few weeks of consternation to
realize it was because the building air conditioning was shut off then.

On Fri, 1 Nov 2002, Richard B. Johnson wrote:

> On Fri, 1 Nov 2002, Ken Ryan wrote:
> 
> > 
> > >Given that, "scrubbing" RAM seems to be somewhat useless on a
> > >running system. The next write to the affected area will fix the
> > >ECC bits, that't what is supposed to clear up the condition. 
> > 
> > If a region of RAM isn't written to it won't help, and may accumulate
> > additional errors.  Kernel code, for instance, can then rot
> > away.  Scrubbing guarantees that all locations in memory get rewritten
> > periodically, so correctable errors are removed.
> > 
> > I first saw this when I was brought in to help on a design for a
> > spacecraft.  Even rad-hard devices (these weren't) will flip a bit in a
> > matter of hours due to background radiation.  Non-hardened memories can
> > get errors within minutes.  Scrubbing assured the system would only notice
> > once every few years (when too many bits get flipped in a word during the
> > scrub interval).
> > 
> > 		Ken Ryan
> > 
> 
> Hang with me a second. This gets complicated and is
> not anything that naive "scrubbing" can fix on a
> desktop machine.
> 
> With a conventional ix86 machine, you have uncorrectable
> errors reported via NMI. Some specialized machines have
> correctable errors reported by maskable interrupt. For
> instance, the AMD SC520's SDRAM memory controller can
> set a bit upon a correctable error and this can be mapped
> to a maskable interrupt but you still have little information
> about what caused the interrupt. Upon either interrupt,
> the return address points to code to be continued. Nothing
> points to the address of the memory causing an error.
> Now, internal to the SDRAM controller, there are registers
> that can be used to identify the "bank" that caused the
> problem. It would require the kernel to completely understand
> the memory configuration in order to isolate this to an
> address. Further, to read the SDRAM controller, requires that
> refresh be turned OFF, etc. Not a good thing to do on a
> live system.
> 
> But, in principle, one could read all the pages addressable
> from each of the segments, CS, SS, DS, ES, FS, GS, and try to
> do what? Make another error, causing a double-fault?
> I think not. That is the problem with handling ECC errors.
> That's also the reason why VAX/VMS would map out any RAM that
> caused such an error, by killing off the process and making
> all the RAM accessible to the process (without a page-fault),
> "owned" by a non-existent process called "Bad Pages". There
> wasn't really anything else you could do. If the RAM was
> owned by the kernel, you got a "Fatal machine-check" and
> that's all she wrote.
> 
> Now, given this, if you read all the RAM in the machine at, say
> ten-second intervals, do you think you would fix anything? What
> would happen is the memory locations that got corrupt would be
> read and you would have a fatal ECC error. Most of these memory
> locations would have never even been accessed, and therefore
> the fatal error would have never happened if you didn't force
> the fatality by reading bad locations. If you turned OFF ECC
> when you read all the memory, you just made good ECC check-bits
> out of bad ones. The data is corrupt and will never be reported.
> 
> So, ten seconds after you have some cosmic-ray upset, you guarantee
> that your machine will crash if you read everything every ten
> seconds. This will never be acceptable. You need to leave the
> machine alone and not try to "pick scabs". That's how you get
> the best reliability. Also, at some periodic intervals, you
> re-boot (restart) the whole machine, reinitializing everything
> including all the RAM.
> 
> 
> 
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
>    Bush : The Fourth Reich of America
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [STATUS 2.5] October 30, 2002
@ 2002-11-01 22:25 Ed Vance
  2002-11-02  0:33 ` Werner Almesberger
  0 siblings, 1 reply; 29+ messages in thread
From: Ed Vance @ 2002-11-01 22:25 UTC (permalink / raw
  To: 'Richard B. Johnson'; +Cc: Ken Ryan, Linux Kernel Mailing List

On Fri, November 01, 2002 at 11:56 AM, Richard B. Johnson wrote:
> [...]
> So, ten seconds after you have some cosmic-ray upset, you guarantee
> that your machine will crash if you read everything every ten
> seconds. This will never be acceptable. You need to leave the
> machine alone and not try to "pick scabs". That's how you get
> the best reliability. Also, at some periodic intervals, you
> re-boot (restart) the whole machine, reinitializing everything
> including all the RAM.
> 
Here's a Monty Python analogy to ECC memory scrubbing:

Do you remember the battle between Arthur and the Black Knight? 

Without scrubbing, the memory bits suffer damage at a more or less constant
rate, like the Black Knight. The damage accumulates and eventually renders
the  Black Knight non-functional. For the memory, this would be an
uncorrectable error from the accumulation of many separate bit error events.

With scrubbing, the memory bits and the Black Knight suffer damage at the
same rate, but this time the Black Knight is able to stick his limbs back on
(while fighting) after Arthur hacks them off. If the Black Knight's rate of
sticking his limbs back on equals Arthur's rate of hacking his limbs off,
the Black Knight will sustain the same amount of damage, but will remain
functional as long as he can keep up. For the memory, the many separate bit
error events would cause only correctable errors, as long as the scrubbing
can keep up.

cheers,
Ed

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5] October 30, 2002
  2002-11-01 22:25 Ed Vance
@ 2002-11-02  0:33 ` Werner Almesberger
  0 siblings, 0 replies; 29+ messages in thread
From: Werner Almesberger @ 2002-11-02  0:33 UTC (permalink / raw
  To: Ed Vance
  Cc: 'Richard B. Johnson', Ken Ryan, Linux Kernel Mailing List

Ed Vance wrote:
> functional as long as he can keep up. For the memory, the many separate bit
> error events would cause only correctable errors, as long as the scrubbing
> can keep up.

Don't those bit errors have a Poissonian character ? If so, it's
impossible to "keep up". All you can do is make the interval small
enough that, on average, it takes a long time until you get hit
twice (or more often) in that interval.

A better example would be car tires on roads with many randomly
distributed sharp objects (i.e. such that age does not significantly
change the odds of tire damage): you can keep going as long as you
can get a flat tire fixed before another tire gets punctured. But
sometimes, you may end up with two flat tires, and need a tow truck.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-11-01 17:00         ` Richard B. Johnson
@ 2002-11-02 12:19           ` Eric W. Biederman
  2002-11-04 14:31             ` Richard B. Johnson
  0 siblings, 1 reply; 29+ messages in thread
From: Eric W. Biederman @ 2002-11-02 12:19 UTC (permalink / raw
  To: root
  Cc: Alan Cox, Pavel Machek, Dave Jones, boissiere,
	Linux Kernel Mailing List

"Richard B. Johnson" <root@chaos.analogic.com> writes:

> On 1 Nov 2002, Alan Cox wrote:
> 
> > On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> > > When you have a correctable ECC error on a page you need to rewrite the
> > > memory to remove the error.  This prevents the correctable error from
> becoming
> 
> > > an uncorrectable error if another bit goes bad.  Also if you have a
> > > working software memory scrub routine you can be certain multiple
> > > errors from the same address are actually distinct.  As opposed to
> > > multiple reports of the same error.
> > 
> > Note that this area has some extremely "interesting" properties. For one
> > you have to be very careful what operation you use to scrub and its
> > platform specific. On x86 for example you want to do something like lock
> > addl $0, mem. A simple read/write isnt safe because if the memory area
> > is a DMA target your read then write just corrupted data and made the
> > problem worse not better!

yep lock addl $0, mem  with the appropriate kmaps so it will work on any system
I use.  It isn't rocket science but since it is using kmap_atomic that function
at least should probably get in the kernel.

> The correctable ECC is supposed to be just that (correctable). It's
> supposed to be entirely transparent to the CPU/Software. An additional
> read of the affected error produces the same correction so the CPU
> will never even know. The x86 CPU/Software is only notified on an
> uncorrectable error. I don't know of any SDRAM controller that
> generates an interrupt upon a correctable error. Some store "logging"
> information internally, very difficult to get at on a running system.

Polling the memory controller periodically isn't hard, and you can usually
get an interrupt as well.  Though I have not explored the whole interrupt
territory.  Finding out when you have a corrected error is extremely useful
as it gives a warning that your memory is going bad.  Just like with a disk
getting a bunch of errors means it is time to be replaced, but you still
have a little time left.

> Given that, "scrubbing" RAM seems to be somewhat useless on a
> running system. The next write to the affected area will fix the
> ECC bits, that't what is supposed to clear up the condition.

If it is your kernel text space that is getting the error there will
be no next write.

Beyond that if you are trying to see if the multiple correctable errors
you have are a single error, or an actual problem software scrubbing helps.
Because then you know the second report was because the problem reoccured.
Making it likely you have a bad bit in your DIMM.

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-11-02 12:19           ` Eric W. Biederman
@ 2002-11-04 14:31             ` Richard B. Johnson
  2002-11-04 15:58               ` Eric W. Biederman
  0 siblings, 1 reply; 29+ messages in thread
From: Richard B. Johnson @ 2002-11-04 14:31 UTC (permalink / raw
  To: Eric W. Biederman
  Cc: Alan Cox, Pavel Machek, Dave Jones, boissiere,
	Linux Kernel Mailing List

On 2 Nov 2002, Eric W. Biederman wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
> 
> > On 1 Nov 2002, Alan Cox wrote:
> > 
> > > On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> > > > When you have a correctable ECC error on a page you need to rewrite the
> > > > memory to remove the error.  This prevents the correctable error from
> > becoming
> > 
> > > > an uncorrectable error if another bit goes bad.  Also if you have a
> > > > working software memory scrub routine you can be certain multiple
> > > > errors from the same address are actually distinct.  As opposed to
> > > > multiple reports of the same error.
> > > 
> > > Note that this area has some extremely "interesting" properties. For one
> > > you have to be very careful what operation you use to scrub and its
> > > platform specific. On x86 for example you want to do something like lock
> > > addl $0, mem. A simple read/write isnt safe because if the memory area
> > > is a DMA target your read then write just corrupted data and made the
> > > problem worse not better!
> 
> yep lock addl $0, mem  with the appropriate kmaps so it will work on any system
> I use.  It isn't rocket science but since it is using kmap_atomic that function
> at least should probably get in the kernel.
> 
> > The correctable ECC is supposed to be just that (correctable). It's
> > supposed to be entirely transparent to the CPU/Software. An additional
> > read of the affected error produces the same correction so the CPU
> > will never even know. The x86 CPU/Software is only notified on an
> > uncorrectable error. I don't know of any SDRAM controller that
> > generates an interrupt upon a correctable error. Some store "logging"
> > information internally, very difficult to get at on a running system.
> 
> Polling the memory controller periodically isn't hard, and you can usually
> get an interrupt as well.  Though I have not explored the whole interrupt
> territory.  Finding out when you have a corrected error is extremely useful
> as it gives a warning that your memory is going bad.  Just like with a disk
> getting a bunch of errors means it is time to be replaced, but you still
> have a little time left.
> 
> > Given that, "scrubbing" RAM seems to be somewhat useless on a
> > running system. The next write to the affected area will fix the
> > ECC bits, that't what is supposed to clear up the condition.
> 
> If it is your kernel text space that is getting the error there will
> be no next write.
> 
> Beyond that if you are trying to see if the multiple correctable errors
> you have are a single error, or an actual problem software scrubbing helps.
> Because then you know the second report was because the problem reoccured.
> Making it likely you have a bad bit in your DIMM.
> 
> Eric
> 

The initial premise is fundamentally flawed. That being
that the first error you get will be a single-bit error.

Memory is not a bunch of randomly spaced bits that get
coalesced into bites/shorts/longs when accessed. Instead,
all the bits in a word are in the same general area. This
means that a nuclear event will alter several. In fact,
a nuclear event will likely put an "electronic hole" in
a physical area of memory. This area may cross several
memory "block" boundaries. These blocks are not usually
related to physical pages at all. These blocks have
different bit-densities depending upon the type and
manufacturer.  Typical bit-densities are 16, 64, 128,
and 256 megabits.  They are organized into banks so
they can be addressed in rows and columns, minimizing
the hardware. The result of a nuclear event may look
like this:

Base
         _______________________________________
0x1000   | Bank 1 |  Bank 2 |  Bank 3 | Bank 4 |
0x8000   |        |         | /       |        |
0x10000  |--------|---------/---------|--------|
0x18000  | Bank 5 |  Bank / |  Bank 7 | Bank 8 |
0x20000  |        |     /   |         |        |
0x0000   -------------/-------------------------
                    /
Particle trail---->

In this case, the event altered bits in bank 6 and
bank 3. It may have altered bits in 2 and 7 also.
The hits altered bits at many memory addresses as
the diagram shows. The bits that got altered are
in the hundreds of thousands (of bits). If you read
these areas, without disabling ECC, you will get
a NMI. If you read these areas with modern ECC
hardware, the read, just like a write, will correct
the ECC bits.  Therefore, you have "fixed" corrupt
memory data.  This is not good.

Isolating a bad bit in RAM caused by bad RAM is not
done by memory "scrubbing", it is done by having the
NMI handler disable access to the bad RAM. In an ix86
machine, that task is very difficult because the handler,
unlike a page-fault handler, has no direct knowledge of
the page being accessed when the NMI occurred. One could
"inspect" the code leading up to the fault, and guess what
memory access occurred but that access is quite likely
in the .text segment which means the code isn't even
correct to inspect.  This stuff is possible to do, and
now that gigabytes of RAM are commonplace, it would
probably be a welcome addition to the kernel because the
probability of a single-bit error in ba-zillions of bits
is quite high.

Any "memory scrubbing" routines are worthless and simply
eat CPU cycles. Further, because of the well-established
principle of locality-of-action, you can have multiple
pages of trashed data in RAM, owned by all those sleeping
processes, that won't be accessed until the next boot.
If you want a reliable system, it's better to let sleeping
dogs lie and not access that RAM. You certainly don't want
to "scrub" it. That's like picking a scab. It will bleed.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [STATUS 2.5]  October 30, 2002
  2002-11-04 14:31             ` Richard B. Johnson
@ 2002-11-04 15:58               ` Eric W. Biederman
  0 siblings, 0 replies; 29+ messages in thread
From: Eric W. Biederman @ 2002-11-04 15:58 UTC (permalink / raw
  To: root
  Cc: Alan Cox, Pavel Machek, Dave Jones, boissiere,
	Linux Kernel Mailing List

"Richard B. Johnson" <root@chaos.analogic.com> writes:

> The initial premise is fundamentally flawed. That being
> that the first error you get will be a single-bit error.

I did not say a single bit error I said a correctable error.  Which
can recover if a single chip on a pair of DIMMs goes bad.

What I have seen in practice is that during manufacturing it is pretty
random weather the first error from bad memory will be correctable
or uncorrectable.  Once the memory is running error free it is
quite likely the first error will be a correctable error.  Especially
when it is the RAM that is going bad.
> 
> Isolating a bad bit in RAM caused by bad RAM is not
> done by memory "scrubbing", it is done by having the
> NMI handler disable access to the bad RAM. 

Scrubbing is for making certain the correction is written back to the RAM.
Many chipsets will correct the data going to the processor, but will leave
it corrupted in RAM.  Allowing the possibility of errors to accumulate,
and making it hard to tell if multiple reports are from the same
error or a different error.

>In an ix86
> machine, that task is very difficult because the handler,
> unlike a page-fault handler, has no direct knowledge of
> the page being accessed when the NMI occurred. 

We are obviously working with quite different hardware.  Intel
chipsets routinely report an ECC error on the page level granularity.

> One could
> "inspect" the code leading up to the fault, and guess what
> memory access occurred but that access is quite likely
> in the .text segment which means the code isn't even
> correct to inspect.  

I have seen no NMI error that ever trigger a cpu exception
to be synchronous with the code, though that may be possible with
an Athlon, which does the ECC correction in the CPU.  In general the
errors come in asynchronously at some point after the error occured.
So even killing the task that is using the bad RAM is unreliable.
If the error is not correctable, on a server I panic the machine.

> This stuff is possible to do, and
> now that gigabytes of RAM are commonplace, it would
> probably be a welcome addition to the kernel because the
> probability of a single-bit error in ba-zillions of bits
> is quite high.
> 
> Any "memory scrubbing" routines are worthless and simply
> eat CPU cycles. 

Functional memory in practice does not have ECC errors, so 
ECC code does not run.  I only run the scrub routine on memory
that has reported a correctable error.  And I think
1200 machines with 4GB each, running processor intensive tasks is a
reasonable sample to make this conclusion with.

>Further, because of the well-established
> principle of locality-of-action, you can have multiple
> pages of trashed data in RAM, owned by all those sleeping
> processes, that won't be accessed until the next boot.
> If you want a reliable system, it's better to let sleeping
> dogs lie and not access that RAM. You certainly don't want
> to "scrub" it. That's like picking a scab. It will bleed.

I do not randomly scrub memory, though for the hardware that does
not do that I am  not be opposed to the idea of a daemon that does.
The biggest problem with doing that in the cpu is that you are likely
to trash your cache.

One of the bigger challenges to work through is that frequently leaves
a few ECC error after setting up RAM.  So a cpu scrubber might trigger
those.  Replacing the BIOS is a good way to be certain that doesn't
happen :)

Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [STATUS 2.5] October 30, 2002
@ 2002-11-04 17:14 Ed Vance
  0 siblings, 0 replies; 29+ messages in thread
From: Ed Vance @ 2002-11-04 17:14 UTC (permalink / raw
  To: 'Werner Almesberger'
  Cc: 'Richard B. Johnson', Ken Ryan, Linux Kernel Mailing List

On Friday, November 01, 2002 at 4:33 PM, Werner Almesberger wrote:
> Ed Vance wrote:
> > functional as long as he can keep up. For the memory, the many separate 
> > bit error events would cause only correctable errors, as long as the 
> > scrubbing can keep up.
> 
> Don't those bit errors have a Poissonian character ? If so, it's
> impossible to "keep up". All you can do is make the interval small
> enough that, on average, it takes a long time until you get hit
> twice (or more often) in that interval.

Yes.
> 
> A better example would be car tires on roads with many randomly
> distributed sharp objects (i.e. such that age does not significantly
> change the odds of tire damage): you can keep going as long as you
> can get a flat tire fixed before another tire gets punctured. But
> sometimes, you may end up with two flat tires, and need a tow truck.
> 
I was just trying to get across the reversible nature of this kind 
of externally induced error. Richard's analogy was that scrubbing memory 
is like picking scabs. Perhaps immune reaction would be closer, because 
it tends to detect and fix small problems before they become big problems.
I don't think anybody is going to be convinced here. Sounds like the 
issue is not a lack of information. I like your car analogy - I had 
a very similar road trip between Missouri and Florida. 

Best regards,
Ed

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2002-11-04 17:08 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-30 16:17 [STATUS 2.5] October 30, 2002 Dave Jones
2002-10-30 17:14 ` Randy.Dunlap
2002-10-31  6:22 ` Eric W. Biederman
2002-10-31 10:56   ` Alan Cox
2002-10-31 16:30     ` Randy.Dunlap
2002-10-31 14:40   ` Dave Jones
2002-10-31 23:01   ` Pavel Machek
2002-11-01 14:05     ` Eric W. Biederman
2002-11-01 16:49       ` Alan Cox
2002-11-01 17:00         ` Richard B. Johnson
2002-11-02 12:19           ` Eric W. Biederman
2002-11-04 14:31             ` Richard B. Johnson
2002-11-04 15:58               ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2002-11-04 17:14 Ed Vance
2002-11-01 22:25 Ed Vance
2002-11-02  0:33 ` Werner Almesberger
2002-11-01 19:14 Ken Ryan
2002-11-01 19:56 ` Richard B. Johnson
2002-11-01 21:27   ` Ken Ryan
2002-11-01 18:17 Ed Vance
2002-11-01 18:46 ` Malcolm Beattie
2002-10-30 15:13 Guillaume Boissiere
2002-10-30 15:55 ` YOSHIFUJI Hideaki / 吉藤英明
2002-10-30 22:36   ` David S. Miller
2002-10-31  2:48     ` YOSHIFUJI Hideaki / 吉藤英明
2002-10-31  2:44       ` David S. Miller
2002-10-31  3:07         ` kuznet
2002-10-31  3:16         ` YOSHIFUJI Hideaki / 吉藤英明
2002-10-31  3:13           ` David S. Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.