All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* possible use-after-free in 2.5.44 scsi changes
@ 2002-10-25  1:39 Andrew Morton
  2002-10-25  4:06 ` Doug Ledford
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Andrew Morton @ 2002-10-25  1:39 UTC (permalink / raw
  To: linux-scsi@vger.kernel.org
  Cc: Badari Pulavarty, Martin J. Bligh, Jens Axboe, Doug Ledford


Gents,

we have some code in the -mm patchsets which adds a per-cpu
LIFO pool which frontends the page allocator.  To return pages
which are cache-warm on the calling CPU.

That code has been stable and unchanging since 2.5.40.  But in
2.5.44, Badari's machines are crashing when those patches are
applied.  Memory corruption deep in the scsi softirq callbacks.

There were no significant memory allocator changes between 2.5.43
and 2.5.44, but there were a lot of scsi changes.

These LIFO pools have a significant sideeffect: is a CPU frees
a page and then allocates a page it will get the same page back.
So if some code is using some memory for a few microseconds after
freeing it, there's a good chance that this bug will not exhibit
with the stock kernel's allocator but _will_ exhibit when the
per-cpu LIFO queues are present.

I'm suspecting that this is what is happening.  A use-after-free
bug may have been introduced in the 2.5.44 SCSI changes.

Here are some of Badari's oops traces.  Jens has commented:

> Seems to me, that blk_rq_map_sg() is being passed q == NULL, which means
> that scsi_init_io() has a request that has req->q == NULL. Badari,
> please do something like the following in
> drivers/scsi/scsi_merge.c:scsi_init_io():
> 
>         req->buffer = NULL;
> +
> +       if (!req->q) {
> +               blk_dump_rq_flags(req, "scsi_init_io");
> +               BUG();
> +       }
> +
>         /*
>          * Next, walk the list, and fill in the addresses and sizes of
> 
> right before calling blk_rq_map_sg(). And then make blk_dump_rq_flags()
> like the one I attach, just replace the one in ll_rw_blk.c

But I don't think Badari has got onto that yet.

In a word: Help!

Thanks.

CPU:    2
EIP:    0060:[<f89d0cd7>]    Not tainted
EFLAGS: 00010002
EIP is at qla2x00_process_completed_request+0x67/0x150 [qla2200]
eax: 00000000   ebx: f663646c   ecx: 00000000   edx: 00000000
esi: f661c17c   edi: 00000000   ebp: f70f3d7b   esp: f70f3cfc
ds: 0068   es: 0068   ss: 0068
Process syslogd (pid: 542, threadinfo=f70f2000 task=f70d1060)
Stack: f661c17c 0000711a 00007100 f89d0efb f661c17c 000000cc 00008020 c013a348 
       00008020 00007100 00ccaaa4 00000000 c0139641 c03c1e89 f7fff060 f7298c90 
       00000246 c013a348 00000001 000000d0 0000005c c03c4294 f661c17c f661c17c 
Call Trace:
 [<f89d0efb>] qla2x00_isr+0x13b/0x610 [qla2200]
 [<c013a348>] kmalloc+0xb8/0x150
 [<c0139641>] kmem_flagcheck+0x21/0x60
 [<c013a348>] kmalloc+0xb8/0x150
 [<f89cc17b>] qla2x00_intr_handler+0x9b/0x250 [qla2200]
 [<c01094ea>] handle_IRQ_event+0x3a/0x60
 [<c0109802>] do_IRQ+0x122/0x200
 [<c0107f38>] common_interrupt+0x18/0x20
 [<c0130068>] unmap_page_range+0x8/0x60
 [<c01359c1>] generic_file_write_nolock+0x6a1/0xa00
 [<c01924ec>] journal_stop+0x19c/0x1b0
 [<c0354531>] sys_recvfrom+0xa1/0x100
 [<c035457c>] sys_recvfrom+0xec/0x100
 [<c013e526>] __get_free_pages+0x36/0x40
 [<c015d673>] __pollwait+0x33/0xa0



EIP:    0060:[<f89d0ac8>]    Not tainted
EFLAGS: 00010002
EIP is at qla2x00_process_completed_request+0x68/0x150 [qla2200]
eax: 00000000   ebx: f6599574   ecx: 00000000   edx: 00000000
esi: f65f017c   edi: 00000000   ebp: f65f9d0c   esp: f65f9d00
ds: 0068   es: 0068   ss: 0068
Process dd (pid: 1594, threadinfo=f65f8000 task=e26350c0)
Stack: f65f017c 0000711a 00007100 f65f9d6c f89d0ceb f65f017c 00000114 00008020 
       f65f9d60 00008020 f61e34c4 00007100 0114c160 00000000 c0134fe7 f61e34c8 
       0000c160 0000c160 f65f9d68 c014fbe4 f61e34c4 0000c160 00000000 00000008 
Call Trace:
 [<f89d0ceb>] qla2x00_isr+0x13b/0x610 [qla2200]
 [<c0134fe7>] find_get_page+0x37/0x50
 [<c014fbe4>] __find_get_block_slow+0x34/0x120
 [<f89cbffe>] qla2x00_intr_handler+0x9e/0x250 [qla2200]
 [<c01095fd>] handle_IRQ_event+0x2d/0x60
 [<c0109924>] do_IRQ+0x124/0x200




elm3b81 login: Oops: 0000
qla2200  
CPU:    2
EIP:    0060:[<c0287d88>]    Not tainted
EFLAGS: 00010292
EIP is at blk_rq_map_sg+0x18/0x1f0
eax: 00000000   ebx: df3d6800   ecx: f631c270   edx: df3d6800
esi: f631c270   edi: 00000000   ebp: c6149c80   esp: c6149c38
ds: 0068   es: 0068   ss: 0068
Process dd (pid: 1590, threadinfo=c6148000 task=dfcba0c0)
Stack: 00000000 dfcba0c0 c0119c50 c6149c44 c6149c44 c6149c60 c01521c4 f0dfeba4 
       00000000 f7edc464 00000010 c6149c84 00000000 f7edc464 00000020 df3d6800 
       f631c270 00000000 c6149ca0 c02c5f41 00000000 f631c270 f6fb85a4 df3d6800 
Call Trace:
 [<c0119c50>] autoremove_wake_function+0x0/0x40
 [<c01521c4>] bio_destructor+0x44/0x50
 [<c02c5f41>] scsi_init_io+0xb1/0x130
 [<c02c5bdd>] scsi_request_fn+0x2ad/0x480
 [<c02c526e>] scsi_queue_next_request+0x7e/0x1b0
 [<c02c549f>] __scsi_end_request+0xff/0x110
 [<c02c569d>] scsi_io_completion+0x15d/0x3a0
 [<f89cce4f>] qla2x00_done+0x2bf/0x2f0 [qla2200]
 [<c02e0d2f>] sd_rw_intr+0x16f/0x180
 [<c02bf746>] scsi_finish_command+0x96/0xa0
 [<c02bf606>] scsi_softirq+0x86/0x100
 [<c012211b>] do_softirq+0x6b/0xd0
 [<c01099e2>] do_IRQ+0x1e2/0x200
 [<c0107fd8>] common_interrupt+0x18/0x20
 [<c020e28c>] __copy_from_user+0x4c/0x70
 [<c01364e7>] generic_file_write_nolock+0x817/0xb70
 [<c02c526e>] scsi_queue_next_request+0x7e/0x1b0
 [<c0130343>] pte_alloc_map+0x133/0x140
 [<c0131352>] zeromap_page_range+0xe2/0x180
 [<c025a6b1>] read_zero+0x1c1/0x1f0
 [<c01368b6>] generic_file_write+0x56/0x70
 [<c014d2ae>] vfs_write+0xbe/0x160
 [<c01174ca>] do_schedule+0x38a/0x480
 [<c014d3ba>] sys_write+0x2a/0x40



elm3b81 login: Oops: 0002
qla2200  
CPU:    2
EIP:    0060:[<f89d0ac8>]    Not tainted
EFLAGS: 00010002
EIP is at qla2x00_process_completed_request+0x68/0x150 [qla2200]
eax: 00000000   ebx: f6791c7c   ecx: 00000000   edx: 00000000
esi: f678c17c   edi: 00000000   ebp: dfe53d0c   esp: dfe53d00
ds: 0068   es: 0068   ss: 0068
Process dd (pid: 1623, threadinfo=dfe52000 task=c390d140)
Stack: f678c17c 0000421a 00004200 dfe53d6c f89d0ceb f678c17c 000000c6 00008020 
       dfe53d60 00008020 f6265254 00004200 00c6f17c 00000000 c01347a7 f6265258 
       0000f17c 0000f17c dfe53d68 c014f164 f6265254 0000f17c 00000000 00000008 
Call Trace:
 [<f89d0ceb>] qla2x00_isr+0x13b/0x610 [qla2200]
 [<c01347a7>] find_get_page+0x37/0x50
 [<c014f164>] __find_get_block_slow+0x34/0x120
 [<f89cbffe>] qla2x00_intr_handler+0x9e/0x250 [qla2200]
 [<c01095fd>] handle_IRQ_event+0x2d/0x60
 [<c0109924>] do_IRQ+0x124/0x200
 [<c0107fd8>] common_interrupt+0x18/0x20
 [<c020bcdc>] __copy_from_user+0x4c/0x70
 [<c01364f7>] generic_file_write_nolock+0x817/0xb70
 [<c0125cb6>] update_wall_time+0x16/0x50
 [<c0130313>] pte_alloc_map+0x133/0x140
 [<c0131322>] zeromap_page_range+0xe2/0x180
 [<c0258101>] read_zero+0x1c1/0x1f0
 [<c01368c6>] generic_file_write+0x56/0x70
 [<c014d2ce>] vfs_write+0xbe/0x160
 [<c01174ca>] do_schedule+0x38a/0x480
 [<c014d3da>] sys_write+0x2a/0x40
 [<c0107693>] syscall_call+0x7/0xb


 kEerIPn:el    NU L0L06 0p:o[in<tce0r2 8d6e81r9e>f]e r e n cNeot< 4t> aaitn tveidr
tuEaFlL AadGdSr: es0s0 00100029002
b0                                0
IP  prisin taitn gb lkei_prq:_
acp0_2s8g+608x1199            m
0*xp2d10e         /
= 0e0a0x0: 00000000
000   ebx: de731800   ecx: f67290c4   edx: de731800
esi: f67290c4   edi: 00000000   ebp: cc879ebc   esp: cc879e38
ds: 0068   es: 0068   ss: 0068
Process db2sysc (pid: 2034, threadinfo=cc878000 task=dd088820)
Stack: 00000000 dd088820 c0119850 cc879e44 cc879e44 00000001 c0151603 ce514ae4 
       cc878000 00000040 00000000 00000000 c02bdf6d f7ec4344 de731800 f67290c4 
       00000000 cc879ebc c02c390f 00000000 f67290c4 d28c9de4 de731800 00000000 
Call Trace:
 [<c0119850>] autoremove_wake_function+0x0/0x40
 [<c0151603>] bio_destructor+0x43/0x50
 [<c02bdf6d>] scsi_alloc_sgtable+0xbd/0x100
 [<c02c390f>] scsi_init_io+0xaf/0x130
 [<c02c35bd>] scsi_request_fn+0x2ad/0x480
 [<c02c2c6e>] scsi_queue_next_request+0x7e/0x1b0
 [<c02c2e9d>] __scsi_end_request+0xfd/0x110
 [<c02c3099>] scsi_io_completion+0x159/0x380
 [<f89ccf9e>] qla2x00_done+0x2be/0x2f0 [qla2200]
 [<c02de38f>] sd_rw_intr+0x15f/0x170
 [<c02bd292>] scsi_finish_command+0x82/0x90
 [<c02bd166>] scsi_softirq+0x86/0x100
 [<c0121a5b>] do_softirq+0x5b/0xc0
 [<c01098c2>] do_IRQ+0x1e2/0x200
 [<c0107f38>] common_interrupt+0x18/0x20

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25  1:39 possible use-after-free in 2.5.44 scsi changes Andrew Morton
@ 2002-10-25  4:06 ` Doug Ledford
  2002-10-25  4:40   ` Andrew Morton
  2002-10-25  4:07 ` Patrick Mansfield
  2002-10-25 14:16 ` James Bottomley
  2 siblings, 1 reply; 37+ messages in thread
From: Doug Ledford @ 2002-10-25  4:06 UTC (permalink / raw
  To: Andrew Morton
  Cc: linux-scsi@vger.kernel.org, Badari Pulavarty, Martin J. Bligh,
	Jens Axboe

On Thu, Oct 24, 2002 at 06:39:24PM -0700, Andrew Morton wrote:
> 
> Gents,
> 
> we have some code in the -mm patchsets which adds a per-cpu
> LIFO pool which frontends the page allocator.  To return pages
> which are cache-warm on the calling CPU.
> 
> That code has been stable and unchanging since 2.5.40.  But in
> 2.5.44, Badari's machines are crashing when those patches are
> applied.  Memory corruption deep in the scsi softirq callbacks.
> 
> There were no significant memory allocator changes between 2.5.43
> and 2.5.44, but there were a lot of scsi changes.

[ snip ]

Sorry I haven't been able to do anything since Tuesday and won't be able 
to again until next week around Wednesday or so (company meeting stuff the 
entire time :-(

Anyway, I've got all my current updates pushed to
linux-scsi.bkbits.net/scsi-misc-2.5 and I know James and Patrick have some
fixes in there as well.  I also know someone has been changing around the
scsi merge function and the scsi init io function recently, and it hasn't
been me ;-)  In any case, the thing appears to be leaking memory and might
be partially related to this problem.  If you touch any device on the scsi
bus such that it results in an actual merge between two requests, and a sg
table has to be realloced to a larger size in order to accomodate the
combined sg table size, then it appears that the smaller table(s) are
leaked.  Just try hitting the disk with an e2fsck or similar program and
then try unloading the complete scsi stack to see the failure on attempt
to free the sg table caches when unloading scsi_mod.  Debugging takes a
bit too since it means you have to reboot the machine to be able to load
the scsi modules again :-/ If someone could look into that specific
problem it might give a clue into this other problem (and sorry I can't do
more about it myself right now, it's what I was just starting to look into
when I ran out of time and into meetings).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25  1:39 possible use-after-free in 2.5.44 scsi changes Andrew Morton
  2002-10-25  4:06 ` Doug Ledford
@ 2002-10-25  4:07 ` Patrick Mansfield
  2002-10-25 14:16 ` James Bottomley
  2 siblings, 0 replies; 37+ messages in thread
From: Patrick Mansfield @ 2002-10-25  4:07 UTC (permalink / raw
  To: Andrew Morton
  Cc: linux-scsi@vger.kernel.org, Badari Pulavarty, Martin J. Bligh,
	Jens Axboe, Doug Ledford

On Thu, Oct 24, 2002 at 06:39:24PM -0700, Andrew Morton wrote:
> 
> Gents,
> 
> we have some code in the -mm patchsets which adds a per-cpu
> LIFO pool which frontends the page allocator.  To return pages
> which are cache-warm on the calling CPU.
> 
> That code has been stable and unchanging since 2.5.40.  But in
> 2.5.44, Badari's machines are crashing when those patches are
> applied.  Memory corruption deep in the scsi softirq callbacks.
> 
> There were no significant memory allocator changes between 2.5.43
> and 2.5.44, but there were a lot of scsi changes.
> 
> These LIFO pools have a significant sideeffect: is a CPU frees
> a page and then allocates a page it will get the same page back.
> So if some code is using some memory for a few microseconds after
> freeing it, there's a good chance that this bug will not exhibit
> with the stock kernel's allocator but _will_ exhibit when the
> per-cpu LIFO queues are present.
> 
> I'm suspecting that this is what is happening.  A use-after-free
> bug may have been introduced in the 2.5.44 SCSI changes.
> 

I hit these also with the qla + 2.5.44-mm3 + multi-path patch, I could
only hit it with file system access.

I've otherwise run fine - building kernels (not on disks attached via
the qla driver), and running tons of raw IO requests using the qla driver
with no problems at all (large block size 64k).

I had a horrible time trying to use the qla driver with switch attached
storage (with nearly identical drives/enclosures that Badari uses), and
finally gave up after having odd problems and crashes, but I only hit it
with multi-path patch installed. I went to direct attached storage (no FC
switch) and had no problems until Badari asked me to try some file 
system IO.

There were also recent comments on lkml about qla users (on 2.4) being 
unable to run qla + LVM, perhaps because of stack overflow.

Has it been hit it with something other than the qla driver, or without
file system access?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25  4:06 ` Doug Ledford
@ 2002-10-25  4:40   ` Andrew Morton
  2002-10-25 14:21     ` James Bottomley
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2002-10-25  4:40 UTC (permalink / raw
  To: Doug Ledford
  Cc: linux-scsi@vger.kernel.org, Badari Pulavarty, Martin J. Bligh,
	Jens Axboe

Doug Ledford wrote:
> 
> Anyway, I've got all my current updates pushed to
> linux-scsi.bkbits.net/scsi-misc-2.5 and I know James and Patrick have some
> fixes in there as well.

OK, thanks.  Could someone please extract a diff which we can try out?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25  1:39 possible use-after-free in 2.5.44 scsi changes Andrew Morton
  2002-10-25  4:06 ` Doug Ledford
  2002-10-25  4:07 ` Patrick Mansfield
@ 2002-10-25 14:16 ` James Bottomley
  2002-10-25 18:34   ` James Bottomley
  2 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2002-10-25 14:16 UTC (permalink / raw
  To: Andrew Morton
  Cc: linux-scsi@vger.kernel.org, Badari Pulavarty, Martin J. Bligh,
	Jens Axboe, Doug Ledford

This has all the hallmarks of the Qlogic double done bug:  Under certain high 
stress/bad bus situations, the Qla driver will call done twice on a SCSI 
command structure.  I take it this is the 6.1.0 qla driver, which qlogic has 
assured me "really really" has this bug fixed?

Is there any way to switch adapters to see if we can confirm this hypothesis?

James



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25  4:40   ` Andrew Morton
@ 2002-10-25 14:21     ` James Bottomley
  0 siblings, 0 replies; 37+ messages in thread
From: James Bottomley @ 2002-10-25 14:21 UTC (permalink / raw
  To: Andrew Morton
  Cc: Doug Ledford, linux-scsi@vger.kernel.org, Badari Pulavarty,
	Martin J. Bligh, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 295 bytes --]

akpm@digeo.com said:
> OK, thanks.  Could someone please extract a diff which we can try out?


I've attached it, but I'd hold out low hopes.  The problem you're seeing is in 
command handling.  We've been playing with tidying up upper level driver stuff 
(and templates).

James




[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 56023 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	v2.5.44 -> 1.819  
#	drivers/scsi/inia100.h	1.6     -> 1.8    
#	drivers/scsi/qlogicisp.h	1.2     -> 1.3    
#	drivers/scsi/scsi_error.c	1.18    -> 1.19   
#	drivers/scsi/aic7xxx_old/aic7xxx.h	1.7     -> 1.8    
#	drivers/scsi/hosts.c	1.16    -> 1.18   
#	drivers/scsi/atp870u.h	1.4     -> 1.5    
#	 drivers/scsi/eata.h	1.10    -> 1.11   
#	  drivers/scsi/ips.h	1.12    -> 1.13   
#	drivers/scsi/3w-xxxx.h	1.12    -> 1.13   
#	   drivers/scsi/sr.c	1.56    -> 1.58   
#	 drivers/scsi/scsi.h	1.29    -> 1.30   
#	drivers/scsi/ncr53c8xx.c	1.12    -> 1.13   
#	drivers/scsi/hosts.h	1.19    -> 1.21   
#	drivers/scsi/ncr53c8xx.h	1.4     -> 1.5    
#	drivers/scsi/qlogicfc.h	1.5     -> 1.6    
#	drivers/scsi/nsp32.c	1.1     -> 1.2    
#	drivers/scsi/cpqfcTS.h	1.4     -> 1.5    
#	drivers/message/fusion/mptscsih.c	1.10    -> 1.11   
#	drivers/scsi/aic7xxx/aic7xxx_linux_host.h	1.8     -> 1.9    
#	drivers/scsi/sym53c8xx.h	1.6     -> 1.7    
#	   drivers/scsi/sg.c	1.30    -> 1.31   
#	drivers/scsi/BusLogic.h	1.7     -> 1.8    
#	drivers/scsi/scsi_lib.c	1.35    -> 1.36   
#	 drivers/scsi/scsi.c	1.48    -> 1.50   
#	   drivers/scsi/sd.h	1.7     -> 1.8    
#	drivers/scsi/ide-scsi.c	1.11    -> 1.12   
#	drivers/scsi/in2000.c	1.8     -> 1.9    
#	   drivers/scsi/sd.c	1.73    -> 1.75   
#	drivers/scsi/scsi_scan.c	1.28    -> 1.31   
#	drivers/scsi/Makefile	1.28    -> 1.29   
#	drivers/scsi/scsi_merge.c	1.24    ->         (deleted)      
#	drivers/scsi/wd7000.h	1.5     -> 1.6    
#	drivers/scsi/qla1280.h	1.7     -> 1.8    
#	drivers/message/fusion/mptscsih.h	1.7     -> 1.9    
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/10/18	torvalds@home.transmeta.com	1.808
# Linux v2.5.44
# --------------------------------------------
# 02/10/20	hch@lst.de	1.809
# [PATCH] remove dead EH methods
# 
# break at compiletime instead of runtime
# 
# 
# ===== drivers/scsi/hosts.h 1.19 vs edited =====
# --------------------------------------------
# 02/10/20	andmike@us.ibm.com	1.810
# [PATCH] fix module unload of sg
# 
# 	It looks like sg.c was missed in the update from put_device to
# 	device_unregister.
# 
# --------------------------------------------
# 02/10/20	johnf@whitsunday.net.au	1.811
# In patch-2.5.44 Mike Anderson <andmike@us.ibm.com> made a cleanup to the
# Scsi Host setup.
# 
# This caused the following errors on trying to compile.
# 
# drivers/scsi/inia100.c:98: unknown field `next' specified in initializer
# drivers/scsi/inia100.c:98: warning: missing braces around initializer
# drivers/scsi/inia100.c:98: warning: (near initialization for `driver_template.shtp_list')
# drivers/scsi/inia100.c:98: unknown field `module' specified in initializer
# drivers/scsi/inia100.c:98: unknown field `proc_name' specified in initializer
# drivers/scsi/inia100.c:98: warning: initialization from incompatible pointer type
# make[2]: *** [drivers/scsi/inia100.o] Error 1
# 
# Several of the drivers Mike modified only had the one-line change to remove
# the 'next' field.  I tried it and bingo, it works and passed my tests.
# 
# The version change is what Doug Ledford intended in patch-2.5.25 back in
# June 2002.  (See inia100.c "inia100_Version")
# 
# 
# --------------------------------------------
# 02/10/21	andmike@us.ibm.com	1.812
# [PATCH] scsi_error device offline fix
# 
# This patch corrects a problem in scsi error handling.
# 
# When a device is offlined indicated by a message like ...Device offlined
# - not ready...
# 
# the command return status was not being updated with a failure status if
# the IO was a timeout.
# 
# I tested the patch on system with ips, aic, and qlogic fc adapters, but
# was unable to generate a satisfactory device offline test case.
# 
# I did test this fix on uml with scsi_debug and generated a device
# offline condition with verified this fix was working correctly.
# 
# -andmike
# --
# Michael Anderson
# andmike@us.ibm.com
# 
#  scsi_error.c |    8 ++++++--
#  1 files changed, 6 insertions(+), 2 deletions(-)
# --------------------------------------------
# 02/10/21	andmike@us.ibm.com	1.813
# [PATCH] scsi sync caches w/ dev offline
# 
# When a scsi device is offlined and then the system is shutdown it will
# hang during the synchronizing SCSI caches task. The error handler was
# activated during this step, but post recovery the system did not
# complete the shutdown.
# 
# This patch just adds a check for online before sending the command. The
# better approach appeared to be to use scsi_block_when_processing_errors,
# but I was concerned that we might block to long in a shutdown case.
# 
# -andmike
# --
# Michael Anderson
# andmike@us.ibm.com
# 
#  sd.c |    3 +++
#  1 files changed, 3 insertions(+)
# --------------------------------------------
# 02/10/21	dledford@aladin.rdu.redhat.com	1.808.1.1
# Update for new TCQ scheme
# --------------------------------------------
# 02/10/22	hch@lst.de	1.814
# [PATCH] get rid of ->finish method for highlevel drivers
# 
# the ->finish method is a relicat from the old day were we never had
# hotplugging and allowed the driver to do fixups after all busses
# had been scanned.  Nowdays only sd and sr actually implement it,
# and both only defer actions to there that should actually happen in
# ->attach.  Change both drivers to move that code into ->attach,
# clenaup the Templates to use C99 initializers and get rid of the
# methods.
# 
# This also cleans up some very crude race-avoidable code in those
# drivers, btw..
# --------------------------------------------
# 02/10/22	jejb@mulgrave.(none)	1.815
# [SCSI] move build commandblocks to before attach so attach can send I/O
# --------------------------------------------
# 02/10/22	dledford@aladin.rdu.redhat.com	1.808.1.2
# Fix for scsi host struct change
# --------------------------------------------
# 02/10/22	dledford@aladin.rdu.redhat.com	1.816
# Merge aladin.rdu.redhat.com:/usr/local/home/dledford/bk/scsi
# into aladin.rdu.redhat.com:/usr/src/2.5
# --------------------------------------------
# 02/10/22	dledford@aladin.rdu.redhat.com	1.817
# host struct cleanups
# --------------------------------------------
# 02/10/22	dledford@aladin.rdu.redhat.com	1.818
# Compile fixes needed due to host struct change
# --------------------------------------------
# 02/10/24	hch@lst.de	1.815.1.1
# [PATCH] remove scsi_merge.c
# 
# In 2.5.44 it contains only two functions, that both have exactly
# one caller in other files and both are entirely unrelated to
# request merging..
# --------------------------------------------
# 02/10/24	jejb@mulgrave.(none)	1.819
# Merge ssh://linux-scsi@linux-scsi.bkbits.net/scsi-misc-2.5
# into mulgrave.(none):/home/jejb/BK/scsi-misc-2.5
# --------------------------------------------
#
diff -Nru a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c
--- a/drivers/message/fusion/mptscsih.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/message/fusion/mptscsih.c	Fri Oct 25 09:19:29 2002
@@ -1295,10 +1295,6 @@
 #endif
 				sh->this_id = this->pfacts[portnum].PortSCSIID;
 
-				/* OS entry to allow host drivers to force
-				 * a queue depth on a per device basis.
-				 */
-				sh->select_queue_depths = mptscsih_select_queue_depths;
 				/* Required entry.
 				 */
 				sh->unique_id = this->id;
@@ -3668,37 +3664,20 @@
  *	Called once per device the bus scan. Use it to force the queue_depth
  *	member to 1 if a device does not support Q tags.
  */
-void
-mptscsih_select_queue_depths(struct Scsi_Host *sh, Scsi_Device *sdList)
+int
+mptscsih_slave_attach(Scsi_Device *device)
 {
-	struct scsi_device	*device;
 	VirtDevice		*pTarget;
-	MPT_SCSI_HOST		*hd;
-	int			 ii, max;
-
-	for (device = sdList; device != NULL; device = device->next) {
 
-		if (device->host != sh)
-			continue;
-
-		hd = (MPT_SCSI_HOST *) sh->hostdata;
-		if (hd == NULL)
-			continue;
-
-		if (hd->Targets != NULL) {
-			if (hd->is_spi)
-				max = MPT_MAX_SCSI_DEVICES;
-			else
-				max = MPT_MAX_FC_DEVICES<256 ? MPT_MAX_FC_DEVICES : 255;
-
-			for (ii=0; ii < max; ii++) {
-				pTarget = hd->Targets[ii];
-				if (pTarget && !(pTarget->tflags & MPT_TARGET_FLAGS_Q_YES)) {
-					device->queue_depth = 1;
-				}
-			}
-		}
+	pTarget = device->hostdata;
+	if (!device->tagged_supported ||
+	    !(pTarget->tflags & MPT_TARGET_FLAGS_Q_YES)) {
+		scsi_adjust_queue_depth(device, 0, 1);
+	} else {
+		scsi_adjust_queue_depth(device, MSG_SIMPLE_TAG,
+					       device->host->can_queue >> 1);
 	}
+	return 0;
 }
 
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
diff -Nru a/drivers/message/fusion/mptscsih.h b/drivers/message/fusion/mptscsih.h
--- a/drivers/message/fusion/mptscsih.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/message/fusion/mptscsih.h	Fri Oct 25 09:19:29 2002
@@ -206,7 +206,7 @@
 #define x_scsi_dev_reset	mptscsih_dev_reset
 #define x_scsi_host_reset	mptscsih_host_reset
 #define x_scsi_bios_param	mptscsih_bios_param
-#define x_scsi_select_queue_depths	mptscsih_select_queue_depths
+#define x_scsi_slave_attach	mptscsih_slave_attach
 
 #define x_scsi_taskmgmt_bh	mptscsih_taskmgmt_bh
 #define x_scsi_old_abort	mptscsih_old_abort
@@ -234,7 +234,7 @@
 #else
 extern	int		 x_scsi_bios_param(Disk *, kdev_t, int *);
 #endif
-extern	void		 x_scsi_select_queue_depths(struct Scsi_Host *, Scsi_Device *);
+extern	int		 x_scsi_slave_attach(Scsi_Device *);
 extern	void		 x_scsi_taskmgmt_bh(void *);
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,3,0)
@@ -248,20 +248,18 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,1)
 
 #define MPT_SCSIHOST {						\
-	next:				NULL,			\
 	PROC_SCSI_DECL						\
 	name:				"MPT SCSI Host",	\
 	detect:				x_scsi_detect,		\
 	release:			x_scsi_release,		\
 	info:				x_scsi_info,		\
-	command:			NULL,			\
 	queuecommand:			x_scsi_queuecommand,	\
-	eh_strategy_handler:		NULL,			\
 	eh_abort_handler:		x_scsi_abort,		\
 	eh_device_reset_handler:	x_scsi_dev_reset,	\
 	eh_bus_reset_handler:		x_scsi_bus_reset,	\
 	eh_host_reset_handler:		x_scsi_host_reset,	\
 	bios_param:			x_scsi_bios_param,	\
+	slave_attach:			x_scsi_slave_attach,	\
 	can_queue:			MPT_SCSI_CAN_QUEUE,	\
 	this_id:			-1,			\
 	sg_tablesize:			MPT_SCSI_SG_DEPTH,	\
@@ -274,19 +272,15 @@
 #else  /* LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,1) */
 
 #define MPT_SCSIHOST {						\
-	next:				NULL,			\
 	PROC_SCSI_DECL						\
 	name:				"MPT SCSI Host",	\
 	detect:				x_scsi_detect,		\
 	release:			x_scsi_release,		\
 	info:				x_scsi_info,		\
-	command:			NULL,			\
 	queuecommand:			x_scsi_queuecommand,	\
-	eh_strategy_handler:		NULL,			\
 	eh_abort_handler:		x_scsi_abort,		\
 	eh_device_reset_handler:	x_scsi_dev_reset,	\
 	eh_bus_reset_handler:		x_scsi_bus_reset,	\
-	eh_host_reset_handler:		NULL,			\
 	bios_param:			x_scsi_bios_param,	\
 	can_queue:			MPT_SCSI_CAN_QUEUE,	\
 	this_id:			-1,			\
@@ -302,13 +296,11 @@
 #else /* MPT_SCSI_USE_NEW_EH */
 
 #define MPT_SCSIHOST {						\
-	next:				NULL,			\
 	PROC_SCSI_DECL						\
 	name:				"MPT SCSI Host",	\
 	detect:				x_scsi_detect,		\
 	release:			x_scsi_release,		\
 	info:				x_scsi_info,		\
-	command:			NULL,			\
 	queuecommand:			x_scsi_queuecommand,	\
 	abort:				x_scsi_old_abort,	\
 	reset:				x_scsi_old_reset,	\
diff -Nru a/drivers/scsi/3w-xxxx.h b/drivers/scsi/3w-xxxx.h
--- a/drivers/scsi/3w-xxxx.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/3w-xxxx.h	Fri Oct 25 09:19:29 2002
@@ -468,25 +468,14 @@
 
 /* Scsi_Host_Template Initializer */
 #define TWXXXX {					\
-	next : NULL,					\
-	module : NULL,					\
 	proc_name : "3w-xxxx",				\
 	proc_info : tw_scsi_proc_info,			\
 	name : "3ware Storage Controller",		\
 	detect : tw_scsi_detect,			\
 	release : tw_scsi_release,			\
-	info : NULL,					\
-	ioctl : NULL,                  			\
-	command : NULL,					\
 	queuecommand : tw_scsi_queue,			\
-	eh_strategy_handler : NULL,			\
 	eh_abort_handler : tw_scsi_eh_abort,		\
-	eh_device_reset_handler : NULL,			\
-	eh_bus_reset_handler : NULL,			\
 	eh_host_reset_handler : tw_scsi_eh_reset,	\
-	abort : NULL,					\
-	reset : NULL,					\
-	slave_attach : NULL,				\
 	bios_param : tw_scsi_biosparam,			\
 	can_queue : TW_Q_LENGTH-1,			\
 	this_id: -1,					\
diff -Nru a/drivers/scsi/BusLogic.h b/drivers/scsi/BusLogic.h
--- a/drivers/scsi/BusLogic.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/BusLogic.h	Fri Oct 25 09:19:29 2002
@@ -55,8 +55,6 @@
 extern int BusLogic_ReleaseHostAdapter(SCSI_Host_T *);
 extern int BusLogic_QueueCommand(SCSI_Command_T *,
 				 void (*CompletionRoutine)(SCSI_Command_T *));
-extern int BusLogic_AbortCommand(SCSI_Command_T *);
-extern int BusLogic_ResetCommand(SCSI_Command_T *, unsigned int);
 extern int BusLogic_BIOSDiskParameters(SCSI_Disk_T *, struct block_device *,
 				       int *);
 extern int BusLogic_ProcDirectoryInfo(char *, char **, off_t, int, int, int);
@@ -75,8 +73,6 @@
     release:        BusLogic_ReleaseHostAdapter,  /* Release Host Adapter   */ \
     info:           BusLogic_DriverInfo,	  /* Driver Info Function   */ \
     queuecommand:   BusLogic_QueueCommand,	  /* Queue Command Function */ \
-    abort:          BusLogic_AbortCommand,	  /* Abort Command Function */ \
-    reset:          BusLogic_ResetCommand,	  /* Reset Command Function */ \
     slave_attach:   BusLogic_SlaveAttach,	  /* Configure a SCSI_Device*/ \
     bios_param:     BusLogic_BIOSDiskParameters,  /* BIOS Disk Parameters   */ \
     unchecked_isa_dma: 1,			  /* Default Initial Value  */ \
diff -Nru a/drivers/scsi/Makefile b/drivers/scsi/Makefile
--- a/drivers/scsi/Makefile	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/Makefile	Fri Oct 25 09:19:29 2002
@@ -122,8 +122,8 @@
 obj-$(CONFIG_CHR_DEV_SG)	+= sg.o
 
 scsi_mod-objs	:= scsi.o hosts.o scsi_ioctl.o constants.o scsicam.o \
-			scsi_proc.o scsi_error.o scsi_lib.o scsi_merge.o \
-			scsi_scan.o scsi_syms.o
+			scsi_proc.o scsi_error.o scsi_lib.o scsi_scan.o \
+			scsi_syms.o
 			
 sd_mod-objs	:= sd.o
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
diff -Nru a/drivers/scsi/aic7xxx/aic7xxx_linux_host.h b/drivers/scsi/aic7xxx/aic7xxx_linux_host.h
--- a/drivers/scsi/aic7xxx/aic7xxx_linux_host.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/aic7xxx/aic7xxx_linux_host.h	Fri Oct 25 09:19:29 2002
@@ -63,22 +63,14 @@
  * to do with card config are filled in after the card is detected.
  */
 #define AIC7XXX	{						\
-	module: NULL,						\
-	proc_dir: NULL,						\
 	proc_info: ahc_linux_proc_info,				\
-	name: NULL,						\
 	detect: ahc_linux_detect,				\
 	release: ahc_linux_release,				\
 	info: ahc_linux_info,					\
-	command: NULL,						\
 	queuecommand: ahc_linux_queue,				\
-	eh_strategy_handler: NULL,				\
 	eh_abort_handler: ahc_linux_abort,			\
 	eh_device_reset_handler: ahc_linux_dev_reset,		\
 	eh_bus_reset_handler: ahc_linux_bus_reset,		\
-	eh_host_reset_handler: NULL,				\
-	abort: NULL,						\
-	reset: NULL,						\
 	slave_attach: ahc_linux_slave_attach,			\
 	bios_param: AIC7XXX_BIOSPARAM,				\
 	can_queue: 253,		/* max simultaneous cmds      */\
diff -Nru a/drivers/scsi/aic7xxx_old/aic7xxx.h b/drivers/scsi/aic7xxx_old/aic7xxx.h
--- a/drivers/scsi/aic7xxx_old/aic7xxx.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/aic7xxx_old/aic7xxx.h	Fri Oct 25 09:19:29 2002
@@ -35,13 +35,6 @@
 	release: aic7xxx_release,				\
 	info: aic7xxx_info,					\
 	queuecommand: aic7xxx_queue,				\
-	eh_strategy_handler: NULL,				\
-	eh_abort_handler: NULL,					\
-	eh_device_reset_handler: NULL,				\
-	eh_bus_reset_handler: NULL,				\
-	eh_host_reset_handler: NULL,				\
-	abort: aic7xxx_abort,					\
-	reset: aic7xxx_reset,					\
 	slave_attach: aic7xxx_slave_attach,			\
 	slave_detach: aic7xxx_slave_detach,			\
 	bios_param: aic7xxx_biosparam,				\
@@ -58,8 +51,6 @@
 extern int aic7xxx_biosparam(Disk *, struct block_device *, int[]);
 extern int aic7xxx_detect(Scsi_Host_Template *);
 extern int aic7xxx_command(Scsi_Cmnd *);
-extern int aic7xxx_reset(Scsi_Cmnd *, unsigned int);
-extern int aic7xxx_abort(Scsi_Cmnd *);
 extern int aic7xxx_release(struct Scsi_Host *);
 extern int aic7xxx_slave_attach(Scsi_Device *);
 extern void aic7xxx_slave_detach(Scsi_Device *);
diff -Nru a/drivers/scsi/atp870u.h b/drivers/scsi/atp870u.h
--- a/drivers/scsi/atp870u.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/atp870u.h	Fri Oct 25 09:19:29 2002
@@ -37,18 +37,13 @@
 extern int atp870u_proc_info(char *, char **, off_t, int, int, int);
 
 #define ATP870U {						\
-	next: NULL,						\
-	module: NULL,						\
 	proc_info: atp870u_proc_info,				\
-	name: NULL,						\
 	detect: atp870u_detect, 				\
 	release: atp870u_release,				\
 	info: atp870u_info,					\
 	command: atp870u_command,				\
 	queuecommand: atp870u_queuecommand,			\
-	eh_strategy_handler: NULL,				\
 	eh_abort_handler: atp870u_abort, 			\
-	slave_attach: NULL,					\
 	bios_param: atp870u_biosparam,				\
 	can_queue: qcnt,	 /* max simultaneous cmds      */\
 	this_id: 7,	       /* scsi id of host adapter    */\
diff -Nru a/drivers/scsi/cpqfcTS.h b/drivers/scsi/cpqfcTS.h
--- a/drivers/scsi/cpqfcTS.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/cpqfcTS.h	Fri Oct 25 09:19:29 2002
@@ -28,8 +28,6 @@
  queuecommand:           cpqfcTS_queuecommand,   \
  eh_device_reset_handler:   cpqfcTS_eh_device_reset,   \
  eh_abort_handler:       cpqfcTS_eh_abort,       \
- reset:                  cpqfcTS_reset,          \
- abort:                  cpqfcTS_abort,		 \
  bios_param:             cpqfcTS_biosparam,      \
  can_queue:              CPQFCTS_REQ_QUEUE_LEN,  \
  this_id:                -1,                     \
diff -Nru a/drivers/scsi/eata.h b/drivers/scsi/eata.h
--- a/drivers/scsi/eata.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/eata.h	Fri Oct 25 09:19:29 2002
@@ -21,11 +21,7 @@
                 detect:                  eata2x_detect,                      \
                 release:                 eata2x_release,                     \
                 queuecommand:            eata2x_queuecommand,                \
-                abort:                   NULL,                               \
-                reset:                   NULL,                               \
                 eh_abort_handler:        eata2x_abort,                       \
-                eh_device_reset_handler: NULL,                               \
-                eh_bus_reset_handler:    NULL,                               \
                 eh_host_reset_handler:   eata2x_reset,                       \
                 bios_param:              eata2x_biosparam,                   \
 		slave_attach:		 eata2x_slave_attach,		     \
diff -Nru a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
--- a/drivers/scsi/hosts.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/hosts.c	Fri Oct 25 09:19:29 2002
@@ -561,25 +561,18 @@
 			shost = list_entry(lh, struct Scsi_Host, sh_list);
 			for (sdev = shost->host_queue; sdev; sdev = sdev->next)
 				if (sdev->host->hostt == shost_tp) {
+					scsi_build_commandblocks(sdev);
+					if (sdev->current_queue_depth == 0)
+						goto out_of_space;
 					for (sdev_tp = scsi_devicelist;
 					     sdev_tp;
 					     sdev_tp = sdev_tp->next)
 						if (sdev_tp->attach)
 							(*sdev_tp->attach) (sdev);
-					if (sdev->attached) {
-						scsi_build_commandblocks(sdev);
-						if (sdev->current_queue_depth == 0)
-							goto out_of_space;
+					if (!sdev->attached) {
+                                                scsi_release_commandblocks(sdev);
 					}
 				}
-		}
-
-		/* This does any final handling that is required. */
-		for (sdev_tp = scsi_devicelist; sdev_tp;
-		     sdev_tp = sdev_tp->next) {
-			if (sdev_tp->finish && sdev_tp->nr_dev) {
-				(*sdev_tp->finish) ();
-			}
 		}
 	}
 
diff -Nru a/drivers/scsi/hosts.h b/drivers/scsi/hosts.h
--- a/drivers/scsi/hosts.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/hosts.h	Fri Oct 25 09:19:29 2002
@@ -168,40 +168,6 @@
      int (*eh_host_reset_handler)(Scsi_Cmnd *);
 
     /*
-     * Since the mid level driver handles time outs, etc, we want to
-     * be able to abort the current command.  Abort returns 0 if the
-     * abortion was successful.	 The field SCpnt->abort reason
-     * can be filled in with the appropriate reason why we wanted
-     * the abort in the first place, and this will be used
-     * in the mid-level code instead of the host_byte().
-     * If non-zero, the code passed to it
-     * will be used as the return code, otherwise
-     * DID_ABORT  should be returned.
-     *
-     * Note that the scsi driver should "clean up" after itself,
-     * resetting the bus, etc.	if necessary.
-     *
-     * NOTE - this interface is depreciated, and will go away.  Use
-     * the eh_ routines instead.
-     */
-    int (* abort)(Scsi_Cmnd *);
-
-    /*
-     * The reset function will reset the SCSI bus.  Any executing
-     * commands should fail with a DID_RESET in the host byte.
-     * The Scsi_Cmnd  is passed so that the reset routine can figure
-     * out which host adapter should be reset, and also which command
-     * within the command block was responsible for the reset in
-     * the first place.	 Some hosts do not implement a reset function,
-     * and these hosts must call scsi_request_sense(SCpnt) to keep
-     * the command alive.
-     *
-     * NOTE - this interface is depreciated, and will go away.  Use
-     * the eh_ routines instead.
-     */
-    int (* reset)(Scsi_Cmnd *, unsigned int);
-
-    /*
      * Once the device has responded to an INQUIRY and we know the device
      * is online, call into the low level driver with the Scsi_Device *
      * (so that the low level driver may save it off in a safe location
@@ -583,7 +549,6 @@
     int (*detect)(Scsi_Device *); /* Returns 1 if we can attach this device */
     int (*init)(void);		  /* Sizes arrays based upon number of devices
 		   *  detected */
-    void (*finish)(void);	  /* Perform initialization after attachment */
     int (*attach)(Scsi_Device *); /* Attach devices to arrays */
     void (*detach)(Scsi_Device *);
     int (*init_command)(Scsi_Cmnd *);     /* Used by new queueing code. 
diff -Nru a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
--- a/drivers/scsi/ide-scsi.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/ide-scsi.c	Fri Oct 25 09:19:29 2002
@@ -860,8 +860,6 @@
 	info:		idescsi_info,
 	ioctl:		idescsi_ioctl,
 	queuecommand:	idescsi_queue,
-	abort:		idescsi_abort,
-	reset:		idescsi_reset,
 	bios_param:	idescsi_bios,
 	can_queue:	10,
 	this_id:	-1,
diff -Nru a/drivers/scsi/in2000.c b/drivers/scsi/in2000.c
--- a/drivers/scsi/in2000.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/in2000.c	Fri Oct 25 09:19:29 2002
@@ -832,7 +832,7 @@
 
 static void in2000_intr (int irqnum, void * dev_id, struct pt_regs *ptregs)
 {
-struct Scsi_Host *instance;
+struct Scsi_Host *instance = dev_id;
 struct IN2000_hostdata *hostdata;
 Scsi_Cmnd *patch, *cmd;
 uchar asr, sr, phs, id, lun, *ucp, msg;
@@ -842,14 +842,6 @@
 unsigned short f;
 unsigned long flags;
 
-   for (instance = instance_list; instance; instance = instance->next) {
-      if (instance->irq == irqnum)
-         break;
-      }
-   if (!instance) {
-      printk("*** Hmm... interrupts are screwed up! ***\n");
-      return;
-      }
    hostdata = (struct IN2000_hostdata *)instance->hostdata;
 
 /* Get the spin_lock and disable further ints, for SMP */
@@ -2046,7 +2038,7 @@
       write1_io(0,IO_FIFO_READ);             /* start fifo out in read mode */
       write1_io(0,IO_INTR_MASK);    /* allow all ints */
       x = int_tab[(switches & (SW_INT0 | SW_INT1)) >> SW_INT_SHIFT];
-      if (request_irq(x, in2000_intr, SA_INTERRUPT, "in2000", NULL)) {
+      if (request_irq(x, in2000_intr, SA_INTERRUPT, "in2000", instance)) {
          printk("in2000_detect: Unable to allocate IRQ.\n");
          detect_count--;
          continue;
@@ -2215,10 +2207,7 @@
 int x,i;
 static int stop = 0;
 
-   for (instance=instance_list; instance; instance=instance->next) {
-      if (instance->host_no == hn)
-         break;
-      }
+   instance = scsi_host_hn_get(hn);
    if (!instance) {
       printk("*** Hmm... Can't find host #%d!\n",hn);
       return (-ESRCH);
diff -Nru a/drivers/scsi/inia100.h b/drivers/scsi/inia100.h
--- a/drivers/scsi/inia100.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/inia100.h	Fri Oct 25 09:19:29 2002
@@ -85,27 +85,14 @@
 
 extern int inia100_biosparam(Scsi_Disk *, struct block_device *, int *);
 
-#define inia100_REVID "Initio INI-A100U2W SCSI device driver; Revision: 1.02c"
+#define inia100_REVID "Initio INI-A100U2W SCSI device driver; Revision: 1.02d"
 
 #define INIA100	{ \
-	next:		NULL,						\
-	module:		NULL,						\
 	proc_name:	"inia100", \
-	proc_info:	NULL,				\
 	name:		inia100_REVID, \
 	detect:		inia100_detect, \
 	release:	inia100_release, \
-	info:		NULL,					\
-	command:	NULL, \
 	queuecommand:	inia100_queue, \
- 	eh_strategy_handler: NULL, \
- 	eh_abort_handler: NULL, \
- 	eh_device_reset_handler: NULL, \
- 	eh_bus_reset_handler: NULL, \
- 	eh_host_reset_handler: NULL, \
-	abort:		inia100_abort, \
-	reset:		inia100_reset, \
-	slave_attach:	NULL, \
 	bios_param:	inia100_biosparam, \
 	can_queue:	1, \
 	this_id:	1, \
diff -Nru a/drivers/scsi/ips.h b/drivers/scsi/ips.h
--- a/drivers/scsi/ips.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/ips.h	Fri Oct 25 09:19:29 2002
@@ -464,23 +464,13 @@
 }
 #else
  #define IPS {                            \
-    module : NULL,                        \
-    proc_info : NULL,                     \
-    name : NULL,                          \
     detect : ips_detect,                  \
     release : ips_release,                \
     info : ips_info,                      \
-    command : NULL,                       \
     queuecommand : ips_queue,             \
-    eh_strategy_handler : NULL,           \
     eh_abort_handler : ips_eh_abort,      \
-    eh_device_reset_handler : NULL,       \
-    eh_bus_reset_handler : NULL,          \
     eh_host_reset_handler : ips_eh_reset, \
-    abort : NULL,                         \
-    reset : NULL,                         \
     slave_attach : ips_slave_attach,      \
-    slave_detach : NULL,                  \
     bios_param : ips_biosparam,           \
     can_queue : 0,                        \
     this_id: -1,                          \
diff -Nru a/drivers/scsi/ncr53c8xx.c b/drivers/scsi/ncr53c8xx.c
--- a/drivers/scsi/ncr53c8xx.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/ncr53c8xx.c	Fri Oct 25 09:19:29 2002
@@ -9125,12 +9125,9 @@
 printk("ncr53c8xx_proc_info: hostno=%d, func=%d\n", hostno, func);
 #endif
 
-	for (host = first_host; host; host = host->next) {
-		if (host->hostt == the_template && host->host_no == hostno) {
-			host_data = (struct host_data *) host->hostdata;
-			ncb = host_data->ncb;
-			break;
-		}
+	if ((host = scsi_host_hn_get(hostno)) != NULL) {
+		host_data = (struct host_data *) host->hostdata;
+		ncb = host_data->ncb;
 	}
 
 	if (!ncb)
diff -Nru a/drivers/scsi/ncr53c8xx.h b/drivers/scsi/ncr53c8xx.h
--- a/drivers/scsi/ncr53c8xx.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/ncr53c8xx.h	Fri Oct 25 09:19:29 2002
@@ -76,8 +76,6 @@
 			info:           ncr53c8xx_info, 	\
 			queuecommand:   ncr53c8xx_queue_command,\
 			slave_attach:   ncr53c8xx_slave_attach, \
-			abort:          ncr53c8xx_abort,	\
-			reset:          ncr53c8xx_reset,	\
 			bios_param:     scsicam_bios_param,	\
 			can_queue:      SCSI_NCR_CAN_QUEUE,	\
 			this_id:        7,			\
diff -Nru a/drivers/scsi/nsp32.c b/drivers/scsi/nsp32.c
--- a/drivers/scsi/nsp32.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/nsp32.c	Fri Oct 25 09:19:29 2002
@@ -352,7 +352,6 @@
 	.eh_device_reset_handler =	NULL,
 	.eh_bus_reset_handler =		nsp32_eh_bus_reset,
 	.eh_host_reset_handler =	nsp32_eh_host_reset,
-	.reset =			nsp32_reset,
 	.release =			nsp32_release,
 #if (LINUX_VERSION_CODE < KERNEL_VERSION(2,5,2))
 	.use_new_eh_code =        	1,
@@ -1575,11 +1574,7 @@
 	}
 
 	/* search this HBA host */
-	for (host=scsi_hostlist; host; host=host->next) {
-		if (host->host_no == hostno) {
-			break;
-		}
-	}
+	host=scsi_host_hn_get(hostno);
 	if (host == NULL) {
 		return -ESRCH;
 	}
diff -Nru a/drivers/scsi/qla1280.h b/drivers/scsi/qla1280.h
--- a/drivers/scsi/qla1280.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/qla1280.h	Fri Oct 25 09:19:29 2002
@@ -1324,25 +1324,12 @@
  */
 
 #define QLA1280_LINUX_TEMPLATE {				\
-	next: NULL,						\
-	module: NULL,						\
-	proc_dir: NULL,						\
 	proc_info: qla1280_proc_info,				\
 	name: "Qlogic ISP 1280/12160",				\
 	detect: qla1280_detect,					\
 	release: qla1280_release,				\
 	info: qla1280_info,					\
-	ioctl: NULL,						\
-	command: NULL,						\
 	queuecommand: qla1280_queuecommand,			\
-	eh_strategy_handler: NULL,				\
-	eh_abort_handler: NULL,					\
-	eh_device_reset_handler: NULL,				\
-	eh_bus_reset_handler: NULL,				\
-	eh_host_reset_handler: NULL,				\
-/*	use_new_eh_code: 0, */					\
-	abort: qla1280_abort,					\
-	reset: qla1280_reset,					\
 	slave_attach: qla1280_slave_attach,			\
 	bios_param: qla1280_biosparam,				\
 	can_queue: 255,		/* max simultaneous cmds      */\
diff -Nru a/drivers/scsi/qlogicfc.h b/drivers/scsi/qlogicfc.h
--- a/drivers/scsi/qlogicfc.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/qlogicfc.h	Fri Oct 25 09:19:29 2002
@@ -87,7 +87,6 @@
         info:                   isp2x00_info,                              \
         queuecommand:           isp2x00_queuecommand,                      \
         eh_abort_handler:       isp2x00_abort,                             \
-        reset:                  isp2x00_reset,                             \
         bios_param:             isp2x00_biosparam,                         \
         can_queue:              QLOGICFC_REQ_QUEUE_LEN,                    \
         this_id:                -1,                                        \
diff -Nru a/drivers/scsi/qlogicisp.h b/drivers/scsi/qlogicisp.h
--- a/drivers/scsi/qlogicisp.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/qlogicisp.h	Fri Oct 25 09:19:29 2002
@@ -75,8 +75,6 @@
 	release:		isp1020_release,			   \
 	info:			isp1020_info,				   \
 	queuecommand:		isp1020_queuecommand,			   \
-	abort:			isp1020_abort,				   \
-	reset:			isp1020_reset,				   \
 	bios_param:		isp1020_biosparam,			   \
 	can_queue:		QLOGICISP_REQ_QUEUE_LEN,		   \
 	this_id:		-1,					   \
diff -Nru a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
--- a/drivers/scsi/scsi.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/scsi.c	Fri Oct 25 09:19:29 2002
@@ -2019,33 +2019,33 @@
 	     shpnt = scsi_host_get_next(shpnt)) {
 		for (SDpnt = shpnt->host_queue; SDpnt;
 		     SDpnt = SDpnt->next) {
+			scsi_build_commandblocks(SDpnt);
+			if (SDpnt->current_queue_depth == 0) {
+				out_of_space = 1;
+				continue;
+			}
 			if (tpnt->attach)
 				(*tpnt->attach) (SDpnt);
+
 			/*
 			 * If this driver attached to the device, and don't have any
 			 * command blocks for this device, allocate some.
 			 */
-			if (SDpnt->attached && SDpnt->current_queue_depth == 0) {
+			if (SDpnt->attached)
 				SDpnt->online = TRUE;
-				scsi_build_commandblocks(SDpnt);
-				if (SDpnt->current_queue_depth == 0)
-					out_of_space = 1;
-			}
+			else
+				scsi_release_commandblocks(SDpnt);
 		}
 	}
 
-	/*
-	 * This does any final handling that is required.
-	 */
-	if (tpnt->finish && tpnt->nr_dev)
-		(*tpnt->finish) ();
 	MOD_INC_USE_COUNT;
 
 	if (out_of_space) {
 		scsi_unregister_device(tpnt);	/* easiest way to clean up?? */
 		return 1;
-	} else
-		return 0;
+	}
+
+	return 0;
 }
 
 int scsi_unregister_device(struct Scsi_Device_Template *tpnt)
diff -Nru a/drivers/scsi/scsi.h b/drivers/scsi/scsi.h
--- a/drivers/scsi/scsi.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/scsi.h	Fri Oct 25 09:19:29 2002
@@ -446,20 +446,6 @@
 void scsi_free_sgtable(struct scatterlist *sgl, int index);
 
 /*
- * Prototypes for functions in scsi_dma.c
- */
-void scsi_resize_dma_pool(void);
-int scsi_init_minimal_dma_pool(void);
-void *scsi_malloc(unsigned int);
-int scsi_free(void *, unsigned int);
-
-/*
- * Prototypes for functions in scsi_merge.c
- */
-extern void scsi_initialize_merge_fn(Scsi_Device *SDpnt);
-extern int scsi_init_io(Scsi_Cmnd *SCpnt);
-
-/*
  * Prototypes for functions in scsi_lib.c
  */
 extern int scsi_maybe_unblock_host(Scsi_Device * SDpnt);
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/scsi_error.c	Fri Oct 25 09:19:29 2002
@@ -1145,14 +1145,18 @@
 		if (!scsi_eh_eflags_chk(scmd, SCSI_EH_CMD_ERR))
 			continue;
 
-		printk(KERN_INFO "%s: Device offlined - not"
+		printk(KERN_INFO "scsi: Device offlined - not"
 				" ready or command retry failed"
 				" after error recovery: host"
 				" %d channel %d id %d lun %d\n",
-				__FUNCTION__, shost->host_no,
+				shost->host_no,
 				scmd->device->channel,
 				scmd->device->id,
 				scmd->device->lun);
+
+		if (scsi_eh_eflags_chk(scmd, SCSI_EH_CMD_TIMEOUT))
+			scmd->result |= (DRIVER_TIMEOUT << 24);
+
 		scmd->device->online = FALSE;
 		scsi_eh_finish_cmd(scmd, shost);
 	}
diff -Nru a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/scsi_lib.c	Fri Oct 25 09:19:29 2002
@@ -730,6 +730,92 @@
 }
 
 /*
+ * Function:    scsi_init_io()
+ *
+ * Purpose:     SCSI I/O initialize function.
+ *
+ * Arguments:   SCpnt   - Command descriptor we wish to initialize
+ *
+ * Returns:     1 on success.
+ */
+static int scsi_init_io(Scsi_Cmnd *SCpnt)
+{
+	struct request     *req = SCpnt->request;
+	struct scatterlist *sgpnt;
+	int count, gfp_mask;
+
+	/*
+	 * non-sg block request. FIXME: check bouncing for isa hosts!
+	 */
+	if ((req->flags & REQ_BLOCK_PC) && !req->bio) {
+		/*
+		 * FIXME: isa bouncing
+		 */
+		if (SCpnt->host->unchecked_isa_dma)
+			goto fail;
+
+		SCpnt->request_bufflen = req->data_len;
+		SCpnt->request_buffer = req->data;
+		req->buffer = req->data;
+		SCpnt->use_sg = 0;
+		return 1;
+	}
+
+	/*
+	 * we used to not use scatter-gather for single segment request,
+	 * but now we do (it makes highmem I/O easier to support without
+	 * kmapping pages)
+	 */
+	SCpnt->use_sg = req->nr_phys_segments;
+
+	gfp_mask = GFP_NOIO;
+	if (in_interrupt()) {
+		gfp_mask &= ~__GFP_WAIT;
+		gfp_mask |= __GFP_HIGH;
+	}
+
+	/*
+	 * if sg table allocation fails, requeue request later.
+	 */
+	sgpnt = scsi_alloc_sgtable(SCpnt, gfp_mask);
+	if (unlikely(!sgpnt))
+		goto out;
+
+	SCpnt->request_buffer = (char *) sgpnt;
+	SCpnt->request_bufflen = req->nr_sectors << 9;
+	req->buffer = NULL;
+
+	/* 
+	 * Next, walk the list, and fill in the addresses and sizes of
+	 * each segment.
+	 */
+	count = blk_rq_map_sg(req->q, req, SCpnt->request_buffer);
+
+	/*
+	 * mapped well, send it off
+	 */
+	if (unlikely(count > SCpnt->use_sg))
+		goto incorrect;
+	SCpnt->use_sg = count;
+	return 1;
+
+incorrect:
+	printk(KERN_ERR "Incorrect number of segments after building list\n");
+	printk(KERN_ERR "counted %d, received %d\n", count, SCpnt->use_sg);
+	printk(KERN_ERR "req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors,
+			req->current_nr_sectors);
+
+	/*
+	 * kill it. there should be no leftover blocks in this request
+	 */
+fail:
+	SCpnt = scsi_end_request(SCpnt, 0, req->nr_sectors);
+	BUG_ON(SCpnt);
+out:
+	return 0;
+}
+
+/*
  * Function:    scsi_request_fn()
  *
  * Purpose:     Generic version of request function for SCSI hosts.
diff -Nru a/drivers/scsi/scsi_merge.c b/drivers/scsi/scsi_merge.c
--- a/drivers/scsi/scsi_merge.c	Fri Oct 25 09:19:29 2002
+++ /dev/null	Wed Dec 31 16:00:00 1969
@@ -1,168 +0,0 @@
-/*
- *  scsi_merge.c Copyright (C) 1999 Eric Youngdale
- *
- *  SCSI queueing library.
- *      Initial versions: Eric Youngdale (eric@andante.org).
- *                        Based upon conversations with large numbers
- *                        of people at Linux Expo.
- *	Support for dynamic DMA mapping: Jakub Jelinek (jakub@redhat.com).
- *	Support for highmem I/O: Jens Axboe <axboe@suse.de>
- */
-
-/*
- * This file contains queue management functions that are used by SCSI.
- * We need to ensure that commands do not grow so large that they cannot
- * be handled all at once by a host adapter.
- */
-
-#include <linux/config.h>
-#include <linux/module.h>
-
-#include <linux/sched.h>
-#include <linux/timer.h>
-#include <linux/string.h>
-#include <linux/slab.h>
-#include <linux/ioport.h>
-#include <linux/kernel.h>
-#include <linux/stat.h>
-#include <linux/blk.h>
-#include <linux/interrupt.h>
-#include <linux/delay.h>
-#include <linux/smp_lock.h>
-
-
-#define __KERNEL_SYSCALLS__
-
-#include <linux/unistd.h>
-
-#include <asm/system.h>
-#include <asm/irq.h>
-#include <asm/dma.h>
-#include <asm/io.h>
-
-#include "scsi.h"
-#include "hosts.h"
-#include <scsi/scsi_ioctl.h>
-
-/*
- * Function:    scsi_init_io()
- *
- * Purpose:     SCSI I/O initialize function.
- *
- * Arguments:   SCpnt   - Command descriptor we wish to initialize
- *
- * Returns:     1 on success.
- *
- * Lock status: 
- */
-int scsi_init_io(Scsi_Cmnd *SCpnt)
-{
-	struct request     *req = SCpnt->request;
-	struct scatterlist *sgpnt;
-	int count, gfp_mask;
-
-	/*
-	 * non-sg block request. FIXME: check bouncing for isa hosts!
-	 */
-	if ((req->flags & REQ_BLOCK_PC) && !req->bio) {
-		/*
-		 * FIXME: isa bouncing
-		 */
-		if (SCpnt->host->unchecked_isa_dma)
-			goto fail;
-
-		SCpnt->request_bufflen = req->data_len;
-		SCpnt->request_buffer = req->data;
-		req->buffer = req->data;
-		SCpnt->use_sg = 0;
-		return 1;
-	}
-
-	/*
-	 * we used to not use scatter-gather for single segment request,
-	 * but now we do (it makes highmem I/O easier to support without
-	 * kmapping pages)
-	 */
-	SCpnt->use_sg = req->nr_phys_segments;
-
-	gfp_mask = GFP_NOIO;
-	if (in_interrupt()) {
-		gfp_mask &= ~__GFP_WAIT;
-		gfp_mask |= __GFP_HIGH;
-	}
-
-	/*
-	 * if sg table allocation fails, requeue request later.
-	 */
-	sgpnt = scsi_alloc_sgtable(SCpnt, gfp_mask);
-	if (!sgpnt)
-		return 0;
-
-	SCpnt->request_buffer = (char *) sgpnt;
-	SCpnt->request_bufflen = req->nr_sectors << 9;
-	req->buffer = NULL;
-
-	/* 
-	 * Next, walk the list, and fill in the addresses and sizes of
-	 * each segment.
-	 */
-	count = blk_rq_map_sg(req->q, req, SCpnt->request_buffer);
-
-	/*
-	 * mapped well, send it off
-	 */
-	if (count <= SCpnt->use_sg) {
-		SCpnt->use_sg = count;
-		return 1;
-	}
-
-	printk("Incorrect number of segments after building list\n");
-	printk("counted %d, received %d\n", count, SCpnt->use_sg);
-	printk("req nr_sec %lu, cur_nr_sec %u\n", req->nr_sectors, req->current_nr_sectors);
-
-	/*
-	 * kill it. there should be no leftover blocks in this request
-	 */
-fail:
-	SCpnt = scsi_end_request(SCpnt, 0, req->nr_sectors);
-	BUG_ON(SCpnt);
-	return 0;
-}
-
-/*
- * Function:    scsi_initialize_merge_fn()
- *
- * Purpose:     Initialize merge function for a host
- *
- * Arguments:   SHpnt   - Host descriptor.
- *
- * Returns:     Nothing.
- *
- * Lock status: 
- *
- * Notes:
- */
-void scsi_initialize_merge_fn(Scsi_Device * SDpnt)
-{
-	struct Scsi_Host *SHpnt = SDpnt->host;
-	request_queue_t *q = &SDpnt->request_queue;
-	u64 bounce_limit;
-
-	/*
-	 * The generic merging functions work just fine for us.
-	 * Enable highmem I/O, if appropriate.
-	 */
-	bounce_limit = BLK_BOUNCE_HIGH;
-	if (SHpnt->highmem_io) {
-		if (!PCI_DMA_BUS_IS_PHYS)
-			/* Platforms with virtual-DMA translation
- 			 * hardware have no practical limit.
-			 */
-			bounce_limit = BLK_BOUNCE_ANY;
-		else if (SHpnt->pci_dev)
-			bounce_limit = SHpnt->pci_dev->dma_mask;
-	} else if (SHpnt->unchecked_isa_dma)
-		bounce_limit = BLK_BOUNCE_ISA;
-
-	blk_queue_bounce_limit(q, bounce_limit);
-}
diff -Nru a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
--- a/drivers/scsi/scsi_scan.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/scsi_scan.c	Fri Oct 25 09:19:29 2002
@@ -483,6 +483,35 @@
 }
 
 /**
+ * scsi_initialize_merge_fn() -Æ£initialize merge function for a host
+ * @sd:		host descriptor
+ */
+static void scsi_initialize_merge_fn(struct scsi_device *sd)
+{
+	request_queue_t *q = &sd->request_queue;
+	struct Scsi_Host *sh = sd->host;
+	u64 bounce_limit;
+
+	if (sh->highmem_io) {
+		if (sh->pci_dev && PCI_DMA_BUS_IS_PHYS) {
+			bounce_limit = sh->pci_dev->dma_mask;
+		} else {
+			/*
+			 * Platforms with virtual-DMA translation
+ 			 * hardware have no practical limit.
+			 */
+			bounce_limit = BLK_BOUNCE_ANY;
+		}
+	} else if (sh->unchecked_isa_dma) {
+		bounce_limit = BLK_BOUNCE_ISA;
+	} else {
+		bounce_limit = BLK_BOUNCE_HIGH;
+	}
+
+	blk_queue_bounce_limit(q, bounce_limit);
+}
+
+/**
  * scsi_alloc_sdev - allocate and setup a Scsi_Device
  *
  * Description:
@@ -1995,29 +2024,28 @@
 	sdevscan->scsi_level = scsi_find_scsi_level(channel, id, shost);
 	res = scsi_probe_and_add_lun(sdevscan, &sdev, NULL);
 	scsi_free_sdev(sdevscan);
-	if (res == SCSI_SCAN_LUN_PRESENT) {
-		BUG_ON(sdev == NULL);
 
-		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
-			if (sdt->init && sdt->dev_noticed)
-				(*sdt->init) ();
-
-		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
-			if (sdt->attach) {
-				(*sdt->attach) (sdev);
-				if (sdev->attached) {
-					scsi_build_commandblocks(sdev);
-					if (sdev->current_queue_depth == 0)
-						printk(ALLOC_FAILURE_MSG,
-						       __FUNCTION__);
-				}
-			}
+	if (res != SCSI_SCAN_LUN_PRESENT) 
+		return;
 
-		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
-			if (sdt->finish && sdt->nr_dev)
-				(*sdt->finish) ();
+	BUG_ON(sdev == NULL);
 
+	scsi_build_commandblocks(sdev);
+	if (sdev->current_queue_depth == 0) {
+		printk(ALLOC_FAILURE_MSG, __FUNCTION__);
+		return;
 	}
+
+	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
+		if (sdt->init && sdt->dev_noticed)
+			(*sdt->init) ();
+
+	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
+		if (sdt->attach)
+			(*sdt->attach) (sdev);
+
+	if (!sdev->attached)
+		scsi_release_commandblocks(sdev);
 }
 
 /**
diff -Nru a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/sd.c	Fri Oct 25 09:19:29 2002
@@ -92,7 +92,6 @@
 static void sd_init_onedisk(Scsi_Disk * sdkp, struct gendisk *disk);
 
 static int sd_init(void);
-static void sd_finish(void);
 static int sd_attach(Scsi_Device *);
 static int sd_detect(Scsi_Device *);
 static void sd_detach(Scsi_Device *);
@@ -103,23 +102,19 @@
 static struct notifier_block sd_notifier_block = {sd_notifier, NULL, 0}; 
 
 static struct Scsi_Device_Template sd_template = {
-	module:THIS_MODULE,
-	name:"disk",
-	tag:"sd",
-	scsi_type:TYPE_DISK,
-	major:SCSI_DISK0_MAJOR,
-        /*
-         * Secondary range of majors that this driver handles.
-         */
-	min_major:SCSI_DISK1_MAJOR,
-	max_major:SCSI_DISK7_MAJOR,
-	blk:1,
-	detect:sd_detect,
-	init:sd_init,
-	finish:sd_finish,
-	attach:sd_attach,
-	detach:sd_detach,
-	init_command:sd_init_command,
+	.module		= THIS_MODULE,
+	.name		= "disk",
+	.tag		= "sd",
+	.scsi_type	= TYPE_DISK,
+	.major		= SCSI_DISK0_MAJOR,
+	.min_major	= SCSI_DISK1_MAJOR,
+	.max_major	= SCSI_DISK7_MAJOR,
+	.blk		= 1,
+	.detect		= sd_detect,
+	.init		= sd_init,
+	.attach		= sd_attach,
+	.detach		= sd_detach,
+	.init_command	= sd_init_command,
 };
 
 static void sd_rw_intr(Scsi_Cmnd * SCpnt);
@@ -1291,38 +1286,6 @@
 }
 
 /**
- *	sd_finish - called during driver initialization, after all
- *	the sd_attach() calls are finished.
- *
- *	Note: this function is invoked from the scsi mid-level.
- *	This function is not called after driver initialization has completed.
- *	Specifically later device attachments invoke sd_attach() but not
- *	this function.
- **/
-static void sd_finish()
-{
-	int k;
-	Scsi_Disk * sdkp;
-
-	SCSI_LOG_HLQUEUE(3, printk("sd_finish: \n"));
-
-	for (k = 0; k < sd_template.dev_max; ++k) {
-		sdkp = sd_get_sdisk(k);
-		if (sdkp && (0 == sdkp->capacity) && sdkp->device) {
-			sd_init_onedisk(sdkp, sd_disks[k]);
-			if (sdkp->has_been_registered)
-				continue;
-			set_capacity(sd_disks[k], sdkp->capacity);
-			sd_disks[k]->private_data = sdkp;
-			sd_disks[k]->queue = &sdkp->device->request_queue;
-			add_disk(sd_disks[k]);
-			sdkp->has_been_registered = 1;
-		}
-	}
-	return;
-}
-
-/**
  *	sd_detect - called at the start of driver initialization, once 
  *	for each scsi device (not just disks) present.
  *
@@ -1358,13 +1321,12 @@
  **/
 static int sd_attach(Scsi_Device * sdp)
 {
-	Scsi_Disk *sdkp;
+	Scsi_Disk *sdkp = NULL;	/* shut up lame gcc warning */
 	int dsk_nr;
 	unsigned long iflags;
 	struct gendisk *gd;
 
-	if ((NULL == sdp) ||
-	    ((sdp->type != TYPE_DISK) && (sdp->type != TYPE_MOD)))
+	if ((sdp->type != TYPE_DISK) && (sdp->type != TYPE_MOD))
 		return 0;
 
 	gd = alloc_disk(16);
@@ -1373,15 +1335,16 @@
 
 	SCSI_LOG_HLQUEUE(3, printk("sd_attach: scsi device: <%d,%d,%d,%d>\n", 
 			 sdp->host->host_no, sdp->channel, sdp->id, sdp->lun));
+
 	if (sd_template.nr_dev >= sd_template.dev_max) {
-		sdp->attached--;
 		printk(KERN_ERR "sd_init: no more room for device\n");
-		put_disk(gd);
-		return 1;
+		goto out;
 	}
 
-/* Assume sd_attach is not re-entrant (for time being) */
-/* Also think about sd_attach() and sd_detach() running coincidentally. */
+	/*
+	 * Assume sd_attach is not re-entrant (for time being)
+	 * Also think about sd_attach() and sd_detach() running coincidentally.
+	 */
 	write_lock_irqsave(&sd_dsk_arr_lock, iflags);
 	for (dsk_nr = 0; dsk_nr < sd_template.dev_max; dsk_nr++) {
 		sdkp = sd_dsk_arr[dsk_nr];
@@ -1393,15 +1356,15 @@
 	}
 	write_unlock_irqrestore(&sd_dsk_arr_lock, iflags);
 
-	if (dsk_nr >= sd_template.dev_max) {
-		/* panic("scsi_devices corrupt (sd)");  overkill */
+	if (!sdkp || dsk_nr >= sd_template.dev_max) {
 		printk(KERN_ERR "sd_init: sd_dsk_arr corrupted\n");
-		put_disk(gd);
-		return 1;
+		goto out;
 	}
 
+	sd_init_onedisk(sdkp, gd);
 	sd_template.nr_dev++;
-        gd->de = sdp->de;
+
+	gd->de = sdp->de;
 	gd->major = SD_MAJOR(dsk_nr>>4);
 	gd->first_minor = (dsk_nr & 15)<<4;
 	gd->fops = &sd_fops;
@@ -1409,14 +1372,26 @@
 		sprintf(gd->disk_name, "sd%c%c",'a'+dsk_nr/26-1,'a'+dsk_nr%26);
 	else
 		sprintf(gd->disk_name, "sd%c",'a'+dsk_nr%26);
-        gd->flags = sdp->removable ? GENHD_FL_REMOVABLE : 0;
-        gd->driverfs_dev = &sdp->sdev_driverfs_dev;
-        gd->flags |= GENHD_FL_DRIVERFS | GENHD_FL_DEVFS;
+	gd->flags = sdp->removable ? GENHD_FL_REMOVABLE : 0;
+	gd->driverfs_dev = &sdp->sdev_driverfs_dev;
+	gd->flags |= GENHD_FL_DRIVERFS | GENHD_FL_DEVFS;
+	gd->private_data = sdkp;
+	gd->queue = &sdkp->device->request_queue;
+
+	set_capacity(gd, sdkp->capacity);
+	add_disk(gd);
+
 	sd_disks[dsk_nr] = gd;
+
 	printk(KERN_NOTICE "Attached scsi %sdisk %s at scsi%d, channel %d, "
 	       "id %d, lun %d\n", sdp->removable ? "removable " : "",
 	       gd->disk_name, sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
 	return 0;
+
+out:
+	sdp->attached--;
+	put_disk(gd);
+	return 1;
 }
 
 static int sd_revalidate(struct gendisk *disk)
@@ -1472,10 +1447,7 @@
 	sdkp->capacity = 0;
 	/* sdkp->detaching = 1; */
 
-	if (sdkp->has_been_registered) {
-		sdkp->has_been_registered = 0;
-		del_gendisk(sd_disks[dsk_nr]);
-	}
+	del_gendisk(sd_disks[dsk_nr]);
 	sdp->attached--;
 	sd_template.dev_noticed--;
 	sd_template.nr_dev--;
@@ -1566,6 +1538,9 @@
 	Scsi_Disk *sdkp = sd_get_sdisk(index);
 	Scsi_Device *SDpnt = sdkp->device;
 	int retries, the_result;
+
+	if (!SDpnt->online)
+		return 0;
 
 	if(verbose) {
 		char buf[16];
diff -Nru a/drivers/scsi/sd.h b/drivers/scsi/sd.h
--- a/drivers/scsi/sd.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/sd.h	Fri Oct 25 09:19:29 2002
@@ -25,7 +25,6 @@
 	Scsi_Device *device;
 	unsigned char media_present;
 	unsigned char write_prot;
-	unsigned has_been_registered:1;
 	unsigned WCE:1;         /* state of disk WCE bit */
 	unsigned RCD:1;         /* state of disk RCD bit */
 } Scsi_Disk;
diff -Nru a/drivers/scsi/sg.c b/drivers/scsi/sg.c
--- a/drivers/scsi/sg.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/sg.c	Fri Oct 25 09:19:29 2002
@@ -1607,7 +1607,7 @@
 		sdp->de = NULL;
 		device_remove_file(&sdp->sg_driverfs_dev, &dev_attr_type);
 		device_remove_file(&sdp->sg_driverfs_dev, &dev_attr_kdev);
-		put_device(&sdp->sg_driverfs_dev);
+		device_unregister(&sdp->sg_driverfs_dev);
 		if (NULL == sdp->headfp)
 			vfree((char *) sdp);
 	}
diff -Nru a/drivers/scsi/sr.c b/drivers/scsi/sr.c
--- a/drivers/scsi/sr.c	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/sr.c	Fri Oct 25 09:19:29 2002
@@ -63,27 +63,24 @@
 #define SR_TIMEOUT	(30 * HZ)
 
 static int sr_init(void);
-static void sr_finish(void);
 static int sr_attach(Scsi_Device *);
 static int sr_detect(Scsi_Device *);
 static void sr_detach(Scsi_Device *);
 
 static int sr_init_command(Scsi_Cmnd *);
 
-static struct Scsi_Device_Template sr_template =
-{
-	module:THIS_MODULE,
-	name:"cdrom",
-	tag:"sr",
-	scsi_type:TYPE_ROM,
-	major:SCSI_CDROM_MAJOR,
-	blk:1,
-	detect:sr_detect,
-	init:sr_init,
-	finish:sr_finish,
-	attach:sr_attach,
-	detach:sr_detach,
-	init_command:sr_init_command
+static struct Scsi_Device_Template sr_template = {
+	.module		= THIS_MODULE,
+	.name		= "cdrom",
+	.tag		= "sr",
+	.scsi_type	= TYPE_ROM,
+	.major		= SCSI_CDROM_MAJOR,
+	.blk		= 1,
+	.detect		= sr_detect,
+	.init		= sr_init,
+	.attach		= sr_attach,
+	.detach		= sr_detach,
+	.init_command	= sr_init_command
 };
 
 static Scsi_CD *scsi_CDs;
@@ -91,6 +88,7 @@
 static int sr_open(struct cdrom_device_info *, int);
 static void get_sectorsize(Scsi_CD *);
 static void get_capabilities(Scsi_CD *);
+static int sr_init_one(Scsi_CD *, int);
 
 static int sr_media_change(struct cdrom_device_info *, int);
 static int sr_packet(struct cdrom_device_info *, struct cdrom_generic_command *);
@@ -473,10 +471,9 @@
 	if (SDp->type != TYPE_ROM && SDp->type != TYPE_WORM)
 		return 1;
 
-	if (sr_template.nr_dev >= sr_template.dev_max) {
-		SDp->attached--;
-		return 1;
-	}
+	if (sr_template.nr_dev >= sr_template.dev_max)
+		goto fail;
+
 	for (cpnt = scsi_CDs, i = 0; i < sr_template.dev_max; i++, cpnt++)
 		if (!cpnt->device)
 			break;
@@ -484,9 +481,11 @@
 	if (i >= sr_template.dev_max)
 		panic("scsi_devices corrupt (sr)");
 
-
 	scsi_CDs[i].device = SDp;
 
+	if (sr_init_one(cpnt, i))
+		goto fail;
+
 	sr_template.nr_dev++;
 	if (sr_template.nr_dev > sr_template.dev_max)
 		panic("scsi_devices corrupt (sr)");
@@ -494,6 +493,10 @@
 	printk("Attached scsi CD-ROM %s at scsi%d, channel %d, id %d, lun %d\n",
 	       scsi_CDs[i].cdi.name, SDp->host->host_no, SDp->channel, SDp->id, SDp->lun);
 	return 0;
+
+fail:
+	SDp->attached--;
+	return 1;
 }
 
 
@@ -744,64 +747,56 @@
 	return 1;
 }
 
-void sr_finish()
+static int sr_init_one(Scsi_CD *cd, int first_minor)
 {
-	int i;
+	struct gendisk *disk;
 
-	for (i = 0; i < sr_template.nr_dev; ++i) {
-		struct gendisk *disk;
-		Scsi_CD *cd = &scsi_CDs[i];
-		/* If we have already seen this, then skip it.  Comes up
-		 * with loadable modules. */
-		if (cd->disk)
-			continue;
-		disk = alloc_disk(1);
-		if (!disk)
-			continue;
-		if (cd->disk) {
-			put_disk(disk);
-			continue;
-		}
-		disk->major = MAJOR_NR;
-		disk->first_minor = i;
-		strcpy(disk->disk_name, cd->cdi.name);
-		disk->fops = &sr_bdops;
-		disk->flags = GENHD_FL_CD;
-		cd->disk = disk;
-		cd->capacity = 0x1fffff;
-		cd->device->sector_size = 2048;/* A guess, just in case */
-		cd->needs_sector_size = 1;
-		cd->device->changed = 1;	/* force recheck CD type */
+	disk = alloc_disk(1);
+	if (!disk)
+		return -ENOMEM;
+
+	disk->major = MAJOR_NR;
+	disk->first_minor = first_minor;
+	strcpy(disk->disk_name, cd->cdi.name);
+	disk->fops = &sr_bdops;
+	disk->flags = GENHD_FL_CD;
+	cd->disk = disk;
+	cd->capacity = 0x1fffff;
+	cd->device->sector_size = 2048;/* A guess, just in case */
+	cd->needs_sector_size = 1;
+	cd->device->changed = 1;	/* force recheck CD type */
 #if 0
-		/* seems better to leave this for later */
-		get_sectorsize(cd);
-		printk("Scd sectorsize = %d bytes.\n", cd->sector_size);
+	/* seems better to leave this for later */
+	get_sectorsize(cd);
+	printk("Scd sectorsize = %d bytes.\n", cd->sector_size);
 #endif
-		cd->use = 1;
+	cd->use = 1;
 
-		cd->device->ten = 1;
-		cd->device->remap = 1;
-		cd->readcd_known = 0;
-		cd->readcd_cdda = 0;
-
-		cd->cdi.ops = &sr_dops;
-		cd->cdi.handle = cd;
-		cd->cdi.mask = 0;
-		cd->cdi.capacity = 1;
-		/*
-		 *	FIXME: someone needs to handle a get_capabilities
-		 *	failure properly ??
-		 */
-		get_capabilities(cd);
-		sr_vendor_init(cd);
-		disk->de = cd->device->de;
-		disk->driverfs_dev = &cd->device->sdev_driverfs_dev;
-		register_cdrom(&cd->cdi);
-		set_capacity(disk, cd->capacity);
-		disk->private_data = cd;
-		disk->queue = &cd->device->request_queue;
-		add_disk(disk);
-	}
+	cd->device->ten = 1;
+	cd->device->remap = 1;
+	cd->readcd_known = 0;
+	cd->readcd_cdda = 0;
+
+	cd->cdi.ops = &sr_dops;
+	cd->cdi.handle = cd;
+	cd->cdi.mask = 0;
+	cd->cdi.capacity = 1;
+
+	/*
+	 *	FIXME: someone needs to handle a get_capabilities
+	 *	failure properly ??
+	 */
+	get_capabilities(cd);
+	sr_vendor_init(cd);
+	disk->de = cd->device->de;
+	disk->driverfs_dev = &cd->device->sdev_driverfs_dev;
+	register_cdrom(&cd->cdi);
+	set_capacity(disk, cd->capacity);
+	disk->private_data = cd;
+	disk->queue = &cd->device->request_queue;
+	add_disk(disk);
+
+	return 0;
 }
 
 static void sr_detach(Scsi_Device * SDp)
diff -Nru a/drivers/scsi/sym53c8xx.h b/drivers/scsi/sym53c8xx.h
--- a/drivers/scsi/sym53c8xx.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/sym53c8xx.h	Fri Oct 25 09:19:29 2002
@@ -91,8 +91,6 @@
 			info:           sym53c8xx_info, 	\
 			queuecommand:   sym53c8xx_queue_command,\
 			slave_attach:   sym53c8xx_slave_attach, \
-			abort:          sym53c8xx_abort,	\
-			reset:          sym53c8xx_reset,	\
 			bios_param:     scsicam_bios_param,	\
 			can_queue:      SCSI_NCR_CAN_QUEUE,	\
 			this_id:        7,			\
diff -Nru a/drivers/scsi/wd7000.h b/drivers/scsi/wd7000.h
--- a/drivers/scsi/wd7000.h	Fri Oct 25 09:19:29 2002
+++ b/drivers/scsi/wd7000.h	Fri Oct 25 09:19:29 2002
@@ -49,7 +49,6 @@
 	detect:			wd7000_detect,			\
 	command:		wd7000_command,			\
 	queuecommand:		wd7000_queuecommand,		\
-	abort:			wd7000_abort,			\
 	eh_bus_reset_handler:	wd7000_bus_reset,		\
 	eh_device_reset_handler:wd7000_device_reset,		\
 	eh_host_reset_handler:	wd7000_host_reset,		\

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 14:16 ` James Bottomley
@ 2002-10-25 18:34   ` James Bottomley
  2002-10-25 18:49     ` Mike Anderson
                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: James Bottomley @ 2002-10-25 18:34 UTC (permalink / raw
  To: Andrew Morton, linux-scsi@vger.kernel.org, Badari Pulavarty,
	Martin J. Bligh, Jens Axboe, Doug Ledford

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]


James.Bottomley@steeleye.com said:
> This has all the hallmarks of the Qlogic double done bug:  Under
> certain high  stress/bad bus situations, the Qla driver will call done
> twice on a SCSI  command structure.  I take it this is the 6.1.0 qla
> driver, which qlogic has  assured me "really really" has this bug
> fixed?

> Is there any way to switch adapters to see if we can confirm this
> hypothesis? 

Actually, rather than switching adapters, could you try the attached patch.  
As long as the command isn't reused too fast after the scsi_done, it should 
pick up this problem.

James





[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 818 bytes --]

===== drivers/scsi/scsi.c 1.50 vs edited =====
--- 1.50/drivers/scsi/scsi.c	Tue Oct 22 12:43:29 2002
+++ edited/drivers/scsi/scsi.c	Fri Oct 25 13:23:32 2002
@@ -1224,6 +1224,12 @@
 	SCSI_LOG_MLQUEUE(3, printk("Leaving scsi_do_cmd()\n"));
 }
 
+static void scsi_null_done_method(Scsi_Cmnd *SCp)
+{
+	printk(KERN_EMERG "Done called on already done command\n");
+        dump_stack();
+}
+
 /**
  * scsi_done - Mark this command as done
  * @SCpnt: The SCSI Command which we think we've completed.
@@ -1247,6 +1253,10 @@
 	unsigned long flags;
 	int cpu, tstatus;
 	struct softscsi_data *queue;
+
+        /* clear out the done method to produce an error for done on the
+         * same command */
+        SCpnt->scsi_done = scsi_null_done_method;
 
 	/*
 	 * We don't have to worry about this one timing out any more.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 18:34   ` James Bottomley
@ 2002-10-25 18:49     ` Mike Anderson
  2002-10-25 19:08     ` Patrick Mansfield
  2002-10-25 22:23     ` Badari Pulavarty
  2 siblings, 0 replies; 37+ messages in thread
From: Mike Anderson @ 2002-10-25 18:49 UTC (permalink / raw
  To: James Bottomley
  Cc: Andrew Morton, linux-scsi@vger.kernel.org, Badari Pulavarty,
	Martin J. Bligh, Jens Axboe, Doug Ledford

This looks better than my quick hack. I talked to Badari about putting a
printk on the status check of scsi_delete_timer.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 18:34   ` James Bottomley
  2002-10-25 18:49     ` Mike Anderson
@ 2002-10-25 19:08     ` Patrick Mansfield
  2002-10-25 19:41       ` Mike Anderson
  2002-10-25 22:23     ` Badari Pulavarty
  2 siblings, 1 reply; 37+ messages in thread
From: Patrick Mansfield @ 2002-10-25 19:08 UTC (permalink / raw
  To: James Bottomley
  Cc: Andrew Morton, linux-scsi@vger.kernel.org, Badari Pulavarty,
	Martin J. Bligh, Jens Axboe, Doug Ledford

On Fri, Oct 25, 2002 at 01:34:02PM -0500, James Bottomley wrote:
> 
> James.Bottomley@steeleye.com said:
> > This has all the hallmarks of the Qlogic double done bug:  Under
> > certain high  stress/bad bus situations, the Qla driver will call done
> > twice on a SCSI  command structure.  I take it this is the 6.1.0 qla
> > driver, which qlogic has  assured me "really really" has this bug
> > fixed?
> 
> > Is there any way to switch adapters to see if we can confirm this
> > hypothesis? 

These are qla 23xx cards, and are not supported by the qlogicfc driver.

I don't think the feral driver is ported to 2.5.x, so we can't use it.

> Actually, rather than switching adapters, could you try the attached patch.  
> As long as the command isn't reused too fast after the scsi_done, it should 
> pick up this problem.
> 
> James

I tried a similiar patch (printk if SCpnt->state == SCSI_STATE_BHQUEUE),
and got no hits. It looks like the latest qla checks for this state, as
I saw these errors:

Incorrect number of segments after building list
counted 3, received 2
req nr_sec 256, cur_nr_sec 8
end_request: I/O error, dev 08:a0, sector 1334497
qla2x00_status_entry: cmd is NULL: already returned to OS (sp=f39810e0)
cmd_timeout: LOST command state = 0x6 
qla2x00 (2): Did not free all srbs, Free count = 4095, Alloc Count = 4096


I did not try to figure out what happens to the qla after the
qla2x00_status_entry prints the "cmd is NULL".

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 19:08     ` Patrick Mansfield
@ 2002-10-25 19:41       ` Mike Anderson
  2002-10-25 19:47         ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Anderson @ 2002-10-25 19:41 UTC (permalink / raw
  To: James Bottomley, Andrew Morton, linux-scsi@vger.kernel.org,
	Badari Pulavarty, Martin J. Bligh, Jens Axboe, Doug Ledford

Patrick Mansfield [patmans@us.ibm.com] wrote:
> Incorrect number of segments after building list
> counted 3, received 2
> req nr_sec 256, cur_nr_sec 8
> end_request: I/O error, dev 08:a0, sector 1334497
> qla2x00_status_entry: cmd is NULL: already returned to OS (sp=f39810e0)
> cmd_timeout: LOST command state = 0x6 
> qla2x00 (2): Did not free all srbs, Free count = 4095, Alloc Count = 4096

I this a signature of sg_tablesize not matching max_sectors.

I heard of an old issue on this.

qla2x00 values are:

max_sectors is 512

SG_SEGMENTS 32

Someone could try setting 

SG_SEGMENTS to  64.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 19:41       ` Mike Anderson
@ 2002-10-25 19:47         ` Jens Axboe
  2002-10-25 22:14           ` James Bottomley
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-25 19:47 UTC (permalink / raw
  To: James Bottomley, Andrew Morton, linux-scsi@vger.kernel.org,
	Badari Pulavarty, Martin J. Bligh, Doug Ledford

On Fri, Oct 25 2002, Mike Anderson wrote:
> Patrick Mansfield [patmans@us.ibm.com] wrote:
> > Incorrect number of segments after building list
> > counted 3, received 2
> > req nr_sec 256, cur_nr_sec 8
> > end_request: I/O error, dev 08:a0, sector 1334497
> > qla2x00_status_entry: cmd is NULL: already returned to OS (sp=f39810e0)
> > cmd_timeout: LOST command state = 0x6 
> > qla2x00 (2): Did not free all srbs, Free count = 4095, Alloc Count = 4096
> 
> I this a signature of sg_tablesize not matching max_sectors.
> 
> I heard of an old issue on this.
> 
> qla2x00 values are:
> 
> max_sectors is 512
> 
> SG_SEGMENTS 32
> 
> Someone could try setting 
> 
> SG_SEGMENTS to  64.

Should not matter (and I don't know of any old issues in this regard?).
A request will never be built that exceeds any of the given limits.
Besides, there's no direct max_sectors -> max_segments mapping.

What I make of the above is that the request or bio appears to have been
mangled. This:

Incorrect number of segments after building list
counted 3, received 2

is a _must not_ happen scenario, it's always a grave bug.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 19:47         ` Jens Axboe
@ 2002-10-25 22:14           ` James Bottomley
  2002-10-25 22:18             ` Andrew Morton
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2002-10-25 22:14 UTC (permalink / raw
  To: Jens Axboe
  Cc: James Bottomley, Andrew Morton, linux-scsi@vger.kernel.org,
	Badari Pulavarty, Martin J. Bligh, Doug Ledford

I think it may be bound up in another SCSI hang scenario I've been seeing.  
I've instrumented the done on already done (and the locking in the SCSI 
request fn).

The scenario I see is:

driver rejects command.  Mid layer requeues.  Several commands go by.  Now I 
get a done on an already done command.  Unfortunately, the **** dump_stack() 
doesn't print a trace so I still don't know where the command is coming from.

I'll keep looking.

James



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 22:14           ` James Bottomley
@ 2002-10-25 22:18             ` Andrew Morton
  0 siblings, 0 replies; 37+ messages in thread
From: Andrew Morton @ 2002-10-25 22:18 UTC (permalink / raw
  To: James Bottomley
  Cc: Jens Axboe, linux-scsi@vger.kernel.org, Badari Pulavarty,
	Martin J. Bligh, Doug Ledford

James Bottomley wrote:
> 
> Unfortunately, the **** dump_stack()
> doesn't print a trace so I still don't know where the command is coming from.

hrm.  Why not?   Maybe there's a BUG happening as well and that
fiddles the console log level?

Did you try using BUG instead of dump_stack()?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 18:34   ` James Bottomley
  2002-10-25 18:49     ` Mike Anderson
  2002-10-25 19:08     ` Patrick Mansfield
@ 2002-10-25 22:23     ` Badari Pulavarty
  2002-10-26  0:13       ` James Bottomley
  2 siblings, 1 reply; 37+ messages in thread
From: Badari Pulavarty @ 2002-10-25 22:23 UTC (permalink / raw
  To: James Bottomley
  Cc: Andrew Morton, linux-scsi@vger.kernel.org, Badari Pulavarty,
	Martin J. Bligh, Jens Axboe, Doug Ledford

I Just tried the patch. No Luck. I get same panic as before...
I am using qla2x00src-v6.03.00b6 driver on qla2200 fc controllers.

- Badari

elm3b81 login: Oops: 0002
qla2200  
CPU:    3
EIP:    0060:[<f89d0cd7>]    Not tainted
EFLAGS: 00010002
EIP is at qla2x00_process_completed_request+0x67/0x150 [qla2200]
eax: 00000000   ebx: f65f02dc   ecx: 00000000   edx: 00000000
esi: f6b9017c   edi: 00000000   ebp: d89bfda3   esp: d89bfd24
ds: 0068   es: 0068   ss: 0068
Process dd (pid: 1512, threadinfo=d89be000 task=d5039760)
Stack: f6b9017c 0000711a 00007100 f89d0efb f6b9017c 00000287 00008020 00000000 
       00008020 00007100 02870000 00000000 00000000 c0134768 f642b110 0000d6a1 
       0000000c c014ed4c f642b10c 00000000 00000008 f7f78a1c f6b9017c f6b9017c 
Call Trace:
 [<f89d0efb>] qla2x00_isr+0x13b/0x610 [qla2200]
 [<c0134768>] find_get_page+0x38/0x50
 [<c014ed4c>] __find_get_block_slow+0x2c/0x110
 [<f89cc17b>] qla2x00_intr_handler+0x9b/0x250 [qla2200]
 [<c010950a>] handle_IRQ_event+0x3a/0x60
 [<c0109812>] do_IRQ+0x112/0x1f0
 [<c0107f38>] common_interrupt+0x18/0x20
 [<c020926c>] __copy_from_user+0x4c/0x70
 [<c0136473>] generic_file_write_nolock+0x743/0xa00
 [<c0130071>] pte_alloc_map+0x121/0x130
 [<c0131103>] zeromap_page_range+0xe3/0x180
 [<c01367a5>] generic_file_write+0x55/0x70
 [<c014cf8f>] vfs_write+0xbf/0x160
 [<c01170ea>] do_schedule+0x38a/0x480
 [<c014d09a>] sys_write+0x2a/0x40
 [<c01075f3>] syscall_call+0x7/0xb

Code: 89 b8 24 01 00 00 8b 43 10 89 88 38 01 00 00 8b 43 10 89 90 
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

> 
> This is a multipart MIME message.
> 
> --==_Exmh_11899173800
> Content-Type: text/plain; charset=us-ascii
> 
> 
> James.Bottomley@steeleye.com said:
> > This has all the hallmarks of the Qlogic double done bug:  Under
> > certain high  stress/bad bus situations, the Qla driver will call done
> > twice on a SCSI  command structure.  I take it this is the 6.1.0 qla
> > driver, which qlogic has  assured me "really really" has this bug
> > fixed?
> 
> > Is there any way to switch adapters to see if we can confirm this
> > hypothesis? 
> 
> Actually, rather than switching adapters, could you try the attached patch.  
> As long as the command isn't reused too fast after the scsi_done, it should 
> pick up this problem.
> 
> James
> 
> 
> 
> 
> 
> --==_Exmh_11899173800
> Content-Type: text/plain ; name="tmp.diff"; charset=us-ascii
> Content-Description: tmp.diff
> Content-Disposition: attachment; filename="tmp.diff"
> 
> ===== drivers/scsi/scsi.c 1.50 vs edited =====
> --- 1.50/drivers/scsi/scsi.c	Tue Oct 22 12:43:29 2002
> +++ edited/drivers/scsi/scsi.c	Fri Oct 25 13:23:32 2002
> @@ -1224,6 +1224,12 @@
>  	SCSI_LOG_MLQUEUE(3, printk("Leaving scsi_do_cmd()\n"));
>  }
>  
> +static void scsi_null_done_method(Scsi_Cmnd *SCp)
> +{
> +	printk(KERN_EMERG "Done called on already done command\n");
> +        dump_stack();
> +}
> +
>  /**
>   * scsi_done - Mark this command as done
>   * @SCpnt: The SCSI Command which we think we've completed.
> @@ -1247,6 +1253,10 @@
>  	unsigned long flags;
>  	int cpu, tstatus;
>  	struct softscsi_data *queue;
> +
> +        /* clear out the done method to produce an error for done on the
> +         * same command */
> +        SCpnt->scsi_done = scsi_null_done_method;
>  
>  	/*
>  	 * We don't have to worry about this one timing out any more.
> 
> --==_Exmh_11899173800--
> 
> 
> 


-- 
Badari Pulavarty
pbadari@us.ibm.com
IBM Linux Technology Center - Kernel Team

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-25 22:23     ` Badari Pulavarty
@ 2002-10-26  0:13       ` James Bottomley
  2002-10-26  0:18         ` Mike Anderson
  2002-10-26  9:29         ` Jens Axboe
  0 siblings, 2 replies; 37+ messages in thread
From: James Bottomley @ 2002-10-26  0:13 UTC (permalink / raw
  To: Badari Pulavarty
  Cc: James Bottomley, Andrew Morton, linux-scsi@vger.kernel.org,
	Martin J. Bligh, Jens Axboe, Doug Ledford

[-- Attachment #1: Type: text/plain, Size: 645 bytes --]

pbadari@us.ibm.com said:
> I Just tried the patch. No Luck. I get same panic as before... I am
> using qla2x00src-v6.03.00b6 driver on qla2200 fc controllers. 

Well, yours may be a qla bug.

However, I've tracked down my hang problem:  If a command is pushed back into 
the block queue as a REQ_SPECIAL using the blk_insert_request() API, it never 
has REQ_CMD cleared.  Eventually it comes back into scsi_request_fn() with 
both REQ_CMD and REQ_SPECIAL set.  This causes the I/O to be initialised again.

The fix is to clear REQ_CMD in blk_insert_request().

This may also be the cause of Patrick's Incorrect number of segments error.

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 374 bytes --]

===== drivers/block/ll_rw_blk.c 1.123 vs edited =====
--- 1.123/drivers/block/ll_rw_blk.c	Fri Oct 18 12:41:37 2002
+++ edited/drivers/block/ll_rw_blk.c	Fri Oct 25 18:48:12 2002
@@ -1374,6 +1374,7 @@
 	 * must not attempt merges on this) and that it acts as a soft
 	 * barrier
 	 */
+	rq->flags &= ~REQ_CMD;
 	rq->flags |= REQ_SPECIAL | REQ_BARRIER;
 
 	rq->special = data;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-26  0:13       ` James Bottomley
@ 2002-10-26  0:18         ` Mike Anderson
  2002-10-26  9:29         ` Jens Axboe
  1 sibling, 0 replies; 37+ messages in thread
From: Mike Anderson @ 2002-10-26  0:18 UTC (permalink / raw
  To: James Bottomley
  Cc: Badari Pulavarty, Andrew Morton, linux-scsi@vger.kernel.org,
	Martin J. Bligh, Jens Axboe, Doug Ledford

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> pbadari@us.ibm.com said:
> > I Just tried the patch. No Luck. I get same panic as before... I am
> > using qla2x00src-v6.03.00b6 driver on qla2200 fc controllers. 
> 
> Well, yours may be a qla bug.
> 
> However, I've tracked down my hang problem:  If a command is pushed back into 
> the block queue as a REQ_SPECIAL using the blk_insert_request() API, it never 
> has REQ_CMD cleared.  Eventually it comes back into scsi_request_fn() with 
> both REQ_CMD and REQ_SPECIAL set.  This causes the I/O to be initialised again.
> 
> The fix is to clear REQ_CMD in blk_insert_request().
> 
> This may also be the cause of Patrick's Incorrect number of segments error.

Once we kicked the console log level up Badari's system also showed
segment errors prior to the panic, but there still maybe other issues here.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-26  0:13       ` James Bottomley
  2002-10-26  0:18         ` Mike Anderson
@ 2002-10-26  9:29         ` Jens Axboe
  2002-10-27  0:50           ` James Bottomley
  1 sibling, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-26  9:29 UTC (permalink / raw
  To: James Bottomley
  Cc: Badari Pulavarty, Andrew Morton, linux-scsi@vger.kernel.org,
	Martin J. Bligh, Doug Ledford

On Fri, Oct 25 2002, James Bottomley wrote:
> pbadari@us.ibm.com said:
> > I Just tried the patch. No Luck. I get same panic as before... I am
> > using qla2x00src-v6.03.00b6 driver on qla2200 fc controllers. 
> 
> Well, yours may be a qla bug.
> 
> However, I've tracked down my hang problem:  If a command is pushed back into 
> the block queue as a REQ_SPECIAL using the blk_insert_request() API, it never 
> has REQ_CMD cleared.  Eventually it comes back into scsi_request_fn() with 
> both REQ_CMD and REQ_SPECIAL set.  This causes the I/O to be initialised again.
> 
> The fix is to clear REQ_CMD in blk_insert_request().

Irk no, you cannot just clear valid information! This is _not_ a fix,
it's a hack.

Use a different flag, or maybe use REQ_DONTPREP or similar.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-26  9:29         ` Jens Axboe
@ 2002-10-27  0:50           ` James Bottomley
  2002-10-27 21:20             ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2002-10-27  0:50 UTC (permalink / raw
  To: Jens Axboe
  Cc: James Bottomley, Badari Pulavarty, Andrew Morton,
	linux-scsi@vger.kernel.org, Martin J. Bligh, Doug Ledford

[-- Attachment #1: Type: text/plain, Size: 315 bytes --]

axboe@suse.de said:
> Irk no, you cannot just clear valid information! This is _not_ a fix,
> it's a hack.

> Use a different flag, or maybe use REQ_DONTPREP or similar. 

OK, what about this.  It uses REQ_DONTPREP, and could be a precursor to moving 
the SCSI subsystem to using the request prep function.

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 2773 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.823   -> 1.824  
#	 drivers/scsi/scsi.h	1.30    -> 1.31   
#	drivers/scsi/scsi_lib.c	1.36    -> 1.37   
#	 drivers/scsi/scsi.c	1.50    -> 1.51   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/10/26	jejb@mulgrave.(none)	1.824
# [SCSI] fix memory etc. leak caused by double preparing requeued commands
# --------------------------------------------
#
diff -Nru a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
--- a/drivers/scsi/scsi.c	Sat Oct 26 19:46:01 2002
+++ b/drivers/scsi/scsi.c	Sat Oct 26 19:46:01 2002
@@ -658,7 +658,7 @@
  * Notes:       This could be called either from an interrupt context or a
  *              normal process context.
  */
-static int scsi_mlqueue_insert(Scsi_Cmnd * cmd, int reason)
+int scsi_mlqueue_insert(Scsi_Cmnd * cmd, int reason)
 {
 	struct Scsi_Host *host = cmd->host;
 	unsigned long flags;
diff -Nru a/drivers/scsi/scsi.h b/drivers/scsi/scsi.h
--- a/drivers/scsi/scsi.h	Sat Oct 26 19:46:01 2002
+++ b/drivers/scsi/scsi.h	Sat Oct 26 19:46:01 2002
@@ -480,6 +480,7 @@
 			void (*done) (struct scsi_cmnd *),
 			int timeout, int retries);
 extern int scsi_dev_init(void);
+extern int scsi_mlqueue_insert(struct scsi_cmnd *, int);
 
 /*
  * Newer request-based interfaces.
diff -Nru a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	Sat Oct 26 19:46:01 2002
+++ b/drivers/scsi/scsi_lib.c	Sat Oct 26 19:46:01 2002
@@ -1005,7 +1005,8 @@
 		req = NULL;
 		spin_unlock_irq(q->queue_lock);
 
-		if (SCpnt->request->flags & (REQ_CMD | REQ_BLOCK_PC)) {
+		if (!(SCpnt->request->flags & REQ_DONTPREP)
+		    && (SCpnt->request->flags & (REQ_CMD | REQ_BLOCK_PC))) {
 			/*
 			 * This will do a couple of things:
 			 *  1) Fill in the actual SCSI command.
@@ -1026,18 +1027,8 @@
 			 * required).
 			 */
 			if (!scsi_init_io(SCpnt)) {
+				scsi_mlqueue_insert(SCpnt, SCSI_MLQUEUE_DEVICE_BUSY);
 				spin_lock_irq(q->queue_lock);
-				SHpnt->host_busy--;
-				SDpnt->device_busy--;
-				if (SDpnt->device_busy == 0) {
-					SDpnt->starved = 1;
-					SHpnt->some_device_starved = 1;
-				}
-				SCpnt->request->special = SCpnt;
-				SCpnt->request->flags |= REQ_SPECIAL;
-				if(blk_rq_tagged(SCpnt->request))
-					blk_queue_end_tag(q, SCpnt->request);
-				_elv_add_request(q, SCpnt->request, 0, 0);
 				break;
 			}
 
@@ -1058,6 +1049,7 @@
 				continue;
 			}
 		}
+		SCpnt->request->flags |= REQ_DONTPREP;
 		/*
 		 * Finally, initialize any error handling parameters, and set up
 		 * the timers for timeouts.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-27  0:50           ` James Bottomley
@ 2002-10-27 21:20             ` Jens Axboe
  2002-10-27 21:37               ` James Bottomley
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-27 21:20 UTC (permalink / raw
  To: James Bottomley
  Cc: Badari Pulavarty, Andrew Morton, linux-scsi@vger.kernel.org,
	Martin J. Bligh, Doug Ledford

On Sat, Oct 26 2002, James Bottomley wrote:
> axboe@suse.de said:
> > Irk no, you cannot just clear valid information! This is _not_ a fix,
> > it's a hack.
> 
> > Use a different flag, or maybe use REQ_DONTPREP or similar. 
> 
> OK, what about this.  It uses REQ_DONTPREP, and could be a precursor
> to moving the SCSI subsystem to using the request prep function.

Sure that would suffice, but lets just move to the prep approach right
away. First just the global one, later we can do one per device type.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-27 21:20             ` Jens Axboe
@ 2002-10-27 21:37               ` James Bottomley
  2002-10-27 21:54                 ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2002-10-27 21:37 UTC (permalink / raw
  To: Jens Axboe
  Cc: James Bottomley, Badari Pulavarty, Andrew Morton,
	linux-scsi@vger.kernel.org, Martin J. Bligh, Doug Ledford

axboe@suse.de said:
> Sure that would suffice, but lets just move to the prep approach right
> away. First just the global one, later we can do one per device type. 

OK, will do.  SCSI currently needs the ability to bail on a request 
preparation in such a way that the request will get re-queued for preparation 
(out of command blocks or other resource shortage), which I believe you're 
working on.

The per device type is more tricky because SCSI can have more than one upper 
level driver driving the device queue.

James



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-27 21:37               ` James Bottomley
@ 2002-10-27 21:54                 ` Jens Axboe
  2002-10-30 17:39                   ` Badari Pulavarty
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-27 21:54 UTC (permalink / raw
  To: James Bottomley
  Cc: Badari Pulavarty, Andrew Morton, linux-scsi@vger.kernel.org,
	Martin J. Bligh, Doug Ledford

On Sun, Oct 27 2002, James Bottomley wrote:
> axboe@suse.de said:
> > Sure that would suffice, but lets just move to the prep approach right
> > away. First just the global one, later we can do one per device type. 
> 
> OK, will do.  SCSI currently needs the ability to bail on a request 
> preparation in such a way that the request will get re-queued for preparation 
> (out of command blocks or other resource shortage), which I believe you're 
> working on.

Oh sorry, just forgot to send that out. The change needed for this is
very small. Basically, in blkdev.h, add these types:

#define BLKPREP_OK              0       /* serve it */
#define BLKPREP_KILL            1       /* fatal error, kill */
#define BLKPREP_DEFER           2       /* leave on queue */

I've attached the elv_next_request(), sorry about this format but my
tree is just severly crowded with changes in the block area right now so
it's hard to diff.

Now you can just return BLKPREP_DEFER from the prep function, and it
should do what you want. We might need a bit more logic in there, mainly
to cover the case where we deferred the only request on the queue (your
usual plugging thing). Or I might leave that to the driver, dunno. But
at least this small change should make it possible for you to test.

> The per device type is more tricky because SCSI can have more than one
> upper level driver driving the device queue.

I was thinking more of doing device-type things, such as making the
sector into lba, rejecting write command on ro media, copy actual scsi
command, etc. But that might just fit into a generic function. And it
means we can basically kill the ->init_command() functions.

struct request *elv_next_request(request_queue_t *q)
{
	struct request *rq;
	int ret;

	while ((rq = __elv_next_request(q))) {
		/*
		 * just mark as started even if we don't start it, a request
		 * that has been delayed should not be passed by new incoming
		 * requests
		 */
		rq->flags |= REQ_STARTED;

		if (&rq->queuelist == q->last_merge)
			q->last_merge = NULL;

		if ((rq->flags & REQ_DONTPREP) || !q->prep_rq_fn)
			break;

		ret = q->prep_rq_fn(q, rq);
		if (ret == BLKPREP_OK) {
			break;
		} else if (ret == BLKPREP_DEFER) {
			rq = NULL;
			break;
		} else if (ret == BLKPREP_KILL) {
			blkdev_dequeue_request(rq);
			rq->flags |= REQ_QUIET;
			while (end_that_request_first(rq, 0, rq->nr_sectors))
				;
			end_that_request_last(rq);
		} else {
			printk("%s: bad return=%d\n", __FUNCTION__, ret);
			break;
		}
	}

	return rq;
}

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-27 21:54                 ` Jens Axboe
@ 2002-10-30 17:39                   ` Badari Pulavarty
  2002-10-30 18:16                     ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Badari Pulavarty @ 2002-10-30 17:39 UTC (permalink / raw
  To: Jens Axboe
  Cc: James Bottomley, Badari Pulavarty, Andrew Morton,
	linux-scsi@vger.kernel.org, Martin J. Bligh, Doug Ledford

Hi,

I still get panics while doing filesystem IO on -mm kernels (with
qlogic fc). I get following msgs before getting the panic:

Incorrect number of segments after building list
counted 3, received 2
req nr_sec 256, cur_nr_sec 8

So I debugged why this is happening. Here is the bottom line:

bio->bi_phys_segments calculated by blk_recount_segments()
is not matching the number of sg elements used by blk_rq_map_sg().

I am doing 128K IO. In the following debug output, my pages in the 
IO except for the first one are contiguous. Since q->max_segment_size = 64K,
it is using 3 sg elememnts - which is correct. 

But blk_recount_segments() is not incrementing bio->bi_phys_segments 
due to the BIOVEC_VIRT_MERGEABLE() check. It always fails and
it creats a new segment all the time. (which does not increment
bi_phys_segments).

What does BIOVEC_VIRT_MERGEABLE() supposed to do ? I am guessing
it is supposed to restrict IO crossing 4GB boundary. Is it correct ?


- Badari

Here is the debug output:

blk_recount_segments():
=======================

counted 3, received 2
req nr_sec 256, cur_nr_sec 8
>>>>> made a new seg:1 (c2f76718, 4096, 0)
Loop: brvprv:c2f76718 bvec:c2f77780 offset:0 len:4096
>>>>> made a new seg:2 (c2f77780, 4096, 0)
Loop: brvprv:c2f77780 bvec:c2f777a8 offset:0 len:4096
Loop: brvprv:c2f777a8 bvec:c2f777d0 offset:0 len:4096
Loop: brvprv:c2f777d0 bvec:c2f777f8 offset:0 len:4096
Loop: brvprv:c2f777f8 bvec:c2f77820 offset:0 len:4096
Loop: brvprv:c2f77820 bvec:c2f77848 offset:0 len:4096
Loop: brvprv:c2f77848 bvec:c2f77870 offset:0 len:4096
Loop: brvprv:c2f77870 bvec:c2f77898 offset:0 len:4096
Loop: brvprv:c2f77898 bvec:c2f778c0 offset:0 len:4096
Loop: brvprv:c2f778c0 bvec:c2f778e8 offset:0 len:4096
Loop: brvprv:c2f778e8 bvec:c2f77910 offset:0 len:4096
Loop: brvprv:c2f77910 bvec:c2f77938 offset:0 len:4096
Loop: brvprv:c2f77938 bvec:c2f77960 offset:0 len:4096
Loop: brvprv:c2f77960 bvec:c2f77988 offset:0 len:4096
Loop: brvprv:c2f77988 bvec:c2f779b0 offset:0 len:4096
Loop: brvprv:c2f779b0 bvec:c2f779d8 offset:0 len:4096
>>>>> made a new seg:3 (c2f77a00, 4096, 0)
Loop: brvprv:c2f77a00 bvec:c2f77a28 offset:0 len:4096
Loop: brvprv:c2f77a28 bvec:c2f77a50 offset:0 len:4096
Loop: brvprv:c2f77a50 bvec:c2f77a78 offset:0 len:4096
Loop: brvprv:c2f77a78 bvec:c2f77aa0 offset:0 len:4096
Loop: brvprv:c2f77aa0 bvec:c2f77ac8 offset:0 len:4096
Loop: brvprv:c2f77ac8 bvec:c2f77af0 offset:0 len:4096
Loop: brvprv:c2f77af0 bvec:c2f77b18 offset:0 len:4096
Loop: brvprv:c2f77b18 bvec:c2f77b40 offset:0 len:4096
Loop: brvprv:c2f77b40 bvec:c2f77b68 offset:0 len:4096
Loop: brvprv:c2f77b68 bvec:c2f77b90 offset:0 len:4096
Loop: brvprv:c2f77b90 bvec:c2f77bb8 offset:0 len:4096
Loop: brvprv:c2f77bb8 bvec:c2f77be0 offset:0 len:4096
Loop: brvprv:c2f77be0 bvec:c2f77c08 offset:0 len:4096
Loop: brvprv:c2f77c08 bvec:c2f77c30 offset:0 len:4096
>>>>> returning segs:3


blk_recount_segments():
======================
===>>>>> made a new seg phys:1 hw:1 
=== Loop: brvprv:c2f76718 bvec:c2f77780 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:2 
=== Loop: brvprv:c2f77780 bvec:c2f777a8 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:3 
=== Loop: brvprv:c2f777a8 bvec:c2f777d0 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:4 
=== Loop: brvprv:c2f777d0 bvec:c2f777f8 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:5 
=== Loop: brvprv:c2f777f8 bvec:c2f77820 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:6 
=== Loop: brvprv:c2f77820 bvec:c2f77848 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:7 
=== Loop: brvprv:c2f77848 bvec:c2f77870 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:8 
=== Loop: brvprv:c2f77870 bvec:c2f77898 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:9 
=== Loop: brvprv:c2f77898 bvec:c2f778c0 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:10 
=== Loop: brvprv:c2f778c0 bvec:c2f778e8 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:11 
=== Loop: brvprv:c2f778e8 bvec:c2f77910 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:12 
=== Loop: brvprv:c2f77910 bvec:c2f77938 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:13 
=== Loop: brvprv:c2f77938 bvec:c2f77960 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:14 
=== Loop: brvprv:c2f77960 bvec:c2f77988 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:15 
=== Loop: brvprv:c2f77988 bvec:c2f779b0 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:16 
=== Loop: brvprv:c2f779b0 bvec:c2f779d8 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:17 
=== Loop: brvprv:c2f779d8 bvec:c2f77a00 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:18 
=== Loop: brvprv:c2f77a00 bvec:c2f77a28 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:19 
=== Loop: brvprv:c2f77a28 bvec:c2f77a50 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:20 
=== Loop: brvprv:c2f77a50 bvec:c2f77a78 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:21 
=== Loop: brvprv:c2f77a78 bvec:c2f77aa0 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:22 
=== Loop: brvprv:c2f77aa0 bvec:c2f77ac8 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:23 
=== Loop: brvprv:c2f77ac8 bvec:c2f77af0 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:24 
=== Loop: brvprv:c2f77af0 bvec:c2f77b18 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:25 
=== Loop: brvprv:c2f77b18 bvec:c2f77b40 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:26 
=== Loop: brvprv:c2f77b40 bvec:c2f77b68 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:27 
=== Loop: brvprv:c2f77b68 bvec:c2f77b90 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:28 
=== Loop: brvprv:c2f77b90 bvec:c2f77bb8 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:29 
=== Loop: brvprv:c2f77bb8 bvec:c2f77be0 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:30 
=== Loop: brvprv:c2f77be0 bvec:c2f77c08 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:31 
=== Loop: brvprv:c2f77c08 bvec:c2f77c30 offset:0 len:4096
===>>>>> Going to make a new seg BIOVEC_VIRT_MERGEABLE failed
===>>>>> made a new seg phys:2 hw:32 
blk_recount_segments  phys: 2 hw:32

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 17:39                   ` Badari Pulavarty
@ 2002-10-30 18:16                     ` Jens Axboe
  2002-10-30 19:31                       ` Badari Pulavarty
  2002-10-30 20:35                       ` David S. Miller
  0 siblings, 2 replies; 37+ messages in thread
From: Jens Axboe @ 2002-10-30 18:16 UTC (permalink / raw
  Cc: James Bottomley, Badari Pulavarty, Andrew Morton,
	linux-scsi@vger.kernel.org, Martin J. Bligh, Doug Ledford,
	David S. Miller

On Wed, Oct 30 2002, Badari Pulavarty wrote:
> Hi,
> 
> I still get panics while doing filesystem IO on -mm kernels (with
> qlogic fc). I get following msgs before getting the panic:
> 
> Incorrect number of segments after building list
> counted 3, received 2
> req nr_sec 256, cur_nr_sec 8
> 
> So I debugged why this is happening. Here is the bottom line:
> 
> bio->bi_phys_segments calculated by blk_recount_segments()
> is not matching the number of sg elements used by blk_rq_map_sg().
> 
> I am doing 128K IO. In the following debug output, my pages in the 
> IO except for the first one are contiguous. Since q->max_segment_size = 64K,
> it is using 3 sg elememnts - which is correct. 
> 
> But blk_recount_segments() is not incrementing bio->bi_phys_segments 
> due to the BIOVEC_VIRT_MERGEABLE() check. It always fails and
> it creats a new segment all the time. (which does not increment
> bi_phys_segments).
> 
> What does BIOVEC_VIRT_MERGEABLE() supposed to do ? I am guessing
> it is supposed to restrict IO crossing 4GB boundary. Is it correct ?

It's only for platforms with iommu that can do funky remapping tricks
for pages ending on certain boundaries. I'm sure davem or anton can tell
you more about this for sparc64 or ppc64. For x86, it will always be
true. Hmm, looking at it, I can't convince myself that it is right.
Davem, could you please check up on this? I'll be back later tonight to
review it as well. Things are not consistent, I agree on that.

Good debugging, btw!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 18:16                     ` Jens Axboe
@ 2002-10-30 19:31                       ` Badari Pulavarty
  2002-10-30 21:36                         ` merlin hughes
  2002-10-30 20:35                       ` David S. Miller
  1 sibling, 1 reply; 37+ messages in thread
From: Badari Pulavarty @ 2002-10-30 19:31 UTC (permalink / raw
  To: Jens Axboe
  Cc: Badari Pulavarty, James Bottomley, Badari Pulavarty,
	Andrew Morton, linux-scsi@vger.kernel.org, merlin

> 
> On Wed, Oct 30 2002, Badari Pulavarty wrote:
> > Hi,
> > 
> > I still get panics while doing filesystem IO on -mm kernels (with
> > qlogic fc). I get following msgs before getting the panic:
> > 
> > Incorrect number of segments after building list
> > counted 3, received 2
> > req nr_sec 256, cur_nr_sec 8
> > 
	....
> > 
> > What does BIOVEC_VIRT_MERGEABLE() supposed to do ? I am guessing
> > it is supposed to restrict IO crossing 4GB boundary. Is it correct ?
> 
> It's only for platforms with iommu that can do funky remapping tricks
> for pages ending on certain boundaries. I'm sure davem or anton can tell
> you more about this for sparc64 or ppc64. For x86, it will always be
> true. Hmm, looking at it, I can't convince myself that it is right.
> Davem, could you please check up on this? I'll be back later tonight to
> review it as well. Things are not consistent, I agree on that.
> 
> Good debugging, btw!
> 
> -- 
> Jens Axboe

How about the following patch ?

- Badari

--- linux/include/linux/bio.h	Wed Oct 30 10:32:52 2002
+++ linux.new/include/linux/bio.h	Wed Oct 30 10:35:06 2002
@@ -160,7 +160,7 @@
 #define BIOVEC_PHYS_MERGEABLE(vec1, vec2)	\
 	((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
 #define BIOVEC_VIRT_MERGEABLE(vec1, vec2)	\
-	((((bvec_to_phys((vec1)) + (vec1)->bv_len) | bvec_to_phys((vec2))) & (BIO_VMERGE_BOUNDARY - 1)) == 0)
+	(((bvec_to_phys((vec1)) + (vec1)->bv_len) | (BIO_VMERGE_BOUNDARY - 1)) == (bvec_to_phys((vec2)) | (BIO_VMERGE_BOUNDARY - 1)))
 #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
 	(((addr1) | (mask)) == (((addr2) - 1) | (mask)))
 #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 18:16                     ` Jens Axboe
  2002-10-30 19:31                       ` Badari Pulavarty
@ 2002-10-30 20:35                       ` David S. Miller
  2002-10-30 22:03                         ` Badari Pulavarty
  1 sibling, 1 reply; 37+ messages in thread
From: David S. Miller @ 2002-10-30 20:35 UTC (permalink / raw
  To: axboe
  Cc: badari, James.Bottomley, pbadari, akpm, linux-scsi, Martin.Bligh,
	dledford

   From: Jens Axboe <axboe@suse.de>
   Date: Wed, 30 Oct 2002 19:16:27 +0100

   On Wed, Oct 30 2002, Badari Pulavarty wrote:
   > What does BIOVEC_VIRT_MERGEABLE() supposed to do ? I am guessing
   > it is supposed to restrict IO crossing 4GB boundary. Is it correct ?
   
   It's only for platforms with iommu that can do funky remapping tricks
   for pages ending on certain boundaries. I'm sure davem or anton can tell
   you more about this for sparc64 or ppc64. For x86, it will always be
   true. Hmm, looking at it, I can't convince myself that it is right.
   Davem, could you please check up on this? I'll be back later tonight to
   review it as well. Things are not consistent, I agree on that.

This whole report sounds like ppc64 is setting BIO_VMERGE_BOUNDARY
incorrectly.

If the IOMMU code on ppc64 is not going to merge physical scatterlist
segments which begin/end on a page boundry into a single DMA addr/len
pair, then this port should not be setting BIO_VMERGE_BOUNDARY
non-zero.

Else, if ppc64's IOMMU code will merge such pages, BIO_VMERGE_BOUNDARY
must be defined to the page size used by the IOMMU.

I notice now that the BIO_VMERGE_BOUNDARY define is "#if 0" protected
on ppc64.  It absolutely must be in sync with how the IOMMU code on
the platform behaves.  So some ppc64 expert needs to comment :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 19:31                       ` Badari Pulavarty
@ 2002-10-30 21:36                         ` merlin hughes
  2002-10-30 22:19                           ` Badari Pulavarty
  0 siblings, 1 reply; 37+ messages in thread
From: merlin hughes @ 2002-10-30 21:36 UTC (permalink / raw
  Cc: Jens Axboe, James Bottomley, Badari Pulavarty, Andrew Morton,
	linux-scsi@vger.kernel.org

r/badari@us.ibm.com/2002.10.30/11:31:57
>> 
>> On Wed, Oct 30 2002, Badari Pulavarty wrote:
>> > Hi,
>> > 
>> > I still get panics while doing filesystem IO on -mm kernels (with
>> > qlogic fc). I get following msgs before getting the panic:
>> > 
>> > Incorrect number of segments after building list
>> > counted 3, received 2
>> > req nr_sec 256, cur_nr_sec 8
>> > 
>	....
>> > 
>> > What does BIOVEC_VIRT_MERGEABLE() supposed to do ? I am guessing
>> > it is supposed to restrict IO crossing 4GB boundary. Is it correct ?
>> 
>> It's only for platforms with iommu that can do funky remapping tricks
>> for pages ending on certain boundaries. I'm sure davem or anton can tell
>> you more about this for sparc64 or ppc64. For x86, it will always be
>> true. Hmm, looking at it, I can't convince myself that it is right.
>> Davem, could you please check up on this? I'll be back later tonight to
>> review it as well. Things are not consistent, I agree on that.
>> 
>> Good debugging, btw!
>> 
>> -- 
>> Jens Axboe
>
>How about the following patch ?
>
>- Badari

Hi; if it's of any use, the patch doesn't seem to solve the scsi
problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
of segments...' and random kernel death during the boot process.

Merlin

>--- linux/include/linux/bio.h	Wed Oct 30 10:32:52 2002
>+++ linux.new/include/linux/bio.h	Wed Oct 30 10:35:06 2002
>@@ -160,7 +160,7 @@
> #define BIOVEC_PHYS_MERGEABLE(vec1, vec2)	\
> 	((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
> #define BIOVEC_VIRT_MERGEABLE(vec1, vec2)	\
>-	((((bvec_to_phys((vec1)) + (vec1)->bv_len) | bvec_to_phys((vec2))) & (B
>IO_VMERGE_BOUNDARY - 1)) == 0)
>+	(((bvec_to_phys((vec1)) + (vec1)->bv_len) | (BIO_VMERGE_BOUNDARY - 1)) 
>== (bvec_to_phys((vec2)) | (BIO_VMERGE_BOUNDARY - 1)))
> #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
> 	(((addr1) | (mask)) == (((addr2) - 1) | (mask)))
> #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 20:35                       ` David S. Miller
@ 2002-10-30 22:03                         ` Badari Pulavarty
  0 siblings, 0 replies; 37+ messages in thread
From: Badari Pulavarty @ 2002-10-30 22:03 UTC (permalink / raw
  To: David S. Miller
  Cc: axboe, badari, James.Bottomley, pbadari, akpm, linux-scsi,
	Martin.Bligh, dledford

> 
>    From: Jens Axboe <axboe@suse.de>
>    Date: Wed, 30 Oct 2002 19:16:27 +0100
> 
>    On Wed, Oct 30 2002, Badari Pulavarty wrote:
>    > What does BIOVEC_VIRT_MERGEABLE() supposed to do ? I am guessing
>    > it is supposed to restrict IO crossing 4GB boundary. Is it correct ?
>    
>    It's only for platforms with iommu that can do funky remapping tricks
>    for pages ending on certain boundaries. I'm sure davem or anton can tell
>    you more about this for sparc64 or ppc64. For x86, it will always be
>    true. Hmm, looking at it, I can't convince myself that it is right.
>    Davem, could you please check up on this? I'll be back later tonight to
>    review it as well. Things are not consistent, I agree on that.
> 
> This whole report sounds like ppc64 is setting BIO_VMERGE_BOUNDARY
> incorrectly.
> 
> If the IOMMU code on ppc64 is not going to merge physical scatterlist
> segments which begin/end on a page boundry into a single DMA addr/len
> pair, then this port should not be setting BIO_VMERGE_BOUNDARY
> non-zero.
> 
> Else, if ppc64's IOMMU code will merge such pages, BIO_VMERGE_BOUNDARY
> must be defined to the page size used by the IOMMU.
> 
> I notice now that the BIO_VMERGE_BOUNDARY define is "#if 0" protected
> on ppc64.  It absolutely must be in sync with how the IOMMU code on
> the platform behaves.  So some ppc64 expert needs to comment :)

Dave,

We are also talking about the correctness of the macro BIOVEC_VIRT_MERGEABLE().
I don't think it is doing the right thing.  (For x86, it is returning always false).

#define BIOVEC_VIRT_MERGEABLE(vec1, vec2)       \
        ((((bvec_to_phys((vec1)) + (vec1)->bv_len) | bvec_to_phys((vec2))) & (BIO_VMERGE_BOUNDARY - 1)) == 0)

I think it should me more like ..

#define BIOVEC_VIRT_MERGEABLE(vec1, vec2)       \
        (((bvec_to_phys((vec1)) + (vec1)->bv_len) | (BIO_VMERGE_BOUNDARY - 1)) == (bvec_to_phys((vec2)) | (BIO_VMERGE_BOUNDARY - 1)))

Do you agree ?

- Badari

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 21:36                         ` merlin hughes
@ 2002-10-30 22:19                           ` Badari Pulavarty
  2002-10-31  2:17                             ` merlin
  2002-10-31 15:04                             ` Jens Axboe
  0 siblings, 2 replies; 37+ messages in thread
From: Badari Pulavarty @ 2002-10-30 22:19 UTC (permalink / raw
  To: merlin hughes; +Cc: axboe, linux-scsi

> >- Badari
> 
> Hi; if it's of any use, the patch doesn't seem to solve the scsi
> problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
> of segments...' and random kernel death during the boot process.
> 
> Merlin

Hi Merlin,

I am looking at the output  of your problem ..


Oct 28 12:36:09 badb kernel: Incorrect number of segments after building list 
Oct 28 12:36:09 badb kernel: counted 2, received 1 
Oct 28 12:36:09 badb kernel: req nr_sec 8, cur_nr_sec 8 
Oct 28 12:36:09 badb kernel: end_request: I/O error, dev 08:40, sector 6784528 
Oct 28 12:36:09 badb kernel: raid5: Disk failure on scsi/host0/bus0/target4/lun0/part7, disabling device. Operation continuing on 4 devices 
Oct 28 12:36:09 badb kernel: blk: request botched 

Huh !! Your IO size is only 4K. You are using 2 sg entries ?
I am curious on finding out whats happening here ..

Would you mind adding few printk()s to following routine:

drivers/block/ll_rw_blk.c: blk_rq_map_sg()

Do something like this ...

int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
{
        struct bio_vec *bvec, *bvprv;
        struct bio *bio;
        int nsegs, i, cluster;

        nsegs = 0;
        cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);

        /*
         * for each bio in rq
         */
        bvprv = NULL;
        rq_for_each_bio(bio, rq) {
                /*
                 * for each segment in bio
                 */
                bio_for_each_segment(bvec, bio, i) {
                        int nbytes = bvec->bv_len;

                        if (bvprv && cluster) {
                                if (sg[nsegs - 1].length + nbytes > q->max_segment_size)
                                        goto new_segment;

printk("Loop: brvprv:%x bvec:%x offset:%d len:%d\n", bvprv->bv_page, bvec->bv_page, bvec->bv_offset, bvec->bv_len);
                                if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
                                        goto new_segment;
                                if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
                                        goto new_segment;

                                sg[nsegs - 1].length += nbytes;
                        } else {
new_segment:
                                memset(&sg[nsegs],0,sizeof(struct scatterlist));
                                sg[nsegs].page = bvec->bv_page;
                                sg[nsegs].length = nbytes;
                                sg[nsegs].offset = bvec->bv_offset;

                                nsegs++;
printk(">>>>> made a new seg:%d (%x, %d, %d)\n", nsegs, bvec->bv_page, nbytes,  bvec->bv_offset);
                        }
                        bvprv = bvec;
                } /* segments in bio */
        } /* bios in rq */

printk(">>>>> returning segs:%d\n", nsegs);
        return nsegs;
}



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 22:19                           ` Badari Pulavarty
@ 2002-10-31  2:17                             ` merlin
  2002-10-31 13:18                               ` Jens Axboe
  2002-10-31 15:04                             ` Jens Axboe
  1 sibling, 1 reply; 37+ messages in thread
From: merlin @ 2002-10-31  2:17 UTC (permalink / raw
  To: Badari Pulavarty; +Cc: axboe, linux-scsi

r/badari@us.ibm.com/2002.10.30/14:19:05
>> >- Badari
>> 
>> Hi; if it's of any use, the patch doesn't seem to solve the scsi
>> problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
>> of segments...' and random kernel death during the boot process.
>> 
>> Merlin
>
>Hi Merlin,
>
>I am looking at the output  of your problem ..
>
>
>Oct 28 12:36:09 badb kernel: Incorrect number of segments after building list 
>Oct 28 12:36:09 badb kernel: counted 2, received 1 
>Oct 28 12:36:09 badb kernel: req nr_sec 8, cur_nr_sec 8 
>Oct 28 12:36:09 badb kernel: end_request: I/O error, dev 08:40, sector 6784528
> 
>Oct 28 12:36:09 badb kernel: raid5: Disk failure on scsi/host0/bus0/target4/lu
>n0/part7, disabling device. Operation continuing on 4 devices 
>Oct 28 12:36:09 badb kernel: blk: request botched 
>
>Huh !! Your IO size is only 4K. You are using 2 sg entries ?
>I am curious on finding out whats happening here ..
>
>Would you mind adding few printk()s to following routine:
>
>drivers/block/ll_rw_blk.c: blk_rq_map_sg()
>
>Do something like this ...

Hi Badari,

I grabbed 2.5.45 and added the printks you describe.. The kernel
dies before the syslog is written and I don't have a serial device
to try capturing the output there, but this is what I scribbled
down:

  ...lots of stuff ...
  home: clean
  made a new seg: 1 (c18adc70, 4096, 0)
  Loop: brvprv: c18adc70 bvec:c18ad838 offset:0 length:4096
  made a new seg: 2 (c18ad838, 4096, 0)
  returning segs: 2
  Incorrect number of segments after building list
  counted 2, received 1
  ...some stuff...

Things went on for a few more messages before the kernel died
a death.

It's an SMP box so the seg stuff that precedes the error may
be unrelated; I don't have a good enough grasp to be sure. I
guess an interesting value to know might be max_segment_size.

Half tempted to try a non-SMP build, just to see. Each time I
do this, I get a bit more (recoverable) fs damage, but still..

merlin

>int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *
>sg)
>{
>        struct bio_vec *bvec, *bvprv;
>        struct bio *bio;
>        int nsegs, i, cluster;
>
>        nsegs = 0;
>        cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
>
>        /*
>         * for each bio in rq
>         */
>        bvprv = NULL;
>        rq_for_each_bio(bio, rq) {
>                /*
>                 * for each segment in bio
>                 */
>                bio_for_each_segment(bvec, bio, i) {
>                        int nbytes = bvec->bv_len;
>
>                        if (bvprv && cluster) {
>                                if (sg[nsegs - 1].length + nbytes > q->max_seg
>ment_size)
>                                        goto new_segment;
>
>printk("Loop: brvprv:%x bvec:%x offset:%d len:%d\n", bvprv->bv_page, bvec->bv_
>page, bvec->bv_offset, bvec->bv_len);
>                                if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
>                                        goto new_segment;
>                                if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
>                                        goto new_segment;
>
>                                sg[nsegs - 1].length += nbytes;
>                        } else {
>new_segment:
>                                memset(&sg[nsegs],0,sizeof(struct scatterlist)
>);
>                                sg[nsegs].page = bvec->bv_page;
>                                sg[nsegs].length = nbytes;
>                                sg[nsegs].offset = bvec->bv_offset;
>
>                                nsegs++;
>printk(">>>>> made a new seg:%d (%x, %d, %d)\n", nsegs, bvec->bv_page, nbytes,
>  bvec->bv_offset);
>                        }
>                        bvprv = bvec;
>                } /* segments in bio */
>        } /* bios in rq */
>
>printk(">>>>> returning segs:%d\n", nsegs);
>        return nsegs;
>}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-31  2:17                             ` merlin
@ 2002-10-31 13:18                               ` Jens Axboe
  2002-10-31 14:41                                 ` merlin
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-31 13:18 UTC (permalink / raw
  To: merlin; +Cc: Badari Pulavarty, linux-scsi

On Wed, Oct 30 2002, merlin wrote:
> r/badari@us.ibm.com/2002.10.30/14:19:05
> >> >- Badari
> >> 
> >> Hi; if it's of any use, the patch doesn't seem to solve the scsi
> >> problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
> >> of segments...' and random kernel death during the boot process.
> >> 
> >> Merlin
> >
> >Hi Merlin,
> >
> >I am looking at the output  of your problem ..
> >
> >
> >Oct 28 12:36:09 badb kernel: Incorrect number of segments after building list 
> >Oct 28 12:36:09 badb kernel: counted 2, received 1 
> >Oct 28 12:36:09 badb kernel: req nr_sec 8, cur_nr_sec 8 
> >Oct 28 12:36:09 badb kernel: end_request: I/O error, dev 08:40, sector 6784528
> > 
> >Oct 28 12:36:09 badb kernel: raid5: Disk failure on scsi/host0/bus0/target4/lu
> >n0/part7, disabling device. Operation continuing on 4 devices 
> >Oct 28 12:36:09 badb kernel: blk: request botched 
> >
> >Huh !! Your IO size is only 4K. You are using 2 sg entries ?
> >I am curious on finding out whats happening here ..
> >
> >Would you mind adding few printk()s to following routine:
> >
> >drivers/block/ll_rw_blk.c: blk_rq_map_sg()
> >
> >Do something like this ...
> 
> Hi Badari,
> 
> I grabbed 2.5.45 and added the printks you describe.. The kernel
> dies before the syslog is written and I don't have a serial device
> to try capturing the output there, but this is what I scribbled
> down:
> 
>   ...lots of stuff ...
>   home: clean
>   made a new seg: 1 (c18adc70, 4096, 0)
>   Loop: brvprv: c18adc70 bvec:c18ad838 offset:0 length:4096
>   made a new seg: 2 (c18ad838, 4096, 0)
>   returning segs: 2
>   Incorrect number of segments after building list
>   counted 2, received 1
>   ...some stuff...
> 
> Things went on for a few more messages before the kernel died
> a death.
> 
> It's an SMP box so the seg stuff that precedes the error may
> be unrelated; I don't have a good enough grasp to be sure. I
> guess an interesting value to know might be max_segment_size.
> 
> Half tempted to try a non-SMP build, just to see. Each time I
> do this, I get a bit more (recoverable) fs damage, but still..

SMP should not make a difference. Does this patch make a difference?

===== drivers/block/ll_rw_blk.c 1.135 vs edited =====
--- 1.135/drivers/block/ll_rw_blk.c	Mon Oct 28 20:57:59 2002
+++ edited/drivers/block/ll_rw_blk.c	Thu Oct 31 14:17:09 2002
@@ -694,31 +694,23 @@
 	seg_size = nr_phys_segs = nr_hw_segs = 0;
 	bio_for_each_segment(bv, bio, i) {
 		if (bvprv && cluster) {
-			int phys, seg;
-
-			if (seg_size + bv->bv_len > q->max_segment_size) {
-				nr_phys_segs++;
+			if (seg_size + bv->bv_len > q->max_segment_size)
 				goto new_segment;
-			}
 
-			phys = BIOVEC_PHYS_MERGEABLE(bvprv, bv);
-			seg = BIOVEC_SEG_BOUNDARY(q, bvprv, bv);
-			if (!phys || !seg)
-				nr_phys_segs++;
-			if (!seg)
+			if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv))
 				goto new_segment;
-
-			if (!BIOVEC_VIRT_MERGEABLE(bvprv, bv))
+			if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
 				goto new_segment;
 
 			seg_size += bv->bv_len;
 			bvprv = bv;
 			continue;
-		} else {
-			nr_phys_segs++;
 		}
 new_segment:
-		nr_hw_segs++;
+		if (!bvprv || !BIOVEC_VIRT_MERGEABLE(bvprv, bv))
+			nr_hw_segs++;
+
+		nr_phys_segs++;
 		bvprv = bv;
 		seg_size = bv->bv_len;
 	}

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-31 13:18                               ` Jens Axboe
@ 2002-10-31 14:41                                 ` merlin
  2002-10-31 14:46                                   ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: merlin @ 2002-10-31 14:41 UTC (permalink / raw
  To: Jens Axboe; +Cc: Badari Pulavarty, linux-scsi

r/axboe@suse.de/2002.10.31/14:18:24
>On Wed, Oct 30 2002, merlin wrote:
>> Hi Badari,
>> 
>> I grabbed 2.5.45 and added the printks you describe.. The kernel
>> dies before the syslog is written and I don't have a serial device
>> to try capturing the output there, but this is what I scribbled
>> down:
>> 
>>   ...lots of stuff ...
>>   home: clean
>>   made a new seg: 1 (c18adc70, 4096, 0)
>>   Loop: brvprv: c18adc70 bvec:c18ad838 offset:0 length:4096
>>   made a new seg: 2 (c18ad838, 4096, 0)
>>   returning segs: 2
>>   Incorrect number of segments after building list
>>   counted 2, received 1
>>   ...some stuff...
>> 
>> Things went on for a few more messages before the kernel died
>> a death.
>> 
>> It's an SMP box so the seg stuff that precedes the error may
>> be unrelated; I don't have a good enough grasp to be sure. I
>> guess an interesting value to know might be max_segment_size.
>> 
>> Half tempted to try a non-SMP build, just to see. Each time I
>> do this, I get a bit more (recoverable) fs damage, but still..
>
>SMP should not make a difference. Does this patch make a difference?

Same error I'm afraid; ``Incorrect number of segments after building list
counted 2, received 1'' follows shortly by kernel panic..

Thanks, merlin

>===== drivers/block/ll_rw_blk.c 1.135 vs edited =====
>--- 1.135/drivers/block/ll_rw_blk.c	Mon Oct 28 20:57:59 2002
>+++ edited/drivers/block/ll_rw_blk.c	Thu Oct 31 14:17:09 2002
>@@ -694,31 +694,23 @@
> 	seg_size = nr_phys_segs = nr_hw_segs = 0;
> 	bio_for_each_segment(bv, bio, i) {
> 		if (bvprv && cluster) {
>-			int phys, seg;
>-
>-			if (seg_size + bv->bv_len > q->max_segment_size) {
>-				nr_phys_segs++;
>+			if (seg_size + bv->bv_len > q->max_segment_size)
> 				goto new_segment;
>-			}
> 
>-			phys = BIOVEC_PHYS_MERGEABLE(bvprv, bv);
>-			seg = BIOVEC_SEG_BOUNDARY(q, bvprv, bv);
>-			if (!phys || !seg)
>-				nr_phys_segs++;
>-			if (!seg)
>+			if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv))
> 				goto new_segment;
>-
>-			if (!BIOVEC_VIRT_MERGEABLE(bvprv, bv))
>+			if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
> 				goto new_segment;
> 
> 			seg_size += bv->bv_len;
> 			bvprv = bv;
> 			continue;
>-		} else {
>-			nr_phys_segs++;
> 		}
> new_segment:
>-		nr_hw_segs++;
>+		if (!bvprv || !BIOVEC_VIRT_MERGEABLE(bvprv, bv))
>+			nr_hw_segs++;
>+
>+		nr_phys_segs++;
> 		bvprv = bv;
> 		seg_size = bv->bv_len;
> 	}
>
>-- 
>Jens Axboe

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-31 14:41                                 ` merlin
@ 2002-10-31 14:46                                   ` Jens Axboe
  0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2002-10-31 14:46 UTC (permalink / raw
  To: merlin; +Cc: Badari Pulavarty, linux-scsi

On Thu, Oct 31 2002, merlin wrote:
> r/axboe@suse.de/2002.10.31/14:18:24
> >On Wed, Oct 30 2002, merlin wrote:
> >> Hi Badari,
> >> 
> >> I grabbed 2.5.45 and added the printks you describe.. The kernel
> >> dies before the syslog is written and I don't have a serial device
> >> to try capturing the output there, but this is what I scribbled
> >> down:
> >> 
> >>   ...lots of stuff ...
> >>   home: clean
> >>   made a new seg: 1 (c18adc70, 4096, 0)
> >>   Loop: brvprv: c18adc70 bvec:c18ad838 offset:0 length:4096
> >>   made a new seg: 2 (c18ad838, 4096, 0)
> >>   returning segs: 2
> >>   Incorrect number of segments after building list
> >>   counted 2, received 1
> >>   ...some stuff...
> >> 
> >> Things went on for a few more messages before the kernel died
> >> a death.
> >> 
> >> It's an SMP box so the seg stuff that precedes the error may
> >> be unrelated; I don't have a good enough grasp to be sure. I
> >> guess an interesting value to know might be max_segment_size.
> >> 
> >> Half tempted to try a non-SMP build, just to see. Each time I
> >> do this, I get a bit more (recoverable) fs damage, but still..
> >
> >SMP should not make a difference. Does this patch make a difference?
> 
> Same error I'm afraid; ``Incorrect number of segments after building list
> counted 2, received 1'' follows shortly by kernel panic..

in drivers/scsi/aic7xxx/aic7xxx_linux_host.h:83, change
ENABLE_CLUSTERING to DISABLE_CLUSTERING and repeat the test, please.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-30 22:19                           ` Badari Pulavarty
  2002-10-31  2:17                             ` merlin
@ 2002-10-31 15:04                             ` Jens Axboe
  2002-10-31 15:12                               ` Jens Axboe
  1 sibling, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-31 15:04 UTC (permalink / raw
  To: Badari Pulavarty; +Cc: merlin hughes, linux-scsi

On Wed, Oct 30 2002, Badari Pulavarty wrote:
> > >- Badari
> > 
> > Hi; if it's of any use, the patch doesn't seem to solve the scsi
> > problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
> > of segments...' and random kernel death during the boot process.
> > 
> > Merlin
> 
> Hi Merlin,
> 
> I am looking at the output  of your problem ..
> 
> 
> Oct 28 12:36:09 badb kernel: Incorrect number of segments after building list 
> Oct 28 12:36:09 badb kernel: counted 2, received 1 
> Oct 28 12:36:09 badb kernel: req nr_sec 8, cur_nr_sec 8 
> Oct 28 12:36:09 badb kernel: end_request: I/O error, dev 08:40, sector 6784528 
> Oct 28 12:36:09 badb kernel: raid5: Disk failure on scsi/host0/bus0/target4/lun0/part7, disabling device. Operation continuing on 4 devices 
> 
> Huh !! Your IO size is only 4K. You are using 2 sg entries ?

Even weirder, nr_sec == cur_nr_sec so there can only be one segment or
something is corrupted...

> Oct 28 12:36:09 badb kernel: blk: request botched 

which this one also seems to indicate.

Merlin, do you have any idea whether older 2.5 kernels worked for you?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-31 15:04                             ` Jens Axboe
@ 2002-10-31 15:12                               ` Jens Axboe
  2002-10-31 17:41                                 ` merlin
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2002-10-31 15:12 UTC (permalink / raw
  To: Badari Pulavarty; +Cc: merlin hughes, linux-scsi

On Thu, Oct 31 2002, Jens Axboe wrote:
> On Wed, Oct 30 2002, Badari Pulavarty wrote:
> > > >- Badari
> > > 
> > > Hi; if it's of any use, the patch doesn't seem to solve the scsi
> > > problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
> > > of segments...' and random kernel death during the boot process.
> > > 
> > > Merlin
> > 
> > Hi Merlin,
> > 
> > I am looking at the output  of your problem ..
> > 
> > 
> > Oct 28 12:36:09 badb kernel: Incorrect number of segments after building list 
> > Oct 28 12:36:09 badb kernel: counted 2, received 1 
> > Oct 28 12:36:09 badb kernel: req nr_sec 8, cur_nr_sec 8 
> > Oct 28 12:36:09 badb kernel: end_request: I/O error, dev 08:40, sector 6784528 
> > Oct 28 12:36:09 badb kernel: raid5: Disk failure on scsi/host0/bus0/target4/lun0/part7, disabling device. Operation continuing on 4 devices 
> > 
> > Huh !! Your IO size is only 4K. You are using 2 sg entries ?
> 
> Even weirder, nr_sec == cur_nr_sec so there can only be one segment or
> something is corrupted...

Ah wait, I think I may know at least what is happening in this case.
I've seen numerous reports of software raid problems botching bio's, and
this above one could easily be explained with one of the bio's having
!bi_size. In fact, it's about the only explanation, otherwise there's
just no way we can have nr_sec == cur_nr_sec unless only _one_ bio is
attached to the request.

Merlin, please also add a

	blk_dump_rq_flags(req, "scsi_init_io");

to drivers/scsi/scsi_lib.c:scsi_init_io() before it calls
scsi_end_request() and kills the request (right after the incorrect
segment complaining).

Badari, I'm not so sure that Merlin's and your bug are the same. Is
yours solved by the patch I sent out earlier? AFAICT, that should fix
the segment miscounting.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-31 15:12                               ` Jens Axboe
@ 2002-10-31 17:41                                 ` merlin
  0 siblings, 0 replies; 37+ messages in thread
From: merlin @ 2002-10-31 17:41 UTC (permalink / raw
  To: Jens Axboe; +Cc: Badari Pulavarty, linux-scsi

r/axboe@suse.de/2002.10.31/16:12:12
>On Thu, Oct 31 2002, Jens Axboe wrote:
>> On Wed, Oct 30 2002, Badari Pulavarty wrote:
>> > > >- Badari
>> > > 
>> > > Hi; if it's of any use, the patch doesn't seem to solve the scsi
>> > > problem for me (2.5.44-bk3-badari). I get the usual 'Incorrect number
>> > > of segments...' and random kernel death during the boot process.
>> > > 
>> > > Merlin
>> > 
>> > Hi Merlin,
>> > 
>> > I am looking at the output  of your problem ..
>> > 
>> > 
>> > Oct 28 12:36:09 badb kernel: Incorrect number of segments after building l
>ist 
>> > Oct 28 12:36:09 badb kernel: counted 2, received 1 
>> > Oct 28 12:36:09 badb kernel: req nr_sec 8, cur_nr_sec 8 
>> > Oct 28 12:36:09 badb kernel: end_request: I/O error, dev 08:40, sector 678
>4528 
>> > Oct 28 12:36:09 badb kernel: raid5: Disk failure on scsi/host0/bus0/target
>4/lun0/part7, disabling device. Operation continuing on 4 devices 
>> > 
>> > Huh !! Your IO size is only 4K. You are using 2 sg entries ?
>> 
>> Even weirder, nr_sec == cur_nr_sec so there can only be one segment or
>> something is corrupted...
>
>Ah wait, I think I may know at least what is happening in this case.
>I've seen numerous reports of software raid problems botching bio's, and
>this above one could easily be explained with one of the bio's having
>!bi_size. In fact, it's about the only explanation, otherwise there's
>just no way we can have nr_sec == cur_nr_sec unless only _one_ bio is
>attached to the request.
>
>Merlin, please also add a
>
>	blk_dump_rq_flags(req, "scsi_init_io");
>
>to drivers/scsi/scsi_lib.c:scsi_init_io() before it calls
>scsi_end_request() and kills the request (right after the incorrect
>segment complaining).

Hi Jens,

Disabling clustering in the aic7xxx driver results in blk: request
botched, NULL dereference, panic (but no segment count complaint).

Booting with the nosmp flag results in strange aic7xxx (I believe)
death really early on.

I have tried no earlier 2.5 kernels on the machine.

For peace of mind, I switched 2.5.45 to root from an old
non--s/w-RAID drive in the machine. syslog goes to a working
root, and it lets me bring up the raid devices manually.

Kernel 2.5.45 WITH your ll_rw_blk.c patch, WITHOUT Badari's
VIRT_MERGEABLE patch, WITHOUT disabling aic7xxx clustering,
WITH this extra logging..

I can boot successfully (well, no keyboard and lots if init
garbage. but no crash) which suggests strongly that it is an
md problem. The drive is on a different controller, but I
doubt that's relevant; it's still an aic7xxx.

I could then bring up md0 (RAID 1 across two disks) and do a
basic scan (zcav) through it without problems. I didn't stress
things (typing on a zaurus isn't easy enough) but it seemed
okay.

Then I brought up md3 (RAID 5 across five disks) and ran zcav
across it. Worked for a while, then:

--8<--
Oct 31 12:00:57 badb kernel: (scsi0:A:2:0): data overrun detected in Data-in phase.  Tag == 0x6.
Oct 31 12:00:57 badb kernel: (scsi0:A:2:0): Have seen Data Phase.  Length = 8192.  NumSGs = 2.
Oct 31 12:00:57 badb kernel: sg[0] - Addr 0x037a2c000 : Length 4096
Oct 31 12:00:57 badb kernel: sg[1] - Addr 0x037a31000 : Length 4096
Oct 31 12:01:27 badb kernel: scsi0:0:4:0: Attempting to queue an ABORT message
Oct 31 12:01:27 badb kernel: scsi0: Dumping Card State in Data-in phase, at SEQADDR 0x168
Oct 31 12:01:27 badb kernel: ACCUM = 0x0, SINDEX = 0x91, DINDEX = 0xe4, ARG_2 = 0x1
Oct 31 12:01:27 badb kernel: HCNT = 0x0
Oct 31 12:01:27 badb kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Oct 31 12:01:27 badb kernel:  DFCNTRL = 0x0, DFSTATUS = 0x89
Oct 31 12:01:27 badb kernel: LASTPHASE = 0x40, SCSISIGI = 0x46, SXFRCTL0 = 0x88
Oct 31 12:01:27 badb kernel: SSTAT0 = 0x2, SSTAT1 = 0x0
Oct 31 12:01:27 badb kernel: SCSIPHASE = 0x0
Oct 31 12:01:27 badb kernel: STACK == 0x63, 0x160, 0x0, 0x34
Oct 31 12:01:27 badb kernel: SCB count = 8
Oct 31 12:01:27 badb kernel: Kernel NEXTQSCB = 0
Oct 31 12:01:27 badb kernel: Card NEXTQSCB = 7
Oct 31 12:01:27 badb kernel: QINFIFO entries: 7 1 3
Oct 31 12:01:27 badb kernel: Waiting Queue entries:
Oct 31 12:01:27 badb kernel: Disconnected Queue entries:
Oct 31 12:01:27 badb kernel: QOUTFIFO entries:
Oct 31 12:01:27 badb kernel: Sequencer Free SCB List: 0 1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Oct 31 12:01:27 badb kernel: Pending list: 3, 1, 7, 6
Oct 31 12:01:27 badb kernel: Kernel Free SCB list: 2 5 4
Oct 31 12:01:27 badb kernel: DevQ(0:0:0): 0 waiting
Oct 31 12:01:27 badb kernel: DevQ(0:1:0): 0 waiting
Oct 31 12:01:27 badb kernel: DevQ(0:2:0): 0 waiting
Oct 31 12:01:27 badb kernel: DevQ(0:3:0): 0 waiting
Oct 31 12:01:27 badb kernel: DevQ(0:4:0): 0 waiting
Oct 31 12:01:27 badb kernel: scsi0:0:4:0: Cmd aborted from QINFIFO
Oct 31 12:01:27 badb kernel: aic7xxx_abort returns 0x2002
Oct 31 12:01:37 badb kernel: scsi0:0:4:0: Attempting to queue an ABORT message
Oct 31 12:01:37 badb kernel: scsi0: Dumping Card State in Data-in phase, at SEQADDR 0x168
Oct 31 12:01:37 badb kernel: ACCUM = 0x0, SINDEX = 0x91, DINDEX = 0xe4, ARG_2 = 0x1
Oct 31 12:01:37 badb kernel: HCNT = 0x0
Oct 31 12:01:37 badb kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Oct 31 12:01:37 badb kernel:  DFCNTRL = 0x0, DFSTATUS = 0x89
Oct 31 12:01:37 badb kernel: LASTPHASE = 0x40, SCSISIGI = 0x46, SXFRCTL0 = 0x88
Oct 31 12:01:37 badb kernel: SSTAT0 = 0x2, SSTAT1 = 0x0
Oct 31 12:01:37 badb kernel: SCSIPHASE = 0x0
Oct 31 12:01:37 badb kernel: STACK == 0x63, 0x160, 0x0, 0x34
Oct 31 12:01:37 badb kernel: SCB count = 8
Oct 31 12:01:37 badb kernel: Kernel NEXTQSCB = 3
Oct 31 12:01:37 badb kernel: Card NEXTQSCB = 0
Oct 31 12:01:37 badb kernel: QINFIFO entries: 0 1 7
Oct 31 12:01:37 badb kernel: Waiting Queue entries:
Oct 31 12:01:37 badb kernel: Disconnected Queue entries:
Oct 31 12:01:37 badb kernel: QOUTFIFO entries:
Oct 31 12:01:37 badb kernel: Sequencer Free SCB List: 0 1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Oct 31 12:01:37 badb kernel: Pending list: 7, 1, 0, 6
Oct 31 12:01:37 badb kernel: Kernel Free SCB list: 2 5 4
Oct 31 12:01:37 badb kernel: DevQ(0:0:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:1:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:2:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:3:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:4:0): 0 waiting
Oct 31 12:01:37 badb kernel: scsi0:0:4:0: Cmd aborted from QINFIFO
Oct 31 12:01:37 badb kernel: aic7xxx_abort returns 0x2002
Oct 31 12:01:37 badb kernel: scsi0:0:3:0: Attempting to queue an ABORT message
Oct 31 12:01:37 badb kernel: scsi0: Dumping Card State in Data-in phase, at SEQADDR 0x168
Oct 31 12:01:37 badb kernel: ACCUM = 0x0, SINDEX = 0x91, DINDEX = 0xe4, ARG_2 = 0x1
Oct 31 12:01:37 badb kernel: HCNT = 0x0
Oct 31 12:01:37 badb kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Oct 31 12:01:37 badb kernel:  DFCNTRL = 0x0, DFSTATUS = 0x89
Oct 31 12:01:37 badb kernel: LASTPHASE = 0x40, SCSISIGI = 0x46, SXFRCTL0 = 0x88
Oct 31 12:01:37 badb kernel: SSTAT0 = 0x2, SSTAT1 = 0x0
Oct 31 12:01:37 badb kernel: SCSIPHASE = 0x0
Oct 31 12:01:37 badb kernel: STACK == 0x63, 0x160, 0x0, 0x34
Oct 31 12:01:37 badb kernel: SCB count = 8
Oct 31 12:01:37 badb kernel: Kernel NEXTQSCB = 0
Oct 31 12:01:37 badb kernel: Card NEXTQSCB = 3
Oct 31 12:01:37 badb kernel: QINFIFO entries: 3 1
Oct 31 12:01:37 badb kernel: Waiting Queue entries:
Oct 31 12:01:37 badb kernel: Disconnected Queue entries:
Oct 31 12:01:37 badb kernel: QOUTFIFO entries:
Oct 31 12:01:37 badb kernel: Sequencer Free SCB List: 0 1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Oct 31 12:01:37 badb kernel: Pending list: 1, 3, 6
Oct 31 12:01:37 badb kernel: Kernel Free SCB list: 7 2 5 4
Oct 31 12:01:37 badb kernel: DevQ(0:0:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:1:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:2:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:3:0): 0 waiting
Oct 31 12:01:37 badb kernel: DevQ(0:4:0): 0 waiting
Oct 31 12:01:37 badb kernel: scsi0:0:3:0: Cmd aborted from QINFIFO
Oct 31 12:01:37 badb kernel: aic7xxx_abort returns 0x2002
... panic (not logged)
-->8--

I'll reboot again, when people are off this machine, with
the raid devices mounted at boot time, to see if I can get
the segment error and that blk_dump_rq_flags stuff.

merlin

>Badari, I'm not so sure that Merlin's and your bug are the same. Is
>yours solved by the patch I sent out earlier? AFAICT, that should fix
>the segment miscounting.
>
>-- 
>Jens Axboe

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
@ 2002-10-31 17:57 Badari Pulavarty
  2002-10-31 18:46 ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Badari Pulavarty @ 2002-10-31 17:57 UTC (permalink / raw
  To: Jens Axboe; +Cc: linux-scsi



> Badari, I'm not so sure that Merlin's and your bug are the same. Is
> yours solved by the patch I sent out earlier? AFAICT, that should fix
> the segment miscounting.

Jens,

Yes. Your patch did fix my problem. But still I think BIOVEC_VIRT_MERGEABLE
() is not doing
the correct thing for x86. (It is returning FALSE for everything).

#define BIOVEC_VIRT_MERGEABLE(vec1, vec2)       \
        ((((bvec_to_phys((vec1)) + (vec1)->bv_len) | bvec_to_phys((vec2)))
& (BIO_VMERGE_BOUNDARY -1)) == 0)

I think BIO_VMERGE_BOUNDARY should be set to "1" instead of "0" for the
archs where this is not needed.
That will force it to return TRUE always.

And also, I was wondering for x86, where do we check to see if the
IO/segment crossing 4GB boundary.
(something similar to 2.4 BH_PHYS_4G()). Don't we need this for drivers
which  can't handle
IO crossing 4GB boundary ?

Thanks,
Badari




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: possible use-after-free in 2.5.44 scsi changes
  2002-10-31 17:57 Badari Pulavarty
@ 2002-10-31 18:46 ` Jens Axboe
  0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2002-10-31 18:46 UTC (permalink / raw
  To: Badari Pulavarty; +Cc: linux-scsi

On Thu, Oct 31 2002, Badari Pulavarty wrote:
> 
> 
> > Badari, I'm not so sure that Merlin's and your bug are the same. Is
> > yours solved by the patch I sent out earlier? AFAICT, that should fix
> > the segment miscounting.
> 
> Jens,
> 
> Yes. Your patch did fix my problem. But still I think BIOVEC_VIRT_MERGEABLE

Great

> () is not doing
> the correct thing for x86. (It is returning FALSE for everything).

It's supposed to :-)

> #define BIOVEC_VIRT_MERGEABLE(vec1, vec2)       \
>         ((((bvec_to_phys((vec1)) + (vec1)->bv_len) | bvec_to_phys((vec2)))
> & (BIO_VMERGE_BOUNDARY -1)) == 0)
> 
> I think BIO_VMERGE_BOUNDARY should be set to "1" instead of "0" for the
> archs where this is not needed.
> That will force it to return TRUE always.

Why should it? We will always be creating a new hardware segment. I
think the logic is cleaner with my patch, in fact I think it was wrong
before. The cluster check is for a plain physical contig segment. We
can't do any sort of funky remapping tricks to make two non-contig pages
appear as one sg segment, so BIOVEC_VIRT_MERGEABLE is supposed to always
return false. But it's not supposed to hinder physical segment
clustering as it did before.

> And also, I was wondering for x86, where do we check to see if the
> IO/segment crossing 4GB boundary.
> (something similar to 2.4 BH_PHYS_4G()). Don't we need this for drivers
> which  can't handle
> IO crossing 4GB boundary ?

BIO_SEG_BOUNDARY and BIOVEC_SEG_BOUNDARY checks for that, see
blk_queue_segment_boundary(). Default is 0xffffffff.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2002-10-31 18:46 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-25  1:39 possible use-after-free in 2.5.44 scsi changes Andrew Morton
2002-10-25  4:06 ` Doug Ledford
2002-10-25  4:40   ` Andrew Morton
2002-10-25 14:21     ` James Bottomley
2002-10-25  4:07 ` Patrick Mansfield
2002-10-25 14:16 ` James Bottomley
2002-10-25 18:34   ` James Bottomley
2002-10-25 18:49     ` Mike Anderson
2002-10-25 19:08     ` Patrick Mansfield
2002-10-25 19:41       ` Mike Anderson
2002-10-25 19:47         ` Jens Axboe
2002-10-25 22:14           ` James Bottomley
2002-10-25 22:18             ` Andrew Morton
2002-10-25 22:23     ` Badari Pulavarty
2002-10-26  0:13       ` James Bottomley
2002-10-26  0:18         ` Mike Anderson
2002-10-26  9:29         ` Jens Axboe
2002-10-27  0:50           ` James Bottomley
2002-10-27 21:20             ` Jens Axboe
2002-10-27 21:37               ` James Bottomley
2002-10-27 21:54                 ` Jens Axboe
2002-10-30 17:39                   ` Badari Pulavarty
2002-10-30 18:16                     ` Jens Axboe
2002-10-30 19:31                       ` Badari Pulavarty
2002-10-30 21:36                         ` merlin hughes
2002-10-30 22:19                           ` Badari Pulavarty
2002-10-31  2:17                             ` merlin
2002-10-31 13:18                               ` Jens Axboe
2002-10-31 14:41                                 ` merlin
2002-10-31 14:46                                   ` Jens Axboe
2002-10-31 15:04                             ` Jens Axboe
2002-10-31 15:12                               ` Jens Axboe
2002-10-31 17:41                                 ` merlin
2002-10-30 20:35                       ` David S. Miller
2002-10-30 22:03                         ` Badari Pulavarty
  -- strict thread matches above, loose matches on Subject: below --
2002-10-31 17:57 Badari Pulavarty
2002-10-31 18:46 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.