Double failure on RAID-5 array

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Double failure on RAID-5 array
@ 2003-06-16 16:18 Marc Marais
  2003-06-18  7:13 ` Marc Marais
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Marais @ 2003-06-16 16:18 UTC (permalink / raw
  To: linux-raid

I've been running software raid on 2.2.19 for some time now (raidtools 0.9, 
patched etc) without incident.

I've recently upgraded to 2.4.18 (debian kernel-source pkg) and last night I 
experienced a strange failure. I'm wondering if this is somehow related to 
2.4.18 or if it was just a coincidence?

I'm running a Raid-5 array with 3 WDC 80GB drives (each on separate IDE 
busses).

hde/g are on a CMD649 based controller (on separate channels). It appears 
that a DMA timeout occurred on this device, causing a reset on both hde/g - 
fortunately one channel recovered. I had to reboot to get all drives back 
online.

So much for having devices on separate channels - looks like devices should 
be on separate controllers!

Any suggestions for preventing a recurrence? Are there known issues with 
CMD649 cards and software RAID in 2.4.18? Am I missing some patches for 
2.4.18 relating to software RAID/CMD649/IDE?

I'm hoping my move to 2.4 series wasn't a mistake - I have a lot of data on 
this array!

Appreciate any help. 

Jun 16 19:26:22 xerces kernel: hde: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
func only: 14
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
func only: 14
Jun 16 19:26:22 xerces kernel: hdg: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: ide3: reset: success
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing WRITE
Jun 16 19:26:22 xerces kernel: ide2: reset: master: error (0x50?)
Jun 16 19:26:32 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:42 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
func only: 14
Jun 16 19:26:42 xerces kernel: hdg: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hde: recal_intr: status=0xd0 { Busy }
Jun 16 19:26:43 xerces kernel: ide3: reset: success
Jun 16 19:26:43 xerces kernel: ide2: reset: master: error (0x50?)


--
Marc Marais
marc@liquid-nexus.net 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Double failure on RAID-5 array
  2003-06-16 16:18 Double failure on RAID-5 array Marc Marais
@ 2003-06-18  7:13 ` Marc Marais
  2003-06-18 16:50   ` Bernd Schubert
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Marais @ 2003-06-18  7:13 UTC (permalink / raw
  To: linux-raid

I've been running software raid on 2.2.19 for some time now (raidtools 0.9, 
patched etc) without incident.

I've recently upgraded to 2.4.18 (debian kernel-source pkg) and last night I 
experienced a strange failure. I'm wondering if this is somehow related to 
2.4.18 or if it was just a coincidence?

I'm running a Raid-5 array with 3 WDC 80GB drives (each on separate IDE 
busses).

hde/g are on a CMD649 based controller (on separate channels). It appears 
that a DMA timeout occurred on this device, causing a reset on both hde/g - 
fortunately one channel recovered. I had to reboot to get all drives back 
online.

So much for having devices on separate channels - looks like devices should 
be on separate controllers!

Any suggestions for preventing a recurrence? Are there known issues with 
CMD649 cards and software RAID in 2.4.18? Am I missing some patches for 
2.4.18 relating to software RAID/CMD649/IDE?

I'm hoping my move to 2.4 series wasn't a mistake - I have a lot of data on 
this array!

Appreciate any help. 

Jun 16 19:26:22 xerces kernel: hde: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
func only: 14
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
func only: 14
Jun 16 19:26:22 xerces kernel: hdg: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: ide3: reset: success
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing WRITE
Jun 16 19:26:22 xerces kernel: ide2: reset: master: error (0x50?)
Jun 16 19:26:32 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:42 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
func only: 14
Jun 16 19:26:42 xerces kernel: hdg: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hde: recal_intr: status=0xd0 { Busy }
Jun 16 19:26:43 xerces kernel: ide3: reset: success
Jun 16 19:26:43 xerces kernel: ide2: reset: master: error (0x50?)

--
Marc Marais
marc@liquid-nexus.net


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Double failure on RAID-5 array
  2003-06-18  7:13 ` Marc Marais
@ 2003-06-18 16:50   ` Bernd Schubert
  0 siblings, 0 replies; 3+ messages in thread
From: Bernd Schubert @ 2003-06-18 16:50 UTC (permalink / raw
  To: linux-raid

Hello,

> I'm hoping my move to 2.4 series wasn't a mistake - I have a lot of data on 
> this array!
> 

Unless you have created 2.4 and above filesystem specific things (e.g.
reiserfs 3.6, etc), downgrading to 2.2 shouldn't be that difficult,
should it ?

> Appreciate any help. 
> 
> Jun 16 19:26:22 xerces kernel: hde: timeout waiting for DMA
> Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout 
> func only: 14
> Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady 
[...]

Looks like a drive hardware problem. You could try to run some drive
tests using smartctl.
On the other hand it could be one of the ide-kernel bugs as well, I
would upgrade to 2.4.21 in any case.

Bernd

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-06-18 16:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-16 16:18 Double failure on RAID-5 array Marc Marais
2003-06-18  7:13 ` Marc Marais
2003-06-18 16:50   ` Bernd Schubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.