* Double failure on RAID-5 array
@ 2003-06-16 16:18 Marc Marais
2003-06-18 7:13 ` Marc Marais
0 siblings, 1 reply; 3+ messages in thread
From: Marc Marais @ 2003-06-16 16:18 UTC (permalink / raw
To: linux-raid
I've been running software raid on 2.2.19 for some time now (raidtools 0.9,
patched etc) without incident.
I've recently upgraded to 2.4.18 (debian kernel-source pkg) and last night I
experienced a strange failure. I'm wondering if this is somehow related to
2.4.18 or if it was just a coincidence?
I'm running a Raid-5 array with 3 WDC 80GB drives (each on separate IDE
busses).
hde/g are on a CMD649 based controller (on separate channels). It appears
that a DMA timeout occurred on this device, causing a reset on both hde/g -
fortunately one channel recovered. I had to reboot to get all drives back
online.
So much for having devices on separate channels - looks like devices should
be on separate controllers!
Any suggestions for preventing a recurrence? Are there known issues with
CMD649 cards and software RAID in 2.4.18? Am I missing some patches for
2.4.18 relating to software RAID/CMD649/IDE?
I'm hoping my move to 2.4 series wasn't a mistake - I have a lot of data on
this array!
Appreciate any help.
Jun 16 19:26:22 xerces kernel: hde: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
func only: 14
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
func only: 14
Jun 16 19:26:22 xerces kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: ide3: reset: success
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing WRITE
Jun 16 19:26:22 xerces kernel: ide2: reset: master: error (0x50?)
Jun 16 19:26:32 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:42 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
func only: 14
Jun 16 19:26:42 xerces kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hde: recal_intr: status=0xd0 { Busy }
Jun 16 19:26:43 xerces kernel: ide3: reset: success
Jun 16 19:26:43 xerces kernel: ide2: reset: master: error (0x50?)
--
Marc Marais
marc@liquid-nexus.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* Double failure on RAID-5 array
2003-06-16 16:18 Double failure on RAID-5 array Marc Marais
@ 2003-06-18 7:13 ` Marc Marais
2003-06-18 16:50 ` Bernd Schubert
0 siblings, 1 reply; 3+ messages in thread
From: Marc Marais @ 2003-06-18 7:13 UTC (permalink / raw
To: linux-raid
I've been running software raid on 2.2.19 for some time now (raidtools 0.9,
patched etc) without incident.
I've recently upgraded to 2.4.18 (debian kernel-source pkg) and last night I
experienced a strange failure. I'm wondering if this is somehow related to
2.4.18 or if it was just a coincidence?
I'm running a Raid-5 array with 3 WDC 80GB drives (each on separate IDE
busses).
hde/g are on a CMD649 based controller (on separate channels). It appears
that a DMA timeout occurred on this device, causing a reset on both hde/g -
fortunately one channel recovered. I had to reboot to get all drives back
online.
So much for having devices on separate channels - looks like devices should
be on separate controllers!
Any suggestions for preventing a recurrence? Are there known issues with
CMD649 cards and software RAID in 2.4.18? Am I missing some patches for
2.4.18 relating to software RAID/CMD649/IDE?
I'm hoping my move to 2.4 series wasn't a mistake - I have a lot of data on
this array!
Appreciate any help.
Jun 16 19:26:22 xerces kernel: hde: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
func only: 14
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
func only: 14
Jun 16 19:26:22 xerces kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:22 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:22 xerces kernel: ide3: reset: success
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing MULTWRITE
Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
SeekComplete }
Jun 16 19:26:22 xerces kernel: hde: no DRQ after issuing WRITE
Jun 16 19:26:22 xerces kernel: ide2: reset: master: error (0x50?)
Jun 16 19:26:32 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hdg: timeout waiting for DMA
Jun 16 19:26:42 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
func only: 14
Jun 16 19:26:42 xerces kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hdg: status timeout: status=0xd0 { Busy }
Jun 16 19:26:42 xerces kernel: hdg: drive not ready for command
Jun 16 19:26:42 xerces kernel: hde: lost interrupt
Jun 16 19:26:42 xerces kernel: hde: recal_intr: status=0xd0 { Busy }
Jun 16 19:26:43 xerces kernel: ide3: reset: success
Jun 16 19:26:43 xerces kernel: ide2: reset: master: error (0x50?)
--
Marc Marais
marc@liquid-nexus.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Double failure on RAID-5 array
2003-06-18 7:13 ` Marc Marais
@ 2003-06-18 16:50 ` Bernd Schubert
0 siblings, 0 replies; 3+ messages in thread
From: Bernd Schubert @ 2003-06-18 16:50 UTC (permalink / raw
To: linux-raid
Hello,
> I'm hoping my move to 2.4 series wasn't a mistake - I have a lot of data on
> this array!
>
Unless you have created 2.4 and above filesystem specific things (e.g.
reiserfs 3.6, etc), downgrading to 2.2 shouldn't be that difficult,
should it ?
> Appreciate any help.
>
> Jun 16 19:26:22 xerces kernel: hde: timeout waiting for DMA
> Jun 16 19:26:22 xerces kernel: ide_dmaproc: chipset supported ide_dma_timeout
> func only: 14
> Jun 16 19:26:22 xerces kernel: hde: status error: status=0x50 { DriveReady
[...]
Looks like a drive hardware problem. You could try to run some drive
tests using smartctl.
On the other hand it could be one of the ide-kernel bugs as well, I
would upgrade to 2.4.21 in any case.
Bernd
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-06-18 16:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-16 16:18 Double failure on RAID-5 array Marc Marais
2003-06-18 7:13 ` Marc Marais
2003-06-18 16:50 ` Bernd Schubert
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.