* Controller problems during reshape -> can't continue reshape after reboot.
@ 2012-08-20 19:55 Tim Small
2012-08-20 22:37 ` NeilBrown
2012-08-21 0:51 ` John Robinson
0 siblings, 2 replies; 6+ messages in thread
From: Tim Small @ 2012-08-20 19:55 UTC (permalink / raw
To: linux-raid@vger.kernel.org
Hi,
I was attempting to reshape a RAID5 from 4 to 5 devices. During the
reshape, I had a problem with one of the controller cards in the
machine, so that first one drive, had repeated errors (and was
eventually marked as failed), and then several hours later, I/O to
another drive effectively stalled. At this point, /proc/mdstat was
showing the reshape proceeding (with one drive marked as failed), but
the throughput had dropped to zero.
After rebooting the machine (alt-sysrq s, u, b) the array won't
reassemble (with or without '--force')...
(I've now replaced the card, and read all data on all drives
successfully...)
[ 2716.070788] raid5: md1 is not clean -- starting background reconstruction
[ 2716.070984] raid5: reshape will continue
[ 2716.071166] raid5: device sda1 operational as raid disk 0
[ 2716.071350] raid5: device sdi1 operational as raid disk 4
[ 2716.071534] raid5: device sdj1 operational as raid disk 3
[ 2716.071715] raid5: device sdk1 operational as raid disk 1
[ 2716.072217] raid5: allocated 5334kB for md1
[ 2716.072452] 0: w=1 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.072633] 4: w=2 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.072816] 3: w=3 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.073001] 1: w=4 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.073180] raid5: cannot start dirty degraded array for md1
[ 2716.073372] RAID5 conf printout:
[ 2716.073544] --- rd:5 wd:4
[ 2716.073717] disk 0, o:1, dev:sda1
[ 2716.073884] disk 1, o:1, dev:sdk1
[ 2716.074071] disk 3, o:1, dev:sdj1
[ 2716.074239] disk 4, o:1, dev:sdi1
[ 2716.074575] raid5: failed to run raid set md1
[ 2716.074749] md: pers->run() failed ...
Any chance of carrying on where it left off, or should I recreate the
array from scratch?
# cat /etc/debian_version ; uname -a
6.0.2
Linux rodmell 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64
GNU/Linux
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sda1[0] sdi1[5] sdj1[4] sdk1[1]
7814054112 blocks super 1.2
# mdadm -E /dev/sd[hijak]1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
Name : rodmell:1 (local to host rodmell)
Creation Time : Mon Dec 19 18:00:13 2011
Raid Level : raid5
Raid Devices : 5
Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 1bf82ae0:82b71e9b:6283dc62:467026fc
Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
Delta Devices : 1 (4->5)
Update Time : Mon Aug 20 08:42:56 2012
Checksum : 46d057ad - correct
Events : 24587
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.AA ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
Name : rodmell:1 (local to host rodmell)
Creation Time : Mon Dec 19 18:00:13 2011
Raid Level : raid5
Raid Devices : 5
Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3e9cca4d:3872738b:1903ee56:5a91b935
Reshape pos'n : 10582016 (10.09 GiB 10.84 GB)
Delta Devices : 1 (4->5)
Update Time : Thu Aug 16 17:30:46 2012
Checksum : 12400b18 - correct
Events : 15896
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
Name : rodmell:1 (local to host rodmell)
Creation Time : Mon Dec 19 18:00:13 2011
Raid Level : raid5
Raid Devices : 5
Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 904de121:58fbef1d:16546bd7:d3ab29c5
Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
Delta Devices : 1 (4->5)
Update Time : Fri Aug 17 01:32:23 2012
Checksum : 48e5a3d3 - correct
Events : 24586
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AA.AA ('A' == active, '.' == missing)
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
Name : rodmell:1 (local to host rodmell)
Creation Time : Mon Dec 19 18:00:13 2011
Raid Level : raid5
Raid Devices : 5
Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 59efcddf:9e679807:09ce1bc4:d882af69
Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
Delta Devices : 1 (4->5)
Update Time : Mon Aug 20 08:42:56 2012
Checksum : 81b55c43 - correct
Events : 24587
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AA.AA ('A' == active, '.' == missing)
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
Name : rodmell:1 (local to host rodmell)
Creation Time : Mon Dec 19 18:00:13 2011
Raid Level : raid5
Raid Devices : 5
Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 31b29cdb:0b70201e:de2036a4:5aecda02
Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
Delta Devices : 1 (4->5)
Update Time : Mon Aug 20 08:42:56 2012
Checksum : d51e3dc - correct
Events : 24587
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA.AA ('A' == active, '.' == missing)
Cheers,
Tim.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Controller problems during reshape -> can't continue reshape after reboot.
2012-08-20 19:55 Controller problems during reshape -> can't continue reshape after reboot Tim Small
@ 2012-08-20 22:37 ` NeilBrown
2012-08-21 7:36 ` Tim Small
2012-08-21 0:51 ` John Robinson
1 sibling, 1 reply; 6+ messages in thread
From: NeilBrown @ 2012-08-20 22:37 UTC (permalink / raw
To: Tim Small; +Cc: linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 7685 bytes --]
On Mon, 20 Aug 2012 20:55:38 +0100 Tim Small <tim@buttersideup.com> wrote:
> Hi,
>
> I was attempting to reshape a RAID5 from 4 to 5 devices. During the
> reshape, I had a problem with one of the controller cards in the
> machine, so that first one drive, had repeated errors (and was
> eventually marked as failed), and then several hours later, I/O to
> another drive effectively stalled. At this point, /proc/mdstat was
> showing the reshape proceeding (with one drive marked as failed), but
> the throughput had dropped to zero.
>
>
> After rebooting the machine (alt-sysrq s, u, b) the array won't
> reassemble (with or without '--force')...
>
> (I've now replaced the card, and read all data on all drives
> successfully...)
>
> [ 2716.070788] raid5: md1 is not clean -- starting background reconstruction
> [ 2716.070984] raid5: reshape will continue
> [ 2716.071166] raid5: device sda1 operational as raid disk 0
> [ 2716.071350] raid5: device sdi1 operational as raid disk 4
> [ 2716.071534] raid5: device sdj1 operational as raid disk 3
> [ 2716.071715] raid5: device sdk1 operational as raid disk 1
> [ 2716.072217] raid5: allocated 5334kB for md1
> [ 2716.072452] 0: w=1 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.072633] 4: w=2 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.072816] 3: w=3 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.073001] 1: w=4 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.073180] raid5: cannot start dirty degraded array for md1
> [ 2716.073372] RAID5 conf printout:
> [ 2716.073544] --- rd:5 wd:4
> [ 2716.073717] disk 0, o:1, dev:sda1
> [ 2716.073884] disk 1, o:1, dev:sdk1
> [ 2716.074071] disk 3, o:1, dev:sdj1
> [ 2716.074239] disk 4, o:1, dev:sdi1
> [ 2716.074575] raid5: failed to run raid set md1
> [ 2716.074749] md: pers->run() failed ...
>
>
> Any chance of carrying on where it left off, or should I recreate the
> array from scratch?
What version of mdadm (mdadm -V) ?
Try
echo 1 > /sys/module/md_mod/parameters/start_dirty_degraded
mdadm -S /dev/md1
and then try assembling the array again.
NeilBrown
>
> # cat /etc/debian_version ; uname -a
> 6.0.2
> Linux rodmell 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64
> GNU/Linux
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md1 : inactive sda1[0] sdi1[5] sdj1[4] sdk1[1]
> 7814054112 blocks super 1.2
> # mdadm -E /dev/sd[hijak]1
> /dev/sda1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
> Name : rodmell:1 (local to host rodmell)
> Creation Time : Mon Dec 19 18:00:13 2011
> Raid Level : raid5
> Raid Devices : 5
>
> Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
> Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 1bf82ae0:82b71e9b:6283dc62:467026fc
>
> Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
> Delta Devices : 1 (4->5)
>
> Update Time : Mon Aug 20 08:42:56 2012
> Checksum : 46d057ad - correct
> Events : 24587
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 0
> Array State : AA.AA ('A' == active, '.' == missing)
> /dev/sdh1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
> Name : rodmell:1 (local to host rodmell)
> Creation Time : Mon Dec 19 18:00:13 2011
> Raid Level : raid5
> Raid Devices : 5
>
> Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
> Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 3e9cca4d:3872738b:1903ee56:5a91b935
>
> Reshape pos'n : 10582016 (10.09 GiB 10.84 GB)
> Delta Devices : 1 (4->5)
>
> Update Time : Thu Aug 16 17:30:46 2012
> Checksum : 12400b18 - correct
> Events : 15896
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 2
> Array State : AAAAA ('A' == active, '.' == missing)
> /dev/sdi1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
> Name : rodmell:1 (local to host rodmell)
> Creation Time : Mon Dec 19 18:00:13 2011
> Raid Level : raid5
> Raid Devices : 5
>
> Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
> Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 904de121:58fbef1d:16546bd7:d3ab29c5
>
> Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
> Delta Devices : 1 (4->5)
>
> Update Time : Fri Aug 17 01:32:23 2012
> Checksum : 48e5a3d3 - correct
> Events : 24586
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 4
> Array State : AA.AA ('A' == active, '.' == missing)
> /dev/sdj1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
> Name : rodmell:1 (local to host rodmell)
> Creation Time : Mon Dec 19 18:00:13 2011
> Raid Level : raid5
> Raid Devices : 5
>
> Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
> Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 59efcddf:9e679807:09ce1bc4:d882af69
>
> Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
> Delta Devices : 1 (4->5)
>
> Update Time : Mon Aug 20 08:42:56 2012
> Checksum : 81b55c43 - correct
> Events : 24587
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 3
> Array State : AA.AA ('A' == active, '.' == missing)
> /dev/sdk1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
> Name : rodmell:1 (local to host rodmell)
> Creation Time : Mon Dec 19 18:00:13 2011
> Raid Level : raid5
> Raid Devices : 5
>
> Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
> Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 31b29cdb:0b70201e:de2036a4:5aecda02
>
> Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
> Delta Devices : 1 (4->5)
>
> Update Time : Mon Aug 20 08:42:56 2012
> Checksum : d51e3dc - correct
> Events : 24587
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : Active device 1
> Array State : AA.AA ('A' == active, '.' == missing)
>
>
>
> Cheers,
>
> Tim.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Controller problems during reshape -> can't continue reshape after reboot.
2012-08-20 19:55 Controller problems during reshape -> can't continue reshape after reboot Tim Small
2012-08-20 22:37 ` NeilBrown
@ 2012-08-21 0:51 ` John Robinson
2012-08-21 7:51 ` Tim Small
1 sibling, 1 reply; 6+ messages in thread
From: John Robinson @ 2012-08-21 0:51 UTC (permalink / raw
To: Tim Small; +Cc: linux-raid@vger.kernel.org
On 20/08/2012 20:55, Tim Small wrote:
> I was attempting to reshape a RAID5 from 4 to 5 devices. During the
> reshape, I had a problem with one of the controller cards in the
> machine
Sorry this isn't very helpful, but just out of interest, what kind of
controller card and what was the problem, do you know?
Cheers,
John.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Controller problems during reshape -> can't continue reshape after reboot.
2012-08-20 22:37 ` NeilBrown
@ 2012-08-21 7:36 ` Tim Small
0 siblings, 0 replies; 6+ messages in thread
From: Tim Small @ 2012-08-21 7:36 UTC (permalink / raw
To: NeilBrown; +Cc: linux-raid@vger.kernel.org
On 20/08/12 23:37, NeilBrown wrote:
> What version of mdadm (mdadm -V) ?
>
mdadm - v3.1.4 - 31st August 2010
> Try
> echo 1 > /sys/module/md_mod/parameters/start_dirty_degraded
> mdadm -S /dev/md1
>
> and then try assembling the array again.
>
Seems to be working, the reshape is continuing, and an fsck has
completed - thanks.
Tim.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Controller problems during reshape -> can't continue reshape after reboot.
2012-08-21 0:51 ` John Robinson
@ 2012-08-21 7:51 ` Tim Small
2012-08-21 16:15 ` Michael-John Turner
0 siblings, 1 reply; 6+ messages in thread
From: Tim Small @ 2012-08-21 7:51 UTC (permalink / raw
To: John Robinson; +Cc: linux-raid@vger.kernel.org
On 21/08/12 01:51, John Robinson wrote:
> On 20/08/2012 20:55, Tim Small wrote:
>> I was attempting to reshape a RAID5 from 4 to 5 devices. During the
>> reshape, I had a problem with one of the controller cards in the
>> machine
>
> Sorry this isn't very helpful, but just out of interest, what kind of
> controller card and what was the problem, do you know?
It was a Marvell 9125 AHCI card (highpoint branded), and I've had
numerous problems with them locking up (the 9123s are worse), and have
scrapped all of the Marvell 9123s already, and gone back to using a
combination of Marvell's 88SX7042 (non-AHCI) cards, and Silicon Image
3132 etc. cards.
It's a shame, as Marvell's AHCI cards are readily available, cheap and
do both 6G SATA, and PCIe 2.0, whereas the earlier Marvell cards, and
the Silicon Image cards are all 3G SATA and (more importantly for me)
PCIe 1.0, which ends up being a bottle neck for modern spinning disks...
I wish Intel, Silicon Image (or someone else) would make a reliable SATA
PCIe 2.0 card!
Intel and Silicon Image publish their errata, whereas Marvell keep
their's secret, under NDA, and don't respond to hardware bug reports in
my experience. Avoid.
Tim.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Controller problems during reshape -> can't continue reshape after reboot.
2012-08-21 7:51 ` Tim Small
@ 2012-08-21 16:15 ` Michael-John Turner
0 siblings, 0 replies; 6+ messages in thread
From: Michael-John Turner @ 2012-08-21 16:15 UTC (permalink / raw
To: Tim Small; +Cc: John Robinson, linux-raid@vger.kernel.org
On Tue, Aug 21, 2012 at 08:51:02AM +0100, Tim Small wrote:
> It was a Marvell 9125 AHCI card (highpoint branded), and I've had
> numerous problems with them locking up (the 9123s are worse), and have
> scrapped all of the Marvell 9123s already, and gone back to using a
> combination of Marvell's 88SX7042 (non-AHCI) cards, and Silicon Image
> 3132 etc. cards.
FWIW, I have a Marvell 9123[1] in one of my systems and found that
disabling NCQ on the two drives hooked to it[2] stopped any funnies.
Agreed, it's not the best of adapters, but it's been rock solid for the
year or so since I made that change.
[1] "Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 10)"
[2] A pair of WD Scorpio Blacks in an md RAID1 set
-mj
--
Michael-John Turner
mj@mjturner.net <> http://mjturner.net/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-08-21 16:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-20 19:55 Controller problems during reshape -> can't continue reshape after reboot Tim Small
2012-08-20 22:37 ` NeilBrown
2012-08-21 7:36 ` Tim Small
2012-08-21 0:51 ` John Robinson
2012-08-21 7:51 ` Tim Small
2012-08-21 16:15 ` Michael-John Turner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.