All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Controller problems during reshape -> can't continue reshape after reboot.
@ 2012-08-20 19:55 Tim Small
  2012-08-20 22:37 ` NeilBrown
  2012-08-21  0:51 ` John Robinson
  0 siblings, 2 replies; 6+ messages in thread
From: Tim Small @ 2012-08-20 19:55 UTC (permalink / raw
  To: linux-raid@vger.kernel.org

Hi,

I was attempting to reshape a RAID5 from 4 to 5 devices.  During the
reshape, I had a problem with one of the controller cards in the
machine, so that first one drive, had repeated errors (and was
eventually marked as failed), and then several hours later, I/O to
another drive effectively stalled.  At this point, /proc/mdstat was
showing the reshape proceeding (with one drive marked as failed), but
the throughput had dropped to zero.


After rebooting the machine (alt-sysrq s, u, b) the array won't
reassemble (with or without '--force')...

(I've now replaced the card, and read all data on all drives
successfully...)

[ 2716.070788] raid5: md1 is not clean -- starting background reconstruction
[ 2716.070984] raid5: reshape will continue
[ 2716.071166] raid5: device sda1 operational as raid disk 0
[ 2716.071350] raid5: device sdi1 operational as raid disk 4
[ 2716.071534] raid5: device sdj1 operational as raid disk 3
[ 2716.071715] raid5: device sdk1 operational as raid disk 1
[ 2716.072217] raid5: allocated 5334kB for md1
[ 2716.072452] 0: w=1 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.072633] 4: w=2 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.072816] 3: w=3 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.073001] 1: w=4 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
[ 2716.073180] raid5: cannot start dirty degraded array for md1
[ 2716.073372] RAID5 conf printout:
[ 2716.073544]  --- rd:5 wd:4
[ 2716.073717]  disk 0, o:1, dev:sda1
[ 2716.073884]  disk 1, o:1, dev:sdk1
[ 2716.074071]  disk 3, o:1, dev:sdj1
[ 2716.074239]  disk 4, o:1, dev:sdi1
[ 2716.074575] raid5: failed to run raid set md1
[ 2716.074749] md: pers->run() failed ...


Any chance of carrying on where it left off, or should I recreate the
array from scratch?

# cat /etc/debian_version ; uname -a
6.0.2
Linux rodmell 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64
GNU/Linux
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sda1[0] sdi1[5] sdj1[4] sdk1[1]
      7814054112 blocks super 1.2
# mdadm -E /dev/sd[hijak]1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
           Name : rodmell:1  (local to host rodmell)
  Creation Time : Mon Dec 19 18:00:13 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
     Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 1bf82ae0:82b71e9b:6283dc62:467026fc

  Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
  Delta Devices : 1 (4->5)

    Update Time : Mon Aug 20 08:42:56 2012
       Checksum : 46d057ad - correct
         Events : 24587

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.AA ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
           Name : rodmell:1  (local to host rodmell)
  Creation Time : Mon Dec 19 18:00:13 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
     Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 3e9cca4d:3872738b:1903ee56:5a91b935

  Reshape pos'n : 10582016 (10.09 GiB 10.84 GB)
  Delta Devices : 1 (4->5)

    Update Time : Thu Aug 16 17:30:46 2012
       Checksum : 12400b18 - correct
         Events : 15896

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing)
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
           Name : rodmell:1  (local to host rodmell)
  Creation Time : Mon Dec 19 18:00:13 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
     Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 904de121:58fbef1d:16546bd7:d3ab29c5

  Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
  Delta Devices : 1 (4->5)

    Update Time : Fri Aug 17 01:32:23 2012
       Checksum : 48e5a3d3 - correct
         Events : 24586

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AA.AA ('A' == active, '.' == missing)
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
           Name : rodmell:1  (local to host rodmell)
  Creation Time : Mon Dec 19 18:00:13 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
     Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 59efcddf:9e679807:09ce1bc4:d882af69

  Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
  Delta Devices : 1 (4->5)

    Update Time : Mon Aug 20 08:42:56 2012
       Checksum : 81b55c43 - correct
         Events : 24587

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AA.AA ('A' == active, '.' == missing)
/dev/sdk1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
           Name : rodmell:1  (local to host rodmell)
  Creation Time : Mon Dec 19 18:00:13 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
     Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 31b29cdb:0b70201e:de2036a4:5aecda02

  Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
  Delta Devices : 1 (4->5)

    Update Time : Mon Aug 20 08:42:56 2012
       Checksum : d51e3dc - correct
         Events : 24587

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.AA ('A' == active, '.' == missing)



Cheers,

Tim.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controller problems during reshape -> can't continue reshape after reboot.
  2012-08-20 19:55 Controller problems during reshape -> can't continue reshape after reboot Tim Small
@ 2012-08-20 22:37 ` NeilBrown
  2012-08-21  7:36   ` Tim Small
  2012-08-21  0:51 ` John Robinson
  1 sibling, 1 reply; 6+ messages in thread
From: NeilBrown @ 2012-08-20 22:37 UTC (permalink / raw
  To: Tim Small; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 7685 bytes --]

On Mon, 20 Aug 2012 20:55:38 +0100 Tim Small <tim@buttersideup.com> wrote:

> Hi,
> 
> I was attempting to reshape a RAID5 from 4 to 5 devices.  During the
> reshape, I had a problem with one of the controller cards in the
> machine, so that first one drive, had repeated errors (and was
> eventually marked as failed), and then several hours later, I/O to
> another drive effectively stalled.  At this point, /proc/mdstat was
> showing the reshape proceeding (with one drive marked as failed), but
> the throughput had dropped to zero.
> 
> 
> After rebooting the machine (alt-sysrq s, u, b) the array won't
> reassemble (with or without '--force')...
> 
> (I've now replaced the card, and read all data on all drives
> successfully...)
> 
> [ 2716.070788] raid5: md1 is not clean -- starting background reconstruction
> [ 2716.070984] raid5: reshape will continue
> [ 2716.071166] raid5: device sda1 operational as raid disk 0
> [ 2716.071350] raid5: device sdi1 operational as raid disk 4
> [ 2716.071534] raid5: device sdj1 operational as raid disk 3
> [ 2716.071715] raid5: device sdk1 operational as raid disk 1
> [ 2716.072217] raid5: allocated 5334kB for md1
> [ 2716.072452] 0: w=1 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.072633] 4: w=2 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.072816] 3: w=3 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.073001] 1: w=4 pa=2 pr=4 m=1 a=2 r=5 op1=0 op2=0
> [ 2716.073180] raid5: cannot start dirty degraded array for md1
> [ 2716.073372] RAID5 conf printout:
> [ 2716.073544]  --- rd:5 wd:4
> [ 2716.073717]  disk 0, o:1, dev:sda1
> [ 2716.073884]  disk 1, o:1, dev:sdk1
> [ 2716.074071]  disk 3, o:1, dev:sdj1
> [ 2716.074239]  disk 4, o:1, dev:sdi1
> [ 2716.074575] raid5: failed to run raid set md1
> [ 2716.074749] md: pers->run() failed ...
> 
> 
> Any chance of carrying on where it left off, or should I recreate the
> array from scratch?

What version of mdadm (mdadm -V) ?

Try
  echo 1 > /sys/module/md_mod/parameters/start_dirty_degraded
  mdadm -S /dev/md1

and then try assembling the array again.

NeilBrown


> 
> # cat /etc/debian_version ; uname -a
> 6.0.2
> Linux rodmell 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64
> GNU/Linux
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md1 : inactive sda1[0] sdi1[5] sdj1[4] sdk1[1]
>       7814054112 blocks super 1.2
> # mdadm -E /dev/sd[hijak]1
> /dev/sda1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
>            Name : rodmell:1  (local to host rodmell)
>   Creation Time : Mon Dec 19 18:00:13 2011
>      Raid Level : raid5
>    Raid Devices : 5
> 
>  Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
>      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
>   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 1bf82ae0:82b71e9b:6283dc62:467026fc
> 
>   Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Mon Aug 20 08:42:56 2012
>        Checksum : 46d057ad - correct
>          Events : 24587
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 0
>    Array State : AA.AA ('A' == active, '.' == missing)
> /dev/sdh1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
>            Name : rodmell:1  (local to host rodmell)
>   Creation Time : Mon Dec 19 18:00:13 2011
>      Raid Level : raid5
>    Raid Devices : 5
> 
>  Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
>      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
>   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 3e9cca4d:3872738b:1903ee56:5a91b935
> 
>   Reshape pos'n : 10582016 (10.09 GiB 10.84 GB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Thu Aug 16 17:30:46 2012
>        Checksum : 12400b18 - correct
>          Events : 15896
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 2
>    Array State : AAAAA ('A' == active, '.' == missing)
> /dev/sdi1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
>            Name : rodmell:1  (local to host rodmell)
>   Creation Time : Mon Dec 19 18:00:13 2011
>      Raid Level : raid5
>    Raid Devices : 5
> 
>  Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
>      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
>   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 904de121:58fbef1d:16546bd7:d3ab29c5
> 
>   Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Fri Aug 17 01:32:23 2012
>        Checksum : 48e5a3d3 - correct
>          Events : 24586
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 4
>    Array State : AA.AA ('A' == active, '.' == missing)
> /dev/sdj1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
>            Name : rodmell:1  (local to host rodmell)
>   Creation Time : Mon Dec 19 18:00:13 2011
>      Raid Level : raid5
>    Raid Devices : 5
> 
>  Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
>      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
>   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 59efcddf:9e679807:09ce1bc4:d882af69
> 
>   Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Mon Aug 20 08:42:56 2012
>        Checksum : 81b55c43 - correct
>          Events : 24587
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 3
>    Array State : AA.AA ('A' == active, '.' == missing)
> /dev/sdk1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 717d7de6:49a886f6:fb20ac87:5a1e8a84
>            Name : rodmell:1  (local to host rodmell)
>   Creation Time : Mon Dec 19 18:00:13 2011
>      Raid Level : raid5
>    Raid Devices : 5
> 
>  Avail Dev Size : 3907027056 (1863.02 GiB 2000.40 GB)
>      Array Size : 15628103680 (7452.06 GiB 8001.59 GB)
>   Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 31b29cdb:0b70201e:de2036a4:5aecda02
> 
>   Reshape pos'n : 1622353920 (1547.20 GiB 1661.29 GB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Mon Aug 20 08:42:56 2012
>        Checksum : d51e3dc - correct
>          Events : 24587
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 1
>    Array State : AA.AA ('A' == active, '.' == missing)
> 
> 
> 
> Cheers,
> 
> Tim.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controller problems during reshape -> can't continue reshape after reboot.
  2012-08-20 19:55 Controller problems during reshape -> can't continue reshape after reboot Tim Small
  2012-08-20 22:37 ` NeilBrown
@ 2012-08-21  0:51 ` John Robinson
  2012-08-21  7:51   ` Tim Small
  1 sibling, 1 reply; 6+ messages in thread
From: John Robinson @ 2012-08-21  0:51 UTC (permalink / raw
  To: Tim Small; +Cc: linux-raid@vger.kernel.org

On 20/08/2012 20:55, Tim Small wrote:
> I was attempting to reshape a RAID5 from 4 to 5 devices.  During the
> reshape, I had a problem with one of the controller cards in the
> machine

Sorry this isn't very helpful, but just out of interest, what kind of 
controller card and what was the problem, do you know?

Cheers,

John.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controller problems during reshape -> can't continue reshape after reboot.
  2012-08-20 22:37 ` NeilBrown
@ 2012-08-21  7:36   ` Tim Small
  0 siblings, 0 replies; 6+ messages in thread
From: Tim Small @ 2012-08-21  7:36 UTC (permalink / raw
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

On 20/08/12 23:37, NeilBrown wrote:
> What version of mdadm (mdadm -V) ?
>   

mdadm - v3.1.4 - 31st August 2010

> Try
>   echo 1 > /sys/module/md_mod/parameters/start_dirty_degraded
>   mdadm -S /dev/md1
>
> and then try assembling the array again.
>   

Seems to be working, the reshape is continuing, and an fsck has
completed - thanks.

Tim.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controller problems during reshape -> can't continue reshape after reboot.
  2012-08-21  0:51 ` John Robinson
@ 2012-08-21  7:51   ` Tim Small
  2012-08-21 16:15     ` Michael-John Turner
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Small @ 2012-08-21  7:51 UTC (permalink / raw
  To: John Robinson; +Cc: linux-raid@vger.kernel.org

On 21/08/12 01:51, John Robinson wrote:
> On 20/08/2012 20:55, Tim Small wrote:
>> I was attempting to reshape a RAID5 from 4 to 5 devices.  During the
>> reshape, I had a problem with one of the controller cards in the
>> machine
>
> Sorry this isn't very helpful, but just out of interest, what kind of
> controller card and what was the problem, do you know?

It was a Marvell 9125 AHCI card (highpoint branded), and I've had
numerous problems with them locking up (the 9123s are worse), and have
scrapped all of the Marvell 9123s already, and gone back to using a
combination of Marvell's 88SX7042 (non-AHCI) cards, and Silicon Image
3132 etc. cards.

It's a shame, as Marvell's AHCI cards are readily available, cheap and
do both 6G SATA, and PCIe 2.0, whereas the earlier Marvell cards, and
the Silicon Image cards are all 3G SATA and (more importantly for me)
PCIe 1.0, which ends up being a bottle neck for modern spinning disks...

I wish Intel, Silicon Image (or someone else) would make a reliable SATA
PCIe 2.0 card!

Intel and Silicon Image publish their errata, whereas Marvell keep
their's secret, under NDA, and don't respond to hardware bug reports in
my experience.  Avoid.

Tim.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Controller problems during reshape -> can't continue reshape after reboot.
  2012-08-21  7:51   ` Tim Small
@ 2012-08-21 16:15     ` Michael-John Turner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael-John Turner @ 2012-08-21 16:15 UTC (permalink / raw
  To: Tim Small; +Cc: John Robinson, linux-raid@vger.kernel.org

On Tue, Aug 21, 2012 at 08:51:02AM +0100, Tim Small wrote:
> It was a Marvell 9125 AHCI card (highpoint branded), and I've had
> numerous problems with them locking up (the 9123s are worse), and have
> scrapped all of the Marvell 9123s already, and gone back to using a
> combination of Marvell's 88SX7042 (non-AHCI) cards, and Silicon Image
> 3132 etc. cards.

FWIW, I have a Marvell 9123[1] in one of my systems and found that
disabling NCQ on the two drives hooked to it[2] stopped any funnies.
Agreed, it's not the best of adapters, but it's been rock solid for the
year or so since I made that change.

[1] "Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 10)"
[2] A pair of WD Scorpio Blacks in an md RAID1 set

-mj
-- 
 Michael-John Turner
 mj@mjturner.net      <>     http://mjturner.net/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-08-21 16:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-20 19:55 Controller problems during reshape -> can't continue reshape after reboot Tim Small
2012-08-20 22:37 ` NeilBrown
2012-08-21  7:36   ` Tim Small
2012-08-21  0:51 ` John Robinson
2012-08-21  7:51   ` Tim Small
2012-08-21 16:15     ` Michael-John Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.