Problem re-shaping RAID6

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Problem re-shaping RAID6
@ 2010-06-13 19:15 Jérôme Poulin
  2010-06-13 23:15 ` Neil Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Jérôme Poulin @ 2010-06-13 19:15 UTC (permalink / raw
  To: linux-raid

I had problems reshaping my RAID6 down 1 disk today and found a
problem in Grow.c:

diff -udpr mdadm-3.1.2/Grow.c mdadm-3.1.2-critical-section/Grow.c
--- mdadm-3.1.2/Grow.c  2010-03-09 23:31:39.000000000 -0500
+++ mdadm-3.1.2-critical-section/Grow.c 2010-06-13 14:57:44.000000000 -0400
@@ -497,7 +497,7 @@ int Grow_reshape(char *devname, int fd,
        int rv = 0;
        struct supertype *st;

-       int nchunk, ochunk;
+       unsigned long nchunk, ochunk;
        int nlayout, olayout;
        int ndisks, odisks;
        int ndata, odata;


After changing this I was able to re-shape the array, it seems it was
overflowing and I had a message saying:
root@billsshack:~/mdadm-3.1.2/ > ./mdadm --grow /dev/md0
--raid-devices=6 --backup-file=/mnt/data1/md-raid6-grow-backup.bak
mdadm: Need to backup 4503599627350016K of critical section..
mdadm: /dev/md0: Something wrong - reshape aborted

Now it works:
root@billsshack:~/mdadm-3.1.2/ > ./mdadm --grow /dev/md0
--raid-devices=6 --backup-file=/mnt/data1/md-raid6-grow-backup.bak
mdadm: Need to backup 20480K of critical section..
root@billsshack:~/mdadm-3.1.2/ > cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
[faulty] [linear]
md0 : active raid6 dm-6[0] dm-8[5] dm-5[4] dm-3[3] dm-4[2] dm-7[1]
      7814041600 blocks super 0.91 level 6, 1024k chunk, algorithm 2
[6/5] [UUUUUU]
      [>....................]  reshape =  0.0% (33792/1953510400)
finish=1926.0min speed=16896K/sec


Here's a nice output of GDB:
      Breakpoint 1, Grow_reshape (devname=0x7fffffffe390 "/dev/md0",
fd=5, quiet=0,
          backup_file=0x7fffffffe3b8
"/mnt/data1/md-raid6-grow-backup.bak", size=1953510400, level=65534,
          layout_str=0x0, chunksize=0, raid_disks=6) at Grow.c:939
      939                     blocks = ochunk/512 * nchunk/512 * odata
* ndata / a;
      (gdb) p ochunk/512
      $9 = 2048
      (gdb) p ochunk/512 * nchunk/512
      $10 = -4194304
      (gdb) p nchunk
      $11 = 1048576
      (gdb) p ochunk
      $12 = 1048576
      (gdb) p ochunk/512
      $13 = 2048
      (gdb) p nchunk/512
      $14 = 2048
      (gdb) p 2048*2048
      $15 = 4194304
      (gdb) p 2048*2048*odata
      $16 = 20971520
      (gdb) p 2048*2048*odata*ndata
      $17 = 83886080
      (gdb) p 2048*2048*odata*ndata/a
      $18 = 40960
      (gdb) p ochunk/512 * nchunk/512 * odata * ndata / a
      $19 = 9007199254700032
      (gdb) p ochunk/512 * nchunk/512
      $20 = -4194304
      (gdb)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem re-shaping RAID6
  2010-06-13 19:15 Problem re-shaping RAID6 Jérôme Poulin
@ 2010-06-13 23:15 ` Neil Brown
  2010-06-14 10:47   ` Nagilum
       [not found]   ` <4C1BBE6E.5090802@tmr.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Neil Brown @ 2010-06-13 23:15 UTC (permalink / raw
  To: Jérôme Poulin; +Cc: linux-raid

On Sun, 13 Jun 2010 15:15:08 -0400
Jérôme Poulin <jeromepoulin@gmail.com> wrote:

> I had problems reshaping my RAID6 down 1 disk today and found a
> problem in Grow.c:
> 
> diff -udpr mdadm-3.1.2/Grow.c mdadm-3.1.2-critical-section/Grow.c
> --- mdadm-3.1.2/Grow.c  2010-03-09 23:31:39.000000000 -0500
> +++ mdadm-3.1.2-critical-section/Grow.c 2010-06-13 14:57:44.000000000 -0400
> @@ -497,7 +497,7 @@ int Grow_reshape(char *devname, int fd,
>         int rv = 0;
>         struct supertype *st;
> 
> -       int nchunk, ochunk;
> +       unsigned long nchunk, ochunk;
>         int nlayout, olayout;
>         int ndisks, odisks;
>         int ndata, odata;
> 
> 
> After changing this I was able to re-shape the array, it seems it was
> overflowing and I had a message saying:
> root@billsshack:~/mdadm-3.1.2/ > ./mdadm --grow /dev/md0
> --raid-devices=6 --backup-file=/mnt/data1/md-raid6-grow-backup.bak
> mdadm: Need to backup 4503599627350016K of critical section..
> mdadm: /dev/md0: Something wrong - reshape aborted
> 
> Now it works:
> root@billsshack:~/mdadm-3.1.2/ > ./mdadm --grow /dev/md0
> --raid-devices=6 --backup-file=/mnt/data1/md-raid6-grow-backup.bak
> mdadm: Need to backup 20480K of critical section..
> root@billsshack:~/mdadm-3.1.2/ > cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> [faulty] [linear]
> md0 : active raid6 dm-6[0] dm-8[5] dm-5[4] dm-3[3] dm-4[2] dm-7[1]
>       7814041600 blocks super 0.91 level 6, 1024k chunk, algorithm 2
> [6/5] [UUUUUU]
>       [>....................]  reshape =  0.0% (33792/1953510400)
> finish=1926.0min speed=16896K/sec
> 
> 
> Here's a nice output of GDB:
>       Breakpoint 1, Grow_reshape (devname=0x7fffffffe390 "/dev/md0",
> fd=5, quiet=0,
>           backup_file=0x7fffffffe3b8
> "/mnt/data1/md-raid6-grow-backup.bak", size=1953510400, level=65534,
>           layout_str=0x0, chunksize=0, raid_disks=6) at Grow.c:939
>       939                     blocks = ochunk/512 * nchunk/512 * odata
> * ndata / a;
>       (gdb) p ochunk/512
>       $9 = 2048
>       (gdb) p ochunk/512 * nchunk/512
>       $10 = -4194304


Thanks.
I fixed this bug a slightly different way

-               blocks = ochunk/512 * nchunk/512 * odata * ndata / a;
+               blocks = (ochunk/512) * (nchunk/512) * odata * ndata / a;


See
http://neil.brown.name/git?p=mdadm;a=commitdiff;h=200871adf9e15d5ad985f28c349fd89c386ef48a

I guess it is time to release a 3.1.3...

Thanks,
NeilBrown


>       (gdb) p nchunk
>       $11 = 1048576
>       (gdb) p ochunk
>       $12 = 1048576
>       (gdb) p ochunk/512
>       $13 = 2048
>       (gdb) p nchunk/512
>       $14 = 2048
>       (gdb) p 2048*2048
>       $15 = 4194304
>       (gdb) p 2048*2048*odata
>       $16 = 20971520
>       (gdb) p 2048*2048*odata*ndata
>       $17 = 83886080
>       (gdb) p 2048*2048*odata*ndata/a
>       $18 = 40960
>       (gdb) p ochunk/512 * nchunk/512 * odata * ndata / a
>       $19 = 9007199254700032
>       (gdb) p ochunk/512 * nchunk/512
>       $20 = -4194304
>       (gdb)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem re-shaping RAID6
  2010-06-13 23:15 ` Neil Brown
@ 2010-06-14 10:47   ` Nagilum
  2010-06-17  5:47     ` Neil Brown
       [not found]   ` <4C1BBE6E.5090802@tmr.com>
  1 sibling, 1 reply; 7+ messages in thread
From: Nagilum @ 2010-06-14 10:47 UTC (permalink / raw
  To: Neil Brown; +Cc: linux-raid


----- Message from neilb@suse.de ---------
     Date: Mon, 14 Jun 2010 09:15:18 +1000
     From: Neil Brown <neilb@suse.de>
  Subject: Re: Problem re-shaping RAID6
       To: Jérôme Poulin <jeromepoulin@gmail.com>
       Cc: linux-raid <linux-raid@vger.kernel.org>


> Thanks.
> I fixed this bug a slightly different way
>
> -               blocks = ochunk/512 * nchunk/512 * odata * ndata / a;
> +               blocks = (ochunk/512) * (nchunk/512) * odata * ndata / a;
>
>
> See
> http://neil.brown.name/git?p=mdadm;a=commitdiff;h=200871adf9e15d5ad985f28c349fd89c386ef48a

Those static numbers always make my nose wrinkle.
Don't we have the blocksize somewhere already? I'm also concerned what  
happens when true 4k sectors are used..



========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem re-shaping RAID6
  2010-06-14 10:47   ` Nagilum
@ 2010-06-17  5:47     ` Neil Brown
  2010-06-17  8:20       ` Michael Evans
  2010-06-18 10:53       ` Nagilum
  0 siblings, 2 replies; 7+ messages in thread
From: Neil Brown @ 2010-06-17  5:47 UTC (permalink / raw
  To: Nagilum; +Cc: linux-raid

On Mon, 14 Jun 2010 12:47:23 +0200
Nagilum <nagilum@nagilum.org> wrote:

> 
> ----- Message from neilb@suse.de ---------
>      Date: Mon, 14 Jun 2010 09:15:18 +1000
>      From: Neil Brown <neilb@suse.de>
>   Subject: Re: Problem re-shaping RAID6
>        To: Jérôme Poulin <jeromepoulin@gmail.com>
>        Cc: linux-raid <linux-raid@vger.kernel.org>
> 
> 
> > Thanks.
> > I fixed this bug a slightly different way
> >
> > -               blocks = ochunk/512 * nchunk/512 * odata * ndata / a;
> > +               blocks = (ochunk/512) * (nchunk/512) * odata * ndata / a;
> >
> >
> > See
> > http://neil.brown.name/git?p=mdadm;a=commitdiff;h=200871adf9e15d5ad985f28c349fd89c386ef48a
> 
> Those static numbers always make my nose wrinkle.
> Don't we have the blocksize somewhere already? I'm also concerned what  
> happens when true 4k sectors are used..
> 

A sector will always be 512 bytes to Linux, even when we have drives that can
only do IO in multiples of 8 sectors.  Changing that would cause way to many
headaches.

Yes,  I could possibly use a define for '512'.  Some times that is
appropriate, but I thing 512 is so clearly "bytes_per_sector" it would just
add an unnecessary level of indirection.
It is a question of taste really - no right answers.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem re-shaping RAID6
  2010-06-17  5:47     ` Neil Brown
@ 2010-06-17  8:20       ` Michael Evans
  2010-06-18 10:53       ` Nagilum
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Evans @ 2010-06-17  8:20 UTC (permalink / raw
  To: Neil Brown; +Cc: Nagilum, linux-raid

On Wed, Jun 16, 2010 at 10:47 PM, Neil Brown <neilb@suse.de> wrote:
> On Mon, 14 Jun 2010 12:47:23 +0200
> Nagilum <nagilum@nagilum.org> wrote:
>
>>
>> ----- Message from neilb@suse.de ---------
>>      Date: Mon, 14 Jun 2010 09:15:18 +1000
>>      From: Neil Brown <neilb@suse.de>
>>   Subject: Re: Problem re-shaping RAID6
>>        To: Jérôme Poulin <jeromepoulin@gmail.com>
>>        Cc: linux-raid <linux-raid@vger.kernel.org>
>>
>>
>> > Thanks.
>> > I fixed this bug a slightly different way
>> >
>> > -               blocks = ochunk/512 * nchunk/512 * odata * ndata / a;
>> > +               blocks = (ochunk/512) * (nchunk/512) * odata * ndata / a;
>> >
>> >
>> > See
>> > http://neil.brown.name/git?p=mdadm;a=commitdiff;h=200871adf9e15d5ad985f28c349fd89c386ef48a
>>
>> Those static numbers always make my nose wrinkle.
>> Don't we have the blocksize somewhere already? I'm also concerned what
>> happens when true 4k sectors are used..
>>
>
> A sector will always be 512 bytes to Linux, even when we have drives that can
> only do IO in multiples of 8 sectors.  Changing that would cause way to many
> headaches.
>
> Yes,  I could possibly use a define for '512'.  Some times that is
> appropriate, but I thing 512 is so clearly "bytes_per_sector" it would just
> add an unnecessary level of indirection.
> It is a question of taste really - no right answers.
>
> Thanks,
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Slightly off topic, but wouldn't an 'always' like that be a valid
reason for starting a 2.7 or 3.0 branch; that is something where major
sections of code would /have/ to be changed even if interfaces for
other sections might remain static?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem re-shaping RAID6
  2010-06-17  5:47     ` Neil Brown
  2010-06-17  8:20       ` Michael Evans
@ 2010-06-18 10:53       ` Nagilum
  1 sibling, 0 replies; 7+ messages in thread
From: Nagilum @ 2010-06-18 10:53 UTC (permalink / raw
  To: Neil Brown; +Cc: linux-raid


----- Message from neilb@suse.de ---------

> A sector will always be 512 bytes to Linux, even when we have drives that can
> only do IO in multiples of 8 sectors.

Oh, I didn't know that.

> Yes,  I could possibly use a define for '512'.  Some times that is
> appropriate, but I thing 512 is so clearly "bytes_per_sector" it would just
> add an unnecessary level of indirection.
> It is a question of taste really - no right answers.

Yep, thanks for the explanation!




========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem re-shaping RAID6
       [not found]   ` <4C1BBE6E.5090802@tmr.com>
@ 2010-06-29  1:11     ` Neil Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Neil Brown @ 2010-06-29  1:11 UTC (permalink / raw
  To: Bill Davidsen; +Cc: Jérôme Poulin, linux-raid

On Fri, 18 Jun 2010 14:43:58 -0400
Bill Davidsen <davidsen@tmr.com> wrote:

> Neil Brown wrote:
> > On Sun, 13 Jun 2010 15:15:08 -0400
> > Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> >
> >   
> >> I had problems reshaping my RAID6 down 1 disk today and found a
> >> problem in Grow.c:
> >>
> >> diff -udpr mdadm-3.1.2/Grow.c mdadm-3.1.2-critical-section/Grow.c
> >> --- mdadm-3.1.2/Grow.c  2010-03-09 23:31:39.000000000 -0500
> >> +++ mdadm-3.1.2-critical-section/Grow.c 2010-06-13 14:57:44.000000000 -0400
> >> @@ -497,7 +497,7 @@ int Grow_reshape(char *devname, int fd,
> >>         int rv = 0;
> >>         struct supertype *st;
> >>
> >> -       int nchunk, ochunk;
> >> +       unsigned long nchunk, ochunk;
> >>         int nlayout, olayout;
> >>         int ndisks, odisks;
> >>         int ndata, odata;
> >>
> >>
> >> After changing this I was able to re-shape the array, it seems it was
> >> overflowing and I had a message saying:
> >> root@billsshack:~/mdadm-3.1.2/ > ./mdadm --grow /dev/md0
> >> --raid-devices=6 --backup-file=/mnt/data1/md-raid6-grow-backup.bak
> >> mdadm: Need to backup 4503599627350016K of critical section..
> >> mdadm: /dev/md0: Something wrong - reshape aborted
> >>
> >> Now it works:
> >> root@billsshack:~/mdadm-3.1.2/ > ./mdadm --grow /dev/md0
> >> --raid-devices=6 --backup-file=/mnt/data1/md-raid6-grow-backup.bak
> >> mdadm: Need to backup 20480K of critical section..
> >> root@billsshack:~/mdadm-3.1.2/ > cat /proc/mdstat
> >> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> >> [faulty] [linear]
> >> md0 : active raid6 dm-6[0] dm-8[5] dm-5[4] dm-3[3] dm-4[2] dm-7[1]
> >>       7814041600 blocks super 0.91 level 6, 1024k chunk, algorithm 2
> >> [6/5] [UUUUUU]
> >>       [>....................]  reshape =  0.0% (33792/1953510400)
> >> finish=1926.0min speed=16896K/sec
> >>
> >>
> >> Here's a nice output of GDB:
> >>       Breakpoint 1, Grow_reshape (devname=0x7fffffffe390 "/dev/md0",
> >> fd=5, quiet=0,
> >>           backup_file=0x7fffffffe3b8
> >> "/mnt/data1/md-raid6-grow-backup.bak", size=1953510400, level=65534,
> >>           layout_str=0x0, chunksize=0, raid_disks=6) at Grow.c:939
> >>       939                     blocks = ochunk/512 * nchunk/512 * odata
> >> * ndata / a;
> >>       (gdb) p ochunk/512
> >>       $9 = 2048
> >>       (gdb) p ochunk/512 * nchunk/512
> >>       $10 = -4194304
> >>     
> >
> >
> > Thanks.
> > I fixed this bug a slightly different way
> >
> > -               blocks = ochunk/512 * nchunk/512 * odata * ndata / a;
> > +               blocks = (ochunk/512) * (nchunk/512) * odata * ndata / a;
> >
> >
> >   
> I don't have a current C standard handy, but I was on the original X3J11 
> committee, and I'm fairly sure that even with that grouping, which may 
> work fine for gcc, the standard allows the compiler to ignore what you 
> did and do the optimization as long as the result "is the same as" doing 
> what you said, in math terms rather than real world. And since other 
> compilers (such as Intel C) are used to compile the kernel and may be in 
> the future, and since disks are getting larger and faster, I do prefer 
> the future proofing of the original solution.

Seriously?  It sounds pretty broken to allow any transformation on fixed-size
integers that would valid on real numbers... (You did say "real world"...)
Even anything valid on integers is quite possibly not valid on fixed-sized
numbers.

But if I were to change it I don't think I would make the variables bigger.
I would do something like

  ochunk_sectors = ochunk / 512;
  nchunk_sectors = nchunk / 512;

  blocks = ochunk_sectors * nchunk_sectors * odata * ndata / a;

but that just makes it clearer to me that if the guarantees there are
different to the guarantees for what I originally proposed, then there is
something seriously wrong with the language.

Do you have any reference to any standard that I could look at and try to
understand?

Thanks.
NeilBrown


> 
> > See
> > http://neil.brown.name/git?p=mdadm;a=commitdiff;h=200871adf9e15d5ad985f28c349fd89c386ef48a
> >
> > I guess it is time to release a 3.1.3...
> >
> > Thanks,
> > NeilBrown
> >   
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-06-29  1:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-13 19:15 Problem re-shaping RAID6 Jérôme Poulin
2010-06-13 23:15 ` Neil Brown
2010-06-14 10:47   ` Nagilum
2010-06-17  5:47     ` Neil Brown
2010-06-17  8:20       ` Michael Evans
2010-06-18 10:53       ` Nagilum
     [not found]   ` <4C1BBE6E.5090802@tmr.com>
2010-06-29  1:11     ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.