All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* xlog_write: reservation ran out. Need to up reservation
       [not found] <159192779.3859815.1408461799560.JavaMail.zimbra@klaube.net>
@ 2014-08-19 15:34 ` Thomas Klaube
  2014-08-19 22:55   ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Klaube @ 2014-08-19 15:34 UTC (permalink / raw
  To: xfs

Hi all,

I am currently testing/benchmarking xfs on top of a bcache. When I run a heavy
IO workload (fio with 64 threads, read/write) on the device for ~30-45min I get

[ 9092.978268] XFS (bcache1): xlog_write: reservation summary:
[ 9092.978268]   trans type  = (null) (42)
[ 9092.978268]   unit res    = 18730384 bytes
[ 9092.978268]   current res = -1640 bytes
[ 9092.978268]   total reg   = 512 bytes (o/flow = 1163749592 bytes)
[ 9092.978268]   ophdrs      = 655304 (ophdr space = 7863648 bytes)
[ 9092.978268]   ophdr + reg = 1171613752 bytes
[ 9092.978268]   num regions = 2
[ 9092.978268] 
[ 9092.978272] XFS (bcache1): region[0]: LR header - 512 bytes
[ 9092.978273] XFS (bcache1): region[1]: commit - 0 bytes
[ 9092.978274] XFS (bcache1): xlog_write: reservation ran out. Need to up reservation
[ 9092.978303] XFS (bcache1): xfs_do_force_shutdown(0x2) called from line 2036 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa04433c8
[ 9092.979189] XFS (bcache1): Log I/O Error Detected.  Shutting down filesystem
[ 9092.979210] XFS (bcache1): Please umount the filesystem and rectify the problem(s)
[ 9092.979238] XFS (bcache1): xfs_do_force_shutdown(0x2) called from line 1497 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa0443b57
[ 9093.183869] XFS (bcache1): xfs_log_force: error 5 returned.
[ 9093.489944] XFS (bcache1): xfs_log_force: error 5 returned.

Kernel is 3.16.1 but this also happens with Ubuntu 3.13.0.34. 
With the bcache the fio puts ~30k IOps on the filesystem. 

xfs_info:
meta-data=/dev/bcache1           isize=256    agcount=8, agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=1949957886, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

umount/mount recovers the fs and the fs seems ok.

I can reproduce this behavior. Is there anything I could try to debug
this?

Regards
Thomas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xlog_write: reservation ran out. Need to up reservation
  2014-08-19 15:34 ` xlog_write: reservation ran out. Need to up reservation Thomas Klaube
@ 2014-08-19 22:55   ` Dave Chinner
  2014-08-21  7:24     ` Thomas Klaube
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2014-08-19 22:55 UTC (permalink / raw
  To: Thomas Klaube; +Cc: xfs

On Tue, Aug 19, 2014 at 05:34:30PM +0200, Thomas Klaube wrote:
> Hi all,
> 
> I am currently testing/benchmarking xfs on top of a bcache. When I run a heavy
> IO workload (fio with 64 threads, read/write) on the device for ~30-45min I get

Can you post the fio job configuration?

> [ 9092.978268] XFS (bcache1): xlog_write: reservation summary:
> [ 9092.978268]   trans type  = (null) (42)
> [ 9092.978268]   unit res    = 18730384 bytes
> [ 9092.978268]   current res = -1640 bytes
> [ 9092.978268]   total reg   = 512 bytes (o/flow = 1163749592 bytes)
> [ 9092.978268]   ophdrs      = 655304 (ophdr space = 7863648 bytes)
> [ 9092.978268]   ophdr + reg = 1171613752 bytes
> [ 9092.978268]   num regions = 2

Oh, my:

> [ 9092.978268]   ophdr + reg = 1171613752 bytes

Thats 1,171,613,752 bytes, or 1.1GB of journal data in that
checkpoint.  It's more than half the size of the journal, so it's
violated fundamental constraints (i.e. no checkpoint shoul dbe
larger than half the log)

We should be committing the checkpoint once the queued metadata is
beyond 12.5% of log space, or about 250MB in this case. The question
is how did that get delayed for so long that we overran the push
threshold by a factor of 3.5?

Hmmmm - I wonder if bcache is causing some kind of kworker or
workqueue starvation? I really need to see that fio job config and
find out a whole lot more about the hardware and storage config you
are running:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> [ 9092.978268] 
> [ 9092.978272] XFS (bcache1): region[0]: LR header - 512 bytes
> [ 9092.978273] XFS (bcache1): region[1]: commit - 0 bytes
> [ 9092.978274] XFS (bcache1): xlog_write: reservation ran out. Need to up reservation
> [ 9092.978303] XFS (bcache1): xfs_do_force_shutdown(0x2) called from line 2036 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa04433c8
> [ 9092.979189] XFS (bcache1): Log I/O Error Detected.  Shutting down filesystem
> [ 9092.979210] XFS (bcache1): Please umount the filesystem and rectify the problem(s)
> [ 9092.979238] XFS (bcache1): xfs_do_force_shutdown(0x2) called from line 1497 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa0443b57
> [ 9093.183869] XFS (bcache1): xfs_log_force: error 5 returned.
> [ 9093.489944] XFS (bcache1): xfs_log_force: error 5 returned.
> 
> Kernel is 3.16.1 but this also happens with Ubuntu 3.13.0.34. 
> With the bcache the fio puts ~30k IOps on the filesystem. 

Which is not very much. I do that sort of thing all the time.

> xfs_info:
> meta-data=/dev/bcache1           isize=256    agcount=8, agsize=268435455 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=1949957886, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> umount/mount recovers the fs and the fs seems ok.
> 
> I can reproduce this behavior. Is there anything I could try to debug
> this?

Run the workload directly on the SSD rather than with bcache. Use
mkfs parameters to give you 8 ags and the same size log, and see
if you get the same problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xlog_write: reservation ran out. Need to up reservation
  2014-08-19 22:55   ` Dave Chinner
@ 2014-08-21  7:24     ` Thomas Klaube
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Klaube @ 2014-08-21  7:24 UTC (permalink / raw
  To: xfs

----- Ursprüngliche Mail -----
> Von: "Dave Chinner" <david@fromorbit.com>
> An: "Thomas Klaube" <thomas@klaube.net>
> CC: xfs@oss.sgi.com
> Gesendet: Mittwoch, 20. August 2014 00:55:07
> Betreff: Re: xlog_write: reservation ran out. Need to up reservation

Hi,

> Can you post the fio job configuration?

first I run this job for 600 sec:
wtk@ubuntu ~ $ cat write.fio
[rnd]
rw=randwrite
ramp_time=30
runtime=600
time_based
gtod_reduce=1
size=100g
refill_buffers=1
directory=.
iodepth=64
direct=1
blocksize=16k
numjobs=64
nrfiles=1
group_reporting
ioengine=libaio
loops=1

Then I run this job for 2hours:
wtk@ubuntu ~ $ cat random.fio
[rnd]
rw=randrw
ramp_time=30
runtime=7200
time_based
rwmixread=30
size=100g
refill_buffers=1
directory=.
iodepth=64
direct=1
blocksize=4k
numjobs=64
group_reporting
ioengine=libaio
loops=1

I run this workload on 2 devices in parallel. One is the bcache device (with xfs), the other is
a non cached device. The random.fio job causes the problem on the bcache device after ~30-75mins.

> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

I have sent a mail with all collected data to Dave.

> Run the workload directly on the SSD rather than with bcache. Use
> mkfs parameters to give you 8 ags and the same size log, and see
> if you get the same problem.

I created a xfs direclty on the SSD:
mkfs.xfs -f -d agcount=8 -l size=521728b /dev/sdc1

Then I started tho fio jobs as described above for 10 hours. I could not reproduce the
problem. I will send a mail to the bcache mailing list as well...

Thanx and Regards
Thomas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-21  7:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <159192779.3859815.1408461799560.JavaMail.zimbra@klaube.net>
2014-08-19 15:34 ` xlog_write: reservation ran out. Need to up reservation Thomas Klaube
2014-08-19 22:55   ` Dave Chinner
2014-08-21  7:24     ` Thomas Klaube

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.