help understanding the output of fio

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* help understanding the output of fio
@ 2024-04-03  5:35 Felix Rubio
  2024-04-03 15:45 ` Patrick Goetz
  0 siblings, 1 reply; 13+ messages in thread
From: Felix Rubio @ 2024-04-03  5:35 UTC (permalink / raw
  To: fio

Hi everybody,

I have started recently to use fio, and I am getting the following 
output for sequential writes:

write_throughput_i: (groupid=0, jobs=16): err= 0: pid=2301660: Tue Apr  
2 21:03:41 2024
   write: IOPS=2, BW=5613KiB/s (5748kB/s)(2048MiB/373607msec); 0 zone 
resets
     slat (msec): min=41549, max=373605, avg=260175.71, stdev=76630.63
     clat (nsec): min=17445, max=31004, avg=20350.31, stdev=3744.64
      lat (msec): min=235566, max=373605, avg=318209.63, stdev=32743.17
     clat percentiles (nsec):
      |  1.00th=[17536],  5.00th=[17536], 10.00th=[17792], 
20.00th=[17792],
      | 30.00th=[18048], 40.00th=[18304], 50.00th=[18304], 
60.00th=[18816],
      | 70.00th=[21632], 80.00th=[22144], 90.00th=[27008], 
95.00th=[31104],
      | 99.00th=[31104], 99.50th=[31104], 99.90th=[31104], 
99.95th=[31104],
      | 99.99th=[31104]
    bw (  MiB/s): min= 2051, max= 2051, per=100.00%, avg=2051.84, stdev= 
0.00, samples=16
    iops        : min= 2048, max= 2048, avg=2048.00, stdev= 0.00, 
samples=16
   lat (usec)   : 20=68.75%, 50=31.25%
   cpu          : usr=0.00%, sys=0.02%, ctx=8350, majf=13, minf=633
   IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=100.0%
      submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
 >=64=0.0%
      issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   WRITE: bw=5613KiB/s (5748kB/s), 5613KiB/s-5613KiB/s 
(5748kB/s-5748kB/s), io=2048MiB (2147MB), run=373607-373607msec

Should I understand this correctly, the submission latency (slat) is at 
minimum 41.5 seconds? I am experiencing problems with my SSD disk (the 
performance is pretty low, which this seems to confirm), but now I am 
wondering if this could be a problem with my OS and not my disk, being 
the slat the submission latency?

Thank you
-- 
Felix Rubio
"Don't believe what you're told. Double check."
Felix Rubio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-03  5:35 help understanding the output of fio Felix Rubio
@ 2024-04-03 15:45 ` Patrick Goetz
  2024-04-03 19:21   ` Felix Rubio
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Goetz @ 2024-04-03 15:45 UTC (permalink / raw
  To: Felix Rubio, fio

Hi Felix -

It might be helpful to share your job file or fio command line. Not 
sure, I'm a neophyte as well with many questions such as what happens 
when the number of jobs is less than the iodepth.

On 4/3/24 00:35, Felix Rubio wrote:
> Hi everybody,
> 
> I have started recently to use fio, and I am getting the following 
> output for sequential writes:
> 
> write_throughput_i: (groupid=0, jobs=16): err= 0: pid=2301660: Tue Apr 2 
> 21:03:41 2024
>    write: IOPS=2, BW=5613KiB/s (5748kB/s)(2048MiB/373607msec); 0 zone 
> resets
>      slat (msec): min=41549, max=373605, avg=260175.71, stdev=76630.63
>      clat (nsec): min=17445, max=31004, avg=20350.31, stdev=3744.64
>       lat (msec): min=235566, max=373605, avg=318209.63, stdev=32743.17
>      clat percentiles (nsec):
>       |  1.00th=[17536],  5.00th=[17536], 10.00th=[17792], 20.00th=[17792],
>       | 30.00th=[18048], 40.00th=[18304], 50.00th=[18304], 60.00th=[18816],
>       | 70.00th=[21632], 80.00th=[22144], 90.00th=[27008], 95.00th=[31104],
>       | 99.00th=[31104], 99.50th=[31104], 99.90th=[31104], 99.95th=[31104],
>       | 99.99th=[31104]
>     bw (  MiB/s): min= 2051, max= 2051, per=100.00%, avg=2051.84, stdev= 
> 0.00, samples=16
>     iops        : min= 2048, max= 2048, avg=2048.00, stdev= 0.00, 
> samples=16
>    lat (usec)   : 20=68.75%, 50=31.25%
>    cpu          : usr=0.00%, sys=0.02%, ctx=8350, majf=13, minf=633
>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>  >=64=100.0%
>       submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
>  >=64=0.0%
>       complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
>  >=64=0.0%
>       issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>    WRITE: bw=5613KiB/s (5748kB/s), 5613KiB/s-5613KiB/s 
> (5748kB/s-5748kB/s), io=2048MiB (2147MB), run=373607-373607msec
> 
> Should I understand this correctly, the submission latency (slat) is at 
> minimum 41.5 seconds? I am experiencing problems with my SSD disk (the 
> performance is pretty low, which this seems to confirm), but now I am 
> wondering if this could be a problem with my OS and not my disk, being 
> the slat the submission latency?
> 
> Thank you

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-03 15:45 ` Patrick Goetz
@ 2024-04-03 19:21   ` Felix Rubio
  2024-04-04  4:28     ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Felix Rubio @ 2024-04-03 19:21 UTC (permalink / raw
  To: fio

Hi Patrick,

Thank you for your answer. I am quite lost with the results shown by 
fio: I do not get to understand them. I have a system with 3 ZFS pools: 
in 1 I have a single SSD, in another I have 2 stripped SSD, and in the 
third I have a single HDD. On the first pool I am running this command:

fio --numjobs=1 --size=1024M --time_based --runtime=60s --ramp_time=2s 
--ioengine=libaio --direct=1 --verify=0 --group_reporting=1 --bs=1M 
--rw=write

And I am getting this result:

write_throughput_i: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
write_throughput_i: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [W(1)][21.1%][eta 03m:56s]
write_throughput_i: (groupid=0, jobs=1): err= 0: pid=235368: Wed Apr  3 
21:01:16 2024
   write: IOPS=3, BW=3562KiB/s (3647kB/s)(211MiB/60664msec); 0 zone 
resets
     slat (msec): min=16, max=1777, avg=287.56, stdev=282.15
     clat (nsec): min=10930, max=31449, avg=12804.00, stdev=3289.32
      lat (msec): min=16, max=1777, avg=288.86, stdev=282.20
     clat percentiles (nsec):
      |  1.00th=[11072],  5.00th=[11328], 10.00th=[11456], 
20.00th=[11584],
      | 30.00th=[11840], 40.00th=[11840], 50.00th=[12096], 
60.00th=[12224],
      | 70.00th=[12352], 80.00th=[12736], 90.00th=[13248], 
95.00th=[13760],
      | 99.00th=[29568], 99.50th=[30848], 99.90th=[31360], 
99.95th=[31360],
      | 99.99th=[31360]
    bw (  KiB/s): min= 2048, max=51302, per=100.00%, avg=4347.05, 
stdev=6195.46, samples=99
    iops        : min=    2, max=   50, avg= 4.24, stdev= 6.04, 
samples=99
   lat (usec)   : 20=96.19%, 50=3.81%
   cpu          : usr=0.10%, sys=0.32%, ctx=1748, majf=0, minf=37
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,210,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   WRITE: bw=3562KiB/s (3647kB/s), 3562KiB/s-3562KiB/s 
(3647kB/s-3647kB/s), io=211MiB (221MB), run=60664-60664msec

For a read test, running the same parameters (changing rw=write by 
rw=read), I get:

read_throughput_i: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
read_throughput_i: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=1229MiB/s][r=1229 IOPS][eta 00m:00s]
read_throughput_i: (groupid=0, jobs=1): err= 0: pid=245501: Wed Apr  3 
21:14:27 2024
   read: IOPS=1283, BW=1284MiB/s (1346MB/s)(75.2GiB/60001msec)
     slat (usec): min=371, max=27065, avg=757.71, stdev=185.43
     clat (usec): min=9, max=358, avg=12.44, stdev= 3.93
      lat (usec): min=383, max=27078, avg=770.15, stdev=185.66
     clat percentiles (usec):
      |  1.00th=[   11],  5.00th=[   12], 10.00th=[   12], 20.00th=[   
12],
      | 30.00th=[   12], 40.00th=[   12], 50.00th=[   13], 60.00th=[   
13],
      | 70.00th=[   13], 80.00th=[   13], 90.00th=[   13], 95.00th=[   
14],
      | 99.00th=[   24], 99.50th=[   29], 99.90th=[   81], 99.95th=[   
93],
      | 99.99th=[  105]
    bw (  MiB/s): min= 1148, max= 1444, per=100.00%, avg=1285.12, 
stdev=86.12, samples=119
    iops        : min= 1148, max= 1444, avg=1284.97, stdev=86.05, 
samples=119
   lat (usec)   : 10=0.09%, 20=98.90%, 50=0.73%, 100=0.26%, 250=0.02%
   lat (usec)   : 500=0.01%
   cpu          : usr=3.14%, sys=95.75%, ctx=6274, majf=0, minf=37
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=77036,0,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
    READ: bw=1284MiB/s (1346MB/s), 1284MiB/s-1284MiB/s 
(1346MB/s-1346MB/s), io=75.2GiB (80.8GB), run=60001-60001msec


If I understand properly these results, I am getting ~4 MBps of write 
bandwidth to disk,caused by a submission latency problem , but when 
reading sequentially I am getting 1.3 GBps (also very weird) (previously 
I shared another test with different parameters, multiple jobs, etc., 
but with the same result). I am trying to find out if this analysis is 
correct and whether is a software or a hardware error.
---
Felix Rubio
"Don't believe what you're told. Double check."

On 2024-04-03 17:45, Patrick Goetz wrote:
> Hi Felix -
> 
> It might be helpful to share your job file or fio command line. Not
> sure, I'm a neophyte as well with many questions such as what happens
> when the number of jobs is less than the iodepth.
> 
> On 4/3/24 00:35, Felix Rubio wrote:
>> Hi everybody,
>> 
>> I have started recently to use fio, and I am getting the following 
>> output for sequential writes:
>> 
>> write_throughput_i: (groupid=0, jobs=16): err= 0: pid=2301660: Tue Apr 
>> 2 21:03:41 2024
>>    write: IOPS=2, BW=5613KiB/s (5748kB/s)(2048MiB/373607msec); 0 zone 
>> resets
>>      slat (msec): min=41549, max=373605, avg=260175.71, stdev=76630.63
>>      clat (nsec): min=17445, max=31004, avg=20350.31, stdev=3744.64
>>       lat (msec): min=235566, max=373605, avg=318209.63, 
>> stdev=32743.17
>>      clat percentiles (nsec):
>>       |  1.00th=[17536],  5.00th=[17536], 10.00th=[17792], 
>> 20.00th=[17792],
>>       | 30.00th=[18048], 40.00th=[18304], 50.00th=[18304], 
>> 60.00th=[18816],
>>       | 70.00th=[21632], 80.00th=[22144], 90.00th=[27008], 
>> 95.00th=[31104],
>>       | 99.00th=[31104], 99.50th=[31104], 99.90th=[31104], 
>> 99.95th=[31104],
>>       | 99.99th=[31104]
>>     bw (  MiB/s): min= 2051, max= 2051, per=100.00%, avg=2051.84, 
>> stdev= 0.00, samples=16
>>     iops        : min= 2048, max= 2048, avg=2048.00, stdev= 0.00, 
>> samples=16
>>    lat (usec)   : 20=68.75%, 50=31.25%
>>    cpu          : usr=0.00%, sys=0.02%, ctx=8350, majf=13, minf=633
>>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,  
>> >=64=100.0%
>>       submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
>>  >=64=0.0%
>>       complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
>>  >=64=0.0%
>>       issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
>>       latency   : target=0, window=0, percentile=100.00%, depth=64
>> 
>> Run status group 0 (all jobs):
>>    WRITE: bw=5613KiB/s (5748kB/s), 5613KiB/s-5613KiB/s 
>> (5748kB/s-5748kB/s), io=2048MiB (2147MB), run=373607-373607msec
>> 
>> Should I understand this correctly, the submission latency (slat) is 
>> at minimum 41.5 seconds? I am experiencing problems with my SSD disk 
>> (the performance is pretty low, which this seems to confirm), but now 
>> I am wondering if this could be a problem with my OS and not my disk, 
>> being the slat the submission latency?
>> 
>> Thank you

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-03 19:21   ` Felix Rubio
@ 2024-04-04  4:28     ` Damien Le Moal
  2024-04-04 16:18       ` Patrick Goetz
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Damien Le Moal @ 2024-04-04  4:28 UTC (permalink / raw
  To: Felix Rubio, fio

On 4/4/24 04:21, Felix Rubio wrote:
> Hi Patrick,
> 
> Thank you for your answer. I am quite lost with the results shown by 
> fio: I do not get to understand them. I have a system with 3 ZFS pools: 
> in 1 I have a single SSD, in another I have 2 stripped SSD, and in the 
> third I have a single HDD. On the first pool I am running this command:
> 
> fio --numjobs=1 --size=1024M --time_based --runtime=60s --ramp_time=2s 
> --ioengine=libaio --direct=1 --verify=0 --group_reporting=1 --bs=1M 
> --rw=write

You did not specify --iodepth. If you want to run at QD=1, the use
--ioengine=psync. For QD > 1, use --ioengine=libaio --iodepth=X (X > 1). You can
use --ioengine=io_uring as well.

Also, there is no filename= option here. Are you running this workload on the
ZFS file system writing to a file ? Or ar you running this against the SSD block
device file ?

> 
> And I am getting this result:
> 
> write_throughput_i: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 
> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
> fio-3.33
> Starting 1 process
> write_throughput_i: Laying out IO file (1 file / 1024MiB)
> Jobs: 1 (f=1): [W(1)][21.1%][eta 03m:56s]
> write_throughput_i: (groupid=0, jobs=1): err= 0: pid=235368: Wed Apr  3 
> 21:01:16 2024
>    write: IOPS=3, BW=3562KiB/s (3647kB/s)(211MiB/60664msec); 0 zone 
> resets

Your drive does 3 I/Os per second... That is very slow... Did you have a look at
dmesg to see if there is anything going on with the driver ?

You may want to start with testing 4K writes and increasing the block size from
there (8k, 16k, 32k, ...) and see if there is a value that triggers the slow
behavior.

But this all look like a HW issue or FS/driver issue. What type of SSD is this ?
NVMe ? ATA ?

>      slat (msec): min=16, max=1777, avg=287.56, stdev=282.15
>      clat (nsec): min=10930, max=31449, avg=12804.00, stdev=3289.32
>       lat (msec): min=16, max=1777, avg=288.86, stdev=282.20
>      clat percentiles (nsec):
>       |  1.00th=[11072],  5.00th=[11328], 10.00th=[11456], 
> 20.00th=[11584],
>       | 30.00th=[11840], 40.00th=[11840], 50.00th=[12096], 
> 60.00th=[12224],
>       | 70.00th=[12352], 80.00th=[12736], 90.00th=[13248], 
> 95.00th=[13760],
>       | 99.00th=[29568], 99.50th=[30848], 99.90th=[31360], 
> 99.95th=[31360],
>       | 99.99th=[31360]
>     bw (  KiB/s): min= 2048, max=51302, per=100.00%, avg=4347.05, 
> stdev=6195.46, samples=99
>     iops        : min=    2, max=   50, avg= 4.24, stdev= 6.04, 
> samples=99
>    lat (usec)   : 20=96.19%, 50=3.81%
>    cpu          : usr=0.10%, sys=0.32%, ctx=1748, majf=0, minf=37
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>  >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>  >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>  >=64=0.0%
>       issued rwts: total=0,210,0,0 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>    WRITE: bw=3562KiB/s (3647kB/s), 3562KiB/s-3562KiB/s 
> (3647kB/s-3647kB/s), io=211MiB (221MB), run=60664-60664msec
> 
> For a read test, running the same parameters (changing rw=write by 
> rw=read), I get:
> 
> read_throughput_i: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 
> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
> fio-3.33
> Starting 1 process
> read_throughput_i: Laying out IO file (1 file / 1024MiB)
> Jobs: 1 (f=1): [R(1)][100.0%][r=1229MiB/s][r=1229 IOPS][eta 00m:00s]
> read_throughput_i: (groupid=0, jobs=1): err= 0: pid=245501: Wed Apr  3 
> 21:14:27 2024
>    read: IOPS=1283, BW=1284MiB/s (1346MB/s)(75.2GiB/60001msec)
>      slat (usec): min=371, max=27065, avg=757.71, stdev=185.43
>      clat (usec): min=9, max=358, avg=12.44, stdev= 3.93
>       lat (usec): min=383, max=27078, avg=770.15, stdev=185.66
>      clat percentiles (usec):
>       |  1.00th=[   11],  5.00th=[   12], 10.00th=[   12], 20.00th=[   
> 12],
>       | 30.00th=[   12], 40.00th=[   12], 50.00th=[   13], 60.00th=[   
> 13],
>       | 70.00th=[   13], 80.00th=[   13], 90.00th=[   13], 95.00th=[   
> 14],
>       | 99.00th=[   24], 99.50th=[   29], 99.90th=[   81], 99.95th=[   
> 93],
>       | 99.99th=[  105]
>     bw (  MiB/s): min= 1148, max= 1444, per=100.00%, avg=1285.12, 
> stdev=86.12, samples=119
>     iops        : min= 1148, max= 1444, avg=1284.97, stdev=86.05, 
> samples=119
>    lat (usec)   : 10=0.09%, 20=98.90%, 50=0.73%, 100=0.26%, 250=0.02%
>    lat (usec)   : 500=0.01%
>    cpu          : usr=3.14%, sys=95.75%, ctx=6274, majf=0, minf=37
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>  >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>  >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>  >=64=0.0%
>       issued rwts: total=77036,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>     READ: bw=1284MiB/s (1346MB/s), 1284MiB/s-1284MiB/s 
> (1346MB/s-1346MB/s), io=75.2GiB (80.8GB), run=60001-60001msec
> 
> 
> If I understand properly these results, I am getting ~4 MBps of write 
> bandwidth to disk,caused by a submission latency problem , but when 
> reading sequentially I am getting 1.3 GBps (also very weird) (previously 
> I shared another test with different parameters, multiple jobs, etc., 
> but with the same result). I am trying to find out if this analysis is 
> correct and whether is a software or a hardware error.
> ---
> Felix Rubio
> "Don't believe what you're told. Double check."
> 
> On 2024-04-03 17:45, Patrick Goetz wrote:
>> Hi Felix -
>>
>> It might be helpful to share your job file or fio command line. Not
>> sure, I'm a neophyte as well with many questions such as what happens
>> when the number of jobs is less than the iodepth.
>>
>> On 4/3/24 00:35, Felix Rubio wrote:
>>> Hi everybody,
>>>
>>> I have started recently to use fio, and I am getting the following 
>>> output for sequential writes:
>>>
>>> write_throughput_i: (groupid=0, jobs=16): err= 0: pid=2301660: Tue Apr 
>>> 2 21:03:41 2024
>>>    write: IOPS=2, BW=5613KiB/s (5748kB/s)(2048MiB/373607msec); 0 zone 
>>> resets
>>>      slat (msec): min=41549, max=373605, avg=260175.71, stdev=76630.63
>>>      clat (nsec): min=17445, max=31004, avg=20350.31, stdev=3744.64
>>>       lat (msec): min=235566, max=373605, avg=318209.63, 
>>> stdev=32743.17
>>>      clat percentiles (nsec):
>>>       |  1.00th=[17536],  5.00th=[17536], 10.00th=[17792], 
>>> 20.00th=[17792],
>>>       | 30.00th=[18048], 40.00th=[18304], 50.00th=[18304], 
>>> 60.00th=[18816],
>>>       | 70.00th=[21632], 80.00th=[22144], 90.00th=[27008], 
>>> 95.00th=[31104],
>>>       | 99.00th=[31104], 99.50th=[31104], 99.90th=[31104], 
>>> 99.95th=[31104],
>>>       | 99.99th=[31104]
>>>     bw (  MiB/s): min= 2051, max= 2051, per=100.00%, avg=2051.84, 
>>> stdev= 0.00, samples=16
>>>     iops        : min= 2048, max= 2048, avg=2048.00, stdev= 0.00, 
>>> samples=16
>>>    lat (usec)   : 20=68.75%, 50=31.25%
>>>    cpu          : usr=0.00%, sys=0.02%, ctx=8350, majf=13, minf=633
>>>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,  
>>>> =64=100.0%
>>>       submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
>>>  >=64=0.0%
>>>       complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, 
>>>  >=64=0.0%
>>>       issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
>>>       latency   : target=0, window=0, percentile=100.00%, depth=64
>>>
>>> Run status group 0 (all jobs):
>>>    WRITE: bw=5613KiB/s (5748kB/s), 5613KiB/s-5613KiB/s 
>>> (5748kB/s-5748kB/s), io=2048MiB (2147MB), run=373607-373607msec
>>>
>>> Should I understand this correctly, the submission latency (slat) is 
>>> at minimum 41.5 seconds? I am experiencing problems with my SSD disk 
>>> (the performance is pretty low, which this seems to confirm), but now 
>>> I am wondering if this could be a problem with my OS and not my disk, 
>>> being the slat the submission latency?
>>>
>>> Thank you
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-04  4:28     ` Damien Le Moal
@ 2024-04-04 16:18       ` Patrick Goetz
  2024-04-04 23:37         ` Damien Le Moal
       [not found]       ` <4d12ea060363d934d84201f834b21a38@kngnt.org>
  2024-04-04 18:43       ` Jeff Johnson
  2 siblings, 1 reply; 13+ messages in thread
From: Patrick Goetz @ 2024-04-04 16:18 UTC (permalink / raw
  To: Damien Le Moal, Felix Rubio, fio

On 4/3/24 23:28, Damien Le Moal wrote:
> 
> Also, there is no filename= option here. Are you running this workload on the
> ZFS file system writing to a file ? Or ar you running this against the SSD block
> device file ?
> 

If he ran this fio test against the SSD block device, wouldn't that blow 
up his zpool?

I read through the entirety of 
https://fio.readthedocs.io/en/latest/fio_doc.html and still feel 
completely clueless about many details. It kind of feels like you need a 
good grasp of how all this is implemented in the kernel to make sense of 
it. Does anyone have any tips for what to read in order to be able to 
master the fio documentation?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
       [not found]         ` <e9efa7bb-f2b3-47f0-aa03-5191f85562e8@kernel.org>
@ 2024-04-04 18:29           ` Felix Rubio
  0 siblings, 0 replies; 13+ messages in thread
From: Felix Rubio @ 2024-04-04 18:29 UTC (permalink / raw
  To: fio

Hi Damien,

First of all, thank you for your time responding to this email. I do not 
know what info was supposed to provide, so I am completing it as info is 
requested. In this case, I am using a kernel 6.1.0, coming down from 
debian stable. The ZFS version that I am using is 2.1.11, and 
unfortunately I cannot test it against the raw drive because I would be 
destroying my pools (as also Patrick points out).

I have a feeling that the drives are not the ones to blame because a) 
they get reported 
(https://ssd.userbenchmark.com/SpeedTest/1471/SanDisk-SDSSDP064G) as 
performing at at 190+ MBps, way better than I am obtaining, and b) I 
bought 2 of these drives on one hand, and the third separately one year 
later (so the changes of getting a bad batch are pretty low. If I run 
the tests on the pool that has the two drives, in strip mode, I get 
double the performance... so the SSD behave consistently in that aspect.

Before coming here I passed by the openzfs libera chat, where some 
colleagues tried to help pointing out that could be related to the SATA 
controller. With respect to that, I have observed that the other drive 
(HDD ST3500412AS) connected to that same SATA controller offers the 
following results:

BS     bandwidth (MBps)   IOPS
4K          93           23700
8K         101           12300
16k         96            5871
32k         97            2970
64k        103            1568
128k       100             788
256k        99             392

which are consistent with the results observed in 
https://hdd.userbenchmark.com/SpeedTest/5332/ST3500412AS.

So... everything points to either an evil combination of the SATA 
controller and the SSD, or a misconfiguration somewhere in my linux that 
I can not pinpoint. The reason because of I landed here is... to try to 
gain some knowledge, because professionally I am quite far from storage 
system benchmarking, so... I can get numbers, but interpreting those is 
definitely another discussion :-/

Thank you for any further insights, if any!

Regards,

---
Felix Rubio
"Don't believe what you're told. Double check."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-04  4:28     ` Damien Le Moal
  2024-04-04 16:18       ` Patrick Goetz
       [not found]       ` <4d12ea060363d934d84201f834b21a38@kngnt.org>
@ 2024-04-04 18:43       ` Jeff Johnson
  2024-04-04 18:59         ` Felix Rubio
  2 siblings, 1 reply; 13+ messages in thread
From: Jeff Johnson @ 2024-04-04 18:43 UTC (permalink / raw
  To: Damien Le Moal; +Cc: Felix Rubio, fio

It looks like you are running fio against the local directory within
your zpool. Your performance is really bad but this could be impacted
by zpool parameters as well. It is not a pure test of your SSD, which
you should do to rule out hardware or block device driver issues. You
can run a read only test and your zpool will not be affected. Please
take care to ensure all tests to your NVMe drive are read-only. Also,
one small point, ZFS does not support direct_io *yet*. Running a
direct_io job against a ZFS directory will have terrible and
unpredictable results. You can run direct_io against raw block
devices, XFS and ext4 file systems.

Examples:
[sequential read-only]
fio --name=seqread --numjobs=1 --time_based --runtime=60s
--ramp_time=2s --iodepth=8 --ioengine=libaio --direct=1 --verify=0
--group_reporting=1 --bs=1M --rw=read --filename=/dev/{$my_ssd}

[random read-only]
fio --name=randomread --numjobs=1 --time_based --runtime=60s
--ramp_time=2s --iodepth=8 --ioengine=libaio --direct=1 --verify=0
--group_reporting=1 --bs=4k --rw=randread --filename=/dev/{$my_ssd}

If you do not see sequential read performance or random IOPs within
70% of what the manufacturer's advertised specs for your drive you
should start reviewing dmesg/messages, check drive link states (pcie
width and speed for nvme, SATA link speed for SATA).

--Jeff


On Wed, Apr 3, 2024 at 9:28 PM Damien Le Moal <dlemoal@kernel.org> wrote:
>
> On 4/4/24 04:21, Felix Rubio wrote:
> > Hi Patrick,
> >
> > Thank you for your answer. I am quite lost with the results shown by
> > fio: I do not get to understand them. I have a system with 3 ZFS pools:
> > in 1 I have a single SSD, in another I have 2 stripped SSD, and in the
> > third I have a single HDD. On the first pool I am running this command:
> >
> > fio --numjobs=1 --size=1024M --time_based --runtime=60s --ramp_time=2s
> > --ioengine=libaio --direct=1 --verify=0 --group_reporting=1 --bs=1M
> > --rw=write
>
> You did not specify --iodepth. If you want to run at QD=1, the use
> --ioengine=psync. For QD > 1, use --ioengine=libaio --iodepth=X (X > 1). You can
> use --ioengine=io_uring as well.
>
> Also, there is no filename= option here. Are you running this workload on the
> ZFS file system writing to a file ? Or ar you running this against the SSD block
> device file ?
>
> >
> > And I am getting this result:
> >
> > write_throughput_i: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W)
> > 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
> > fio-3.33
> > Starting 1 process
> > write_throughput_i: Laying out IO file (1 file / 1024MiB)
> > Jobs: 1 (f=1): [W(1)][21.1%][eta 03m:56s]
> > write_throughput_i: (groupid=0, jobs=1): err= 0: pid=235368: Wed Apr  3
> > 21:01:16 2024
> >    write: IOPS=3, BW=3562KiB/s (3647kB/s)(211MiB/60664msec); 0 zone
> > resets
>
> Your drive does 3 I/Os per second... That is very slow... Did you have a look at
> dmesg to see if there is anything going on with the driver ?
>
> You may want to start with testing 4K writes and increasing the block size from
> there (8k, 16k, 32k, ...) and see if there is a value that triggers the slow
> behavior.
>
> But this all look like a HW issue or FS/driver issue. What type of SSD is this ?
> NVMe ? ATA ?
>
> >      slat (msec): min=16, max=1777, avg=287.56, stdev=282.15
> >      clat (nsec): min=10930, max=31449, avg=12804.00, stdev=3289.32
> >       lat (msec): min=16, max=1777, avg=288.86, stdev=282.20
> >      clat percentiles (nsec):
> >       |  1.00th=[11072],  5.00th=[11328], 10.00th=[11456],
> > 20.00th=[11584],
> >       | 30.00th=[11840], 40.00th=[11840], 50.00th=[12096],
> > 60.00th=[12224],
> >       | 70.00th=[12352], 80.00th=[12736], 90.00th=[13248],
> > 95.00th=[13760],
> >       | 99.00th=[29568], 99.50th=[30848], 99.90th=[31360],
> > 99.95th=[31360],
> >       | 99.99th=[31360]
> >     bw (  KiB/s): min= 2048, max=51302, per=100.00%, avg=4347.05,
> > stdev=6195.46, samples=99
> >     iops        : min=    2, max=   50, avg= 4.24, stdev= 6.04,
> > samples=99
> >    lat (usec)   : 20=96.19%, 50=3.81%
> >    cpu          : usr=0.10%, sys=0.32%, ctx=1748, majf=0, minf=37
> >    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >  >=64=0.0%
> >       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >  >=64=0.0%
> >       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >  >=64=0.0%
> >       issued rwts: total=0,210,0,0 short=0,0,0,0 dropped=0,0,0,0
> >       latency   : target=0, window=0, percentile=100.00%, depth=1
> >
> > Run status group 0 (all jobs):
> >    WRITE: bw=3562KiB/s (3647kB/s), 3562KiB/s-3562KiB/s
> > (3647kB/s-3647kB/s), io=211MiB (221MB), run=60664-60664msec
> >
> > For a read test, running the same parameters (changing rw=write by
> > rw=read), I get:
> >
> > read_throughput_i: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W)
> > 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
> > fio-3.33
> > Starting 1 process
> > read_throughput_i: Laying out IO file (1 file / 1024MiB)
> > Jobs: 1 (f=1): [R(1)][100.0%][r=1229MiB/s][r=1229 IOPS][eta 00m:00s]
> > read_throughput_i: (groupid=0, jobs=1): err= 0: pid=245501: Wed Apr  3
> > 21:14:27 2024
> >    read: IOPS=1283, BW=1284MiB/s (1346MB/s)(75.2GiB/60001msec)
> >      slat (usec): min=371, max=27065, avg=757.71, stdev=185.43
> >      clat (usec): min=9, max=358, avg=12.44, stdev= 3.93
> >       lat (usec): min=383, max=27078, avg=770.15, stdev=185.66
> >      clat percentiles (usec):
> >       |  1.00th=[   11],  5.00th=[   12], 10.00th=[   12], 20.00th=[
> > 12],
> >       | 30.00th=[   12], 40.00th=[   12], 50.00th=[   13], 60.00th=[
> > 13],
> >       | 70.00th=[   13], 80.00th=[   13], 90.00th=[   13], 95.00th=[
> > 14],
> >       | 99.00th=[   24], 99.50th=[   29], 99.90th=[   81], 99.95th=[
> > 93],
> >       | 99.99th=[  105]
> >     bw (  MiB/s): min= 1148, max= 1444, per=100.00%, avg=1285.12,
> > stdev=86.12, samples=119
> >     iops        : min= 1148, max= 1444, avg=1284.97, stdev=86.05,
> > samples=119
> >    lat (usec)   : 10=0.09%, 20=98.90%, 50=0.73%, 100=0.26%, 250=0.02%
> >    lat (usec)   : 500=0.01%
> >    cpu          : usr=3.14%, sys=95.75%, ctx=6274, majf=0, minf=37
> >    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >  >=64=0.0%
> >       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >  >=64=0.0%
> >       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >  >=64=0.0%
> >       issued rwts: total=77036,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> >       latency   : target=0, window=0, percentile=100.00%, depth=1
> >
> > Run status group 0 (all jobs):
> >     READ: bw=1284MiB/s (1346MB/s), 1284MiB/s-1284MiB/s
> > (1346MB/s-1346MB/s), io=75.2GiB (80.8GB), run=60001-60001msec
> >
> >
> > If I understand properly these results, I am getting ~4 MBps of write
> > bandwidth to disk,caused by a submission latency problem , but when
> > reading sequentially I am getting 1.3 GBps (also very weird) (previously
> > I shared another test with different parameters, multiple jobs, etc.,
> > but with the same result). I am trying to find out if this analysis is
> > correct and whether is a software or a hardware error.
> > ---
> > Felix Rubio
> > "Don't believe what you're told. Double check."
> >
> > On 2024-04-03 17:45, Patrick Goetz wrote:
> >> Hi Felix -
> >>
> >> It might be helpful to share your job file or fio command line. Not
> >> sure, I'm a neophyte as well with many questions such as what happens
> >> when the number of jobs is less than the iodepth.
> >>
> >> On 4/3/24 00:35, Felix Rubio wrote:
> >>> Hi everybody,
> >>>
> >>> I have started recently to use fio, and I am getting the following
> >>> output for sequential writes:
> >>>
> >>> write_throughput_i: (groupid=0, jobs=16): err= 0: pid=2301660: Tue Apr
> >>> 2 21:03:41 2024
> >>>    write: IOPS=2, BW=5613KiB/s (5748kB/s)(2048MiB/373607msec); 0 zone
> >>> resets
> >>>      slat (msec): min=41549, max=373605, avg=260175.71, stdev=76630.63
> >>>      clat (nsec): min=17445, max=31004, avg=20350.31, stdev=3744.64
> >>>       lat (msec): min=235566, max=373605, avg=318209.63,
> >>> stdev=32743.17
> >>>      clat percentiles (nsec):
> >>>       |  1.00th=[17536],  5.00th=[17536], 10.00th=[17792],
> >>> 20.00th=[17792],
> >>>       | 30.00th=[18048], 40.00th=[18304], 50.00th=[18304],
> >>> 60.00th=[18816],
> >>>       | 70.00th=[21632], 80.00th=[22144], 90.00th=[27008],
> >>> 95.00th=[31104],
> >>>       | 99.00th=[31104], 99.50th=[31104], 99.90th=[31104],
> >>> 99.95th=[31104],
> >>>       | 99.99th=[31104]
> >>>     bw (  MiB/s): min= 2051, max= 2051, per=100.00%, avg=2051.84,
> >>> stdev= 0.00, samples=16
> >>>     iops        : min= 2048, max= 2048, avg=2048.00, stdev= 0.00,
> >>> samples=16
> >>>    lat (usec)   : 20=68.75%, 50=31.25%
> >>>    cpu          : usr=0.00%, sys=0.02%, ctx=8350, majf=13, minf=633
> >>>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>>> =64=100.0%
> >>>       submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%,
> >>>  >=64=0.0%
> >>>       complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%,
> >>>  >=64=0.0%
> >>>       issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
> >>>       latency   : target=0, window=0, percentile=100.00%, depth=64
> >>>
> >>> Run status group 0 (all jobs):
> >>>    WRITE: bw=5613KiB/s (5748kB/s), 5613KiB/s-5613KiB/s
> >>> (5748kB/s-5748kB/s), io=2048MiB (2147MB), run=373607-373607msec
> >>>
> >>> Should I understand this correctly, the submission latency (slat) is
> >>> at minimum 41.5 seconds? I am experiencing problems with my SSD disk
> >>> (the performance is pretty low, which this seems to confirm), but now
> >>> I am wondering if this could be a problem with my OS and not my disk,
> >>> being the slat the submission latency?
> >>>
> >>> Thank you
> >
>
> --
> Damien Le Moal
> Western Digital Research
>
>


-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-04 18:43       ` Jeff Johnson
@ 2024-04-04 18:59         ` Felix Rubio
  2024-04-04 19:07           ` Jeff Johnson
  0 siblings, 1 reply; 13+ messages in thread
From: Felix Rubio @ 2024-04-04 18:59 UTC (permalink / raw
  To: fio

hey Jeff,

Good catch! I have run the following command:

fio --name=seqread --numjobs=1 --time_based --runtime=60s --ramp_time=2s 
--iodepth=8 --ioengine=libaio --direct=1 --verify=0 --group_reporting=1 
--bs=1M --rw=read --size=1G --filename=/dev/sda

(/sda and /sdd are the drives I have on one of the pools), and this is 
what I get:

seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, 
(T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr  4 20:56:06 
2024
   read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
     slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
     clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
      lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
     clat percentiles (msec):
      |  1.00th=[   21],  5.00th=[   21], 10.00th=[   21], 20.00th=[   
21],
      | 30.00th=[   21], 40.00th=[   21], 50.00th=[   21], 60.00th=[   
21],
      | 70.00th=[   21], 80.00th=[   21], 90.00th=[   21], 95.00th=[   
21],
      | 99.00th=[   25], 99.50th=[   31], 99.90th=[   48], 99.95th=[   
50],
      | 99.99th=[  368]
    bw (  KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06, 
stdev=19902.89, samples=120
    iops        : min=  210, max=  390, avg=382.55, stdev=19.43, 
samples=120
   lat (msec)   : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
   cpu          : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
   IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
    READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s), 
io=22.4GiB (24.1GB), run=60020-60020msec

Disk stats (read/write):
   sda: ios=23817/315, merge=0/0, ticks=549704/132687, in_queue=683613, 
util=99.93%

400 MBps!!! This is a number I have never experienced. I understand this 
means I need to go back to the openzfs chat/forum?

---
Felix Rubio
"Don't believe what you're told. Double check."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-04 18:59         ` Felix Rubio
@ 2024-04-04 19:07           ` Jeff Johnson
  2024-04-05  5:41             ` Felix Rubio
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Johnson @ 2024-04-04 19:07 UTC (permalink / raw
  To: Felix Rubio; +Cc: fio

Felix,

Based on your previous emails about the drive it would appear that the
hardware (ssd, cables, port) are fine and the drive performs.

Go back and run your original ZFS test on your mounted ZFS volume
directory and remove the "--direct=1" from your command as ZFS does
not yet support direct_io and disabling buffered io to the zfs
directory will have very negative impacts. This is a ZFS thing, not
your kernel or hardware.

--Jeff

On Thu, Apr 4, 2024 at 12:00 PM Felix Rubio <felix@kngnt.org> wrote:
>
> hey Jeff,
>
> Good catch! I have run the following command:
>
> fio --name=seqread --numjobs=1 --time_based --runtime=60s --ramp_time=2s
> --iodepth=8 --ioengine=libaio --direct=1 --verify=0 --group_reporting=1
> --bs=1M --rw=read --size=1G --filename=/dev/sda
>
> (/sda and /sdd are the drives I have on one of the pools), and this is
> what I get:
>
> seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
> (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
> fio-3.33
> Starting 1 process
> Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
> seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr  4 20:56:06
> 2024
>    read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
>      slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
>      clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
>       lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
>      clat percentiles (msec):
>       |  1.00th=[   21],  5.00th=[   21], 10.00th=[   21], 20.00th=[
> 21],
>       | 30.00th=[   21], 40.00th=[   21], 50.00th=[   21], 60.00th=[
> 21],
>       | 70.00th=[   21], 80.00th=[   21], 90.00th=[   21], 95.00th=[
> 21],
>       | 99.00th=[   25], 99.50th=[   31], 99.90th=[   48], 99.95th=[
> 50],
>       | 99.99th=[  368]
>     bw (  KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06,
> stdev=19902.89, samples=120
>     iops        : min=  210, max=  390, avg=382.55, stdev=19.43,
> samples=120
>    lat (msec)   : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
>    cpu          : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%,
>  >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=8
>
> Run status group 0 (all jobs):
>     READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s),
> io=22.4GiB (24.1GB), run=60020-60020msec
>
> Disk stats (read/write):
>    sda: ios=23817/315, merge=0/0, ticks=549704/132687, in_queue=683613,
> util=99.93%
>
> 400 MBps!!! This is a number I have never experienced. I understand this
> means I need to go back to the openzfs chat/forum?
>
> ---
> Felix Rubio
> "Don't believe what you're told. Double check."
>


-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-04 16:18       ` Patrick Goetz
@ 2024-04-04 23:37         ` Damien Le Moal
  0 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2024-04-04 23:37 UTC (permalink / raw
  To: Patrick Goetz, Felix Rubio, fio

On 4/5/24 01:18, Patrick Goetz wrote:
> On 4/3/24 23:28, Damien Le Moal wrote:
>>
>> Also, there is no filename= option here. Are you running this workload on the
>> ZFS file system writing to a file ? Or ar you running this against the SSD block
>> device file ?
>>
> 
> If he ran this fio test against the SSD block device, wouldn't that blow 
> up his zpool?
> 
> I read through the entirety of 
> https://fio.readthedocs.io/en/latest/fio_doc.html and still feel 
> completely clueless about many details. It kind of feels like you need a 
> good grasp of how all this is implemented in the kernel to make sense of 
> it. Does anyone have any tips for what to read in order to be able to 
> master the fio documentation?

Look at the examples in fio/examples/.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-04 19:07           ` Jeff Johnson
@ 2024-04-05  5:41             ` Felix Rubio
  2024-04-05 18:39               ` Jeff Johnson
  0 siblings, 1 reply; 13+ messages in thread
From: Felix Rubio @ 2024-04-05  5:41 UTC (permalink / raw
  To: fio

Hi Jeff,

I have done the test you suggested, and this is what came up:

read_throughput_rpool: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
read_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=904MiB/s][r=903 IOPS][eta 00m:00s]
read_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3284411: Fri Apr 
  5 07:31:18 2024
   read: IOPS=1092, BW=1092MiB/s (1145MB/s)(64.0GiB/60001msec)
     clat (usec): min=733, max=6031, avg=903.71, stdev=166.68
      lat (usec): min=734, max=6032, avg=904.88, stdev=166.84
     clat percentiles (usec):
      |  1.00th=[  742],  5.00th=[  750], 10.00th=[  750], 20.00th=[  
758],
      | 30.00th=[  766], 40.00th=[  791], 50.00th=[  824], 60.00th=[  
873],
      | 70.00th=[ 1037], 80.00th=[ 1106], 90.00th=[ 1139], 95.00th=[ 
1188],
      | 99.00th=[ 1254], 99.50th=[ 1287], 99.90th=[ 1401], 99.95th=[ 
1500],
      | 99.99th=[ 2474]
    bw (  MiB/s): min=  871, max= 1270, per=100.00%, avg=1092.97, 
stdev=166.88, samples=120
    iops        : min=  871, max= 1270, avg=1092.78, stdev=166.95, 
samples=120
   lat (usec)   : 750=6.17%, 1000=60.30%
   lat (msec)   : 2=33.51%, 4=0.02%, 10=0.01%
   cpu          : usr=2.04%, sys=97.52%, ctx=2457, majf=0, minf=37
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=65543,0,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
    READ: bw=1092MiB/s (1145MB/s), 1092MiB/s-1092MiB/s 
(1145MB/s-1145MB/s), io=64.0GiB (68.7GB), run=60001-60001msec

I take that the file was being cached (because the 1145 MBps on a single 
SSD disk is hard to believe)? For the write test, these are the results:

write_throughput_rpool: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
write_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=4100KiB/s][w=4 IOPS][eta 00m:00s]
write_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3292023: Fri 
Apr  5 07:36:11 2024
   write: IOPS=5, BW=5394KiB/s (5524kB/s)(317MiB/60174msec); 0 zone 
resets
     clat (msec): min=16, max=1063, avg=189.59, stdev=137.19
      lat (msec): min=16, max=1063, avg=189.81, stdev=137.19
     clat percentiles (msec):
      |  1.00th=[   18],  5.00th=[   22], 10.00th=[   41], 20.00th=[   
62],
      | 30.00th=[  129], 40.00th=[  144], 50.00th=[  163], 60.00th=[  
190],
      | 70.00th=[  228], 80.00th=[  268], 90.00th=[  363], 95.00th=[  
489],
      | 99.00th=[  567], 99.50th=[  592], 99.90th=[ 1062], 99.95th=[ 
1062],
      | 99.99th=[ 1062]
    bw (  KiB/s): min= 2048, max=41042, per=100.00%, avg=5534.14, 
stdev=5269.61, samples=117
    iops        : min=    2, max=   40, avg= 5.40, stdev= 5.14, 
samples=117
   lat (msec)   : 20=3.47%, 50=11.99%, 100=8.83%, 250=51.42%, 500=20.19%
   lat (msec)   : 750=3.79%, 2000=0.32%
   cpu          : usr=0.11%, sys=0.52%, ctx=2673, majf=0, minf=38
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,317,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   WRITE: bw=5394KiB/s (5524kB/s), 5394KiB/s-5394KiB/s 
(5524kB/s-5524kB/s), io=317MiB (332MB), run=60174-60174msec

Still at 6 MBps. Should move this discussion to the openzfs team, or is 
there anything else still worth testing?

Thank you very much!

---
Felix Rubio
"Don't believe what you're told. Double check."

On 2024-04-04 21:07, Jeff Johnson wrote:
> Felix,
> 
> Based on your previous emails about the drive it would appear that the
> hardware (ssd, cables, port) are fine and the drive performs.
> 
> Go back and run your original ZFS test on your mounted ZFS volume
> directory and remove the "--direct=1" from your command as ZFS does
> not yet support direct_io and disabling buffered io to the zfs
> directory will have very negative impacts. This is a ZFS thing, not
> your kernel or hardware.
> 
> --Jeff
> 
> On Thu, Apr 4, 2024 at 12:00 PM Felix Rubio <felix@kngnt.org> wrote:
>> 
>> hey Jeff,
>> 
>> Good catch! I have run the following command:
>> 
>> fio --name=seqread --numjobs=1 --time_based --runtime=60s 
>> --ramp_time=2s
>> --iodepth=8 --ioengine=libaio --direct=1 --verify=0 
>> --group_reporting=1
>> --bs=1M --rw=read --size=1G --filename=/dev/sda
>> 
>> (/sda and /sdd are the drives I have on one of the pools), and this is
>> what I get:
>> 
>> seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
>> (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
>> fio-3.33
>> Starting 1 process
>> Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
>> seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr  4 20:56:06
>> 2024
>>    read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
>>      slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
>>      clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
>>       lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
>>      clat percentiles (msec):
>>       |  1.00th=[   21],  5.00th=[   21], 10.00th=[   21], 20.00th=[
>> 21],
>>       | 30.00th=[   21], 40.00th=[   21], 50.00th=[   21], 60.00th=[
>> 21],
>>       | 70.00th=[   21], 80.00th=[   21], 90.00th=[   21], 95.00th=[
>> 21],
>>       | 99.00th=[   25], 99.50th=[   31], 99.90th=[   48], 99.95th=[
>> 50],
>>       | 99.99th=[  368]
>>     bw (  KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06,
>> stdev=19902.89, samples=120
>>     iops        : min=  210, max=  390, avg=382.55, stdev=19.43,
>> samples=120
>>    lat (msec)   : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
>>    cpu          : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
>>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%,
>>  >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>  >=64=0.0%
>>       complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%,
>>  >=64=0.0%
>>       issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>>       latency   : target=0, window=0, percentile=100.00%, depth=8
>> 
>> Run status group 0 (all jobs):
>>     READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s),
>> io=22.4GiB (24.1GB), run=60020-60020msec
>> 
>> Disk stats (read/write):
>>    sda: ios=23817/315, merge=0/0, ticks=549704/132687, 
>> in_queue=683613,
>> util=99.93%
>> 
>> 400 MBps!!! This is a number I have never experienced. I understand 
>> this
>> means I need to go back to the openzfs chat/forum?
>> 
>> ---
>> Felix Rubio
>> "Don't believe what you're told. Double check."
>> 
> 
> 
> --
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
> 
> jeff.johnson@aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
> 
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> 
> High-Performance Computing / Lustre Filesystems / Scale-out Storage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-05  5:41             ` Felix Rubio
@ 2024-04-05 18:39               ` Jeff Johnson
  2024-04-06  7:03                 ` Felix Rubio
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Johnson @ 2024-04-05 18:39 UTC (permalink / raw
  To: Felix Rubio; +Cc: fio

Felix,

You're using the psync ioengine with a queue depth of 1. This will
slow zfs down a lot. Use libaio and a queue depth of 8-16.

To temporarily disable ZFS arc caching on your volume, say your
benchmark volume is mypool/myvol mounted at /myvol you can do:
zfs get all mypool/myvol > ~/myvol-settings-save.txt
zfs set primarycache=off mypool/myvol
zfs set secondarycache=off mypool/myvol

then run your fio tests. After you are done benchmarking your ZFS
refer to the settings saved in ~/myvol-settings-save.txt to set
primarycache and secondarycache back to original.

--Jeff


On Thu, Apr 4, 2024 at 10:41 PM Felix Rubio <felix@kngnt.org> wrote:
>
> Hi Jeff,
>
> I have done the test you suggested, and this is what came up:
>
> read_throughput_rpool: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W)
> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
> fio-3.33
> Starting 1 process
> read_throughput_rpool: Laying out IO file (1 file / 1024MiB)
> Jobs: 1 (f=1): [R(1)][100.0%][r=904MiB/s][r=903 IOPS][eta 00m:00s]
> read_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3284411: Fri Apr
>   5 07:31:18 2024
>    read: IOPS=1092, BW=1092MiB/s (1145MB/s)(64.0GiB/60001msec)
>      clat (usec): min=733, max=6031, avg=903.71, stdev=166.68
>       lat (usec): min=734, max=6032, avg=904.88, stdev=166.84
>      clat percentiles (usec):
>       |  1.00th=[  742],  5.00th=[  750], 10.00th=[  750], 20.00th=[
> 758],
>       | 30.00th=[  766], 40.00th=[  791], 50.00th=[  824], 60.00th=[
> 873],
>       | 70.00th=[ 1037], 80.00th=[ 1106], 90.00th=[ 1139], 95.00th=[
> 1188],
>       | 99.00th=[ 1254], 99.50th=[ 1287], 99.90th=[ 1401], 99.95th=[
> 1500],
>       | 99.99th=[ 2474]
>     bw (  MiB/s): min=  871, max= 1270, per=100.00%, avg=1092.97,
> stdev=166.88, samples=120
>     iops        : min=  871, max= 1270, avg=1092.78, stdev=166.95,
> samples=120
>    lat (usec)   : 750=6.17%, 1000=60.30%
>    lat (msec)   : 2=33.51%, 4=0.02%, 10=0.01%
>    cpu          : usr=2.04%, sys=97.52%, ctx=2457, majf=0, minf=37
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>  >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       issued rwts: total=65543,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>     READ: bw=1092MiB/s (1145MB/s), 1092MiB/s-1092MiB/s
> (1145MB/s-1145MB/s), io=64.0GiB (68.7GB), run=60001-60001msec
>
> I take that the file was being cached (because the 1145 MBps on a single
> SSD disk is hard to believe)? For the write test, these are the results:
>
> write_throughput_rpool: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W)
> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
> fio-3.33
> Starting 1 process
> write_throughput_rpool: Laying out IO file (1 file / 1024MiB)
> Jobs: 1 (f=1): [W(1)][100.0%][w=4100KiB/s][w=4 IOPS][eta 00m:00s]
> write_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3292023: Fri
> Apr  5 07:36:11 2024
>    write: IOPS=5, BW=5394KiB/s (5524kB/s)(317MiB/60174msec); 0 zone
> resets
>      clat (msec): min=16, max=1063, avg=189.59, stdev=137.19
>       lat (msec): min=16, max=1063, avg=189.81, stdev=137.19
>      clat percentiles (msec):
>       |  1.00th=[   18],  5.00th=[   22], 10.00th=[   41], 20.00th=[
> 62],
>       | 30.00th=[  129], 40.00th=[  144], 50.00th=[  163], 60.00th=[
> 190],
>       | 70.00th=[  228], 80.00th=[  268], 90.00th=[  363], 95.00th=[
> 489],
>       | 99.00th=[  567], 99.50th=[  592], 99.90th=[ 1062], 99.95th=[
> 1062],
>       | 99.99th=[ 1062]
>     bw (  KiB/s): min= 2048, max=41042, per=100.00%, avg=5534.14,
> stdev=5269.61, samples=117
>     iops        : min=    2, max=   40, avg= 5.40, stdev= 5.14,
> samples=117
>    lat (msec)   : 20=3.47%, 50=11.99%, 100=8.83%, 250=51.42%, 500=20.19%
>    lat (msec)   : 750=3.79%, 2000=0.32%
>    cpu          : usr=0.11%, sys=0.52%, ctx=2673, majf=0, minf=38
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>  >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       issued rwts: total=0,317,0,0 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    WRITE: bw=5394KiB/s (5524kB/s), 5394KiB/s-5394KiB/s
> (5524kB/s-5524kB/s), io=317MiB (332MB), run=60174-60174msec
>
> Still at 6 MBps. Should move this discussion to the openzfs team, or is
> there anything else still worth testing?
>
> Thank you very much!
>
> ---
> Felix Rubio
> "Don't believe what you're told. Double check."
>
> On 2024-04-04 21:07, Jeff Johnson wrote:
> > Felix,
> >
> > Based on your previous emails about the drive it would appear that the
> > hardware (ssd, cables, port) are fine and the drive performs.
> >
> > Go back and run your original ZFS test on your mounted ZFS volume
> > directory and remove the "--direct=1" from your command as ZFS does
> > not yet support direct_io and disabling buffered io to the zfs
> > directory will have very negative impacts. This is a ZFS thing, not
> > your kernel or hardware.
> >
> > --Jeff
> >
> > On Thu, Apr 4, 2024 at 12:00 PM Felix Rubio <felix@kngnt.org> wrote:
> >>
> >> hey Jeff,
> >>
> >> Good catch! I have run the following command:
> >>
> >> fio --name=seqread --numjobs=1 --time_based --runtime=60s
> >> --ramp_time=2s
> >> --iodepth=8 --ioengine=libaio --direct=1 --verify=0
> >> --group_reporting=1
> >> --bs=1M --rw=read --size=1G --filename=/dev/sda
> >>
> >> (/sda and /sdd are the drives I have on one of the pools), and this is
> >> what I get:
> >>
> >> seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
> >> (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
> >> fio-3.33
> >> Starting 1 process
> >> Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
> >> seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr  4 20:56:06
> >> 2024
> >>    read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
> >>      slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
> >>      clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
> >>       lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
> >>      clat percentiles (msec):
> >>       |  1.00th=[   21],  5.00th=[   21], 10.00th=[   21], 20.00th=[
> >> 21],
> >>       | 30.00th=[   21], 40.00th=[   21], 50.00th=[   21], 60.00th=[
> >> 21],
> >>       | 70.00th=[   21], 80.00th=[   21], 90.00th=[   21], 95.00th=[
> >> 21],
> >>       | 99.00th=[   25], 99.50th=[   31], 99.90th=[   48], 99.95th=[
> >> 50],
> >>       | 99.99th=[  368]
> >>     bw (  KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06,
> >> stdev=19902.89, samples=120
> >>     iops        : min=  210, max=  390, avg=382.55, stdev=19.43,
> >> samples=120
> >>    lat (msec)   : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
> >>    cpu          : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
> >>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%,
> >>  >=64=0.0%
> >>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>  >=64=0.0%
> >>       complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>  >=64=0.0%
> >>       issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> >>       latency   : target=0, window=0, percentile=100.00%, depth=8
> >>
> >> Run status group 0 (all jobs):
> >>     READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s),
> >> io=22.4GiB (24.1GB), run=60020-60020msec
> >>
> >> Disk stats (read/write):
> >>    sda: ios=23817/315, merge=0/0, ticks=549704/132687,
> >> in_queue=683613,
> >> util=99.93%
> >>
> >> 400 MBps!!! This is a number I have never experienced. I understand
> >> this
> >> means I need to go back to the openzfs chat/forum?
> >>
> >> ---
> >> Felix Rubio
> >> "Don't believe what you're told. Double check."
> >>
> >
> >
> > --
> > ------------------------------
> > Jeff Johnson
> > Co-Founder
> > Aeon Computing
> >
> > jeff.johnson@aeoncomputing.com
> > www.aeoncomputing.com
> > t: 858-412-3810 x1001   f: 858-412-3845
> > m: 619-204-9061
> >
> > 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> >
> > High-Performance Computing / Lustre Filesystems / Scale-out Storage
>


-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help understanding the output of fio
  2024-04-05 18:39               ` Jeff Johnson
@ 2024-04-06  7:03                 ` Felix Rubio
  0 siblings, 0 replies; 13+ messages in thread
From: Felix Rubio @ 2024-04-06  7:03 UTC (permalink / raw
  To: fio

Hi Jeff,

I have run the write test using libaio, and with the primary and 
secondary caches set to "none"
fio --name=write_throughput_rpool --numjobs=1 --time_based --runtime=60s 
--ramp_time=2s --iodepth=8 --ioengine=libaio --verify=0 
--group_reporting=1 --bs=16k --rw=write --size=1G 
--filename=/var/cache/test

These are the results:
fio-3.33
Starting 1 process
write_throughput_rpool: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [W(1)][51.6%][w=256KiB/s][w=16 IOPS][eta 00m:59s]
write_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=1330097: Sat 
Apr  6 08:56:27 2024
   write: IOPS=550, BW=8808KiB/s (9019kB/s)(519MiB/60315msec); 0 zone 
resets
     slat (usec): min=185, max=2865.8k, avg=1810.21, stdev=32343.38
     clat (usec): min=19, max=2869.5k, avg=12720.49, stdev=85213.20
      lat (msec): min=2, max=2870, avg=14.53, stdev=91.03
     clat percentiles (msec):
      |  1.00th=[    3],  5.00th=[    3], 10.00th=[    4], 20.00th=[    
5],
      | 30.00th=[    6], 40.00th=[    6], 50.00th=[    7], 60.00th=[    
8],
      | 70.00th=[    9], 80.00th=[   10], 90.00th=[   13], 95.00th=[   
16],
      | 99.00th=[   34], 99.50th=[  313], 99.90th=[ 1217], 99.95th=[ 
1687],
      | 99.99th=[ 2869]
    bw (  KiB/s): min=  256, max=42068, per=100.00%, avg=11304.49, 
stdev=9239.48, samples=94
    iops        : min=   16, max= 2629, avg=706.47, stdev=577.45, 
samples=94
   lat (usec)   : 20=0.01%
   lat (msec)   : 2=0.07%, 4=14.03%, 10=68.18%, 20=15.56%, 50=1.40%
   lat (msec)   : 100=0.23%, 250=0.04%, 500=0.06%, 750=0.13%, 1000=0.06%
   lat (msec)   : 2000=0.21%, >=2000=0.04%
   cpu          : usr=0.46%, sys=2.57%, ctx=33743, majf=0, minf=38
   IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,33194,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   WRITE: bw=8808KiB/s (9019kB/s), 8808KiB/s-8808KiB/s 
(9019kB/s-9019kB/s), io=519MiB (544MB), run=60315-60315msec

Should I use multiple jobs, running
fio --name=write_throughput_rpool --numjobs=8 --time_based --runtime=60s 
--ramp_time=2s --iodepth=8 --ioengine=libaio --verify=0 
--group_reporting=1 --bs=16k --rw=write --size=128M 
--filename=/var/cache/test

I get the following results:
write_throughput_rpool: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 
16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=8
...
fio-3.33
Starting 8 processes
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
write_throughput_rpool: Laying out IO file (1 file / 128MiB)
Jobs: 8 (f=2): [f(4),W(1),f(2),W(1)][35.6%][w=2818KiB/s][w=176 IOPS][eta 
01m:54s]
write_throughput_rpool: (groupid=0, jobs=8): err= 0: pid=1338427: Sat 
Apr  6 09:02:00 2024
   write: IOPS=543, BW=8712KiB/s (8921kB/s)(511MiB/60023msec); 0 zone 
resets
     slat (usec): min=733, max=2260.7k, avg=14703.28, stdev=63546.14
     clat (usec): min=4, max=2385.0k, avg=102868.92, stdev=164897.88
      lat (usec): min=774, max=2405.7k, avg=117555.81, stdev=175738.91
     clat percentiles (msec):
      |  1.00th=[   19],  5.00th=[   29], 10.00th=[   36], 20.00th=[   
44],
      | 30.00th=[   50], 40.00th=[   55], 50.00th=[   62], 60.00th=[   
70],
      | 70.00th=[   81], 80.00th=[  103], 90.00th=[  133], 95.00th=[  
359],
      | 99.00th=[ 1045], 99.50th=[ 1167], 99.90th=[ 1385], 99.95th=[ 
1754],
      | 99.99th=[ 2366]
    bw (  KiB/s): min=  384, max=33551, per=100.00%, avg=9727.65, 
stdev=850.37, samples=853
    iops        : min=   24, max= 2095, avg=607.90, stdev=53.14, 
samples=853
   lat (usec)   : 10=0.01%, 20=0.02%, 1000=0.01%
   lat (msec)   : 2=0.01%, 4=0.02%, 10=0.10%, 20=1.01%, 50=30.81%
   lat (msec)   : 100=47.01%, 250=15.29%, 500=2.31%, 750=2.00%, 
1000=0.53%
   lat (msec)   : 2000=1.04%, >=2000=0.02%
   cpu          : usr=0.11%, sys=0.45%, ctx=33751, majf=0, minf=295
   IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,32620,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   WRITE: bw=8712KiB/s (8921kB/s), 8712KiB/s-8712KiB/s 
(8921kB/s-8921kB/s), io=511MiB (535MB), run=60023-60023msec

So... I am getting more or less the same bandwidths in both cases.

---
Felix Rubio
"Don't believe what you're told. Double check."

On 2024-04-05 20:39, Jeff Johnson wrote:
> Felix,
> 
> You're using the psync ioengine with a queue depth of 1. This will
> slow zfs down a lot. Use libaio and a queue depth of 8-16.
> 
> To temporarily disable ZFS arc caching on your volume, say your
> benchmark volume is mypool/myvol mounted at /myvol you can do:
> zfs get all mypool/myvol > ~/myvol-settings-save.txt
> zfs set primarycache=off mypool/myvol
> zfs set secondarycache=off mypool/myvol
> 
> then run your fio tests. After you are done benchmarking your ZFS
> refer to the settings saved in ~/myvol-settings-save.txt to set
> primarycache and secondarycache back to original.
> 
> --Jeff
> 
> 
> On Thu, Apr 4, 2024 at 10:41 PM Felix Rubio <felix@kngnt.org> wrote:
>> 
>> Hi Jeff,
>> 
>> I have done the test you suggested, and this is what came up:
>> 
>> read_throughput_rpool: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W)
>> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
>> fio-3.33
>> Starting 1 process
>> read_throughput_rpool: Laying out IO file (1 file / 1024MiB)
>> Jobs: 1 (f=1): [R(1)][100.0%][r=904MiB/s][r=903 IOPS][eta 00m:00s]
>> read_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3284411: Fri 
>> Apr
>>   5 07:31:18 2024
>>    read: IOPS=1092, BW=1092MiB/s (1145MB/s)(64.0GiB/60001msec)
>>      clat (usec): min=733, max=6031, avg=903.71, stdev=166.68
>>       lat (usec): min=734, max=6032, avg=904.88, stdev=166.84
>>      clat percentiles (usec):
>>       |  1.00th=[  742],  5.00th=[  750], 10.00th=[  750], 20.00th=[
>> 758],
>>       | 30.00th=[  766], 40.00th=[  791], 50.00th=[  824], 60.00th=[
>> 873],
>>       | 70.00th=[ 1037], 80.00th=[ 1106], 90.00th=[ 1139], 95.00th=[
>> 1188],
>>       | 99.00th=[ 1254], 99.50th=[ 1287], 99.90th=[ 1401], 99.95th=[
>> 1500],
>>       | 99.99th=[ 2474]
>>     bw (  MiB/s): min=  871, max= 1270, per=100.00%, avg=1092.97,
>> stdev=166.88, samples=120
>>     iops        : min=  871, max= 1270, avg=1092.78, stdev=166.95,
>> samples=120
>>    lat (usec)   : 750=6.17%, 1000=60.30%
>>    lat (msec)   : 2=33.51%, 4=0.02%, 10=0.01%
>>    cpu          : usr=2.04%, sys=97.52%, ctx=2457, majf=0, minf=37
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>  >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>  >=64=0.0%
>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>  >=64=0.0%
>>       issued rwts: total=65543,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>> 
>> Run status group 0 (all jobs):
>>     READ: bw=1092MiB/s (1145MB/s), 1092MiB/s-1092MiB/s
>> (1145MB/s-1145MB/s), io=64.0GiB (68.7GB), run=60001-60001msec
>> 
>> I take that the file was being cached (because the 1145 MBps on a 
>> single
>> SSD disk is hard to believe)? For the write test, these are the 
>> results:
>> 
>> write_throughput_rpool: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W)
>> 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
>> fio-3.33
>> Starting 1 process
>> write_throughput_rpool: Laying out IO file (1 file / 1024MiB)
>> Jobs: 1 (f=1): [W(1)][100.0%][w=4100KiB/s][w=4 IOPS][eta 00m:00s]
>> write_throughput_rpool: (groupid=0, jobs=1): err= 0: pid=3292023: Fri
>> Apr  5 07:36:11 2024
>>    write: IOPS=5, BW=5394KiB/s (5524kB/s)(317MiB/60174msec); 0 zone
>> resets
>>      clat (msec): min=16, max=1063, avg=189.59, stdev=137.19
>>       lat (msec): min=16, max=1063, avg=189.81, stdev=137.19
>>      clat percentiles (msec):
>>       |  1.00th=[   18],  5.00th=[   22], 10.00th=[   41], 20.00th=[
>> 62],
>>       | 30.00th=[  129], 40.00th=[  144], 50.00th=[  163], 60.00th=[
>> 190],
>>       | 70.00th=[  228], 80.00th=[  268], 90.00th=[  363], 95.00th=[
>> 489],
>>       | 99.00th=[  567], 99.50th=[  592], 99.90th=[ 1062], 99.95th=[
>> 1062],
>>       | 99.99th=[ 1062]
>>     bw (  KiB/s): min= 2048, max=41042, per=100.00%, avg=5534.14,
>> stdev=5269.61, samples=117
>>     iops        : min=    2, max=   40, avg= 5.40, stdev= 5.14,
>> samples=117
>>    lat (msec)   : 20=3.47%, 50=11.99%, 100=8.83%, 250=51.42%, 
>> 500=20.19%
>>    lat (msec)   : 750=3.79%, 2000=0.32%
>>    cpu          : usr=0.11%, sys=0.52%, ctx=2673, majf=0, minf=38
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>  >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>  >=64=0.0%
>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>  >=64=0.0%
>>       issued rwts: total=0,317,0,0 short=0,0,0,0 dropped=0,0,0,0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>> 
>> Run status group 0 (all jobs):
>>    WRITE: bw=5394KiB/s (5524kB/s), 5394KiB/s-5394KiB/s
>> (5524kB/s-5524kB/s), io=317MiB (332MB), run=60174-60174msec
>> 
>> Still at 6 MBps. Should move this discussion to the openzfs team, or 
>> is
>> there anything else still worth testing?
>> 
>> Thank you very much!
>> 
>> ---
>> Felix Rubio
>> "Don't believe what you're told. Double check."
>> 
>> On 2024-04-04 21:07, Jeff Johnson wrote:
>> > Felix,
>> >
>> > Based on your previous emails about the drive it would appear that the
>> > hardware (ssd, cables, port) are fine and the drive performs.
>> >
>> > Go back and run your original ZFS test on your mounted ZFS volume
>> > directory and remove the "--direct=1" from your command as ZFS does
>> > not yet support direct_io and disabling buffered io to the zfs
>> > directory will have very negative impacts. This is a ZFS thing, not
>> > your kernel or hardware.
>> >
>> > --Jeff
>> >
>> > On Thu, Apr 4, 2024 at 12:00 PM Felix Rubio <felix@kngnt.org> wrote:
>> >>
>> >> hey Jeff,
>> >>
>> >> Good catch! I have run the following command:
>> >>
>> >> fio --name=seqread --numjobs=1 --time_based --runtime=60s
>> >> --ramp_time=2s
>> >> --iodepth=8 --ioengine=libaio --direct=1 --verify=0
>> >> --group_reporting=1
>> >> --bs=1M --rw=read --size=1G --filename=/dev/sda
>> >>
>> >> (/sda and /sdd are the drives I have on one of the pools), and this is
>> >> what I get:
>> >>
>> >> seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
>> >> (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=8
>> >> fio-3.33
>> >> Starting 1 process
>> >> Jobs: 1 (f=1): [R(1)][100.0%][r=388MiB/s][r=388 IOPS][eta 00m:00s]
>> >> seqread: (groupid=0, jobs=1): err= 0: pid=2368687: Thu Apr  4 20:56:06
>> >> 2024
>> >>    read: IOPS=382, BW=383MiB/s (401MB/s)(22.4GiB/60020msec)
>> >>      slat (usec): min=17, max=3098, avg=68.94, stdev=46.04
>> >>      clat (msec): min=14, max=367, avg=20.84, stdev= 6.61
>> >>       lat (msec): min=15, max=367, avg=20.91, stdev= 6.61
>> >>      clat percentiles (msec):
>> >>       |  1.00th=[   21],  5.00th=[   21], 10.00th=[   21], 20.00th=[
>> >> 21],
>> >>       | 30.00th=[   21], 40.00th=[   21], 50.00th=[   21], 60.00th=[
>> >> 21],
>> >>       | 70.00th=[   21], 80.00th=[   21], 90.00th=[   21], 95.00th=[
>> >> 21],
>> >>       | 99.00th=[   25], 99.50th=[   31], 99.90th=[   48], 99.95th=[
>> >> 50],
>> >>       | 99.99th=[  368]
>> >>     bw (  KiB/s): min=215040, max=399360, per=100.00%, avg=392047.06,
>> >> stdev=19902.89, samples=120
>> >>     iops        : min=  210, max=  390, avg=382.55, stdev=19.43,
>> >> samples=120
>> >>    lat (msec)   : 20=0.19%, 50=99.80%, 100=0.01%, 500=0.03%
>> >>    cpu          : usr=0.39%, sys=1.93%, ctx=45947, majf=0, minf=37
>> >>    IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%,
>> >>  >=64=0.0%
>> >>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >>  >=64=0.0%
>> >>       complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >>  >=64=0.0%
>> >>       issued rwts: total=22954,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>> >>       latency   : target=0, window=0, percentile=100.00%, depth=8
>> >>
>> >> Run status group 0 (all jobs):
>> >>     READ: bw=383MiB/s (401MB/s), 383MiB/s-383MiB/s (401MB/s-401MB/s),
>> >> io=22.4GiB (24.1GB), run=60020-60020msec
>> >>
>> >> Disk stats (read/write):
>> >>    sda: ios=23817/315, merge=0/0, ticks=549704/132687,
>> >> in_queue=683613,
>> >> util=99.93%
>> >>
>> >> 400 MBps!!! This is a number I have never experienced. I understand
>> >> this
>> >> means I need to go back to the openzfs chat/forum?
>> >>
>> >> ---
>> >> Felix Rubio
>> >> "Don't believe what you're told. Double check."
>> >>
>> >
>> >
>> > --
>> > ------------------------------
>> > Jeff Johnson
>> > Co-Founder
>> > Aeon Computing
>> >
>> > jeff.johnson@aeoncomputing.com
>> > www.aeoncomputing.com
>> > t: 858-412-3810 x1001   f: 858-412-3845
>> > m: 619-204-9061
>> >
>> > 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>> >
>> > High-Performance Computing / Lustre Filesystems / Scale-out Storage
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-06  7:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03  5:35 help understanding the output of fio Felix Rubio
2024-04-03 15:45 ` Patrick Goetz
2024-04-03 19:21   ` Felix Rubio
2024-04-04  4:28     ` Damien Le Moal
2024-04-04 16:18       ` Patrick Goetz
2024-04-04 23:37         ` Damien Le Moal
     [not found]       ` <4d12ea060363d934d84201f834b21a38@kngnt.org>
     [not found]         ` <e9efa7bb-f2b3-47f0-aa03-5191f85562e8@kernel.org>
2024-04-04 18:29           ` Felix Rubio
2024-04-04 18:43       ` Jeff Johnson
2024-04-04 18:59         ` Felix Rubio
2024-04-04 19:07           ` Jeff Johnson
2024-04-05  5:41             ` Felix Rubio
2024-04-05 18:39               ` Jeff Johnson
2024-04-06  7:03                 ` Felix Rubio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.