fio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Fu <vincentfu@gmail.com>
To: Surbhi Palande <csurbhi@gmail.com>, fio@vger.kernel.org
Subject: Re: Requesting help with random I/O distribution
Date: Fri, 16 Jun 2023 16:28:38 -0400	[thread overview]
Message-ID: <4cf9e355-2c30-8d4d-255d-3a7c395cc61c@gmail.com> (raw)
In-Reply-To: <CAMBkX3dN8d5kSNUDpPBBmx34+Ov7JzgK=dmgwAfeAaecLX+-7A@mail.gmail.com>

On 6/8/23 21:11, Surbhi Palande wrote:
> Hi All,
> 
> I am trying to performance test my device mapper for zoned devices; I
> am trying to repliate the 80 - 20 principle for I/O. I understand that
> this can be done in the following three ways using fio.
> 
> a) zoned - simplest of the three :
> zoned:80/20:20/80
> However, this restricts the first 20% space to get 80% of the I/O and
> vice versa. The good thing though is that zoned distribution can be
> used to achieve a russian doll effect.
> 
> b) zipf:1.2
> 
> I used fio-genzipf to visualize the random I/O pattern:
> fio-genzipf -t zipf -i 1.2 -b 4096 -g 100GiB
> Generating Zipf distribution with 1.200000 input and 100 GiB size and
> 4096 block_size.
> 
>     Rows           Hits %         Sum %           # Hits          Size
> -----------------------------------------------------------------------
> Top   5.00% 93.31% 93.31% 24459924 93.31G
> |->  10.00%  1.34% 94.65%  352314  1.34G
> |->  15.00%  0.77% 95.42%  201010 785.20M
> |->  20.00%  0.51% 95.92%  132667 518.23M
> |->  25.00%  0.47% 96.39%  122386 478.07M
> |->  30.00%  0.34% 96.73%   89402 349.23M
> |->  35.00%  0.23% 96.97%   61193 239.04M
> |->  40.00%  0.23% 97.20%   61193 239.04M
> |->  45.00%  0.23% 97.43%   61193 239.04M
> |->  50.00%  0.23% 97.67%   61193 239.04M
> |->  55.00%  0.23% 97.90%   61193 239.04M
> |->  60.00%  0.23% 98.13%   61193 239.04M
> |->  65.00%  0.23% 98.37%   61193 239.04M
> |->  70.00%  0.23% 98.60%   61193 239.04M
> |->  75.00%  0.23% 98.83%   61193 239.04M
> |->  80.00%  0.23% 99.07%   61193 239.04M
> |->  85.00%  0.23% 99.30%   61193 239.04M
> |->  90.00%  0.23% 99.53%   61193 239.04M
> |->  95.00%  0.23% 99.77%   61193 239.04M
> |-> 100.00%  0.23% 100.00%   61188 239.02M
> -----------------------------------------------------------------------
> Total 26214400
> 
> I need help with this interpretation. Does this mean that 5% of the
> LBAs get 93.31% hits, the next 5% gets 1.34% etc. It seems that way to
> me.

I haven't thoroughly digested the source code but all indications 
suggest that this is the correct interpretation.

> However, this does not have the Russian doll effect  - ie the 5% of
> the rest of 95% does not get the rest of ~93% I/O.

Add "-o 40" or "-o 100" to increase the number of rows. That way you can 
see the distribution within each 5% band. The results suggest to me that 
the distribution is skewed even within each band. If I understand 
correctly what you mean by "Russian doll effect," this distribution does 
follow that pattern to some extent, although the distribution is 
essentially flat in its tail which is inconsistent with that pattern.

Look at the zipf probability density function listed on Wikipedia and 
imagine its shape after you have removed the most frequent values. Even 
if you use a new normalizing constant, it won't have the same shape as 
the original distribution because the distance between, for example, 1/2 
and 1/3 will not be the same as the distance between 1/200 and 1/201

> Is the 5% range - scattered over the disk or is this similar to zoned
> distribution, in that a contiguous
> space gets the 93% I/O. In that case, this is similar to zoned
> distribution, right?

Try running fio and examining the offsets it produces. Then use some 
utilities to extract the offsets and analyze them:

$ fio --name=test --ioengine=null --filesize=10240 --bs=512 
--rw=randread --randrepeat=0 --random_distribution=zipf:1.2 --debug=io | 
grep complete: | cut -d ':' -f3 | cut -d ',' -f1 | cut -d '=' -f2 | sort 
| uniq -c | sort -r
       7 0x1200
       5 0x0
       3 0xc00
       2 0x1c00
       1 0x600
       1 0x2200
       1 0x1600

In each row, the first number is the count and the second number is the 
offset. Thus you can see that random_distribution=zipf produces offsets 
all over the map which is different from zoned.

> c)  pareto -
> fio-genzipf -t pareto -i 0.04 -b 4096 -g 100GiB
> Generating Pareto distribution with 0.040000 input and 100 GiB size
> and 4096 block_size.
> 
>     Rows           Hits %         Sum %           # Hits          Size
> -----------------------------------------------------------------------
> Top   5.00% 93.04% 93.04% 24388831 93.04G
> |->  10.00%  0.99% 94.02%  259143 1012.28M
> |->  15.00%  0.60% 94.63%  158285 618.30M
> |->  20.00%  0.59% 95.22%  154826 604.79M
> |->  25.00%  0.35% 95.57%   92138 359.91M
> |->  30.00%  0.30% 95.87%   77413 302.39M
> |->  35.00%  0.30% 96.16%   77413 302.39M
> |->  40.00%  0.30% 96.46%   77413 302.39M
> |->  45.00%  0.30% 96.75%   77413 302.39M
> |->  50.00%  0.30% 97.05%   77413 302.39M
> |->  55.00%  0.30% 97.34%   77413 302.39M
> |->  60.00%  0.30% 97.64%   77413 302.39M
> |->  65.00%  0.30% 97.93%   77413 302.39M
> |->  70.00%  0.30% 98.23%   77413 302.39M
> |->  75.00%  0.30% 98.52%   77413 302.39M
> |->  80.00%  0.30% 98.82%   77413 302.39M
> |->  85.00%  0.30% 99.11%   77413 302.39M
> |->  90.00%  0.30% 99.41%   77413 302.39M
> |->  95.00%  0.30% 99.70%   77413 302.39M
> |-> 100.00%  0.30% 100.00%   77395 302.32M
> -----------------------------------------------------------------------
> Total 26214400
> 
> This to me looks pretty much similar to the zipf distribution above.
> 
> Is this understanding correct ? Or am I missing something here?

There are plenty of discussions online about the relationship between 
the zipf and pareto distributions. I don't have any particular expertise 
to add to what you can already easily find.

Vincent

      reply	other threads:[~2023-06-16 20:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-09  1:11 Requesting help with random I/O distribution Surbhi Palande
2023-06-16 20:28 ` Vincent Fu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4cf9e355-2c30-8d4d-255d-3a7c395cc61c@gmail.com \
    --to=vincentfu@gmail.com \
    --cc=csurbhi@gmail.com \
    --cc=fio@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).