Block IO: more io-cpu-affinity results

LKML Archive mirror
 help / color / mirror / Atom feed

* Block IO: more io-cpu-affinity results
@ 2008-04-15 12:47 Alan D. Brunelle
  2008-04-15 17:04 ` Alan D. Brunelle
  0 siblings, 1 reply; 2+ messages in thread
From: Alan D. Brunelle @ 2008-04-15 12:47 UTC (permalink / raw
  To: linux-kernel; +Cc: Jens Axboe

On a 4-way IA64 box we are seeing definite improvements in overall
system responsiveness w/ the patch series currently in Jens'
io-cpu-affinity branch on his block IO git repository. In this
microbenchmark, I peg 4 processes to 4 separate processors: 2 are doing
CPU-intensive work (sqrts) and 2 are doing IO-intensive work (4KB direct
reads from RAID array cache - thus limiting physical disk accesses).

There are 2 variables: whether rq_affinity is on or off for the devices
under test for the IO-intensive procs, and whether the IO-intensive
procs are pegged onto the same CPU as is handling IRQs for its device.
The results are averaged over 4-minute runs per permutation.

When the IO-intensive procs are pegged onto the CPU that is handling
IRQs for its device, we see no real difference between rq_affinity on or
off:

rq=0 local=1     66.616 (M sqrt/sec)   12.312 (K ios/sec)
rq=1 local=1     66.616 (M sqrt/sec)   12.314 (K ios/sec)

Both see 66.616 million sqrts per second, and 12,300 IOs per second.

However, when we move the 2 IO-intensive threads onto CPUs that are not
handling its IRQs, we see a definite improvement - both in terms of the
amount of CPU-intensive work we can do (about 4%), as well as the number
of IOs per second achieved (about 1%):

rq=0 local=0     61.929 (M sqrt/sec)   11.911 (K ios/sec)
rq=1 local=0     64.386 (M sqrt/sec)   12.026 (K ios/sec)

Alan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Block IO: more io-cpu-affinity results
  2008-04-15 12:47 Block IO: more io-cpu-affinity results Alan D. Brunelle
@ 2008-04-15 17:04 ` Alan D. Brunelle
  0 siblings, 0 replies; 2+ messages in thread
From: Alan D. Brunelle @ 2008-04-15 17:04 UTC (permalink / raw
  To: linux-kernel; +Cc: Jens Axboe

Alan D. Brunelle wrote:
> On a 4-way IA64 box we are seeing definite improvements in overall
> system responsiveness w/ the patch series currently in Jens'
> io-cpu-affinity branch on his block IO git repository. In this
> microbenchmark, I peg 4 processes to 4 separate processors: 2 are doing
> CPU-intensive work (sqrts) and 2 are doing IO-intensive work (4KB direct
> reads from RAID array cache - thus limiting physical disk accesses).
> 
> There are 2 variables: whether rq_affinity is on or off for the devices
> under test for the IO-intensive procs, and whether the IO-intensive
> procs are pegged onto the same CPU as is handling IRQs for its device.
> The results are averaged over 4-minute runs per permutation.
> 
> When the IO-intensive procs are pegged onto the CPU that is handling
> IRQs for its device, we see no real difference between rq_affinity on or
> off:
> 
> rq=0 local=1     66.616 (M sqrt/sec)   12.312 (K ios/sec)
> rq=1 local=1     66.616 (M sqrt/sec)   12.314 (K ios/sec)
> 
> Both see 66.616 million sqrts per second, and 12,300 IOs per second.
> 
> However, when we move the 2 IO-intensive threads onto CPUs that are not
> handling its IRQs, we see a definite improvement - both in terms of the
> amount of CPU-intensive work we can do (about 4%), as well as the number
> of IOs per second achieved (about 1%):
> 
> rq=0 local=0     61.929 (M sqrt/sec)   11.911 (K ios/sec)
> rq=1 local=0     64.386 (M sqrt/sec)   12.026 (K ios/sec)
> 
> Alan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

This is even more noticeable on a larger system - a 16-way IA64 box - so
now 8 CPUs are doing IO-intensive and 8 are doing CPU-intensive loads.

rq=0 local=1    266.437 (M sqrt/sec)   50.018 (K ios/sec)
rq=1 local=1    266.399 (M sqrt/sec)   50.035 (K ios/sec)

rq=0 local=0    219.692 (M sqrt/sec)   39.842 (K ios/sec)
rq=1 local=0    247.406 (M sqrt/sec)   44.995 (K ios/sec)

By setting rq=1 when IOs are being remoted, we see a 12.61% improvement
on the CPU-intensive processes, and 12.93% improvement for the
IO-intensive loads.




However, if we remove the affinitization of the processes - just start
up 16 processes (8 IO-intensive + 8 CPU-intensive), and let the
scheduler associate processes w/ CPUs as normal, we see a very different
picture (single run of 4 minutes per rq value):

rq=0 local=0    261.050 (M sqrt/sec)   49.147 (K ios/sec)
rq=1 local=0    264.481 (M sqrt/sec)   42.817 (K ios/sec)

Setting rq to 1 yields about a 1.31% improvement for the CPU-intensive
tasks, but a 12.88% reduction in IO-intensive performance.




But that is subject to some initial placement randomness, doing ten
30-second runs, I'm seeing:

rq=0 M sqrt/sec: min=228.877, avg=240.043, max=256.925
rq=1 M sqrt/sec: min=237.202, avg=249.405, max=258.302

rq=0 K ios/sec : min= 46.198, avg= 47.760, max= 50.057
rq=1 K ios/sec : min= 38.076, avg= 41.007, max= 43.271

Which works out to a 14.14% decrease in ios/sec when RQ=1, with only a
3.90% increase in the CPU-intensive performance.

I'll need to do some work to see what's causing the problem in these
latter tests...

Alan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-04-15 17:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-15 12:47 Block IO: more io-cpu-affinity results Alan D. Brunelle
2008-04-15 17:04 ` Alan D. Brunelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).