dmaengine Archive mirror
 help / color / mirror / Atom feed
From: Andre Glover <andre.glover@linux.intel.com>
To: tom.zanussi@linux.intel.com, minchan@kernel.org,
	senozhatsky@chromium.org, hannes@cmpxchg.org,
	yosryahmed@google.com, nphamcs@gmail.com,
	chengming.zhou@linux.dev, herbert@gondor.apana.org.au,
	davem@davemloft.net, fenghua.yu@intel.com, dave.jiang@intel.com
Cc: wajdi.k.feghali@intel.com, james.guilford@intel.com,
	vinodh.gopal@intel.com, bala.seshasayee@intel.com,
	heath.caldwell@intel.com, kanchana.p.sridhar@intel.com,
	andre.glover@linux.intel.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, 21cnbao@gmail.com, ryan.roberts@arm.com,
	linux-crypto@vger.kernel.org, dmaengine@vger.kernel.org
Subject: [RFC PATCH 0/3] by_n compression and decompression with Intel IAA
Date: Wed,  1 May 2024 14:46:26 -0700	[thread overview]
Message-ID: <cover.1714581792.git.andre.glover@linux.intel.com> (raw)


With the introduction of the 'canned' compression algorithm [1], we
see better latencies than the 'dynamic' Deflate, and a better compression
ratio than 'fixed' Deflate.

When using IAA hardware accelerated 'canned' compression, we are able to
take advantage of the IAA architecture and initiate independent
parallel operations for both compress and decompress. In support of mTHP
and large folio swap in/out, we have developed an algorithm based on
'canned' compression, called 'canned-by_n' that takes advantage of the IAA
hardware that has multiple compression and decompression engines which
creates parallelism for each single compress and/or decompress operation
thus greatly reducing latency.

When using the 'canned-by_n' algorithm, the user provides an input buffer,
an output buffer, and a parameter N. The 'canned-by_n' crypto algorithm
compresses (or decompresses) a single input buffer into a single output
buffer. This is done in such a way that the compress and decompress
operations can be parallelized into up to N parallel operations from a
single input buffer into a single output buffer.

Usage
=====

With the introduction of the 'canned-by_n' algorithm, the user would
simply do the following to initiate an operation:

struct crypto_acomp *tfm;
struct acomp_req *req;
tfm = crypto_alloc_acomp("deflate-iaa-canned-by_n", 0, 0);

....

// Ignored by non 'by_n' algorithms

req->by_n = N;

err = crypto_wait_req(crypto_acomp_compress(req), &wait);

In the above example, the only new initialization for an acomp_req would be
to specify the by_n number N, where N is a power of 2 and 1 <= N <= 64 (64
is the current limit but this can be changed to a greater value based on
the hardware capability).

Performance
===========

'Canned-by_n' compression shows promising performance improvements when
applied to recent patches pertaining to multi-sized THPs in mm-unstable
(7cca940d) -- swapping out the large folios and storing them in zram as
outlined in [2] and swapping them back in as large folios [3]. Our results
with a simple madvise-based benchmark swapping out/in folios comprised of
data folios collected from SPEC benchmarks shows an over 16x improvement in
compression latency and close to 10x in decompression latency over lzo-rle
on 64KB mTHPs. This translates to a greater than 10x improvement in zram
write latency and 7x improvement in zram read latency. The achieved
compression ratio, at 2.8 is better than that of lzo-rle. These are
achieved with 'canned-by_n' compression by_n setting of 8. See table below
for additional data.

With larger values of N, the latency of compression and decompression
drops, due to more parallelism. Concurrently, the overheads also increase
with larger N values, and start to dominate the cost after a point.
Compression ratio also drops with the increased splitting with larger
values of N.

Performance comparison for each 64KB folio with zram on Sapphire Rapids,
whose core frequency is fixed at 2500MHz, is shown below:

+------------+-------------+---------+-------------+----------+----------+
|            | Compression | Decomp  | Compression | zram     | zram     |
| Algorithm  | latency     | latency | ratio       | write    | read     |
+------------+-------------+---------+-------------+----------+----------+
|            |       Median (ns)     |             |      Median (ns)    |
+------------+-------------+---------+-------------+----------+----------+
|            |             |         |             |          |          |
| IAA by_1   | 34,493      | 20,038  | 2.93        | 40,130   | 24,478   |
| IAA by_2   | 18,830      | 11,888  | 2.93        | 24,149   | 15,536   |
| IAA by_4   | 11,364      |  8,146  | 2.90        | 16,735   | 11,469   |
| IAA by_8   |  8,344      |  6,342  | 2.77        | 13,527   |  9,177   |
| IAA by_16  |  8,837      |  6,549  | 2.33        | 15,309   |  9,547   |
| IAA by_32  | 11,153      |  9,641  | 2.19        | 16,457   | 14,086   |
| IAA by_64  | 18,272      | 16,696  | 1.96        | 24,294   | 20,048   |
|            |             |         |             |          |          |
| lz4        | 139,190     | 33,687  | 2.40        | 144,940  | 37,312   |
|            |             |         |             |          |          |
| lzo-rle    | 138,235     | 61,055  | 2.52        | 143,666  | 64,321   |
|            |             |         |             |          |          |
| zstd       | 251,820     | 90,878  | 3.40        | 256,384  | 94,328   |
+------------+-------------+---------+-------------+----------+----------+

[1] https://lore.kernel.org/all/cover.1710969449.git.andre.glover@linux.intel.com/
[2] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@gmail.com/
[3] https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/

Andre Glover (3):
  crypto: Add pre_alloc and post_free callbacks for acomp algorithms
  crypto: add by_n attribute to acomp_req
  crypto: Add deflate-canned-byN algorithm to IAA

 crypto/acompress.c                         |  13 +
 drivers/crypto/intel/iaa/iaa_crypto.h      |   9 +
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 402 ++++++++++++++++++++-
 include/crypto/acompress.h                 |   4 +
 include/crypto/internal/acompress.h        |   6 +
 5 files changed, 421 insertions(+), 13 deletions(-)

-- 
2.27.0


             reply	other threads:[~2024-05-01 21:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 21:46 Andre Glover [this message]
2024-05-01 21:46 ` [RFC PATCH 1/3] crypto: Add pre_alloc and post_free callbacks for acomp algorithms Andre Glover
2024-05-01 21:46 ` [RFC PATCH 2/3] crypto: add by_n attribute to acomp_req Andre Glover
2024-05-01 21:46 ` [RFC PATCH 3/3] crypto: Add deflate-canned-byN algorithm to IAA Andre Glover

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1714581792.git.andre.glover@linux.intel.com \
    --to=andre.glover@linux.intel.com \
    --cc=21cnbao@gmail.com \
    --cc=bala.seshasayee@intel.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=dmaengine@vger.kernel.org \
    --cc=fenghua.yu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=heath.caldwell@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=james.guilford@intel.com \
    --cc=kanchana.p.sridhar@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=tom.zanussi@linux.intel.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).