All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] mm/zswap & crypto/compress: remove a couple of memcpy
@ 2024-02-20  6:44 Barry Song
  2024-02-20  6:44 ` [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep Barry Song
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Barry Song @ 2024-02-20  6:44 UTC (permalink / raw
  To: akpm, davem, hannes, herbert, linux-crypto, linux-mm, nphamcs,
	yosryahmed, zhouchengming
  Cc: chriscli, chrisl, ddstreet, linux-kernel, sjenning, vitaly.wool,
	Barry Song

From: Barry Song <v-songbaohua@oppo.com>

The patchset removes a couple of memcpy in zswap and crypto
to improve zswap's performance.

Thanks for Chengming Zhou's test and perf data.
Quote from Chengming,
 I just tested these three patches on my server, found improvement in the
 kernel build testcase on a tmpfs with zswap (lz4 + zsmalloc) enabled.
 
         mm-stable 501a06fe8e4c  patched
 real    1m38.028s               1m32.317s
 user    19m11.482s              18m39.439s
 sys     19m26.445s              17m5.646s


This patchset applies to mm-unstable as recently zswap has
lots of change.

-v5:
  * remove the helper of exposing algorithm flags, alternative directly
    expose acomp_is_async() by test ASYNC flag according to Herbert;
  * remove the fixes of cra_flags for intel and hisilicon async drivers,
    they are separated patches[1] according to Herbert

[1] https://lore.kernel.org/linux-crypto/20240220044222.197614-1-v-songbaohua@oppo.com/

Barry Song (3):
  crypto: introduce: acomp_is_async to expose if comp drivers might
    sleep
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: scompress: remove memcpy if sg_nents is 1

 crypto/scompress.c         | 36 +++++++++++++++++++++++++++++-------
 include/crypto/acompress.h |  6 ++++++
 mm/zswap.c                 |  6 ++++--
 3 files changed, 39 insertions(+), 9 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  2024-02-20  6:44 [PATCH v5 0/3] mm/zswap & crypto/compress: remove a couple of memcpy Barry Song
@ 2024-02-20  6:44 ` Barry Song
  2024-02-21  5:31   ` Herbert Xu
  2024-02-20  6:44 ` [PATCH v5 2/3] mm/zswap: remove the memcpy if acomp is not sleepable Barry Song
  2024-02-20  6:44 ` [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1 Barry Song
  2 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2024-02-20  6:44 UTC (permalink / raw
  To: akpm, davem, hannes, herbert, linux-crypto, linux-mm, nphamcs,
	yosryahmed, zhouchengming
  Cc: chriscli, chrisl, ddstreet, linux-kernel, sjenning, vitaly.wool,
	Barry Song

From: Barry Song <v-songbaohua@oppo.com>

acomp's users might want to know if acomp is really async to
optimize themselves. One typical user which can benefit from
exposed async stat is zswap.

In zswap, zsmalloc is the most commonly used allocator for
(and perhaps the only one). For zsmalloc, we cannot sleep
while we map the compressed memory, so we copy it to a
temporary buffer. By knowing the alg won't sleep can help
zswap to avoid the need for a buffer. This shows noticeable
improvement in load/store latency of zswap.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 include/crypto/acompress.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index 574cffc90730..80e243611fe2 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -160,6 +160,12 @@ static inline void acomp_request_set_tfm(struct acomp_req *req,
 	req->base.tfm = crypto_acomp_tfm(tfm);
 }
 
+static inline bool acomp_is_async(struct crypto_acomp *tfm)
+{
+	return crypto_comp_alg_common(tfm)->base.cra_flags &
+	       CRYPTO_ALG_ASYNC;
+}
+
 static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *req)
 {
 	return __crypto_acomp_tfm(req->base.tfm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 2/3] mm/zswap: remove the memcpy if acomp is not sleepable
  2024-02-20  6:44 [PATCH v5 0/3] mm/zswap & crypto/compress: remove a couple of memcpy Barry Song
  2024-02-20  6:44 ` [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep Barry Song
@ 2024-02-20  6:44 ` Barry Song
  2024-02-20  6:44 ` [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1 Barry Song
  2 siblings, 0 replies; 8+ messages in thread
From: Barry Song @ 2024-02-20  6:44 UTC (permalink / raw
  To: akpm, davem, hannes, herbert, linux-crypto, linux-mm, nphamcs,
	yosryahmed, zhouchengming
  Cc: chriscli, chrisl, ddstreet, linux-kernel, sjenning, vitaly.wool,
	Barry Song

From: Barry Song <v-songbaohua@oppo.com>

Most compressors are actually CPU-based and won't sleep during
compression and decompression. We should remove the redundant
memcpy for them.
This patch checks if the algorithm is sleepable by testing the
CRYPTO_ALG_ASYNC algorithm flag.
Generally speaking, async and sleepable are semantically similar
but not equal. But for compress drivers, they are basically equal
at least due to the below facts.
Firstly, scompress drivers - crypto/deflate.c, lz4.c, zstd.c,
lzo.c etc have no sleep. Secondly, zRAM has been using these
scompress drivers for years in atomic contexts, and never
worried those drivers going to sleep.
One exception is that an async driver can sometimes still return
synchronously per Herbert's clarification. In this case, we are
still having a redundant memcpy. But we can't know if one
particular acomp request will sleep or not unless crypto can
expose more details for each specific request from offload
drivers.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 51de79aa8659..ef782879291a 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -162,6 +162,7 @@ struct crypto_acomp_ctx {
 	struct crypto_wait wait;
 	u8 *buffer;
 	struct mutex mutex;
+	bool is_sleepable;
 };
 
 /*
@@ -973,6 +974,7 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 		goto acomp_fail;
 	}
 	acomp_ctx->acomp = acomp;
+	acomp_ctx->is_sleepable = acomp_is_async(acomp);
 
 	req = acomp_request_alloc(acomp_ctx->acomp);
 	if (!req) {
@@ -1100,7 +1102,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct page *page)
 	mutex_lock(&acomp_ctx->mutex);
 
 	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
-	if (!zpool_can_sleep_mapped(zpool)) {
+	if (acomp_ctx->is_sleepable && !zpool_can_sleep_mapped(zpool)) {
 		memcpy(acomp_ctx->buffer, src, entry->length);
 		src = acomp_ctx->buffer;
 		zpool_unmap_handle(zpool, entry->handle);
@@ -1114,7 +1116,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct page *page)
 	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
 	mutex_unlock(&acomp_ctx->mutex);
 
-	if (zpool_can_sleep_mapped(zpool))
+	if (!acomp_ctx->is_sleepable || zpool_can_sleep_mapped(zpool))
 		zpool_unmap_handle(zpool, entry->handle);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1
  2024-02-20  6:44 [PATCH v5 0/3] mm/zswap & crypto/compress: remove a couple of memcpy Barry Song
  2024-02-20  6:44 ` [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep Barry Song
  2024-02-20  6:44 ` [PATCH v5 2/3] mm/zswap: remove the memcpy if acomp is not sleepable Barry Song
@ 2024-02-20  6:44 ` Barry Song
  2024-02-21  5:35   ` Herbert Xu
  2 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2024-02-20  6:44 UTC (permalink / raw
  To: akpm, davem, hannes, herbert, linux-crypto, linux-mm, nphamcs,
	yosryahmed, zhouchengming
  Cc: chriscli, chrisl, ddstreet, linux-kernel, sjenning, vitaly.wool,
	Barry Song

From: Barry Song <v-songbaohua@oppo.com>

while sg_nents is 1 which is always true for the current kernel
as the only user - zswap is the case, we should remove two big
memcpy.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index b108a30a7600..50a487eac792 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 	struct crypto_scomp *scomp = *tfm_ctx;
 	void **ctx = acomp_request_ctx(req);
 	struct scomp_scratch *scratch;
+	void *src, *dst;
 	unsigned int dlen;
 	int ret;
 
@@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 	scratch = raw_cpu_ptr(&scomp_scratch);
 	spin_lock(&scratch->lock);
 
-	scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0);
+	if (sg_nents(req->src) == 1) {
+		src = kmap_local_page(sg_page(req->src)) + req->src->offset;
+	} else {
+		scatterwalk_map_and_copy(scratch->src, req->src, 0,
+					 req->slen, 0);
+		src = scratch->src;
+	}
+
+	if (req->dst && sg_nents(req->dst) == 1)
+		dst = kmap_local_page(sg_page(req->dst)) + req->dst->offset;
+	else
+		dst = scratch->dst;
+
 	if (dir)
-		ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
-					    scratch->dst, &req->dlen, *ctx);
+		ret = crypto_scomp_compress(scomp, src, req->slen,
+					    dst, &req->dlen, *ctx);
 	else
-		ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
-					      scratch->dst, &req->dlen, *ctx);
+		ret = crypto_scomp_decompress(scomp, src, req->slen,
+					      dst, &req->dlen, *ctx);
 	if (!ret) {
 		if (!req->dst) {
 			req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
@@ -152,10 +165,19 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
 			ret = -ENOSPC;
 			goto out;
 		}
-		scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
-					 1);
+		if (dst == scratch->dst) {
+			scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
+						 req->dlen, 1);
+		} else {
+			flush_dcache_page(sg_page(req->dst));
+		}
 	}
 out:
+	if (src != scratch->src)
+		kunmap_local(src);
+	if (dst != scratch->dst)
+		kunmap_local(dst);
+
 	spin_unlock(&scratch->lock);
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  2024-02-20  6:44 ` [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep Barry Song
@ 2024-02-21  5:31   ` Herbert Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2024-02-21  5:31 UTC (permalink / raw
  To: Barry Song
  Cc: akpm, davem, hannes, linux-crypto, linux-mm, nphamcs, yosryahmed,
	zhouchengming, chriscli, chrisl, ddstreet, linux-kernel, sjenning,
	vitaly.wool, Barry Song

On Tue, Feb 20, 2024 at 07:44:12PM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> acomp's users might want to know if acomp is really async to
> optimize themselves. One typical user which can benefit from
> exposed async stat is zswap.
> 
> In zswap, zsmalloc is the most commonly used allocator for
> (and perhaps the only one). For zsmalloc, we cannot sleep
> while we map the compressed memory, so we copy it to a
> temporary buffer. By knowing the alg won't sleep can help
> zswap to avoid the need for a buffer. This shows noticeable
> improvement in load/store latency of zswap.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  include/crypto/acompress.h | 6 ++++++
>  1 file changed, 6 insertions(+)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1
  2024-02-20  6:44 ` [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1 Barry Song
@ 2024-02-21  5:35   ` Herbert Xu
  2024-02-21  5:55     ` Barry Song
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2024-02-21  5:35 UTC (permalink / raw
  To: Barry Song
  Cc: akpm, davem, hannes, linux-crypto, linux-mm, nphamcs, yosryahmed,
	zhouchengming, chriscli, chrisl, ddstreet, linux-kernel, sjenning,
	vitaly.wool, Barry Song

On Tue, Feb 20, 2024 at 07:44:14PM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> while sg_nents is 1 which is always true for the current kernel
> as the only user - zswap is the case, we should remove two big
> memcpy.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
>  1 file changed, 29 insertions(+), 7 deletions(-)

This patch is independent of the other two.  Please split it
out so I can apply it directly.

> @@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>  	scratch = raw_cpu_ptr(&scomp_scratch);
>  	spin_lock(&scratch->lock);
>  
> -	scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0);
> +	if (sg_nents(req->src) == 1) {
> +		src = kmap_local_page(sg_page(req->src)) + req->src->offset;

What if the SG entry is longer than PAGE_SIZE (or indeed crosses a
page boundary)? I think the test needs to be strengthened.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1
  2024-02-21  5:35   ` Herbert Xu
@ 2024-02-21  5:55     ` Barry Song
  2024-02-21  6:10       ` Barry Song
  0 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2024-02-21  5:55 UTC (permalink / raw
  To: Herbert Xu, akpm
  Cc: davem, hannes, linux-crypto, linux-mm, nphamcs, yosryahmed,
	zhouchengming, chriscli, chrisl, ddstreet, linux-kernel, sjenning,
	vitaly.wool, Barry Song

On Wed, Feb 21, 2024 at 6:35 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, Feb 20, 2024 at 07:44:14PM +1300, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > while sg_nents is 1 which is always true for the current kernel
> > as the only user - zswap is the case, we should remove two big
> > memcpy.
> >
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
> > ---
> >  crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
> >  1 file changed, 29 insertions(+), 7 deletions(-)
>
> This patch is independent of the other two.  Please split it
> out so I can apply it directly.

Ok. OTOH, patch 3/3 has no dependency with other patches. so patch
3/3 should be perfectly applicable to crypto :-)

Hi Andrew,
Would you please handle patch 1/3 and 2/3 in mm-tree given Herbert's ack on
1/3?

>
> > @@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >       scratch = raw_cpu_ptr(&scomp_scratch);
> >       spin_lock(&scratch->lock);
> >
> > -     scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0);
> > +     if (sg_nents(req->src) == 1) {
> > +             src = kmap_local_page(sg_page(req->src)) + req->src->offset;
>
> What if the SG entry is longer than PAGE_SIZE (or indeed crosses a
> page boundary)? I think the test needs to be strengthened.

I don't understand what is the problem for a nents to cross two pages
as anyway they are contiguous in both physical and virtual addresses.
if they are not contiguous, they will be two nents.

>
> Thanks,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Thanks
Barry

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1
  2024-02-21  5:55     ` Barry Song
@ 2024-02-21  6:10       ` Barry Song
  0 siblings, 0 replies; 8+ messages in thread
From: Barry Song @ 2024-02-21  6:10 UTC (permalink / raw
  To: Herbert Xu, akpm
  Cc: davem, hannes, linux-crypto, linux-mm, nphamcs, yosryahmed,
	zhouchengming, chriscli, chrisl, ddstreet, linux-kernel, sjenning,
	vitaly.wool, Barry Song

On Wed, Feb 21, 2024 at 6:55 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Wed, Feb 21, 2024 at 6:35 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> > On Tue, Feb 20, 2024 at 07:44:14PM +1300, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@oppo.com>
> > >
> > > while sg_nents is 1 which is always true for the current kernel
> > > as the only user - zswap is the case, we should remove two big
> > > memcpy.
> > >
> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > ---
> > >  crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
> > >  1 file changed, 29 insertions(+), 7 deletions(-)
> >
> > This patch is independent of the other two.  Please split it
> > out so I can apply it directly.
>
> Ok. OTOH, patch 3/3 has no dependency with other patches. so patch
> 3/3 should be perfectly applicable to crypto :-)
>
> Hi Andrew,
> Would you please handle patch 1/3 and 2/3 in mm-tree given Herbert's ack on
> 1/3?
>
> >
> > > @@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> > >       scratch = raw_cpu_ptr(&scomp_scratch);
> > >       spin_lock(&scratch->lock);
> > >
> > > -     scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0);
> > > +     if (sg_nents(req->src) == 1) {
> > > +             src = kmap_local_page(sg_page(req->src)) + req->src->offset;
> >
> > What if the SG entry is longer than PAGE_SIZE (or indeed crosses a
> > page boundary)? I think the test needs to be strengthened.
>
> I don't understand what is the problem for a nents to cross two pages
> as anyway they are contiguous in both physical and virtual addresses.
> if they are not contiguous, they will be two nents.

second thought,  you are right.  sorry for my noise.
The test was running on a platform like arm64 without HIGHMEM. thus,
kmap_local_page always returns mapped page_address of normal
zone.

but for platforms with HIGHMEM for example arm32, x86_32 , we can't use
the virtual address of the first page as the start address of two pages though
they are physically contiguous.

I will rework on this.  ideally, we should still avoid the memcpy though
two pages are within one nents :-) we are really this case for zswap
as the dst is always two pages in case the compressed data is
longer than the original data.

>
> >
> > Thanks,
> > --
> > Email: Herbert Xu <herbert@gondor.apana.org.au>
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>

Thanks
Barry

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-21  6:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-20  6:44 [PATCH v5 0/3] mm/zswap & crypto/compress: remove a couple of memcpy Barry Song
2024-02-20  6:44 ` [PATCH v5 1/3] crypto: introduce: acomp_is_async to expose if comp drivers might sleep Barry Song
2024-02-21  5:31   ` Herbert Xu
2024-02-20  6:44 ` [PATCH v5 2/3] mm/zswap: remove the memcpy if acomp is not sleepable Barry Song
2024-02-20  6:44 ` [PATCH v5 3/3] crypto: scompress: remove memcpy if sg_nents is 1 Barry Song
2024-02-21  5:35   ` Herbert Xu
2024-02-21  5:55     ` Barry Song
2024-02-21  6:10       ` Barry Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.