[PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes
@ 2023-01-11 20:58 Eric Biggers
  2023-01-12 12:29 ` Ojaswin Mujoo
  2023-01-18 16:23 ` Zorro Lang
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Biggers @ 2023-01-11 20:58 UTC (permalink / raw
  To: fstests; +Cc: linux-fscrypt, Ojaswin Mujoo

From: Ojaswin Mujoo <ojaswin@linux.ibm.com>

Due to the assumption of the Merkle tree block size being 4K, the file
size calculated for the second test case was taking way too long to hit
EFBIG in case of larger block sizes like 64K.  Fix this by generalizing
the calculation to any Merkle tree block size >= 1K.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
v3: hashes_per_block, not hash_per_block
v2: Cleaned up the original patch from Ojaswin:
    - Explained the calculations as they are done.
    - Considered 11 levels instead of 8, to account for 1K blocks
      potentially needing up to 11 levels.
    - Increased 'scale' for real number results, and don't use 'scale'
      at all for integer number results.
    - Improved a variable name.
    - Updated commit message.

 tests/generic/692 | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tests/generic/692 b/tests/generic/692
index d6da734b..95f6ec04 100755
--- a/tests/generic/692
+++ b/tests/generic/692
@@ -51,18 +51,31 @@ _fsv_enable $fsv_file |& _filter_scratch
 #
 # The Merkle tree is stored with the leaf node level (L0) last, but it is
 # written first.  To get an interesting overflow, we need the maximum file size
-# (MAX) to be in the middle of L0 -- ideally near the beginning of L0 so that we
-# don't have to write many blocks before getting an error.
-#
-# With SHA-256 and 4K blocks, there are 128 hashes per block.  Thus, ignoring
-# padding, L0 is 1/128 of the file size while the other levels in total are
-# 1/128**2 + 1/128**3 + 1/128**4 + ... = 1/16256 of the file size.  So still
-# ignoring padding, for L0 start exactly at MAX, the file size must be s such
-# that s + s/16256 = MAX, i.e. s = MAX * (16256/16257).  Then to get a file size
-# where MAX occurs *near* the start of L0 rather than *at* the start, we can
-# just subtract an overestimate of the padding: 64K after the file contents,
-# then 4K per level, where the consideration of 8 levels is sufficient.
-sz=$(echo "scale=20; $max_sz * (16256/16257) - 65536 - 4096*8" | $BC -q | cut -d. -f1)
+# ($max_sz) to be in the middle of L0 -- ideally near the beginning of L0 so
+# that we don't have to write many blocks before getting an error.
+
+bs=$FSV_BLOCK_SIZE
+hash_size=32   # SHA-256
+hashes_per_block=$(echo "scale=30; $bs/$hash_size" | $BC -q)
+
+# Compute the proportion of the original file size that the non-leaf levels of
+# the Merkle tree take up.  Ignoring padding, this is 1/($hashes_per_block^2) +
+# 1/($hashes_per_block^3) + 1/($hashes_per_block^4) + ...  Compute it using the
+# formula for the sum of a geometric series: \sum_{k=0}^{\infty} ar^k = a/(1-r).
+a=$(echo "scale=30; 1/($hashes_per_block^2)" | $BC -q)
+r=$(echo "scale=30; 1/$hashes_per_block" | $BC -q)
+nonleaves_relative_size=$(echo "scale=30; $a/(1-$r)" | $BC -q)
+
+# Compute the original file size where the leaf level L0 starts at $max_sz.
+# Padding is still ignored, so the real value is slightly smaller than this.
+sz=$(echo "$max_sz/(1+$nonleaves_relative_size)" | $BC -q)
+
+# Finally, to get a file size where $max_sz occurs just after the start of L0
+# rather than *at* the start of L0, subtract an overestimate of the padding: 64K
+# after the file contents, then $bs per level for 11 levels.  11 levels is the
+# most levels that can ever be needed, assuming the block size is at least 1K.
+sz=$(echo "$sz - 65536 - $bs*11" | $BC -q)
+
 _fsv_scratch_begin_subtest "still too big: fail on first invalid merkle block"
 truncate -s $sz $fsv_file
 _fsv_enable $fsv_file |& _filter_scratch
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes
  2023-01-11 20:58 [PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes Eric Biggers
@ 2023-01-12 12:29 ` Ojaswin Mujoo
  2023-01-18 16:23 ` Zorro Lang
  1 sibling, 0 replies; 3+ messages in thread
From: Ojaswin Mujoo @ 2023-01-12 12:29 UTC (permalink / raw
  To: Eric Biggers; +Cc: fstests, linux-fscrypt

On Wed, Jan 11, 2023 at 12:58:28PM -0800, Eric Biggers wrote:
> From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> 
> Due to the assumption of the Merkle tree block size being 4K, the file
> size calculated for the second test case was taking way too long to hit
> EFBIG in case of larger block sizes like 64K.  Fix this by generalizing
> the calculation to any Merkle tree block size >= 1K.
> 
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Co-developed-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> v3: hashes_per_block, not hash_per_block
> v2: Cleaned up the original patch from Ojaswin:
>     - Explained the calculations as they are done.
>     - Considered 11 levels instead of 8, to account for 1K blocks
>       potentially needing up to 11 levels.
>     - Increased 'scale' for real number results, and don't use 'scale'
>       at all for integer number results.
>     - Improved a variable name.
>     - Updated commit message.
> 
>  tests/generic/692 | 37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/tests/generic/692 b/tests/generic/692
> index d6da734b..95f6ec04 100755
> --- a/tests/generic/692
> +++ b/tests/generic/692
> @@ -51,18 +51,31 @@ _fsv_enable $fsv_file |& _filter_scratch
>  #
>  # The Merkle tree is stored with the leaf node level (L0) last, but it is
>  # written first.  To get an interesting overflow, we need the maximum file size
> -# (MAX) to be in the middle of L0 -- ideally near the beginning of L0 so that we
> -# don't have to write many blocks before getting an error.
> -#
> -# With SHA-256 and 4K blocks, there are 128 hashes per block.  Thus, ignoring
> -# padding, L0 is 1/128 of the file size while the other levels in total are
> -# 1/128**2 + 1/128**3 + 1/128**4 + ... = 1/16256 of the file size.  So still
> -# ignoring padding, for L0 start exactly at MAX, the file size must be s such
> -# that s + s/16256 = MAX, i.e. s = MAX * (16256/16257).  Then to get a file size
> -# where MAX occurs *near* the start of L0 rather than *at* the start, we can
> -# just subtract an overestimate of the padding: 64K after the file contents,
> -# then 4K per level, where the consideration of 8 levels is sufficient.
> -sz=$(echo "scale=20; $max_sz * (16256/16257) - 65536 - 4096*8" | $BC -q | cut -d. -f1)
> +# ($max_sz) to be in the middle of L0 -- ideally near the beginning of L0 so
> +# that we don't have to write many blocks before getting an error.
> +
> +bs=$FSV_BLOCK_SIZE
> +hash_size=32   # SHA-256
> +hashes_per_block=$(echo "scale=30; $bs/$hash_size" | $BC -q)
> +
> +# Compute the proportion of the original file size that the non-leaf levels of
> +# the Merkle tree take up.  Ignoring padding, this is 1/($hashes_per_block^2) +
> +# 1/($hashes_per_block^3) + 1/($hashes_per_block^4) + ...  Compute it using the
> +# formula for the sum of a geometric series: \sum_{k=0}^{\infty} ar^k = a/(1-r).
> +a=$(echo "scale=30; 1/($hashes_per_block^2)" | $BC -q)
> +r=$(echo "scale=30; 1/$hashes_per_block" | $BC -q)
> +nonleaves_relative_size=$(echo "scale=30; $a/(1-$r)" | $BC -q)
> +
> +# Compute the original file size where the leaf level L0 starts at $max_sz.
> +# Padding is still ignored, so the real value is slightly smaller than this.
> +sz=$(echo "$max_sz/(1+$nonleaves_relative_size)" | $BC -q)
> +
> +# Finally, to get a file size where $max_sz occurs just after the start of L0
> +# rather than *at* the start of L0, subtract an overestimate of the padding: 64K
> +# after the file contents, then $bs per level for 11 levels.  11 levels is the
> +# most levels that can ever be needed, assuming the block size is at least 1K.
> +sz=$(echo "$sz - 65536 - $bs*11" | $BC -q)
> +
Hi Eric,

The comments look much better to me, thanks for the fix up. Tested on
powerpc with 64k and 4k tree sizes and the test passes fairly quickly so
patch looks good to me!
>  _fsv_scratch_begin_subtest "still too big: fail on first invalid merkle block"
>  truncate -s $sz $fsv_file
>  _fsv_enable $fsv_file |& _filter_scratch
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes
  2023-01-11 20:58 [PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes Eric Biggers
  2023-01-12 12:29 ` Ojaswin Mujoo
@ 2023-01-18 16:23 ` Zorro Lang
  1 sibling, 0 replies; 3+ messages in thread
From: Zorro Lang @ 2023-01-18 16:23 UTC (permalink / raw
  To: Eric Biggers; +Cc: fstests, linux-fscrypt

On Wed, Jan 11, 2023 at 12:58:28PM -0800, Eric Biggers wrote:
> From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> 
> Due to the assumption of the Merkle tree block size being 4K, the file
> size calculated for the second test case was taking way too long to hit
> EFBIG in case of larger block sizes like 64K.  Fix this by generalizing
> the calculation to any Merkle tree block size >= 1K.
> 
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Co-developed-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> v3: hashes_per_block, not hash_per_block
> v2: Cleaned up the original patch from Ojaswin:
>     - Explained the calculations as they are done.
>     - Considered 11 levels instead of 8, to account for 1K blocks
>       potentially needing up to 11 levels.
>     - Increased 'scale' for real number results, and don't use 'scale'
>       at all for integer number results.
>     - Improved a variable name.
>     - Updated commit message.

Hmm... There're two duplicated variables for "bc" --- BC and BC_PROG. I prefer
the later one. Anyway it doesn't affect this patch. I'll use another patch to
change all BC to BC_PROG, then remove BC.

Reviewed-by: Zorro Lang <zlang@redhat.com>

> 
>  tests/generic/692 | 37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/tests/generic/692 b/tests/generic/692
> index d6da734b..95f6ec04 100755
> --- a/tests/generic/692
> +++ b/tests/generic/692
> @@ -51,18 +51,31 @@ _fsv_enable $fsv_file |& _filter_scratch
>  #
>  # The Merkle tree is stored with the leaf node level (L0) last, but it is
>  # written first.  To get an interesting overflow, we need the maximum file size
> -# (MAX) to be in the middle of L0 -- ideally near the beginning of L0 so that we
> -# don't have to write many blocks before getting an error.
> -#
> -# With SHA-256 and 4K blocks, there are 128 hashes per block.  Thus, ignoring
> -# padding, L0 is 1/128 of the file size while the other levels in total are
> -# 1/128**2 + 1/128**3 + 1/128**4 + ... = 1/16256 of the file size.  So still
> -# ignoring padding, for L0 start exactly at MAX, the file size must be s such
> -# that s + s/16256 = MAX, i.e. s = MAX * (16256/16257).  Then to get a file size
> -# where MAX occurs *near* the start of L0 rather than *at* the start, we can
> -# just subtract an overestimate of the padding: 64K after the file contents,
> -# then 4K per level, where the consideration of 8 levels is sufficient.
> -sz=$(echo "scale=20; $max_sz * (16256/16257) - 65536 - 4096*8" | $BC -q | cut -d. -f1)
> +# ($max_sz) to be in the middle of L0 -- ideally near the beginning of L0 so
> +# that we don't have to write many blocks before getting an error.
> +
> +bs=$FSV_BLOCK_SIZE
> +hash_size=32   # SHA-256
> +hashes_per_block=$(echo "scale=30; $bs/$hash_size" | $BC -q)
> +
> +# Compute the proportion of the original file size that the non-leaf levels of
> +# the Merkle tree take up.  Ignoring padding, this is 1/($hashes_per_block^2) +
> +# 1/($hashes_per_block^3) + 1/($hashes_per_block^4) + ...  Compute it using the
> +# formula for the sum of a geometric series: \sum_{k=0}^{\infty} ar^k = a/(1-r).
> +a=$(echo "scale=30; 1/($hashes_per_block^2)" | $BC -q)
> +r=$(echo "scale=30; 1/$hashes_per_block" | $BC -q)
> +nonleaves_relative_size=$(echo "scale=30; $a/(1-$r)" | $BC -q)
> +
> +# Compute the original file size where the leaf level L0 starts at $max_sz.
> +# Padding is still ignored, so the real value is slightly smaller than this.
> +sz=$(echo "$max_sz/(1+$nonleaves_relative_size)" | $BC -q)
> +
> +# Finally, to get a file size where $max_sz occurs just after the start of L0
> +# rather than *at* the start of L0, subtract an overestimate of the padding: 64K
> +# after the file contents, then $bs per level for 11 levels.  11 levels is the
> +# most levels that can ever be needed, assuming the block size is at least 1K.
> +sz=$(echo "$sz - 65536 - $bs*11" | $BC -q)
> +
>  _fsv_scratch_begin_subtest "still too big: fail on first invalid merkle block"
>  truncate -s $sz $fsv_file
>  _fsv_enable $fsv_file |& _filter_scratch
> -- 
> 2.39.0
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-18 16:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-11 20:58 [PATCH v3] generic/692: generalize the test for non-4K Merkle tree block sizes Eric Biggers
2023-01-12 12:29 ` Ojaswin Mujoo
2023-01-18 16:23 ` Zorro Lang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.