Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>, Toon Claes <toon@iotcl.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
Date: Mon, 5 Jun 2023 16:47:14 +0100	[thread overview]
Message-ID: <9900512f-b0da-2e47-f1ab-ed51ec2c78ff@gmail.com> (raw)
In-Reply-To: <07a7c34615ec68fa42c725fd34d6144b6b191f03.1685710884.git.ps@pks.im>

Hi Patrick

On 02/06/2023 14:02, Patrick Steinhardt wrote:
> In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with
> `-z`, 2022-07-22), we have introduced a new mode to read the input via
> NUL-delimited records instead of newline-delimited records. This allows
> the user to query for revisions that have newlines in their path
> component. While unusual, such queries are perfectly valid and thus it
> is clear that we should be able to support them properly.
> 
> Unfortunately, the commit only changed the input to be NUL-delimited,
> but didn't change the output at the same time. While this is fine for
> queries that are processed successfully, it is less so for queries that
> aren't. In the case of missing commits for example the result can become
> entirely unparsable:
> 
> ```
> $ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
>      git cat-file --batch -z
> 7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
> 1234567890
> 
> commit missing
> ```
> 
> This is of course a crafted query that is intentionally gaming the
> deficiency, but more benign queries that contain newlines would have
> similar problems.
> 
> Ideally, we should have also changed the output to be NUL-delimited when
> `-z` is specified to avoid this problem. As the input is NUL-delimited,
> it is clear that the output in this case cannot ever contain NUL
> characters by itself. Furthermore, Git does not allow NUL characters in
> revisions anyway, further stressing the point that using NUL-delimited
> output is safe. The only exception is of course the object data itself,
> but as git-cat-file(1) prints the size of the object data clients should
> read until that specified size has been consumed.
> 
> But even though `-z` has only been introduced a few releases ago in Git
> v2.38.0, changing the output format retroactively to also NUL-delimit
> output would be a backwards incompatible change. And while one could
> make the argument that the output is inherently broken already, we need
> to assume that there are existing users out there that use it just fine
> given that revisions containing newlines are quite exotic.
> 
> Instead, introduce a new option `-Z` that switches to NUL-delimited
> input and output. The old `-z` option is marked as deprecated with a
> hint that its output may become unparsable.

The commit message explains the problem well, I agree adding a new 
option is the cleanest solution.

> Co-authored-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>   Documentation/git-cat-file.txt |  13 +++-
>   builtin/cat-file.c             |  55 +++++++++------
>   t/t1006-cat-file.sh            | 123 ++++++++++++++++++++++++---------
>   3 files changed, 137 insertions(+), 54 deletions(-)
> 
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 411de2e27d..b1f48fdfb1 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
>   'git cat-file' (-t | -s) [--allow-unknown-type] <object>
>   'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
>   	     [--buffer] [--follow-symlinks] [--unordered]
> -	     [--textconv | --filters] [-z]
> +	     [--textconv | --filters] [-z] [-Z]
>   'git cat-file' (--textconv | --filters)
>   	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
>   
> @@ -246,6 +246,12 @@ respectively print:
>   -z::
>   	Only meaningful with `--batch`, `--batch-check`, or
>   	`--batch-command`; input is NUL-delimited instead of
> +	newline-delimited. This option is deprecated in favor of
> +	`-Z` as the output can otherwise be ambiguous.
> +
> +-Z::
> +	Only meaningful with `--batch`, `--batch-check`, or
> +	`--batch-command`; input and output is NUL-delimited instead of
>   	newline-delimited.

The documentation changes look good. I wonder if we should put the 
documentation for "-Z" above "-z" so users see the preferred option first.

>   
> @@ -384,6 +390,11 @@ notdir SP <size> LF
>   is printed when, during symlink resolution, a file is used as a
>   directory name.
>   
> +Alternatively, when `-Z` is passed, the line feeds in any of the above examples
> +are replaced with NUL terminators. This ensures that output will be parsable if
> +the output itself would contain a linefeed and is thus recommended for
> +scripting purposes.
> +
>   CAVEATS
>   -------
>   
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 001dcb24d6..90ef407d30 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -492,17 +494,18 @@ static void batch_object_write(const char *obj_name,
>   	strbuf_reset(scratch);
>   
>   	if (!opt->format) {
> -		print_default_format(scratch, data);
> +		print_default_format(scratch, data, opt);
>   	} else {
>   		strbuf_expand(scratch, opt->format, expand_format, data);
> -		strbuf_addch(scratch, '\n');
> +		strbuf_addch(scratch, opt->output_delim);
>   	}
>   
>   	batch_write(opt, scratch->buf, scratch->len);
>   
>   	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
> +		char buf[] = {opt->output_delim};

I found this a bit confusing, I think it would be clearer just to do

	batch_write(opt, &opt->output_delim, 1);

>   		print_object_or_die(opt, data);
> -		batch_write(opt, "\n", 1);
> +		batch_write(opt, buf, 1);
>   	}
>   }

> @@ -920,7 +927,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>   		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
>   		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
>   		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
> -		   "             [--textconv | --filters] [-z]"),
> +		   "             [--textconv | --filters] [-z] [-Z]"),

If we're recommending that people don't use '-z' then maybe we should 
remove it from the synopsis and add OPT_HIDDEN to it below.

>   		N_("git cat-file (--textconv | --filters)\n"
>   		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
>   		NULL
> @@ -950,6 +957,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>   			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>   			batch_option_callback),
>   		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
> +		OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 7b985cfded..d73a0be1b9 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -392,17 +393,18 @@ deadbeef
>   
>   "
>   
> -batch_output="$hello_sha1 blob $hello_size
> -$hello_content
> -$commit_sha1 commit $commit_size
> -$commit_content
> -$tag_sha1 tag $tag_size
> -$tag_content
> -deadbeef missing
> - missing"
> +printf "%s\0" \
> +	"$hello_sha1 blob $hello_size" \
> +	"$hello_content" \
> +	"$commit_sha1 commit $commit_size" \
> +	"$commit_content" \
> +	"$tag_sha1 tag $tag_size" \
> +	"$tag_content" \
> +	"deadbeef missing" \
> +	" missing" >batch_output

I think writing the expected output to a file is a good change as we 
always use it with test_cmp. As "-z" is deprecated I think it makes 
sense to model the expected output for "-Z" and use tr for the "-z" 
tests as you have done here. It looks like we have good coverage of the 
new option.

Thanks for working on this

Phillip

>   test_expect_success '--batch with multiple sha1s gives correct format' '
> -	echo "$batch_output" >expect &&
> +	tr "\0" "\n" <batch_output >expect &&
>   	echo_without_newline "$batch_input" >in &&
>   	git cat-file --batch <in >actual &&
>   	test_cmp expect actual
> @@ -410,11 +412,17 @@ test_expect_success '--batch with multiple sha1s gives correct format' '
>   
>   test_expect_success '--batch, -z with multiple sha1s gives correct format' '
>   	echo_without_newline_nul "$batch_input" >in &&
> -	echo "$batch_output" >expect &&
> +	tr "\0" "\n" <batch_output >expect &&
>   	git cat-file --batch -z <in >actual &&
>   	test_cmp expect actual
>   '
>   
> +test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
> +	echo_without_newline_nul "$batch_input" >in &&
> +	git cat-file --batch -Z <in >actual &&
> +	test_cmp batch_output actual
> +'
> +
>   batch_check_input="$hello_sha1
>   $tree_sha1
>   $commit_sha1
> @@ -423,40 +431,55 @@ deadbeef
>   
>   "
>   
> -batch_check_output="$hello_sha1 blob $hello_size
> -$tree_sha1 tree $tree_size
> -$commit_sha1 commit $commit_size
> -$tag_sha1 tag $tag_size
> -deadbeef missing
> - missing"
> +printf "%s\0" \
> +	"$hello_sha1 blob $hello_size" \
> +	"$tree_sha1 tree $tree_size" \
> +	"$commit_sha1 commit $commit_size" \
> +	"$tag_sha1 tag $tag_size" \
> +	"deadbeef missing" \
> +	" missing" >batch_check_output
>   
>   test_expect_success "--batch-check with multiple sha1s gives correct format" '
> -	echo "$batch_check_output" >expect &&
> +	tr "\0" "\n" <batch_check_output >expect &&
>   	echo_without_newline "$batch_check_input" >in &&
>   	git cat-file --batch-check <in >actual &&
>   	test_cmp expect actual
>   '
>   
>   test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
> -	echo "$batch_check_output" >expect &&
> +	tr "\0" "\n" <batch_check_output >expect &&
>   	echo_without_newline_nul "$batch_check_input" >in &&
>   	git cat-file --batch-check -z <in >actual &&
>   	test_cmp expect actual
>   '
>   
> -test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> +test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
> +	echo_without_newline_nul "$batch_check_input" >in &&
> +	git cat-file --batch-check -Z <in >actual &&
> +	test_cmp batch_check_output actual
> +'
> +
> +test_expect_success FUNNYNAMES 'setup with newline in input' '
>   	touch -- "newline${LF}embedded" &&
>   	git add -- "newline${LF}embedded" &&
>   	git commit -m "file with newline embedded" &&
>   	test_tick &&
>   
> -	printf "HEAD:newline${LF}embedded" >in &&
> -	git cat-file --batch-check -z <in >actual &&
> +	printf "HEAD:newline${LF}embedded" >in
> +'
>   
> +test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> +	git cat-file --batch-check -z <in >actual &&
>   	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
>   	test_cmp expect actual
>   '
>   
> +test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
> +	git cat-file --batch-check -Z <in >actual &&
> +	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
> +	test_cmp expect actual
> +'
> +
>   batch_command_multiple_info="info $hello_sha1
>   info $tree_sha1
>   info $commit_sha1
> @@ -480,7 +503,13 @@ test_expect_success '--batch-command with multiple info calls gives correct form
>   	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
>   	git cat-file --batch-command --buffer -z <in >actual &&
>   
> -	test_cmp expect actual
> +	test_cmp expect actual &&
> +
> +	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
> +	tr "\n" "\0" <expect >expect_nul &&
> +	git cat-file --batch-command --buffer -Z <in >actual &&
> +
> +	test_cmp expect_nul actual
>   '
>   
>   batch_command_multiple_contents="contents $hello_sha1
> @@ -490,15 +519,15 @@ contents deadbeef
>   flush"
>   
>   test_expect_success '--batch-command with multiple command calls gives correct format' '
> -	cat >expect <<-EOF &&
> -	$hello_sha1 blob $hello_size
> -	$hello_content
> -	$commit_sha1 commit $commit_size
> -	$commit_content
> -	$tag_sha1 tag $tag_size
> -	$tag_content
> -	deadbeef missing
> -	EOF
> +	printf "%s\0" \
> +		"$hello_sha1 blob $hello_size" \
> +		"$hello_content" \
> +		"$commit_sha1 commit $commit_size" \
> +		"$commit_content" \
> +		"$tag_sha1 tag $tag_size" \
> +		"$tag_content" \
> +		"deadbeef missing" >expect_nul &&
> +	tr "\0" "\n" <expect_nul >expect &&
>   
>   	echo "$batch_command_multiple_contents" >in &&
>   	git cat-file --batch-command --buffer <in >actual &&
> @@ -508,7 +537,12 @@ test_expect_success '--batch-command with multiple command calls gives correct f
>   	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
>   	git cat-file --batch-command --buffer -z <in >actual &&
>   
> -	test_cmp expect actual
> +	test_cmp expect actual &&
> +
> +	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
> +	git cat-file --batch-command --buffer -Z <in >actual &&
> +
> +	test_cmp expect_nul actual
>   '
>   
>   test_expect_success 'setup blobs which are likely to delta' '
> @@ -848,6 +882,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for brok
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for broken in-repo, same-dir links' '
> +	printf "HEAD:broken-same-dir-link\0" >in &&
> +	printf "dangling 25\0HEAD:broken-same-dir-link\0" >expect &&
> +	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'git cat-file --batch-check --follow-symlinks works for same-dir links-to-links' '
>   	echo HEAD:link-to-link | git cat-file --batch-check --follow-symlinks >actual &&
>   	test_cmp found actual
> @@ -862,6 +903,15 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for pare
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for parent-dir links' '
> +	echo HEAD:dir/parent-dir-link | git cat-file --batch-check --follow-symlinks >actual &&
> +	test_cmp found actual &&
> +	printf "notdir 29\0HEAD:dir/parent-dir-link/nope\0" >expect &&
> +	printf "HEAD:dir/parent-dir-link/nope\0" >in &&
> +	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'git cat-file --batch-check --follow-symlinks works for .. links' '
>   	echo dangling 22 >expect &&
>   	echo HEAD:dir/link-dir/nope >>expect &&
> @@ -976,6 +1026,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlink breaks loops' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'git cat-file --batch-check --follow-symlink -Z breaks loops' '
> +	printf "loop 10\0HEAD:loop1\0" >expect &&
> +	printf "HEAD:loop1\0" >in &&
> +	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'git cat-file --batch --follow-symlink returns correct sha and mode' '
>   	echo HEAD:morx | git cat-file --batch >expect &&
>   	echo HEAD:morx | git cat-file --batch --follow-symlinks >actual &&

  reply	other threads:[~2023-06-05 15:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
2023-06-05 15:47   ` Phillip Wood [this message]
2023-06-05 23:54     ` Junio C Hamano
2023-06-06  4:52       ` Patrick Steinhardt
2023-06-06  5:22         ` Junio C Hamano
2023-06-06  5:31           ` Patrick Steinhardt
2023-06-12 19:12             ` Junio C Hamano
2023-06-06  5:00     ` Patrick Steinhardt
2023-06-06  1:23   ` Junio C Hamano
2023-06-03  1:44 ` [PATCH 0/5] cat-file: introduce NUL-terminated output format Junio C Hamano
2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 5/5] cat-file: introduce option to delimit input and output with NUL Patrick Steinhardt
2023-06-12 20:43   ` [PATCH v2 0/5] catfile: introduce NUL-terminated output format Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9900512f-b0da-2e47-f1ab-ed51ec2c78ff@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ps@pks.im \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).