From: Phillip Wood <phillip.wood123@gmail.com>
To: Patrick Steinhardt <ps@pks.im>, git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>, Toon Claes <toon@iotcl.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
Date: Mon, 5 Jun 2023 16:47:14 +0100 [thread overview]
Message-ID: <9900512f-b0da-2e47-f1ab-ed51ec2c78ff@gmail.com> (raw)
In-Reply-To: <07a7c34615ec68fa42c725fd34d6144b6b191f03.1685710884.git.ps@pks.im>
Hi Patrick
On 02/06/2023 14:02, Patrick Steinhardt wrote:
> In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with
> `-z`, 2022-07-22), we have introduced a new mode to read the input via
> NUL-delimited records instead of newline-delimited records. This allows
> the user to query for revisions that have newlines in their path
> component. While unusual, such queries are perfectly valid and thus it
> is clear that we should be able to support them properly.
>
> Unfortunately, the commit only changed the input to be NUL-delimited,
> but didn't change the output at the same time. While this is fine for
> queries that are processed successfully, it is less so for queries that
> aren't. In the case of missing commits for example the result can become
> entirely unparsable:
>
> ```
> $ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
> git cat-file --batch -z
> 7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
> 1234567890
>
> commit missing
> ```
>
> This is of course a crafted query that is intentionally gaming the
> deficiency, but more benign queries that contain newlines would have
> similar problems.
>
> Ideally, we should have also changed the output to be NUL-delimited when
> `-z` is specified to avoid this problem. As the input is NUL-delimited,
> it is clear that the output in this case cannot ever contain NUL
> characters by itself. Furthermore, Git does not allow NUL characters in
> revisions anyway, further stressing the point that using NUL-delimited
> output is safe. The only exception is of course the object data itself,
> but as git-cat-file(1) prints the size of the object data clients should
> read until that specified size has been consumed.
>
> But even though `-z` has only been introduced a few releases ago in Git
> v2.38.0, changing the output format retroactively to also NUL-delimit
> output would be a backwards incompatible change. And while one could
> make the argument that the output is inherently broken already, we need
> to assume that there are existing users out there that use it just fine
> given that revisions containing newlines are quite exotic.
>
> Instead, introduce a new option `-Z` that switches to NUL-delimited
> input and output. The old `-z` option is marked as deprecated with a
> hint that its output may become unparsable.
The commit message explains the problem well, I agree adding a new
option is the cleanest solution.
> Co-authored-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> Documentation/git-cat-file.txt | 13 +++-
> builtin/cat-file.c | 55 +++++++++------
> t/t1006-cat-file.sh | 123 ++++++++++++++++++++++++---------
> 3 files changed, 137 insertions(+), 54 deletions(-)
>
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 411de2e27d..b1f48fdfb1 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
> 'git cat-file' (-t | -s) [--allow-unknown-type] <object>
> 'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
> [--buffer] [--follow-symlinks] [--unordered]
> - [--textconv | --filters] [-z]
> + [--textconv | --filters] [-z] [-Z]
> 'git cat-file' (--textconv | --filters)
> [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
>
> @@ -246,6 +246,12 @@ respectively print:
> -z::
> Only meaningful with `--batch`, `--batch-check`, or
> `--batch-command`; input is NUL-delimited instead of
> + newline-delimited. This option is deprecated in favor of
> + `-Z` as the output can otherwise be ambiguous.
> +
> +-Z::
> + Only meaningful with `--batch`, `--batch-check`, or
> + `--batch-command`; input and output is NUL-delimited instead of
> newline-delimited.
The documentation changes look good. I wonder if we should put the
documentation for "-Z" above "-z" so users see the preferred option first.
>
> @@ -384,6 +390,11 @@ notdir SP <size> LF
> is printed when, during symlink resolution, a file is used as a
> directory name.
>
> +Alternatively, when `-Z` is passed, the line feeds in any of the above examples
> +are replaced with NUL terminators. This ensures that output will be parsable if
> +the output itself would contain a linefeed and is thus recommended for
> +scripting purposes.
> +
> CAVEATS
> -------
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 001dcb24d6..90ef407d30 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -492,17 +494,18 @@ static void batch_object_write(const char *obj_name,
> strbuf_reset(scratch);
>
> if (!opt->format) {
> - print_default_format(scratch, data);
> + print_default_format(scratch, data, opt);
> } else {
> strbuf_expand(scratch, opt->format, expand_format, data);
> - strbuf_addch(scratch, '\n');
> + strbuf_addch(scratch, opt->output_delim);
> }
>
> batch_write(opt, scratch->buf, scratch->len);
>
> if (opt->batch_mode == BATCH_MODE_CONTENTS) {
> + char buf[] = {opt->output_delim};
I found this a bit confusing, I think it would be clearer just to do
batch_write(opt, &opt->output_delim, 1);
> print_object_or_die(opt, data);
> - batch_write(opt, "\n", 1);
> + batch_write(opt, buf, 1);
> }
> }
> @@ -920,7 +927,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
> N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
> N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
> " [--buffer] [--follow-symlinks] [--unordered]\n"
> - " [--textconv | --filters] [-z]"),
> + " [--textconv | --filters] [-z] [-Z]"),
If we're recommending that people don't use '-z' then maybe we should
remove it from the synopsis and add OPT_HIDDEN to it below.
> N_("git cat-file (--textconv | --filters)\n"
> " [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
> NULL
> @@ -950,6 +957,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
> PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
> batch_option_callback),
> OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
> + OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 7b985cfded..d73a0be1b9 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -392,17 +393,18 @@ deadbeef
>
> "
>
> -batch_output="$hello_sha1 blob $hello_size
> -$hello_content
> -$commit_sha1 commit $commit_size
> -$commit_content
> -$tag_sha1 tag $tag_size
> -$tag_content
> -deadbeef missing
> - missing"
> +printf "%s\0" \
> + "$hello_sha1 blob $hello_size" \
> + "$hello_content" \
> + "$commit_sha1 commit $commit_size" \
> + "$commit_content" \
> + "$tag_sha1 tag $tag_size" \
> + "$tag_content" \
> + "deadbeef missing" \
> + " missing" >batch_output
I think writing the expected output to a file is a good change as we
always use it with test_cmp. As "-z" is deprecated I think it makes
sense to model the expected output for "-Z" and use tr for the "-z"
tests as you have done here. It looks like we have good coverage of the
new option.
Thanks for working on this
Phillip
> test_expect_success '--batch with multiple sha1s gives correct format' '
> - echo "$batch_output" >expect &&
> + tr "\0" "\n" <batch_output >expect &&
> echo_without_newline "$batch_input" >in &&
> git cat-file --batch <in >actual &&
> test_cmp expect actual
> @@ -410,11 +412,17 @@ test_expect_success '--batch with multiple sha1s gives correct format' '
>
> test_expect_success '--batch, -z with multiple sha1s gives correct format' '
> echo_without_newline_nul "$batch_input" >in &&
> - echo "$batch_output" >expect &&
> + tr "\0" "\n" <batch_output >expect &&
> git cat-file --batch -z <in >actual &&
> test_cmp expect actual
> '
>
> +test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
> + echo_without_newline_nul "$batch_input" >in &&
> + git cat-file --batch -Z <in >actual &&
> + test_cmp batch_output actual
> +'
> +
> batch_check_input="$hello_sha1
> $tree_sha1
> $commit_sha1
> @@ -423,40 +431,55 @@ deadbeef
>
> "
>
> -batch_check_output="$hello_sha1 blob $hello_size
> -$tree_sha1 tree $tree_size
> -$commit_sha1 commit $commit_size
> -$tag_sha1 tag $tag_size
> -deadbeef missing
> - missing"
> +printf "%s\0" \
> + "$hello_sha1 blob $hello_size" \
> + "$tree_sha1 tree $tree_size" \
> + "$commit_sha1 commit $commit_size" \
> + "$tag_sha1 tag $tag_size" \
> + "deadbeef missing" \
> + " missing" >batch_check_output
>
> test_expect_success "--batch-check with multiple sha1s gives correct format" '
> - echo "$batch_check_output" >expect &&
> + tr "\0" "\n" <batch_check_output >expect &&
> echo_without_newline "$batch_check_input" >in &&
> git cat-file --batch-check <in >actual &&
> test_cmp expect actual
> '
>
> test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
> - echo "$batch_check_output" >expect &&
> + tr "\0" "\n" <batch_check_output >expect &&
> echo_without_newline_nul "$batch_check_input" >in &&
> git cat-file --batch-check -z <in >actual &&
> test_cmp expect actual
> '
>
> -test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> +test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
> + echo_without_newline_nul "$batch_check_input" >in &&
> + git cat-file --batch-check -Z <in >actual &&
> + test_cmp batch_check_output actual
> +'
> +
> +test_expect_success FUNNYNAMES 'setup with newline in input' '
> touch -- "newline${LF}embedded" &&
> git add -- "newline${LF}embedded" &&
> git commit -m "file with newline embedded" &&
> test_tick &&
>
> - printf "HEAD:newline${LF}embedded" >in &&
> - git cat-file --batch-check -z <in >actual &&
> + printf "HEAD:newline${LF}embedded" >in
> +'
>
> +test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> + git cat-file --batch-check -z <in >actual &&
> echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
> test_cmp expect actual
> '
>
> +test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
> + git cat-file --batch-check -Z <in >actual &&
> + printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
> + test_cmp expect actual
> +'
> +
> batch_command_multiple_info="info $hello_sha1
> info $tree_sha1
> info $commit_sha1
> @@ -480,7 +503,13 @@ test_expect_success '--batch-command with multiple info calls gives correct form
> echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
> git cat-file --batch-command --buffer -z <in >actual &&
>
> - test_cmp expect actual
> + test_cmp expect actual &&
> +
> + echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
> + tr "\n" "\0" <expect >expect_nul &&
> + git cat-file --batch-command --buffer -Z <in >actual &&
> +
> + test_cmp expect_nul actual
> '
>
> batch_command_multiple_contents="contents $hello_sha1
> @@ -490,15 +519,15 @@ contents deadbeef
> flush"
>
> test_expect_success '--batch-command with multiple command calls gives correct format' '
> - cat >expect <<-EOF &&
> - $hello_sha1 blob $hello_size
> - $hello_content
> - $commit_sha1 commit $commit_size
> - $commit_content
> - $tag_sha1 tag $tag_size
> - $tag_content
> - deadbeef missing
> - EOF
> + printf "%s\0" \
> + "$hello_sha1 blob $hello_size" \
> + "$hello_content" \
> + "$commit_sha1 commit $commit_size" \
> + "$commit_content" \
> + "$tag_sha1 tag $tag_size" \
> + "$tag_content" \
> + "deadbeef missing" >expect_nul &&
> + tr "\0" "\n" <expect_nul >expect &&
>
> echo "$batch_command_multiple_contents" >in &&
> git cat-file --batch-command --buffer <in >actual &&
> @@ -508,7 +537,12 @@ test_expect_success '--batch-command with multiple command calls gives correct f
> echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
> git cat-file --batch-command --buffer -z <in >actual &&
>
> - test_cmp expect actual
> + test_cmp expect actual &&
> +
> + echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
> + git cat-file --batch-command --buffer -Z <in >actual &&
> +
> + test_cmp expect_nul actual
> '
>
> test_expect_success 'setup blobs which are likely to delta' '
> @@ -848,6 +882,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for brok
> test_cmp expect actual
> '
>
> +test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for broken in-repo, same-dir links' '
> + printf "HEAD:broken-same-dir-link\0" >in &&
> + printf "dangling 25\0HEAD:broken-same-dir-link\0" >expect &&
> + git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success 'git cat-file --batch-check --follow-symlinks works for same-dir links-to-links' '
> echo HEAD:link-to-link | git cat-file --batch-check --follow-symlinks >actual &&
> test_cmp found actual
> @@ -862,6 +903,15 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for pare
> test_cmp expect actual
> '
>
> +test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for parent-dir links' '
> + echo HEAD:dir/parent-dir-link | git cat-file --batch-check --follow-symlinks >actual &&
> + test_cmp found actual &&
> + printf "notdir 29\0HEAD:dir/parent-dir-link/nope\0" >expect &&
> + printf "HEAD:dir/parent-dir-link/nope\0" >in &&
> + git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success 'git cat-file --batch-check --follow-symlinks works for .. links' '
> echo dangling 22 >expect &&
> echo HEAD:dir/link-dir/nope >>expect &&
> @@ -976,6 +1026,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlink breaks loops' '
> test_cmp expect actual
> '
>
> +test_expect_success 'git cat-file --batch-check --follow-symlink -Z breaks loops' '
> + printf "loop 10\0HEAD:loop1\0" >expect &&
> + printf "HEAD:loop1\0" >in &&
> + git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success 'git cat-file --batch --follow-symlink returns correct sha and mode' '
> echo HEAD:morx | git cat-file --batch >expect &&
> echo HEAD:morx | git cat-file --batch --follow-symlinks >actual &&
next prev parent reply other threads:[~2023-06-05 15:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
2023-06-05 15:47 ` Phillip Wood [this message]
2023-06-05 23:54 ` Junio C Hamano
2023-06-06 4:52 ` Patrick Steinhardt
2023-06-06 5:22 ` Junio C Hamano
2023-06-06 5:31 ` Patrick Steinhardt
2023-06-12 19:12 ` Junio C Hamano
2023-06-06 5:00 ` Patrick Steinhardt
2023-06-06 1:23 ` Junio C Hamano
2023-06-03 1:44 ` [PATCH 0/5] cat-file: introduce NUL-terminated output format Junio C Hamano
2023-06-06 5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
2023-06-06 5:19 ` [PATCH v2 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
2023-06-06 5:19 ` [PATCH v2 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
2023-06-06 5:19 ` [PATCH v2 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
2023-06-06 5:19 ` [PATCH v2 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
2023-06-06 5:19 ` [PATCH v2 5/5] cat-file: introduce option to delimit input and output with NUL Patrick Steinhardt
2023-06-12 20:43 ` [PATCH v2 0/5] catfile: introduce NUL-terminated output format Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9900512f-b0da-2e47-f1ab-ed51ec2c78ff@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=phillip.wood@dunelm.org.uk \
--cc=ps@pks.im \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).