Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] cat-file: introduce NUL-terminated output format
@ 2023-06-02 13:02 Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-02 13:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]

Hi,

with Git v2.38.0, we have introduced the `-z` option to git-cat-file(1)
that causes it to read NUL-terminated queries from standard input
instead of the newline-terminated queries. This fixes issues when the
query themselves contains a newline, like it can happen if e.g. we need
to pass a path-scoped revision with embedded newlines.

Unfortunately, this change is causing a new problem. When the object
cannot be looked up, then we print an error message that echoes the
input:

```
$ echo does-not-exist | git cat-file --batch
does-not-exist missing
```

Now if the input itself contains newlines then the output can become
inherently unparsable:

```
$ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
    git cat-file --batch -z
7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
1234567890

commit missing
```

Ideally, `-z` should have also switched the output to be NUL-terminated.
But it didn't, and we cannot change this issue now without doing a
backwards incompatible change. Instead, this series introduces a new
`-Z` mode that switches both input and output to be NUL-terminated to
fix the issue.

Note that Toon has sent a patch series for this issue to address the
same issue, see the thread starting at [1]. I've collaborated with him
internally at GitLab to arrive at this new patch series which thus
effectively supersedes the patches he has sent. The approach is also
different: while his patches start quoting the output, the approach
chosen by my series only changes the lines to be NUL terminated. This
should make it easier to use for scripted purposes compared to having to
de-quote the input.

I've put all folks that participated in the original thread into Cc.

Patrick

[1]: <20221209150048.2400648-1-toon@iotcl.com>

Patrick Steinhardt (5):
  t1006: don't strip timestamps from expected results
  t1006: modernize test style to use `test_cmp`
  strbuf: provide CRLF-aware helper to read until a specified delimiter
  cat-file: simplify reading from standard input
  cat-file: Introduce new option to delimit output with NUL characters

 Documentation/git-cat-file.txt |  13 +-
 builtin/cat-file.c             |  85 +++++------
 strbuf.c                       |  11 +-
 strbuf.h                       |  12 ++
 t/t1006-cat-file.sh            | 249 +++++++++++++++++++++------------
 5 files changed, 232 insertions(+), 138 deletions(-)

-- 
2.40.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] t1006: don't strip timestamps from expected results
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
@ 2023-06-02 13:02 ` Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-02 13:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7211 bytes --]

In t1006 we have a bunch of tests that verify the output format of the
git-cat-file(1) command. But while part of the output for some tests
would include commit timestamps, we don't verify those but instead strip
them before comparing expected with actual results. This is done by the
function `maybe_remove_timestamp`, which goes all the way back to the
ancient commit b335d3f121 (Add tests for git cat-file, 2008-04-23).

Our tests had been in a different shape back then. Most importantly we
didn't yet have the infrastructure to create objects with deterministic
timestamps. Nowadays we do though, and thus there is no reason anymore
to strip the timestamps.

Refactor the tests to not strip the timestamp anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t1006-cat-file.sh | 68 +++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 42 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8eac74b59c..f139d56fb4 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -109,26 +109,12 @@ strlen () {
     echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
 }
 
-maybe_remove_timestamp () {
-	if test -z "$2"; then
-		echo_without_newline "$1"
-	else
-		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
-	fi
-}
-
-remove_timestamp () {
-	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
-}
-
-
 run_tests () {
     type=$1
     sha1=$2
     size=$3
     content=$4
     pretty_content=$5
-    no_ts=$6
 
     batch_output="$sha1 $type $size
 $content"
@@ -163,21 +149,21 @@ $content"
 
     test -z "$content" ||
     test_expect_success "Content of $type is correct" '
-	maybe_remove_timestamp "$content" $no_ts >expect &&
-	maybe_remove_timestamp "$(git cat-file $type $sha1)" $no_ts >actual &&
+	echo_without_newline "$content" >expect &&
+	git cat-file $type $sha1 >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Pretty content of $type is correct" '
-	maybe_remove_timestamp "$pretty_content" $no_ts >expect &&
-	maybe_remove_timestamp "$(git cat-file -p $sha1)" $no_ts >actual &&
+	echo_without_newline "$pretty_content" >expect &&
+	git cat-file -p $sha1 >actual &&
 	test_cmp expect actual
     '
 
     test -z "$content" ||
     test_expect_success "--batch output of $type is correct" '
-	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
-	maybe_remove_timestamp "$(echo $sha1 | git cat-file --batch)" $no_ts >actual &&
+	echo "$batch_output" >expect &&
+	echo $sha1 | git cat-file --batch >actual &&
 	test_cmp expect actual
     '
 
@@ -191,9 +177,8 @@ $content"
     do
 	test -z "$content" ||
 		test_expect_success "--batch-command $opt output of $type content is correct" '
-		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
-		maybe_remove_timestamp "$(test_write_lines "contents $sha1" |
-		git cat-file --batch-command $opt)" $no_ts >actual &&
+		echo "$batch_output" >expect &&
+		test_write_lines "contents $sha1" | git cat-file --batch-command $opt >actual &&
 		test_cmp expect actual
 	'
 
@@ -228,10 +213,9 @@ $content"
     test_expect_success "--batch without type ($type)" '
 	{
 		echo "$size" &&
-		maybe_remove_timestamp "$content" $no_ts
+		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objectsize)" >actual.full &&
-	maybe_remove_timestamp "$(cat actual.full)" $no_ts >actual &&
+	echo $sha1 | git cat-file --batch="%(objectsize)" >actual &&
 	test_cmp expect actual
     '
 
@@ -239,10 +223,9 @@ $content"
     test_expect_success "--batch without size ($type)" '
 	{
 		echo "$type" &&
-		maybe_remove_timestamp "$content" $no_ts
+		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objecttype)" >actual.full &&
-	maybe_remove_timestamp "$(cat actual.full)" $no_ts >actual &&
+	echo $sha1 | git cat-file --batch="%(objecttype)" >actual &&
 	test_cmp expect actual
     '
 }
@@ -284,7 +267,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
 
 tree_sha1=$(git write-tree)
 tree_size=$(($(test_oid rawsz) + 13))
-tree_pretty_content="100644 blob $hello_sha1	hello"
+tree_pretty_content="100644 blob $hello_sha1	hello${LF}"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
@@ -292,12 +275,12 @@ commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
 commit_size=$(($(test_oid hexsz) + 137))
 commit_content="tree $tree_sha1
-author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0 +0000
-committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0 +0000
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 
 $commit_message"
 
-run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
+run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content"
 
 tag_header_without_timestamp="object $hello_sha1
 type blob
@@ -311,7 +294,7 @@ $tag_description"
 tag_sha1=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
 tag_size=$(strlen "$tag_content")
 
-run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
+run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content"
 
 test_expect_success \
     "Reach a blob from a tag pointing to it" \
@@ -400,13 +383,16 @@ deadbeef missing
  missing"
 
 test_expect_success '--batch with multiple sha1s gives correct format' '
-	test "$(maybe_remove_timestamp "$batch_output" 1)" = "$(maybe_remove_timestamp "$(echo_without_newline "$batch_input" | git cat-file --batch)" 1)"
+	echo "$batch_output" >expect &&
+	echo_without_newline "$batch_input" | git cat-file --batch >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success '--batch, -z with multiple sha1s gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
-	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
-	"$(maybe_remove_timestamp "$(git cat-file --batch -z <in)" 1)"
+	echo "$batch_output" >expect &&
+	git cat-file --batch -z <in >actual &&
+	test_cmp expect actual
 '
 
 batch_check_input="$hello_sha1
@@ -480,7 +466,7 @@ contents deadbeef
 flush"
 
 test_expect_success '--batch-command with multiple command calls gives correct format' '
-	remove_timestamp >expect <<-EOF &&
+	cat >expect <<-EOF &&
 	$hello_sha1 blob $hello_size
 	$hello_content
 	$commit_sha1 commit $commit_size
@@ -491,15 +477,13 @@ test_expect_success '--batch-command with multiple command calls gives correct f
 	EOF
 
 	echo "$batch_command_multiple_contents" >in &&
-	git cat-file --batch-command --buffer <in >actual_raw &&
+	git cat-file --batch-command --buffer <in >actual &&
 
-	remove_timestamp <actual_raw >actual &&
 	test_cmp expect actual &&
 
 	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
-	git cat-file --batch-command --buffer -z <in >actual_raw &&
+	git cat-file --batch-command --buffer -z <in >actual &&
 
-	remove_timestamp <actual_raw >actual &&
 	test_cmp expect actual
 '
 
-- 
2.40.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] t1006: modernize test style to use `test_cmp`
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
@ 2023-06-02 13:02 ` Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-02 13:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4683 bytes --]

The tests for git-cat-file(1) are quite old and haven't ever been
updated since they were introduced. They thus tend to use old idioms
that have since grown outdated. Most importantly, many of the tests use
`test $A = $B` to compare expected and actual output. This has the
downside that it is impossible to tell what exactly is different between
both versions in case the test fails.

Refactor the tests to instead use `test_cmp`. While more verbose, it
both tends to be more readable and will result in a nice diff in case
states don't match.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t1006-cat-file.sh | 70 ++++++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 23 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index f139d56fb4..7b985cfded 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -296,9 +296,11 @@ tag_size=$(strlen "$tag_content")
 
 run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content"
 
-test_expect_success \
-    "Reach a blob from a tag pointing to it" \
-    "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
+test_expect_success "Reach a blob from a tag pointing to it" '
+	echo_without_newline "$hello_content" >expect &&
+	git cat-file blob $tag_sha1 >actual &&
+	test_cmp expect actual
+'
 
 for batch in batch batch-check batch-command
 do
@@ -334,30 +336,47 @@ do
 done
 
 test_expect_success "--batch-check for a non-existent named object" '
-    test "foobar42 missing
-foobar84 missing" = \
-    "$( ( echo foobar42 && echo_without_newline foobar84 ) | git cat-file --batch-check)"
+	cat >expect <<-EOF &&
+	foobar42 missing
+	foobar84 missing
+	EOF
+
+	printf "foobar42\nfoobar84" >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch-check for a non-existent hash" '
-    test "0000000000000000000000000000000000000042 missing
-0000000000000000000000000000000000000084 missing" = \
-    "$( ( echo 0000000000000000000000000000000000000042 &&
-	 echo_without_newline 0000000000000000000000000000000000000084 ) |
-       git cat-file --batch-check)"
+	cat >expect <<-EOF &&
+	0000000000000000000000000000000000000042 missing
+	0000000000000000000000000000000000000084 missing
+	EOF
+
+	printf "0000000000000000000000000000000000000042\n0000000000000000000000000000000000000084" >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch for an existent and a non-existent hash" '
-    test "$tag_sha1 tag $tag_size
-$tag_content
-0000000000000000000000000000000000000000 missing" = \
-    "$( ( echo $tag_sha1 &&
-	 echo_without_newline 0000000000000000000000000000000000000000 ) |
-       git cat-file --batch)"
+	cat >expect <<-EOF &&
+	$tag_sha1 tag $tag_size
+	$tag_content
+	0000000000000000000000000000000000000000 missing
+	EOF
+
+	printf "$tag_sha1\n0000000000000000000000000000000000000000" >in &&
+	git cat-file --batch <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch-check for an empty line" '
-    test " missing" = "$(echo | git cat-file --batch-check)"
+	cat >expect <<-EOF &&
+	 missing
+	EOF
+
+	echo >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'empty --batch-check notices missing object' '
@@ -384,7 +403,8 @@ deadbeef missing
 
 test_expect_success '--batch with multiple sha1s gives correct format' '
 	echo "$batch_output" >expect &&
-	echo_without_newline "$batch_input" | git cat-file --batch >actual &&
+	echo_without_newline "$batch_input" >in &&
+	git cat-file --batch <in >actual &&
 	test_cmp expect actual
 '
 
@@ -411,13 +431,17 @@ deadbeef missing
  missing"
 
 test_expect_success "--batch-check with multiple sha1s gives correct format" '
-    test "$batch_check_output" = \
-    "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
+	echo "$batch_check_output" >expect &&
+	echo_without_newline "$batch_check_input" >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
-    echo_without_newline_nul "$batch_check_input" >in &&
-    test "$batch_check_output" = "$(git cat-file --batch-check -z <in)"
+	echo "$batch_check_output" >expect &&
+	echo_without_newline_nul "$batch_check_input" >in &&
+	git cat-file --batch-check -z <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
-- 
2.40.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
@ 2023-06-02 13:02 ` Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-02 13:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3114 bytes --]

Many of our commands support reading input that is separated either via
newlines or via NUL characters. Furthermore, in order to be a better
cross platform citizen, these commands typically know to strip the CRLF
sequence so that we also support reading newline-separated inputs on
e.g. the Windows platform. This results in the following kind of awkward
pattern:

```
struct strbuf input = STRBUF_INIT;

while (1) {
	int ret;

	if (nul_terminated)
		ret = strbuf_getline_nul(&input, stdin);
	else
		ret = strbuf_getline(&input, stdin);
	if (ret)
		break;

	...
}
```

Introduce a new CRLF-aware helper function that can read up to a user
specified delimiter. If the delimiter is `\n` the function knows to also
strip CRLF, otherwise it will only strip the specified delimiter. This
results in the following, much more readable code pattern:

```
struct strbuf input = STRBUF_INIT;

while (strbuf_getdelim_strip_crlf(&input, stdin, delim) != EOF) {
	...
}
```

The new function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 strbuf.c | 11 ++++++++---
 strbuf.h | 12 ++++++++++++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 08eec8f1d8..31dc48c0ab 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -721,11 +721,11 @@ static int strbuf_getdelim(struct strbuf *sb, FILE *fp, int term)
 	return 0;
 }
 
-int strbuf_getline(struct strbuf *sb, FILE *fp)
+int strbuf_getdelim_strip_crlf(struct strbuf *sb, FILE *fp, int term)
 {
-	if (strbuf_getwholeline(sb, fp, '\n'))
+	if (strbuf_getwholeline(sb, fp, term))
 		return EOF;
-	if (sb->buf[sb->len - 1] == '\n') {
+	if (term == '\n' && sb->buf[sb->len - 1] == '\n') {
 		strbuf_setlen(sb, sb->len - 1);
 		if (sb->len && sb->buf[sb->len - 1] == '\r')
 			strbuf_setlen(sb, sb->len - 1);
@@ -733,6 +733,11 @@ int strbuf_getline(struct strbuf *sb, FILE *fp)
 	return 0;
 }
 
+int strbuf_getline(struct strbuf *sb, FILE *fp)
+{
+	return strbuf_getdelim_strip_crlf(sb, fp, '\n');
+}
+
 int strbuf_getline_lf(struct strbuf *sb, FILE *fp)
 {
 	return strbuf_getdelim(sb, fp, '\n');
diff --git a/strbuf.h b/strbuf.h
index 3dfeadb44c..0e69b656bc 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -475,6 +475,18 @@ int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint);
  */
 ssize_t strbuf_write(struct strbuf *sb, FILE *stream);
 
+/**
+ * Read from a FILE * until the specified terminator is encountered,
+ * overwriting the existing contents of the strbuf.
+ *
+ * Reading stops after the terminator or at EOF.  The terminator is
+ * removed from the buffer before returning.  If the terminator is LF
+ * and if it is preceded by a CR, then the whole CRLF is stripped.
+ * Returns 0 unless there was nothing left before EOF, in which case
+ * it returns `EOF`.
+ */
+int strbuf_getdelim_strip_crlf(struct strbuf *sb, FILE *fp, int term);
+
 /**
  * Read a line from a FILE *, overwriting the existing contents of
  * the strbuf.  The strbuf_getline*() family of functions share
-- 
2.40.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] cat-file: simplify reading from standard input
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2023-06-02 13:02 ` [PATCH 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
@ 2023-06-02 13:02 ` Patrick Steinhardt
  2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-02 13:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3870 bytes --]

The batch modes of git-cat-file(1) read queries from stantard input that
are either newline- or NUL-delimited. This code was introduced via
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), which notes that:

"""
The refactoring here is slightly unfortunate, since we turn loops like:

     while (strbuf_getline(&buf, stdin) != EOF)

 into:

     while (1) {
         int ret;
         if (opt->nul_terminated)
             ret = strbuf_getline_nul(&input, stdin);
         else
             ret = strbuf_getline(&input, stdin);

         if (ret == EOF)
             break;
     }
"""

The commit proposed introducing a helper function that is easier to use,
which is just what we have done in the preceding commit. Refactor the
code to use this new helper to simplify the loop.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/cat-file.c | 32 +++++++++-----------------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0bafc14e6c..001dcb24d6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -42,7 +42,7 @@ struct batch_options {
 	int all_objects;
 	int unordered;
 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
-	int nul_terminated;
+	char input_delim;
 	const char *format;
 };
 
@@ -694,20 +694,12 @@ static void batch_objects_command(struct batch_options *opt,
 	struct queued_cmd *queued_cmd = NULL;
 	size_t alloc = 0, nr = 0;
 
-	while (1) {
-		int i, ret;
+	while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
+		int i;
 		const struct parse_cmd *cmd = NULL;
 		const char *p = NULL, *cmd_end;
 		struct queued_cmd call = {0};
 
-		if (opt->nul_terminated)
-			ret = strbuf_getline_nul(&input, stdin);
-		else
-			ret = strbuf_getline(&input, stdin);
-
-		if (ret)
-			break;
-
 		if (!input.len)
 			die(_("empty command in input"));
 		if (isspace(*input.buf))
@@ -851,16 +843,7 @@ static int batch_objects(struct batch_options *opt)
 		goto cleanup;
 	}
 
-	while (1) {
-		int ret;
-		if (opt->nul_terminated)
-			ret = strbuf_getline_nul(&input, stdin);
-		else
-			ret = strbuf_getline(&input, stdin);
-
-		if (ret == EOF)
-			break;
-
+	while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
 			 * Split at first whitespace, tying off the beginning
@@ -929,6 +912,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	const char *exp_type = NULL, *obj_name = NULL;
 	struct batch_options batch = {0};
 	int unknown_type = 0;
+	int input_nul_terminated = 0;
 
 	const char * const usage[] = {
 		N_("git cat-file <type> <object>"),
@@ -965,7 +949,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
-		OPT_BOOL('z', NULL, &batch.nul_terminated, N_("stdin is NUL-terminated")),
+		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
 		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
 			N_("read commands from stdin"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
@@ -1024,10 +1008,12 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	else if (batch.all_objects)
 		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
 			       "--batch-all-objects");
-	else if (batch.nul_terminated)
+	else if (input_nul_terminated)
 		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
 			       "-z");
 
+	batch.input_delim = input_nul_terminated ? '\0' : '\n';
+
 	/* Batch defaults */
 	if (batch.buffer_output < 0)
 		batch.buffer_output = batch.all_objects;
-- 
2.40.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2023-06-02 13:02 ` [PATCH 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
@ 2023-06-02 13:02 ` Patrick Steinhardt
  2023-06-05 15:47   ` Phillip Wood
  2023-06-06  1:23   ` Junio C Hamano
  2023-06-03  1:44 ` [PATCH 0/5] cat-file: introduce NUL-terminated output format Junio C Hamano
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
  6 siblings, 2 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-02 13:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 17453 bytes --]

In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with
`-z`, 2022-07-22), we have introduced a new mode to read the input via
NUL-delimited records instead of newline-delimited records. This allows
the user to query for revisions that have newlines in their path
component. While unusual, such queries are perfectly valid and thus it
is clear that we should be able to support them properly.

Unfortunately, the commit only changed the input to be NUL-delimited,
but didn't change the output at the same time. While this is fine for
queries that are processed successfully, it is less so for queries that
aren't. In the case of missing commits for example the result can become
entirely unparsable:

```
$ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
    git cat-file --batch -z
7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
1234567890

commit missing
```

This is of course a crafted query that is intentionally gaming the
deficiency, but more benign queries that contain newlines would have
similar problems.

Ideally, we should have also changed the output to be NUL-delimited when
`-z` is specified to avoid this problem. As the input is NUL-delimited,
it is clear that the output in this case cannot ever contain NUL
characters by itself. Furthermore, Git does not allow NUL characters in
revisions anyway, further stressing the point that using NUL-delimited
output is safe. The only exception is of course the object data itself,
but as git-cat-file(1) prints the size of the object data clients should
read until that specified size has been consumed.

But even though `-z` has only been introduced a few releases ago in Git
v2.38.0, changing the output format retroactively to also NUL-delimit
output would be a backwards incompatible change. And while one could
make the argument that the output is inherently broken already, we need
to assume that there are existing users out there that use it just fine
given that revisions containing newlines are quite exotic.

Instead, introduce a new option `-Z` that switches to NUL-delimited
input and output. The old `-z` option is marked as deprecated with a
hint that its output may become unparsable.

Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-cat-file.txt |  13 +++-
 builtin/cat-file.c             |  55 +++++++++------
 t/t1006-cat-file.sh            | 123 ++++++++++++++++++++++++---------
 3 files changed, 137 insertions(+), 54 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 411de2e27d..b1f48fdfb1 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 'git cat-file' (-t | -s) [--allow-unknown-type] <object>
 'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
 	     [--buffer] [--follow-symlinks] [--unordered]
-	     [--textconv | --filters] [-z]
+	     [--textconv | --filters] [-z] [-Z]
 'git cat-file' (--textconv | --filters)
 	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
 
@@ -246,6 +246,12 @@ respectively print:
 -z::
 	Only meaningful with `--batch`, `--batch-check`, or
 	`--batch-command`; input is NUL-delimited instead of
+	newline-delimited. This option is deprecated in favor of
+	`-Z` as the output can otherwise be ambiguous.
+
+-Z::
+	Only meaningful with `--batch`, `--batch-check`, or
+	`--batch-command`; input and output is NUL-delimited instead of
 	newline-delimited.
 
 
@@ -384,6 +390,11 @@ notdir SP <size> LF
 is printed when, during symlink resolution, a file is used as a
 directory name.
 
+Alternatively, when `-Z` is passed, the line feeds in any of the above examples
+are replaced with NUL terminators. This ensures that output will be parsable if
+the output itself would contain a linefeed and is thus recommended for
+scripting purposes.
+
 CAVEATS
 -------
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 001dcb24d6..90ef407d30 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -43,6 +43,7 @@ struct batch_options {
 	int unordered;
 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	char input_delim;
+	char output_delim;
 	const char *format;
 };
 
@@ -437,11 +438,12 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	}
 }
 
-static void print_default_format(struct strbuf *scratch, struct expand_data *data)
+static void print_default_format(struct strbuf *scratch, struct expand_data *data,
+				 struct batch_options *opt)
 {
-	strbuf_addf(scratch, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
+	strbuf_addf(scratch, "%s %s %"PRIuMAX"%c", oid_to_hex(&data->oid),
 		    type_name(data->type),
-		    (uintmax_t)data->size);
+		    (uintmax_t)data->size, opt->output_delim);
 }
 
 /*
@@ -470,8 +472,8 @@ static void batch_object_write(const char *obj_name,
 						       &data->oid, &data->info,
 						       OBJECT_INFO_LOOKUP_REPLACE);
 		if (ret < 0) {
-			printf("%s missing\n",
-			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			printf("%s missing%c",
+			       obj_name ? obj_name : oid_to_hex(&data->oid), opt->output_delim);
 			fflush(stdout);
 			return;
 		}
@@ -492,17 +494,18 @@ static void batch_object_write(const char *obj_name,
 	strbuf_reset(scratch);
 
 	if (!opt->format) {
-		print_default_format(scratch, data);
+		print_default_format(scratch, data, opt);
 	} else {
 		strbuf_expand(scratch, opt->format, expand_format, data);
-		strbuf_addch(scratch, '\n');
+		strbuf_addch(scratch, opt->output_delim);
 	}
 
 	batch_write(opt, scratch->buf, scratch->len);
 
 	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
+		char buf[] = {opt->output_delim};
 		print_object_or_die(opt, data);
-		batch_write(opt, "\n", 1);
+		batch_write(opt, buf, 1);
 	}
 }
 
@@ -520,22 +523,25 @@ static void batch_one_object(const char *obj_name,
 	if (result != FOUND) {
 		switch (result) {
 		case MISSING_OBJECT:
-			printf("%s missing\n", obj_name);
+			printf("%s missing%c", obj_name, opt->output_delim);
 			break;
 		case SHORT_NAME_AMBIGUOUS:
-			printf("%s ambiguous\n", obj_name);
+			printf("%s ambiguous%c", obj_name, opt->output_delim);
 			break;
 		case DANGLING_SYMLINK:
-			printf("dangling %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
+			printf("dangling %"PRIuMAX"%c%s%c",
+			       (uintmax_t)strlen(obj_name),
+			       opt->output_delim, obj_name, opt->output_delim);
 			break;
 		case SYMLINK_LOOP:
-			printf("loop %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
+			printf("loop %"PRIuMAX"%c%s%c",
+			       (uintmax_t)strlen(obj_name),
+			       opt->output_delim, obj_name, opt->output_delim);
 			break;
 		case NOT_DIR:
-			printf("notdir %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
+			printf("notdir %"PRIuMAX"%c%s%c",
+			       (uintmax_t)strlen(obj_name),
+			       opt->output_delim, obj_name, opt->output_delim);
 			break;
 		default:
 			BUG("unknown get_sha1_with_context result %d\n",
@@ -547,9 +553,9 @@ static void batch_one_object(const char *obj_name,
 	}
 
 	if (ctx.mode == 0) {
-		printf("symlink %"PRIuMAX"\n%s\n",
+		printf("symlink %"PRIuMAX"%c%s%c",
 		       (uintmax_t)ctx.symlink_path.len,
-		       ctx.symlink_path.buf);
+		       opt->output_delim, ctx.symlink_path.buf, opt->output_delim);
 		fflush(stdout);
 		return;
 	}
@@ -913,6 +919,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	struct batch_options batch = {0};
 	int unknown_type = 0;
 	int input_nul_terminated = 0;
+	int nul_terminated = 0;
 
 	const char * const usage[] = {
 		N_("git cat-file <type> <object>"),
@@ -920,7 +927,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
 		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
-		   "             [--textconv | --filters] [-z]"),
+		   "             [--textconv | --filters] [-z] [-Z]"),
 		N_("git cat-file (--textconv | --filters)\n"
 		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
 		NULL
@@ -950,6 +957,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
 		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
+		OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),
 		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
 			N_("read commands from stdin"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
@@ -1011,8 +1019,15 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	else if (input_nul_terminated)
 		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
 			       "-z");
+	else if (nul_terminated)
+		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
+			       "-Z");
 
-	batch.input_delim = input_nul_terminated ? '\0' : '\n';
+	batch.input_delim = batch.output_delim = '\n';
+	if (input_nul_terminated)
+		batch.input_delim = '\0';
+	if (nul_terminated)
+		batch.input_delim = batch.output_delim = '\0';
 
 	/* Batch defaults */
 	if (batch.buffer_output < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7b985cfded..d73a0be1b9 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -89,7 +89,8 @@ done
 for opt in --buffer \
 	--follow-symlinks \
 	--batch-all-objects \
-	-z
+	-z \
+	-Z
 do
 	test_expect_success "usage: bad option combination: $opt without batch mode" '
 		test_incompatible_usage git cat-file $opt &&
@@ -392,17 +393,18 @@ deadbeef
 
 "
 
-batch_output="$hello_sha1 blob $hello_size
-$hello_content
-$commit_sha1 commit $commit_size
-$commit_content
-$tag_sha1 tag $tag_size
-$tag_content
-deadbeef missing
- missing"
+printf "%s\0" \
+	"$hello_sha1 blob $hello_size" \
+	"$hello_content" \
+	"$commit_sha1 commit $commit_size" \
+	"$commit_content" \
+	"$tag_sha1 tag $tag_size" \
+	"$tag_content" \
+	"deadbeef missing" \
+	" missing" >batch_output
 
 test_expect_success '--batch with multiple sha1s gives correct format' '
-	echo "$batch_output" >expect &&
+	tr "\0" "\n" <batch_output >expect &&
 	echo_without_newline "$batch_input" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
@@ -410,11 +412,17 @@ test_expect_success '--batch with multiple sha1s gives correct format' '
 
 test_expect_success '--batch, -z with multiple sha1s gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
-	echo "$batch_output" >expect &&
+	tr "\0" "\n" <batch_output >expect &&
 	git cat-file --batch -z <in >actual &&
 	test_cmp expect actual
 '
 
+test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
+	echo_without_newline_nul "$batch_input" >in &&
+	git cat-file --batch -Z <in >actual &&
+	test_cmp batch_output actual
+'
+
 batch_check_input="$hello_sha1
 $tree_sha1
 $commit_sha1
@@ -423,40 +431,55 @@ deadbeef
 
 "
 
-batch_check_output="$hello_sha1 blob $hello_size
-$tree_sha1 tree $tree_size
-$commit_sha1 commit $commit_size
-$tag_sha1 tag $tag_size
-deadbeef missing
- missing"
+printf "%s\0" \
+	"$hello_sha1 blob $hello_size" \
+	"$tree_sha1 tree $tree_size" \
+	"$commit_sha1 commit $commit_size" \
+	"$tag_sha1 tag $tag_size" \
+	"deadbeef missing" \
+	" missing" >batch_check_output
 
 test_expect_success "--batch-check with multiple sha1s gives correct format" '
-	echo "$batch_check_output" >expect &&
+	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline "$batch_check_input" >in &&
 	git cat-file --batch-check <in >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
-	echo "$batch_check_output" >expect &&
+	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -z <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
+test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
+	echo_without_newline_nul "$batch_check_input" >in &&
+	git cat-file --batch-check -Z <in >actual &&
+	test_cmp batch_check_output actual
+'
+
+test_expect_success FUNNYNAMES 'setup with newline in input' '
 	touch -- "newline${LF}embedded" &&
 	git add -- "newline${LF}embedded" &&
 	git commit -m "file with newline embedded" &&
 	test_tick &&
 
-	printf "HEAD:newline${LF}embedded" >in &&
-	git cat-file --batch-check -z <in >actual &&
+	printf "HEAD:newline${LF}embedded" >in
+'
 
+test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
+	git cat-file --batch-check -z <in >actual &&
 	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
 	test_cmp expect actual
 '
 
+test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
+	git cat-file --batch-check -Z <in >actual &&
+	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
+	test_cmp expect actual
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
@@ -480,7 +503,13 @@ test_expect_success '--batch-command with multiple info calls gives correct form
 	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
 	git cat-file --batch-command --buffer -z <in >actual &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
+	tr "\n" "\0" <expect >expect_nul &&
+	git cat-file --batch-command --buffer -Z <in >actual &&
+
+	test_cmp expect_nul actual
 '
 
 batch_command_multiple_contents="contents $hello_sha1
@@ -490,15 +519,15 @@ contents deadbeef
 flush"
 
 test_expect_success '--batch-command with multiple command calls gives correct format' '
-	cat >expect <<-EOF &&
-	$hello_sha1 blob $hello_size
-	$hello_content
-	$commit_sha1 commit $commit_size
-	$commit_content
-	$tag_sha1 tag $tag_size
-	$tag_content
-	deadbeef missing
-	EOF
+	printf "%s\0" \
+		"$hello_sha1 blob $hello_size" \
+		"$hello_content" \
+		"$commit_sha1 commit $commit_size" \
+		"$commit_content" \
+		"$tag_sha1 tag $tag_size" \
+		"$tag_content" \
+		"deadbeef missing" >expect_nul &&
+	tr "\0" "\n" <expect_nul >expect &&
 
 	echo "$batch_command_multiple_contents" >in &&
 	git cat-file --batch-command --buffer <in >actual &&
@@ -508,7 +537,12 @@ test_expect_success '--batch-command with multiple command calls gives correct f
 	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
 	git cat-file --batch-command --buffer -z <in >actual &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
+	git cat-file --batch-command --buffer -Z <in >actual &&
+
+	test_cmp expect_nul actual
 '
 
 test_expect_success 'setup blobs which are likely to delta' '
@@ -848,6 +882,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for brok
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for broken in-repo, same-dir links' '
+	printf "HEAD:broken-same-dir-link\0" >in &&
+	printf "dangling 25\0HEAD:broken-same-dir-link\0" >expect &&
+	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git cat-file --batch-check --follow-symlinks works for same-dir links-to-links' '
 	echo HEAD:link-to-link | git cat-file --batch-check --follow-symlinks >actual &&
 	test_cmp found actual
@@ -862,6 +903,15 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for pare
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for parent-dir links' '
+	echo HEAD:dir/parent-dir-link | git cat-file --batch-check --follow-symlinks >actual &&
+	test_cmp found actual &&
+	printf "notdir 29\0HEAD:dir/parent-dir-link/nope\0" >expect &&
+	printf "HEAD:dir/parent-dir-link/nope\0" >in &&
+	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git cat-file --batch-check --follow-symlinks works for .. links' '
 	echo dangling 22 >expect &&
 	echo HEAD:dir/link-dir/nope >>expect &&
@@ -976,6 +1026,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlink breaks loops' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch-check --follow-symlink -Z breaks loops' '
+	printf "loop 10\0HEAD:loop1\0" >expect &&
+	printf "HEAD:loop1\0" >in &&
+	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git cat-file --batch --follow-symlink returns correct sha and mode' '
 	echo HEAD:morx | git cat-file --batch >expect &&
 	echo HEAD:morx | git cat-file --batch --follow-symlinks >actual &&
-- 
2.40.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] cat-file: introduce NUL-terminated output format
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
@ 2023-06-03  1:44 ` Junio C Hamano
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
  6 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-06-03  1:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Toon Claes, Phillip Wood

Patrick Steinhardt <ps@pks.im> writes:

> Note that Toon has sent a patch series for this issue to address the
> same issue, see the thread starting at [1]. I've collaborated with him
> internally at GitLab to arrive at this new patch series which thus
> effectively supersedes the patches he has sent. The approach is also
> different: while his patches start quoting the output, the approach
> chosen by my series only changes the lines to be NUL terminated. This
> should make it easier to use for scripted purposes compared to having to
> de-quote the input.

OK.  Will try to drop the other topic and queue this one instead.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
@ 2023-06-05 15:47   ` Phillip Wood
  2023-06-05 23:54     ` Junio C Hamano
  2023-06-06  5:00     ` Patrick Steinhardt
  2023-06-06  1:23   ` Junio C Hamano
  1 sibling, 2 replies; 22+ messages in thread
From: Phillip Wood @ 2023-06-05 15:47 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau, Toon Claes, Junio C Hamano

Hi Patrick

On 02/06/2023 14:02, Patrick Steinhardt wrote:
> In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with
> `-z`, 2022-07-22), we have introduced a new mode to read the input via
> NUL-delimited records instead of newline-delimited records. This allows
> the user to query for revisions that have newlines in their path
> component. While unusual, such queries are perfectly valid and thus it
> is clear that we should be able to support them properly.
> 
> Unfortunately, the commit only changed the input to be NUL-delimited,
> but didn't change the output at the same time. While this is fine for
> queries that are processed successfully, it is less so for queries that
> aren't. In the case of missing commits for example the result can become
> entirely unparsable:
> 
> ```
> $ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
>      git cat-file --batch -z
> 7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
> 1234567890
> 
> commit missing
> ```
> 
> This is of course a crafted query that is intentionally gaming the
> deficiency, but more benign queries that contain newlines would have
> similar problems.
> 
> Ideally, we should have also changed the output to be NUL-delimited when
> `-z` is specified to avoid this problem. As the input is NUL-delimited,
> it is clear that the output in this case cannot ever contain NUL
> characters by itself. Furthermore, Git does not allow NUL characters in
> revisions anyway, further stressing the point that using NUL-delimited
> output is safe. The only exception is of course the object data itself,
> but as git-cat-file(1) prints the size of the object data clients should
> read until that specified size has been consumed.
> 
> But even though `-z` has only been introduced a few releases ago in Git
> v2.38.0, changing the output format retroactively to also NUL-delimit
> output would be a backwards incompatible change. And while one could
> make the argument that the output is inherently broken already, we need
> to assume that there are existing users out there that use it just fine
> given that revisions containing newlines are quite exotic.
> 
> Instead, introduce a new option `-Z` that switches to NUL-delimited
> input and output. The old `-z` option is marked as deprecated with a
> hint that its output may become unparsable.

The commit message explains the problem well, I agree adding a new 
option is the cleanest solution.

> Co-authored-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>   Documentation/git-cat-file.txt |  13 +++-
>   builtin/cat-file.c             |  55 +++++++++------
>   t/t1006-cat-file.sh            | 123 ++++++++++++++++++++++++---------
>   3 files changed, 137 insertions(+), 54 deletions(-)
> 
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 411de2e27d..b1f48fdfb1 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
>   'git cat-file' (-t | -s) [--allow-unknown-type] <object>
>   'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
>   	     [--buffer] [--follow-symlinks] [--unordered]
> -	     [--textconv | --filters] [-z]
> +	     [--textconv | --filters] [-z] [-Z]
>   'git cat-file' (--textconv | --filters)
>   	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
>   
> @@ -246,6 +246,12 @@ respectively print:
>   -z::
>   	Only meaningful with `--batch`, `--batch-check`, or
>   	`--batch-command`; input is NUL-delimited instead of
> +	newline-delimited. This option is deprecated in favor of
> +	`-Z` as the output can otherwise be ambiguous.
> +
> +-Z::
> +	Only meaningful with `--batch`, `--batch-check`, or
> +	`--batch-command`; input and output is NUL-delimited instead of
>   	newline-delimited.

The documentation changes look good. I wonder if we should put the 
documentation for "-Z" above "-z" so users see the preferred option first.

>   
> @@ -384,6 +390,11 @@ notdir SP <size> LF
>   is printed when, during symlink resolution, a file is used as a
>   directory name.
>   
> +Alternatively, when `-Z` is passed, the line feeds in any of the above examples
> +are replaced with NUL terminators. This ensures that output will be parsable if
> +the output itself would contain a linefeed and is thus recommended for
> +scripting purposes.
> +
>   CAVEATS
>   -------
>   
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 001dcb24d6..90ef407d30 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -492,17 +494,18 @@ static void batch_object_write(const char *obj_name,
>   	strbuf_reset(scratch);
>   
>   	if (!opt->format) {
> -		print_default_format(scratch, data);
> +		print_default_format(scratch, data, opt);
>   	} else {
>   		strbuf_expand(scratch, opt->format, expand_format, data);
> -		strbuf_addch(scratch, '\n');
> +		strbuf_addch(scratch, opt->output_delim);
>   	}
>   
>   	batch_write(opt, scratch->buf, scratch->len);
>   
>   	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
> +		char buf[] = {opt->output_delim};

I found this a bit confusing, I think it would be clearer just to do

	batch_write(opt, &opt->output_delim, 1);

>   		print_object_or_die(opt, data);
> -		batch_write(opt, "\n", 1);
> +		batch_write(opt, buf, 1);
>   	}
>   }

> @@ -920,7 +927,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>   		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
>   		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
>   		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
> -		   "             [--textconv | --filters] [-z]"),
> +		   "             [--textconv | --filters] [-z] [-Z]"),

If we're recommending that people don't use '-z' then maybe we should 
remove it from the synopsis and add OPT_HIDDEN to it below.

>   		N_("git cat-file (--textconv | --filters)\n"
>   		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
>   		NULL
> @@ -950,6 +957,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>   			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>   			batch_option_callback),
>   		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
> +		OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 7b985cfded..d73a0be1b9 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -392,17 +393,18 @@ deadbeef
>   
>   "
>   
> -batch_output="$hello_sha1 blob $hello_size
> -$hello_content
> -$commit_sha1 commit $commit_size
> -$commit_content
> -$tag_sha1 tag $tag_size
> -$tag_content
> -deadbeef missing
> - missing"
> +printf "%s\0" \
> +	"$hello_sha1 blob $hello_size" \
> +	"$hello_content" \
> +	"$commit_sha1 commit $commit_size" \
> +	"$commit_content" \
> +	"$tag_sha1 tag $tag_size" \
> +	"$tag_content" \
> +	"deadbeef missing" \
> +	" missing" >batch_output

I think writing the expected output to a file is a good change as we 
always use it with test_cmp. As "-z" is deprecated I think it makes 
sense to model the expected output for "-Z" and use tr for the "-z" 
tests as you have done here. It looks like we have good coverage of the 
new option.

Thanks for working on this

Phillip

>   test_expect_success '--batch with multiple sha1s gives correct format' '
> -	echo "$batch_output" >expect &&
> +	tr "\0" "\n" <batch_output >expect &&
>   	echo_without_newline "$batch_input" >in &&
>   	git cat-file --batch <in >actual &&
>   	test_cmp expect actual
> @@ -410,11 +412,17 @@ test_expect_success '--batch with multiple sha1s gives correct format' '
>   
>   test_expect_success '--batch, -z with multiple sha1s gives correct format' '
>   	echo_without_newline_nul "$batch_input" >in &&
> -	echo "$batch_output" >expect &&
> +	tr "\0" "\n" <batch_output >expect &&
>   	git cat-file --batch -z <in >actual &&
>   	test_cmp expect actual
>   '
>   
> +test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
> +	echo_without_newline_nul "$batch_input" >in &&
> +	git cat-file --batch -Z <in >actual &&
> +	test_cmp batch_output actual
> +'
> +
>   batch_check_input="$hello_sha1
>   $tree_sha1
>   $commit_sha1
> @@ -423,40 +431,55 @@ deadbeef
>   
>   "
>   
> -batch_check_output="$hello_sha1 blob $hello_size
> -$tree_sha1 tree $tree_size
> -$commit_sha1 commit $commit_size
> -$tag_sha1 tag $tag_size
> -deadbeef missing
> - missing"
> +printf "%s\0" \
> +	"$hello_sha1 blob $hello_size" \
> +	"$tree_sha1 tree $tree_size" \
> +	"$commit_sha1 commit $commit_size" \
> +	"$tag_sha1 tag $tag_size" \
> +	"deadbeef missing" \
> +	" missing" >batch_check_output
>   
>   test_expect_success "--batch-check with multiple sha1s gives correct format" '
> -	echo "$batch_check_output" >expect &&
> +	tr "\0" "\n" <batch_check_output >expect &&
>   	echo_without_newline "$batch_check_input" >in &&
>   	git cat-file --batch-check <in >actual &&
>   	test_cmp expect actual
>   '
>   
>   test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
> -	echo "$batch_check_output" >expect &&
> +	tr "\0" "\n" <batch_check_output >expect &&
>   	echo_without_newline_nul "$batch_check_input" >in &&
>   	git cat-file --batch-check -z <in >actual &&
>   	test_cmp expect actual
>   '
>   
> -test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> +test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
> +	echo_without_newline_nul "$batch_check_input" >in &&
> +	git cat-file --batch-check -Z <in >actual &&
> +	test_cmp batch_check_output actual
> +'
> +
> +test_expect_success FUNNYNAMES 'setup with newline in input' '
>   	touch -- "newline${LF}embedded" &&
>   	git add -- "newline${LF}embedded" &&
>   	git commit -m "file with newline embedded" &&
>   	test_tick &&
>   
> -	printf "HEAD:newline${LF}embedded" >in &&
> -	git cat-file --batch-check -z <in >actual &&
> +	printf "HEAD:newline${LF}embedded" >in
> +'
>   
> +test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> +	git cat-file --batch-check -z <in >actual &&
>   	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
>   	test_cmp expect actual
>   '
>   
> +test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
> +	git cat-file --batch-check -Z <in >actual &&
> +	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
> +	test_cmp expect actual
> +'
> +
>   batch_command_multiple_info="info $hello_sha1
>   info $tree_sha1
>   info $commit_sha1
> @@ -480,7 +503,13 @@ test_expect_success '--batch-command with multiple info calls gives correct form
>   	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
>   	git cat-file --batch-command --buffer -z <in >actual &&
>   
> -	test_cmp expect actual
> +	test_cmp expect actual &&
> +
> +	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
> +	tr "\n" "\0" <expect >expect_nul &&
> +	git cat-file --batch-command --buffer -Z <in >actual &&
> +
> +	test_cmp expect_nul actual
>   '
>   
>   batch_command_multiple_contents="contents $hello_sha1
> @@ -490,15 +519,15 @@ contents deadbeef
>   flush"
>   
>   test_expect_success '--batch-command with multiple command calls gives correct format' '
> -	cat >expect <<-EOF &&
> -	$hello_sha1 blob $hello_size
> -	$hello_content
> -	$commit_sha1 commit $commit_size
> -	$commit_content
> -	$tag_sha1 tag $tag_size
> -	$tag_content
> -	deadbeef missing
> -	EOF
> +	printf "%s\0" \
> +		"$hello_sha1 blob $hello_size" \
> +		"$hello_content" \
> +		"$commit_sha1 commit $commit_size" \
> +		"$commit_content" \
> +		"$tag_sha1 tag $tag_size" \
> +		"$tag_content" \
> +		"deadbeef missing" >expect_nul &&
> +	tr "\0" "\n" <expect_nul >expect &&
>   
>   	echo "$batch_command_multiple_contents" >in &&
>   	git cat-file --batch-command --buffer <in >actual &&
> @@ -508,7 +537,12 @@ test_expect_success '--batch-command with multiple command calls gives correct f
>   	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
>   	git cat-file --batch-command --buffer -z <in >actual &&
>   
> -	test_cmp expect actual
> +	test_cmp expect actual &&
> +
> +	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
> +	git cat-file --batch-command --buffer -Z <in >actual &&
> +
> +	test_cmp expect_nul actual
>   '
>   
>   test_expect_success 'setup blobs which are likely to delta' '
> @@ -848,6 +882,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for brok
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for broken in-repo, same-dir links' '
> +	printf "HEAD:broken-same-dir-link\0" >in &&
> +	printf "dangling 25\0HEAD:broken-same-dir-link\0" >expect &&
> +	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'git cat-file --batch-check --follow-symlinks works for same-dir links-to-links' '
>   	echo HEAD:link-to-link | git cat-file --batch-check --follow-symlinks >actual &&
>   	test_cmp found actual
> @@ -862,6 +903,15 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for pare
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for parent-dir links' '
> +	echo HEAD:dir/parent-dir-link | git cat-file --batch-check --follow-symlinks >actual &&
> +	test_cmp found actual &&
> +	printf "notdir 29\0HEAD:dir/parent-dir-link/nope\0" >expect &&
> +	printf "HEAD:dir/parent-dir-link/nope\0" >in &&
> +	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'git cat-file --batch-check --follow-symlinks works for .. links' '
>   	echo dangling 22 >expect &&
>   	echo HEAD:dir/link-dir/nope >>expect &&
> @@ -976,6 +1026,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlink breaks loops' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'git cat-file --batch-check --follow-symlink -Z breaks loops' '
> +	printf "loop 10\0HEAD:loop1\0" >expect &&
> +	printf "HEAD:loop1\0" >in &&
> +	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_expect_success 'git cat-file --batch --follow-symlink returns correct sha and mode' '
>   	echo HEAD:morx | git cat-file --batch >expect &&
>   	echo HEAD:morx | git cat-file --batch --follow-symlinks >actual &&

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-05 15:47   ` Phillip Wood
@ 2023-06-05 23:54     ` Junio C Hamano
  2023-06-06  4:52       ` Patrick Steinhardt
  2023-06-06  5:00     ` Patrick Steinhardt
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-06-05 23:54 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Patrick Steinhardt, git, Taylor Blau, Toon Claes

Phillip Wood <phillip.wood123@gmail.com> writes:

>> Instead, introduce a new option `-Z` that switches to NUL-delimited
>> input and output. The old `-z` option is marked as deprecated with a
>> hint that its output may become unparsable.
>
> The commit message explains the problem well, I agree adding a new
> option is the cleanest solution.
>
>...
>>   @@ -246,6 +246,12 @@ respectively print:
>>   -z::
>>   	Only meaningful with `--batch`, `--batch-check`, or
>>   	`--batch-command`; input is NUL-delimited instead of
>> +	newline-delimited. This option is deprecated in favor of
>> +	`-Z` as the output can otherwise be ambiguous.
>> +
>> +-Z::
>> +	Only meaningful with `--batch`, `--batch-check`, or
>> +	`--batch-command`; input and output is NUL-delimited instead of
>>   	newline-delimited.
>
> The documentation changes look good. I wonder if we should put the
> documentation for "-Z" above "-z" so users see the preferred option
> first.

Hmph, I expected "-z" and "-Z" to be orthogonal, the former
controlling how input records are delimited, the latter controlling
how output records are delimited, as it usually is a good idea to
keep things that could be orthogonal to be orthogonal to avoid
unnecessarily robbing users flexibility.  "-Z is a new way that is
preferred over -z" was something I did not expect, actually.

I am not outright rejecting such a deliberately limiting design, but
I'll have to think about it a bit.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
  2023-06-05 15:47   ` Phillip Wood
@ 2023-06-06  1:23   ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-06-06  1:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Toon Claes, Phillip Wood

Patrick Steinhardt <ps@pks.im> writes:

> Subject: Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters

If you are going to make "-Z" and "-z" not orthogonal, then this
should say

    cat-file: introduce -Z option to delimit both input and output with ...

or something.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-05 23:54     ` Junio C Hamano
@ 2023-06-06  4:52       ` Patrick Steinhardt
  2023-06-06  5:22         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  4:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, git, Taylor Blau, Toon Claes

[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]

On Tue, Jun 06, 2023 at 08:54:16AM +0900, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
> >> Instead, introduce a new option `-Z` that switches to NUL-delimited
> >> input and output. The old `-z` option is marked as deprecated with a
> >> hint that its output may become unparsable.
> >
> > The commit message explains the problem well, I agree adding a new
> > option is the cleanest solution.
> >
> >...
> >>   @@ -246,6 +246,12 @@ respectively print:
> >>   -z::
> >>   	Only meaningful with `--batch`, `--batch-check`, or
> >>   	`--batch-command`; input is NUL-delimited instead of
> >> +	newline-delimited. This option is deprecated in favor of
> >> +	`-Z` as the output can otherwise be ambiguous.
> >> +
> >> +-Z::
> >> +	Only meaningful with `--batch`, `--batch-check`, or
> >> +	`--batch-command`; input and output is NUL-delimited instead of
> >>   	newline-delimited.
> >
> > The documentation changes look good. I wonder if we should put the
> > documentation for "-Z" above "-z" so users see the preferred option
> > first.
> 
> Hmph, I expected "-z" and "-Z" to be orthogonal, the former
> controlling how input records are delimited, the latter controlling
> how output records are delimited, as it usually is a good idea to
> keep things that could be orthogonal to be orthogonal to avoid
> unnecessarily robbing users flexibility.  "-Z is a new way that is
> preferred over -z" was something I did not expect, actually.
> 
> I am not outright rejecting such a deliberately limiting design, but
> I'll have to think about it a bit.

Well, the way I see it we shouldn't have ever decoupled the input and
output format, and I consider it a bug that `-z` did as it makes it
unusable with arbitrary input. So I'd rather be helping the user of
these modes to do the right thing and NUL-delimit both input and output
than running into these edge cases later down the road.

That being said I'd be fine to change this series to mean "-Z changes
stdout" if you insist. In that case we should be pointing out in our
documentation that "You should never use `-z` without `-Z` when you
process arbitrary input".

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-05 15:47   ` Phillip Wood
  2023-06-05 23:54     ` Junio C Hamano
@ 2023-06-06  5:00     ` Patrick Steinhardt
  1 sibling, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:00 UTC (permalink / raw)
  To: phillip.wood; +Cc: git, Taylor Blau, Toon Claes, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4261 bytes --]

On Mon, Jun 05, 2023 at 04:47:14PM +0100, Phillip Wood wrote:
> > @@ -384,6 +390,11 @@ notdir SP <size> LF
> >   is printed when, during symlink resolution, a file is used as a
> >   directory name.
> >   
> > +Alternatively, when `-Z` is passed, the line feeds in any of the above examples
> > +are replaced with NUL terminators. This ensures that output will be parsable if
> > +the output itself would contain a linefeed and is thus recommended for
> > +scripting purposes.
> > +
> >   CAVEATS
> >   -------
> >   
> > diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> > index 001dcb24d6..90ef407d30 100644
> > --- a/builtin/cat-file.c
> > +++ b/builtin/cat-file.c
> > @@ -492,17 +494,18 @@ static void batch_object_write(const char *obj_name,
> >   	strbuf_reset(scratch);
> >   
> >   	if (!opt->format) {
> > -		print_default_format(scratch, data);
> > +		print_default_format(scratch, data, opt);
> >   	} else {
> >   		strbuf_expand(scratch, opt->format, expand_format, data);
> > -		strbuf_addch(scratch, '\n');
> > +		strbuf_addch(scratch, opt->output_delim);
> >   	}
> >   
> >   	batch_write(opt, scratch->buf, scratch->len);
> >   
> >   	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
> > +		char buf[] = {opt->output_delim};
> 
> I found this a bit confusing, I think it would be clearer just to do
> 
> 	batch_write(opt, &opt->output_delim, 1);

Agreed, that's cleaner.

> >   		print_object_or_die(opt, data);
> > -		batch_write(opt, "\n", 1);
> > +		batch_write(opt, buf, 1);
> >   	}
> >   }
> 
> > @@ -920,7 +927,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
> >   		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
> >   		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
> >   		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
> > -		   "             [--textconv | --filters] [-z]"),
> > +		   "             [--textconv | --filters] [-z] [-Z]"),
> 
> If we're recommending that people don't use '-z' then maybe we should 
> remove it from the synopsis and add OPT_HIDDEN to it below.

I might still change this depending on the conclusion Junio and I will
arrive at, but for now I agree that this makes sense.

> >   		N_("git cat-file (--textconv | --filters)\n"
> >   		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
> >   		NULL
> > @@ -950,6 +957,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
> >   			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
> >   			batch_option_callback),
> >   		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
> > +		OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),
> 
> > diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> > index 7b985cfded..d73a0be1b9 100755
> > --- a/t/t1006-cat-file.sh
> > +++ b/t/t1006-cat-file.sh
> > @@ -392,17 +393,18 @@ deadbeef
> >   
> >   "
> >   
> > -batch_output="$hello_sha1 blob $hello_size
> > -$hello_content
> > -$commit_sha1 commit $commit_size
> > -$commit_content
> > -$tag_sha1 tag $tag_size
> > -$tag_content
> > -deadbeef missing
> > - missing"
> > +printf "%s\0" \
> > +	"$hello_sha1 blob $hello_size" \
> > +	"$hello_content" \
> > +	"$commit_sha1 commit $commit_size" \
> > +	"$commit_content" \
> > +	"$tag_sha1 tag $tag_size" \
> > +	"$tag_content" \
> > +	"deadbeef missing" \
> > +	" missing" >batch_output
> 
> I think writing the expected output to a file is a good change as we 
> always use it with test_cmp. As "-z" is deprecated I think it makes 
> sense to model the expected output for "-Z" and use tr for the "-z" 
> tests as you have done here. It looks like we have good coverage of the 
> new option.

It's actually also required in order to not have to specify the expected
output twice. While we could leave this as-is, translating it to be NUL
terminated via `tr \n \0` doesn't work as the output contains newlines
in places where we don't want to translate them to NUL delimiters. And
storing the NUL-delimited string in a variable doesn't work either as
shells will truncate the C strings.

Thanks for your review!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/5] catfile: introduce NUL-terminated output format
  2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2023-06-03  1:44 ` [PATCH 0/5] cat-file: introduce NUL-terminated output format Junio C Hamano
@ 2023-06-06  5:19 ` Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
                     ` (5 more replies)
  6 siblings, 6 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7009 bytes --]

Hi,

this is the second version of my patch series that introduces a new NUL
terminated output format to git-cat-file(1) in order to make its output
unambiguously parsable in the case where the input contains newlines.

Changes compared to v1:

    - Improved the commit subject of v5 to mention that the new option
      changes both input and output to be NUL delimited.

    - Extended the commit message to make a better case for why `-Z`
      changes both input and output format to be NUL terminated, instead
      of having `-z` change the input and `-Z` change the output format.

    - The `-Z` option is now sorted before `-z` in git-cat-file(1).

    - The `-z` option is now deprecated "even more", where it is hidden
      from the synopsis as well as from the `-h` output.

    - A small change to pass the delimiter to `batch_write()` directly
      instead of storing it in a temporary array first.

Thanks for your feedback, Junio and Phillip!

Patrick

Patrick Steinhardt (5):
  t1006: don't strip timestamps from expected results
  t1006: modernize test style to use `test_cmp`
  strbuf: provide CRLF-aware helper to read until a specified delimiter
  cat-file: simplify reading from standard input
  cat-file: introduce option to delimit input and output with NUL

 Documentation/git-cat-file.txt |  15 +-
 builtin/cat-file.c             |  85 +++++------
 strbuf.c                       |  11 +-
 strbuf.h                       |  12 ++
 t/t1006-cat-file.sh            | 249 +++++++++++++++++++++------------
 5 files changed, 233 insertions(+), 139 deletions(-)

Range-diff against v1:
1:  5c8b4a1d70 = 1:  5c8b4a1d70 t1006: don't strip timestamps from expected results
2:  251fc2a387 = 2:  251fc2a387 t1006: modernize test style to use `test_cmp`
3:  8127eeac97 = 3:  8127eeac97 strbuf: provide CRLF-aware helper to read until a specified delimiter
4:  e7cba8dc4c = 4:  e7cba8dc4c cat-file: simplify reading from standard input
5:  07a7c34615 ! 5:  79ed618c84 cat-file: Introduce new option to delimit output with NUL characters
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    cat-file: Introduce new option to delimit output with NUL characters
    +    cat-file: introduce option to delimit input and output with NUL
     
         In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with
         `-z`, 2022-07-22), we have introduced a new mode to read the input via
    @@ Commit message
         given that revisions containing newlines are quite exotic.
     
         Instead, introduce a new option `-Z` that switches to NUL-delimited
    -    input and output. The old `-z` option is marked as deprecated with a
    -    hint that its output may become unparsable.
    +    input and output. While this new option could arguably only switch the
    +    output format to be NUL-delimited, the consequence would be that users
    +    have to always specify both `-z` and `-Z` when the input may contain
    +    newlines. On the other hand, if the user knows that there never will be
    +    newlines in the input, they don't have to use either of those options.
    +    There is thus no usecase that would warrant treating input and output
    +    format separately, which is why we instead opt to "do the right thing"
    +    and have `-Z` mean to NUL-terminate both formats.
    +
    +    The old `-z` option is marked as deprecated with a hint that its output
    +    may become unparsable. It is thus hidden both from the synopsis as well
    +    as the command's help output.
     
         Co-authored-by: Toon Claes <toon@iotcl.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
    @@ Documentation/git-cat-file.txt: SYNOPSIS
      'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
      	     [--buffer] [--follow-symlinks] [--unordered]
     -	     [--textconv | --filters] [-z]
    -+	     [--textconv | --filters] [-z] [-Z]
    ++	     [--textconv | --filters] [-Z]
      'git cat-file' (--textconv | --filters)
      	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
      
     @@ Documentation/git-cat-file.txt: respectively print:
    - -z::
    - 	Only meaningful with `--batch`, `--batch-check`, or
    - 	`--batch-command`; input is NUL-delimited instead of
    -+	newline-delimited. This option is deprecated in favor of
    -+	`-Z` as the output can otherwise be ambiguous.
    -+
    + 	/etc/passwd
    + --
    + 
     +-Z::
     +	Only meaningful with `--batch`, `--batch-check`, or
     +	`--batch-command`; input and output is NUL-delimited instead of
    - 	newline-delimited.
    ++	newline-delimited.
    ++
    + -z::
    + 	Only meaningful with `--batch`, `--batch-check`, or
    + 	`--batch-command`; input is NUL-delimited instead of
    +-	newline-delimited.
    ++	newline-delimited. This option is deprecated in favor of
    ++	`-Z` as the output can otherwise be ambiguous.
      
      
    + OUTPUT
     @@ Documentation/git-cat-file.txt: notdir SP <size> LF
      is printed when, during symlink resolution, a file is used as a
      directory name.
    @@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
      	batch_write(opt, scratch->buf, scratch->len);
      
      	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
    -+		char buf[] = {opt->output_delim};
      		print_object_or_die(opt, data);
     -		batch_write(opt, "\n", 1);
    -+		batch_write(opt, buf, 1);
    ++		batch_write(opt, &opt->output_delim, 1);
      	}
      }
      
    @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *pr
      		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
      		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
     -		   "             [--textconv | --filters] [-z]"),
    -+		   "             [--textconv | --filters] [-z] [-Z]"),
    ++		   "             [--textconv | --filters] [-Z]"),
      		N_("git cat-file (--textconv | --filters)\n"
      		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
      		NULL
     @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *prefix)
    + 			N_("like --batch, but don't emit <contents>"),
      			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
      			batch_option_callback),
    - 		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
    +-		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
    ++		OPT_BOOL_F('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated"),
    ++			PARSE_OPT_HIDDEN),
     +		OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),
      		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
      			N_("read commands from stdin"),
-- 
2.41.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/5] t1006: don't strip timestamps from expected results
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
@ 2023-06-06  5:19   ` Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7211 bytes --]

In t1006 we have a bunch of tests that verify the output format of the
git-cat-file(1) command. But while part of the output for some tests
would include commit timestamps, we don't verify those but instead strip
them before comparing expected with actual results. This is done by the
function `maybe_remove_timestamp`, which goes all the way back to the
ancient commit b335d3f121 (Add tests for git cat-file, 2008-04-23).

Our tests had been in a different shape back then. Most importantly we
didn't yet have the infrastructure to create objects with deterministic
timestamps. Nowadays we do though, and thus there is no reason anymore
to strip the timestamps.

Refactor the tests to not strip the timestamp anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t1006-cat-file.sh | 68 +++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 42 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8eac74b59c..f139d56fb4 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -109,26 +109,12 @@ strlen () {
     echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
 }
 
-maybe_remove_timestamp () {
-	if test -z "$2"; then
-		echo_without_newline "$1"
-	else
-		echo_without_newline "$(printf '%s\n' "$1" | remove_timestamp)"
-	fi
-}
-
-remove_timestamp () {
-	sed -e 's/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//'
-}
-
-
 run_tests () {
     type=$1
     sha1=$2
     size=$3
     content=$4
     pretty_content=$5
-    no_ts=$6
 
     batch_output="$sha1 $type $size
 $content"
@@ -163,21 +149,21 @@ $content"
 
     test -z "$content" ||
     test_expect_success "Content of $type is correct" '
-	maybe_remove_timestamp "$content" $no_ts >expect &&
-	maybe_remove_timestamp "$(git cat-file $type $sha1)" $no_ts >actual &&
+	echo_without_newline "$content" >expect &&
+	git cat-file $type $sha1 >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Pretty content of $type is correct" '
-	maybe_remove_timestamp "$pretty_content" $no_ts >expect &&
-	maybe_remove_timestamp "$(git cat-file -p $sha1)" $no_ts >actual &&
+	echo_without_newline "$pretty_content" >expect &&
+	git cat-file -p $sha1 >actual &&
 	test_cmp expect actual
     '
 
     test -z "$content" ||
     test_expect_success "--batch output of $type is correct" '
-	maybe_remove_timestamp "$batch_output" $no_ts >expect &&
-	maybe_remove_timestamp "$(echo $sha1 | git cat-file --batch)" $no_ts >actual &&
+	echo "$batch_output" >expect &&
+	echo $sha1 | git cat-file --batch >actual &&
 	test_cmp expect actual
     '
 
@@ -191,9 +177,8 @@ $content"
     do
 	test -z "$content" ||
 		test_expect_success "--batch-command $opt output of $type content is correct" '
-		maybe_remove_timestamp "$batch_output" $no_ts >expect &&
-		maybe_remove_timestamp "$(test_write_lines "contents $sha1" |
-		git cat-file --batch-command $opt)" $no_ts >actual &&
+		echo "$batch_output" >expect &&
+		test_write_lines "contents $sha1" | git cat-file --batch-command $opt >actual &&
 		test_cmp expect actual
 	'
 
@@ -228,10 +213,9 @@ $content"
     test_expect_success "--batch without type ($type)" '
 	{
 		echo "$size" &&
-		maybe_remove_timestamp "$content" $no_ts
+		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objectsize)" >actual.full &&
-	maybe_remove_timestamp "$(cat actual.full)" $no_ts >actual &&
+	echo $sha1 | git cat-file --batch="%(objectsize)" >actual &&
 	test_cmp expect actual
     '
 
@@ -239,10 +223,9 @@ $content"
     test_expect_success "--batch without size ($type)" '
 	{
 		echo "$type" &&
-		maybe_remove_timestamp "$content" $no_ts
+		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objecttype)" >actual.full &&
-	maybe_remove_timestamp "$(cat actual.full)" $no_ts >actual &&
+	echo $sha1 | git cat-file --batch="%(objecttype)" >actual &&
 	test_cmp expect actual
     '
 }
@@ -284,7 +267,7 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
 
 tree_sha1=$(git write-tree)
 tree_size=$(($(test_oid rawsz) + 13))
-tree_pretty_content="100644 blob $hello_sha1	hello"
+tree_pretty_content="100644 blob $hello_sha1	hello${LF}"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
@@ -292,12 +275,12 @@ commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
 commit_size=$(($(test_oid hexsz) + 137))
 commit_content="tree $tree_sha1
-author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0 +0000
-committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0 +0000
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 
 $commit_message"
 
-run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
+run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content"
 
 tag_header_without_timestamp="object $hello_sha1
 type blob
@@ -311,7 +294,7 @@ $tag_description"
 tag_sha1=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
 tag_size=$(strlen "$tag_content")
 
-run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content" 1
+run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content"
 
 test_expect_success \
     "Reach a blob from a tag pointing to it" \
@@ -400,13 +383,16 @@ deadbeef missing
  missing"
 
 test_expect_success '--batch with multiple sha1s gives correct format' '
-	test "$(maybe_remove_timestamp "$batch_output" 1)" = "$(maybe_remove_timestamp "$(echo_without_newline "$batch_input" | git cat-file --batch)" 1)"
+	echo "$batch_output" >expect &&
+	echo_without_newline "$batch_input" | git cat-file --batch >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success '--batch, -z with multiple sha1s gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
-	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
-	"$(maybe_remove_timestamp "$(git cat-file --batch -z <in)" 1)"
+	echo "$batch_output" >expect &&
+	git cat-file --batch -z <in >actual &&
+	test_cmp expect actual
 '
 
 batch_check_input="$hello_sha1
@@ -480,7 +466,7 @@ contents deadbeef
 flush"
 
 test_expect_success '--batch-command with multiple command calls gives correct format' '
-	remove_timestamp >expect <<-EOF &&
+	cat >expect <<-EOF &&
 	$hello_sha1 blob $hello_size
 	$hello_content
 	$commit_sha1 commit $commit_size
@@ -491,15 +477,13 @@ test_expect_success '--batch-command with multiple command calls gives correct f
 	EOF
 
 	echo "$batch_command_multiple_contents" >in &&
-	git cat-file --batch-command --buffer <in >actual_raw &&
+	git cat-file --batch-command --buffer <in >actual &&
 
-	remove_timestamp <actual_raw >actual &&
 	test_cmp expect actual &&
 
 	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
-	git cat-file --batch-command --buffer -z <in >actual_raw &&
+	git cat-file --batch-command --buffer -z <in >actual &&
 
-	remove_timestamp <actual_raw >actual &&
 	test_cmp expect actual
 '
 
-- 
2.41.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/5] t1006: modernize test style to use `test_cmp`
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
@ 2023-06-06  5:19   ` Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4683 bytes --]

The tests for git-cat-file(1) are quite old and haven't ever been
updated since they were introduced. They thus tend to use old idioms
that have since grown outdated. Most importantly, many of the tests use
`test $A = $B` to compare expected and actual output. This has the
downside that it is impossible to tell what exactly is different between
both versions in case the test fails.

Refactor the tests to instead use `test_cmp`. While more verbose, it
both tends to be more readable and will result in a nice diff in case
states don't match.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t1006-cat-file.sh | 70 ++++++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 23 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index f139d56fb4..7b985cfded 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -296,9 +296,11 @@ tag_size=$(strlen "$tag_content")
 
 run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content"
 
-test_expect_success \
-    "Reach a blob from a tag pointing to it" \
-    "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
+test_expect_success "Reach a blob from a tag pointing to it" '
+	echo_without_newline "$hello_content" >expect &&
+	git cat-file blob $tag_sha1 >actual &&
+	test_cmp expect actual
+'
 
 for batch in batch batch-check batch-command
 do
@@ -334,30 +336,47 @@ do
 done
 
 test_expect_success "--batch-check for a non-existent named object" '
-    test "foobar42 missing
-foobar84 missing" = \
-    "$( ( echo foobar42 && echo_without_newline foobar84 ) | git cat-file --batch-check)"
+	cat >expect <<-EOF &&
+	foobar42 missing
+	foobar84 missing
+	EOF
+
+	printf "foobar42\nfoobar84" >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch-check for a non-existent hash" '
-    test "0000000000000000000000000000000000000042 missing
-0000000000000000000000000000000000000084 missing" = \
-    "$( ( echo 0000000000000000000000000000000000000042 &&
-	 echo_without_newline 0000000000000000000000000000000000000084 ) |
-       git cat-file --batch-check)"
+	cat >expect <<-EOF &&
+	0000000000000000000000000000000000000042 missing
+	0000000000000000000000000000000000000084 missing
+	EOF
+
+	printf "0000000000000000000000000000000000000042\n0000000000000000000000000000000000000084" >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch for an existent and a non-existent hash" '
-    test "$tag_sha1 tag $tag_size
-$tag_content
-0000000000000000000000000000000000000000 missing" = \
-    "$( ( echo $tag_sha1 &&
-	 echo_without_newline 0000000000000000000000000000000000000000 ) |
-       git cat-file --batch)"
+	cat >expect <<-EOF &&
+	$tag_sha1 tag $tag_size
+	$tag_content
+	0000000000000000000000000000000000000000 missing
+	EOF
+
+	printf "$tag_sha1\n0000000000000000000000000000000000000000" >in &&
+	git cat-file --batch <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch-check for an empty line" '
-    test " missing" = "$(echo | git cat-file --batch-check)"
+	cat >expect <<-EOF &&
+	 missing
+	EOF
+
+	echo >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'empty --batch-check notices missing object' '
@@ -384,7 +403,8 @@ deadbeef missing
 
 test_expect_success '--batch with multiple sha1s gives correct format' '
 	echo "$batch_output" >expect &&
-	echo_without_newline "$batch_input" | git cat-file --batch >actual &&
+	echo_without_newline "$batch_input" >in &&
+	git cat-file --batch <in >actual &&
 	test_cmp expect actual
 '
 
@@ -411,13 +431,17 @@ deadbeef missing
  missing"
 
 test_expect_success "--batch-check with multiple sha1s gives correct format" '
-    test "$batch_check_output" = \
-    "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
+	echo "$batch_check_output" >expect &&
+	echo_without_newline "$batch_check_input" >in &&
+	git cat-file --batch-check <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
-    echo_without_newline_nul "$batch_check_input" >in &&
-    test "$batch_check_output" = "$(git cat-file --batch-check -z <in)"
+	echo "$batch_check_output" >expect &&
+	echo_without_newline_nul "$batch_check_input" >in &&
+	git cat-file --batch-check -z <in >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
-- 
2.41.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
@ 2023-06-06  5:19   ` Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3114 bytes --]

Many of our commands support reading input that is separated either via
newlines or via NUL characters. Furthermore, in order to be a better
cross platform citizen, these commands typically know to strip the CRLF
sequence so that we also support reading newline-separated inputs on
e.g. the Windows platform. This results in the following kind of awkward
pattern:

```
struct strbuf input = STRBUF_INIT;

while (1) {
	int ret;

	if (nul_terminated)
		ret = strbuf_getline_nul(&input, stdin);
	else
		ret = strbuf_getline(&input, stdin);
	if (ret)
		break;

	...
}
```

Introduce a new CRLF-aware helper function that can read up to a user
specified delimiter. If the delimiter is `\n` the function knows to also
strip CRLF, otherwise it will only strip the specified delimiter. This
results in the following, much more readable code pattern:

```
struct strbuf input = STRBUF_INIT;

while (strbuf_getdelim_strip_crlf(&input, stdin, delim) != EOF) {
	...
}
```

The new function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 strbuf.c | 11 ++++++++---
 strbuf.h | 12 ++++++++++++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 08eec8f1d8..31dc48c0ab 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -721,11 +721,11 @@ static int strbuf_getdelim(struct strbuf *sb, FILE *fp, int term)
 	return 0;
 }
 
-int strbuf_getline(struct strbuf *sb, FILE *fp)
+int strbuf_getdelim_strip_crlf(struct strbuf *sb, FILE *fp, int term)
 {
-	if (strbuf_getwholeline(sb, fp, '\n'))
+	if (strbuf_getwholeline(sb, fp, term))
 		return EOF;
-	if (sb->buf[sb->len - 1] == '\n') {
+	if (term == '\n' && sb->buf[sb->len - 1] == '\n') {
 		strbuf_setlen(sb, sb->len - 1);
 		if (sb->len && sb->buf[sb->len - 1] == '\r')
 			strbuf_setlen(sb, sb->len - 1);
@@ -733,6 +733,11 @@ int strbuf_getline(struct strbuf *sb, FILE *fp)
 	return 0;
 }
 
+int strbuf_getline(struct strbuf *sb, FILE *fp)
+{
+	return strbuf_getdelim_strip_crlf(sb, fp, '\n');
+}
+
 int strbuf_getline_lf(struct strbuf *sb, FILE *fp)
 {
 	return strbuf_getdelim(sb, fp, '\n');
diff --git a/strbuf.h b/strbuf.h
index 3dfeadb44c..0e69b656bc 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -475,6 +475,18 @@ int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint);
  */
 ssize_t strbuf_write(struct strbuf *sb, FILE *stream);
 
+/**
+ * Read from a FILE * until the specified terminator is encountered,
+ * overwriting the existing contents of the strbuf.
+ *
+ * Reading stops after the terminator or at EOF.  The terminator is
+ * removed from the buffer before returning.  If the terminator is LF
+ * and if it is preceded by a CR, then the whole CRLF is stripped.
+ * Returns 0 unless there was nothing left before EOF, in which case
+ * it returns `EOF`.
+ */
+int strbuf_getdelim_strip_crlf(struct strbuf *sb, FILE *fp, int term);
+
 /**
  * Read a line from a FILE *, overwriting the existing contents of
  * the strbuf.  The strbuf_getline*() family of functions share
-- 
2.41.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/5] cat-file: simplify reading from standard input
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2023-06-06  5:19   ` [PATCH v2 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
@ 2023-06-06  5:19   ` Patrick Steinhardt
  2023-06-06  5:19   ` [PATCH v2 5/5] cat-file: introduce option to delimit input and output with NUL Patrick Steinhardt
  2023-06-12 20:43   ` [PATCH v2 0/5] catfile: introduce NUL-terminated output format Junio C Hamano
  5 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3870 bytes --]

The batch modes of git-cat-file(1) read queries from stantard input that
are either newline- or NUL-delimited. This code was introduced via
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), which notes that:

"""
The refactoring here is slightly unfortunate, since we turn loops like:

     while (strbuf_getline(&buf, stdin) != EOF)

 into:

     while (1) {
         int ret;
         if (opt->nul_terminated)
             ret = strbuf_getline_nul(&input, stdin);
         else
             ret = strbuf_getline(&input, stdin);

         if (ret == EOF)
             break;
     }
"""

The commit proposed introducing a helper function that is easier to use,
which is just what we have done in the preceding commit. Refactor the
code to use this new helper to simplify the loop.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/cat-file.c | 32 +++++++++-----------------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0bafc14e6c..001dcb24d6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -42,7 +42,7 @@ struct batch_options {
 	int all_objects;
 	int unordered;
 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
-	int nul_terminated;
+	char input_delim;
 	const char *format;
 };
 
@@ -694,20 +694,12 @@ static void batch_objects_command(struct batch_options *opt,
 	struct queued_cmd *queued_cmd = NULL;
 	size_t alloc = 0, nr = 0;
 
-	while (1) {
-		int i, ret;
+	while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
+		int i;
 		const struct parse_cmd *cmd = NULL;
 		const char *p = NULL, *cmd_end;
 		struct queued_cmd call = {0};
 
-		if (opt->nul_terminated)
-			ret = strbuf_getline_nul(&input, stdin);
-		else
-			ret = strbuf_getline(&input, stdin);
-
-		if (ret)
-			break;
-
 		if (!input.len)
 			die(_("empty command in input"));
 		if (isspace(*input.buf))
@@ -851,16 +843,7 @@ static int batch_objects(struct batch_options *opt)
 		goto cleanup;
 	}
 
-	while (1) {
-		int ret;
-		if (opt->nul_terminated)
-			ret = strbuf_getline_nul(&input, stdin);
-		else
-			ret = strbuf_getline(&input, stdin);
-
-		if (ret == EOF)
-			break;
-
+	while (strbuf_getdelim_strip_crlf(&input, stdin, opt->input_delim) != EOF) {
 		if (data.split_on_whitespace) {
 			/*
 			 * Split at first whitespace, tying off the beginning
@@ -929,6 +912,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	const char *exp_type = NULL, *obj_name = NULL;
 	struct batch_options batch = {0};
 	int unknown_type = 0;
+	int input_nul_terminated = 0;
 
 	const char * const usage[] = {
 		N_("git cat-file <type> <object>"),
@@ -965,7 +949,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
-		OPT_BOOL('z', NULL, &batch.nul_terminated, N_("stdin is NUL-terminated")),
+		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
 		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
 			N_("read commands from stdin"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
@@ -1024,10 +1008,12 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	else if (batch.all_objects)
 		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
 			       "--batch-all-objects");
-	else if (batch.nul_terminated)
+	else if (input_nul_terminated)
 		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
 			       "-z");
 
+	batch.input_delim = input_nul_terminated ? '\0' : '\n';
+
 	/* Batch defaults */
 	if (batch.buffer_output < 0)
 		batch.buffer_output = batch.all_objects;
-- 
2.41.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/5] cat-file: introduce option to delimit input and output with NUL
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2023-06-06  5:19   ` [PATCH v2 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
@ 2023-06-06  5:19   ` Patrick Steinhardt
  2023-06-12 20:43   ` [PATCH v2 0/5] catfile: introduce NUL-terminated output format Junio C Hamano
  5 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:19 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Toon Claes, Phillip Wood, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 18248 bytes --]

In db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with
`-z`, 2022-07-22), we have introduced a new mode to read the input via
NUL-delimited records instead of newline-delimited records. This allows
the user to query for revisions that have newlines in their path
component. While unusual, such queries are perfectly valid and thus it
is clear that we should be able to support them properly.

Unfortunately, the commit only changed the input to be NUL-delimited,
but didn't change the output at the same time. While this is fine for
queries that are processed successfully, it is less so for queries that
aren't. In the case of missing commits for example the result can become
entirely unparsable:

```
$ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
    git cat-file --batch -z
7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
1234567890

commit missing
```

This is of course a crafted query that is intentionally gaming the
deficiency, but more benign queries that contain newlines would have
similar problems.

Ideally, we should have also changed the output to be NUL-delimited when
`-z` is specified to avoid this problem. As the input is NUL-delimited,
it is clear that the output in this case cannot ever contain NUL
characters by itself. Furthermore, Git does not allow NUL characters in
revisions anyway, further stressing the point that using NUL-delimited
output is safe. The only exception is of course the object data itself,
but as git-cat-file(1) prints the size of the object data clients should
read until that specified size has been consumed.

But even though `-z` has only been introduced a few releases ago in Git
v2.38.0, changing the output format retroactively to also NUL-delimit
output would be a backwards incompatible change. And while one could
make the argument that the output is inherently broken already, we need
to assume that there are existing users out there that use it just fine
given that revisions containing newlines are quite exotic.

Instead, introduce a new option `-Z` that switches to NUL-delimited
input and output. While this new option could arguably only switch the
output format to be NUL-delimited, the consequence would be that users
have to always specify both `-z` and `-Z` when the input may contain
newlines. On the other hand, if the user knows that there never will be
newlines in the input, they don't have to use either of those options.
There is thus no usecase that would warrant treating input and output
format separately, which is why we instead opt to "do the right thing"
and have `-Z` mean to NUL-terminate both formats.

The old `-z` option is marked as deprecated with a hint that its output
may become unparsable. It is thus hidden both from the synopsis as well
as the command's help output.

Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-cat-file.txt |  15 +++-
 builtin/cat-file.c             |  57 +++++++++------
 t/t1006-cat-file.sh            | 123 ++++++++++++++++++++++++---------
 3 files changed, 139 insertions(+), 56 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 411de2e27d..0e4936d182 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 'git cat-file' (-t | -s) [--allow-unknown-type] <object>
 'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
 	     [--buffer] [--follow-symlinks] [--unordered]
-	     [--textconv | --filters] [-z]
+	     [--textconv | --filters] [-Z]
 'git cat-file' (--textconv | --filters)
 	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
 
@@ -243,10 +243,16 @@ respectively print:
 	/etc/passwd
 --
 
+-Z::
+	Only meaningful with `--batch`, `--batch-check`, or
+	`--batch-command`; input and output is NUL-delimited instead of
+	newline-delimited.
+
 -z::
 	Only meaningful with `--batch`, `--batch-check`, or
 	`--batch-command`; input is NUL-delimited instead of
-	newline-delimited.
+	newline-delimited. This option is deprecated in favor of
+	`-Z` as the output can otherwise be ambiguous.
 
 
 OUTPUT
@@ -384,6 +390,11 @@ notdir SP <size> LF
 is printed when, during symlink resolution, a file is used as a
 directory name.
 
+Alternatively, when `-Z` is passed, the line feeds in any of the above examples
+are replaced with NUL terminators. This ensures that output will be parsable if
+the output itself would contain a linefeed and is thus recommended for
+scripting purposes.
+
 CAVEATS
 -------
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 001dcb24d6..e5f72229df 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -43,6 +43,7 @@ struct batch_options {
 	int unordered;
 	int transform_mode; /* may be 'w' or 'c' for --filters or --textconv */
 	char input_delim;
+	char output_delim;
 	const char *format;
 };
 
@@ -437,11 +438,12 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 	}
 }
 
-static void print_default_format(struct strbuf *scratch, struct expand_data *data)
+static void print_default_format(struct strbuf *scratch, struct expand_data *data,
+				 struct batch_options *opt)
 {
-	strbuf_addf(scratch, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
+	strbuf_addf(scratch, "%s %s %"PRIuMAX"%c", oid_to_hex(&data->oid),
 		    type_name(data->type),
-		    (uintmax_t)data->size);
+		    (uintmax_t)data->size, opt->output_delim);
 }
 
 /*
@@ -470,8 +472,8 @@ static void batch_object_write(const char *obj_name,
 						       &data->oid, &data->info,
 						       OBJECT_INFO_LOOKUP_REPLACE);
 		if (ret < 0) {
-			printf("%s missing\n",
-			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			printf("%s missing%c",
+			       obj_name ? obj_name : oid_to_hex(&data->oid), opt->output_delim);
 			fflush(stdout);
 			return;
 		}
@@ -492,17 +494,17 @@ static void batch_object_write(const char *obj_name,
 	strbuf_reset(scratch);
 
 	if (!opt->format) {
-		print_default_format(scratch, data);
+		print_default_format(scratch, data, opt);
 	} else {
 		strbuf_expand(scratch, opt->format, expand_format, data);
-		strbuf_addch(scratch, '\n');
+		strbuf_addch(scratch, opt->output_delim);
 	}
 
 	batch_write(opt, scratch->buf, scratch->len);
 
 	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		print_object_or_die(opt, data);
-		batch_write(opt, "\n", 1);
+		batch_write(opt, &opt->output_delim, 1);
 	}
 }
 
@@ -520,22 +522,25 @@ static void batch_one_object(const char *obj_name,
 	if (result != FOUND) {
 		switch (result) {
 		case MISSING_OBJECT:
-			printf("%s missing\n", obj_name);
+			printf("%s missing%c", obj_name, opt->output_delim);
 			break;
 		case SHORT_NAME_AMBIGUOUS:
-			printf("%s ambiguous\n", obj_name);
+			printf("%s ambiguous%c", obj_name, opt->output_delim);
 			break;
 		case DANGLING_SYMLINK:
-			printf("dangling %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
+			printf("dangling %"PRIuMAX"%c%s%c",
+			       (uintmax_t)strlen(obj_name),
+			       opt->output_delim, obj_name, opt->output_delim);
 			break;
 		case SYMLINK_LOOP:
-			printf("loop %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
+			printf("loop %"PRIuMAX"%c%s%c",
+			       (uintmax_t)strlen(obj_name),
+			       opt->output_delim, obj_name, opt->output_delim);
 			break;
 		case NOT_DIR:
-			printf("notdir %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
+			printf("notdir %"PRIuMAX"%c%s%c",
+			       (uintmax_t)strlen(obj_name),
+			       opt->output_delim, obj_name, opt->output_delim);
 			break;
 		default:
 			BUG("unknown get_sha1_with_context result %d\n",
@@ -547,9 +552,9 @@ static void batch_one_object(const char *obj_name,
 	}
 
 	if (ctx.mode == 0) {
-		printf("symlink %"PRIuMAX"\n%s\n",
+		printf("symlink %"PRIuMAX"%c%s%c",
 		       (uintmax_t)ctx.symlink_path.len,
-		       ctx.symlink_path.buf);
+		       opt->output_delim, ctx.symlink_path.buf, opt->output_delim);
 		fflush(stdout);
 		return;
 	}
@@ -913,6 +918,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	struct batch_options batch = {0};
 	int unknown_type = 0;
 	int input_nul_terminated = 0;
+	int nul_terminated = 0;
 
 	const char * const usage[] = {
 		N_("git cat-file <type> <object>"),
@@ -920,7 +926,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
 		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
-		   "             [--textconv | --filters] [-z]"),
+		   "             [--textconv | --filters] [-Z]"),
 		N_("git cat-file (--textconv | --filters)\n"
 		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
 		NULL
@@ -949,7 +955,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			N_("like --batch, but don't emit <contents>"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			batch_option_callback),
-		OPT_BOOL('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated")),
+		OPT_BOOL_F('z', NULL, &input_nul_terminated, N_("stdin is NUL-terminated"),
+			PARSE_OPT_HIDDEN),
+		OPT_BOOL('Z', NULL, &nul_terminated, N_("stdin and stdout is NUL-terminated")),
 		OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
 			N_("read commands from stdin"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
@@ -1011,8 +1019,15 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	else if (input_nul_terminated)
 		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
 			       "-z");
+	else if (nul_terminated)
+		usage_msg_optf(_("'%s' requires a batch mode"), usage, options,
+			       "-Z");
 
-	batch.input_delim = input_nul_terminated ? '\0' : '\n';
+	batch.input_delim = batch.output_delim = '\n';
+	if (input_nul_terminated)
+		batch.input_delim = '\0';
+	if (nul_terminated)
+		batch.input_delim = batch.output_delim = '\0';
 
 	/* Batch defaults */
 	if (batch.buffer_output < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7b985cfded..d73a0be1b9 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -89,7 +89,8 @@ done
 for opt in --buffer \
 	--follow-symlinks \
 	--batch-all-objects \
-	-z
+	-z \
+	-Z
 do
 	test_expect_success "usage: bad option combination: $opt without batch mode" '
 		test_incompatible_usage git cat-file $opt &&
@@ -392,17 +393,18 @@ deadbeef
 
 "
 
-batch_output="$hello_sha1 blob $hello_size
-$hello_content
-$commit_sha1 commit $commit_size
-$commit_content
-$tag_sha1 tag $tag_size
-$tag_content
-deadbeef missing
- missing"
+printf "%s\0" \
+	"$hello_sha1 blob $hello_size" \
+	"$hello_content" \
+	"$commit_sha1 commit $commit_size" \
+	"$commit_content" \
+	"$tag_sha1 tag $tag_size" \
+	"$tag_content" \
+	"deadbeef missing" \
+	" missing" >batch_output
 
 test_expect_success '--batch with multiple sha1s gives correct format' '
-	echo "$batch_output" >expect &&
+	tr "\0" "\n" <batch_output >expect &&
 	echo_without_newline "$batch_input" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
@@ -410,11 +412,17 @@ test_expect_success '--batch with multiple sha1s gives correct format' '
 
 test_expect_success '--batch, -z with multiple sha1s gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
-	echo "$batch_output" >expect &&
+	tr "\0" "\n" <batch_output >expect &&
 	git cat-file --batch -z <in >actual &&
 	test_cmp expect actual
 '
 
+test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
+	echo_without_newline_nul "$batch_input" >in &&
+	git cat-file --batch -Z <in >actual &&
+	test_cmp batch_output actual
+'
+
 batch_check_input="$hello_sha1
 $tree_sha1
 $commit_sha1
@@ -423,40 +431,55 @@ deadbeef
 
 "
 
-batch_check_output="$hello_sha1 blob $hello_size
-$tree_sha1 tree $tree_size
-$commit_sha1 commit $commit_size
-$tag_sha1 tag $tag_size
-deadbeef missing
- missing"
+printf "%s\0" \
+	"$hello_sha1 blob $hello_size" \
+	"$tree_sha1 tree $tree_size" \
+	"$commit_sha1 commit $commit_size" \
+	"$tag_sha1 tag $tag_size" \
+	"deadbeef missing" \
+	" missing" >batch_check_output
 
 test_expect_success "--batch-check with multiple sha1s gives correct format" '
-	echo "$batch_check_output" >expect &&
+	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline "$batch_check_input" >in &&
 	git cat-file --batch-check <in >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
-	echo "$batch_check_output" >expect &&
+	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -z <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
+test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
+	echo_without_newline_nul "$batch_check_input" >in &&
+	git cat-file --batch-check -Z <in >actual &&
+	test_cmp batch_check_output actual
+'
+
+test_expect_success FUNNYNAMES 'setup with newline in input' '
 	touch -- "newline${LF}embedded" &&
 	git add -- "newline${LF}embedded" &&
 	git commit -m "file with newline embedded" &&
 	test_tick &&
 
-	printf "HEAD:newline${LF}embedded" >in &&
-	git cat-file --batch-check -z <in >actual &&
+	printf "HEAD:newline${LF}embedded" >in
+'
 
+test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
+	git cat-file --batch-check -z <in >actual &&
 	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
 	test_cmp expect actual
 '
 
+test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
+	git cat-file --batch-check -Z <in >actual &&
+	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
+	test_cmp expect actual
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
@@ -480,7 +503,13 @@ test_expect_success '--batch-command with multiple info calls gives correct form
 	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
 	git cat-file --batch-command --buffer -z <in >actual &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	echo "$batch_command_multiple_info" | tr "\n" "\0" >in &&
+	tr "\n" "\0" <expect >expect_nul &&
+	git cat-file --batch-command --buffer -Z <in >actual &&
+
+	test_cmp expect_nul actual
 '
 
 batch_command_multiple_contents="contents $hello_sha1
@@ -490,15 +519,15 @@ contents deadbeef
 flush"
 
 test_expect_success '--batch-command with multiple command calls gives correct format' '
-	cat >expect <<-EOF &&
-	$hello_sha1 blob $hello_size
-	$hello_content
-	$commit_sha1 commit $commit_size
-	$commit_content
-	$tag_sha1 tag $tag_size
-	$tag_content
-	deadbeef missing
-	EOF
+	printf "%s\0" \
+		"$hello_sha1 blob $hello_size" \
+		"$hello_content" \
+		"$commit_sha1 commit $commit_size" \
+		"$commit_content" \
+		"$tag_sha1 tag $tag_size" \
+		"$tag_content" \
+		"deadbeef missing" >expect_nul &&
+	tr "\0" "\n" <expect_nul >expect &&
 
 	echo "$batch_command_multiple_contents" >in &&
 	git cat-file --batch-command --buffer <in >actual &&
@@ -508,7 +537,12 @@ test_expect_success '--batch-command with multiple command calls gives correct f
 	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
 	git cat-file --batch-command --buffer -z <in >actual &&
 
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	echo "$batch_command_multiple_contents" | tr "\n" "\0" >in &&
+	git cat-file --batch-command --buffer -Z <in >actual &&
+
+	test_cmp expect_nul actual
 '
 
 test_expect_success 'setup blobs which are likely to delta' '
@@ -848,6 +882,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for brok
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for broken in-repo, same-dir links' '
+	printf "HEAD:broken-same-dir-link\0" >in &&
+	printf "dangling 25\0HEAD:broken-same-dir-link\0" >expect &&
+	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git cat-file --batch-check --follow-symlinks works for same-dir links-to-links' '
 	echo HEAD:link-to-link | git cat-file --batch-check --follow-symlinks >actual &&
 	test_cmp found actual
@@ -862,6 +903,15 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for pare
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch-check --follow-symlinks -Z works for parent-dir links' '
+	echo HEAD:dir/parent-dir-link | git cat-file --batch-check --follow-symlinks >actual &&
+	test_cmp found actual &&
+	printf "notdir 29\0HEAD:dir/parent-dir-link/nope\0" >expect &&
+	printf "HEAD:dir/parent-dir-link/nope\0" >in &&
+	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git cat-file --batch-check --follow-symlinks works for .. links' '
 	echo dangling 22 >expect &&
 	echo HEAD:dir/link-dir/nope >>expect &&
@@ -976,6 +1026,13 @@ test_expect_success 'git cat-file --batch-check --follow-symlink breaks loops' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git cat-file --batch-check --follow-symlink -Z breaks loops' '
+	printf "loop 10\0HEAD:loop1\0" >expect &&
+	printf "HEAD:loop1\0" >in &&
+	git cat-file --batch-check --follow-symlinks -Z <in >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git cat-file --batch --follow-symlink returns correct sha and mode' '
 	echo HEAD:morx | git cat-file --batch >expect &&
 	echo HEAD:morx | git cat-file --batch --follow-symlinks >actual &&
-- 
2.41.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-06  4:52       ` Patrick Steinhardt
@ 2023-06-06  5:22         ` Junio C Hamano
  2023-06-06  5:31           ` Patrick Steinhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-06-06  5:22 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Phillip Wood, git, Taylor Blau, Toon Claes

Patrick Steinhardt <ps@pks.im> writes:

>> Hmph, I expected "-z" and "-Z" to be orthogonal, the former
>> controlling how input records are delimited, the latter controlling
>> how output records are delimited, as it usually is a good idea to
>> keep things that could be orthogonal to be orthogonal to avoid
>> unnecessarily robbing users flexibility.  "-Z is a new way that is
>> preferred over -z" was something I did not expect, actually.
>> 
>> I am not outright rejecting such a deliberately limiting design, but
>> I'll have to think about it a bit.
>
> Well, the way I see it we shouldn't have ever decoupled the input and
> output format, and I consider it a bug that `-z` did as it makes it
> unusable with arbitrary input. So I'd rather be helping the user of
> these modes to do the right thing and NUL-delimit both input and output
> than running into these edge cases later down the road.

If that is the direction we want to go in, then the title of this
step must be updated to say "-Z" does both output and input.  Also
with the same number of bytes it takes to say "new option", you can
say what that new option is, so do that.

> That being said I'd be fine to change this series to mean "-Z changes
> stdout" if you insist. In that case we should be pointing out in our
> documentation that "You should never use `-z` without `-Z` when you
> process arbitrary input".

You are not making sense.  If we were to leave them orthogonal to
keep flexibility, it is because there can be cases where using '-Z'
without using '-z' (and vice versa) makes sense; "you should never"
has no place to live in such a world.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-06  5:22         ` Junio C Hamano
@ 2023-06-06  5:31           ` Patrick Steinhardt
  2023-06-12 19:12             ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2023-06-06  5:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, git, Taylor Blau, Toon Claes

[-- Attachment #1: Type: text/plain, Size: 2865 bytes --]

On Tue, Jun 06, 2023 at 02:22:15PM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> Hmph, I expected "-z" and "-Z" to be orthogonal, the former
> >> controlling how input records are delimited, the latter controlling
> >> how output records are delimited, as it usually is a good idea to
> >> keep things that could be orthogonal to be orthogonal to avoid
> >> unnecessarily robbing users flexibility.  "-Z is a new way that is
> >> preferred over -z" was something I did not expect, actually.
> >> 
> >> I am not outright rejecting such a deliberately limiting design, but
> >> I'll have to think about it a bit.
> >
> > Well, the way I see it we shouldn't have ever decoupled the input and
> > output format, and I consider it a bug that `-z` did as it makes it
> > unusable with arbitrary input. So I'd rather be helping the user of
> > these modes to do the right thing and NUL-delimit both input and output
> > than running into these edge cases later down the road.
> 
> If that is the direction we want to go in, then the title of this
> step must be updated to say "-Z" does both output and input.  Also
> with the same number of bytes it takes to say "new option", you can
> say what that new option is, so do that.

Sorry, my v2 and your mail crossed, so the new title doesn't yet mention
`-Z`. I agree it would make sense to do so, and if there's a v3 I'll
change it accordingly. Otherwise, please feel free to change:

    cat-file: introduce option to delimit input and output with NUL

to

    cat-file: introduce `-Z` to delimit input and output with NUL

> > That being said I'd be fine to change this series to mean "-Z changes
> > stdout" if you insist. In that case we should be pointing out in our
> > documentation that "You should never use `-z` without `-Z` when you
> > process arbitrary input".
> 
> You are not making sense.  If we were to leave them orthogonal to
> keep flexibility, it is because there can be cases where using '-Z'
> without using '-z' (and vice versa) makes sense; "you should never"
> has no place to live in such a world.

Well, that's exactly what I'm arguing: I don't think it does make sense
to keep them orthogonal. I cannot think of usecases where you'd want to
only change either of those options once you start processing arbitrary
input. If you exactly know that your input doesn't contain newlines then
you don't need either `-z` nor `-Z`. On the other hand, if it does, then
you must be using both.

I only proposed the alternative with the intent of being diplomatic and
not fully close the door on your remark in case you felt strongly about
it.

I've thus sent v2 to double down on my current approach where `-Z`
changes both stdin and stdout and tried to make a better case for it in
the commit message.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters
  2023-06-06  5:31           ` Patrick Steinhardt
@ 2023-06-12 19:12             ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-06-12 19:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Phillip Wood, git, Taylor Blau, Toon Claes

Patrick Steinhardt <ps@pks.im> writes:

>> > That being said I'd be fine to change this series to mean "-Z changes
>> > stdout" if you insist. In that case we should be pointing out in our
>> > documentation that "You should never use `-z` without `-Z` when you
>> > process arbitrary input".
>> 
>> You are not making sense.  If we were to leave them orthogonal to
>> keep flexibility, it is because there can be cases where using '-Z'
>> without using '-z' (and vice versa) makes sense; "you should never"
>> has no place to live in such a world.
>
> Well, that's exactly what I'm arguing: I don't think it does make sense
> to keep them orthogonal.

I was not commenting on that.

My "you are not making sense" was because you said "I'd be fine to
make -Z to mean 'stdout is NUL delimited'".  If somebody thinks it
is a good idea to make '-Z' mean 'stdout is NUL delimited without
affecting how the input is delimited', it can only be because that
somebody thinks that it makes sense to make input and output
orthogonal.

It is perfectly OK if you are not fine to make -Z to affect only the
standard output stream.  That stance is at least self consistent.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/5] catfile: introduce NUL-terminated output format
  2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2023-06-06  5:19   ` [PATCH v2 5/5] cat-file: introduce option to delimit input and output with NUL Patrick Steinhardt
@ 2023-06-12 20:43   ` Junio C Hamano
  5 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-06-12 20:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Toon Claes, Phillip Wood

Patrick Steinhardt <ps@pks.im> writes:

> this is the second version of my patch series that introduces a new NUL
> terminated output format to git-cat-file(1) in order to make its output
> unambiguously parsable in the case where the input contains newlines.

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-12 20:44 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 13:02 [PATCH 0/5] cat-file: introduce NUL-terminated output format Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
2023-06-02 13:02 ` [PATCH 5/5] cat-file: Introduce new option to delimit output with NUL characters Patrick Steinhardt
2023-06-05 15:47   ` Phillip Wood
2023-06-05 23:54     ` Junio C Hamano
2023-06-06  4:52       ` Patrick Steinhardt
2023-06-06  5:22         ` Junio C Hamano
2023-06-06  5:31           ` Patrick Steinhardt
2023-06-12 19:12             ` Junio C Hamano
2023-06-06  5:00     ` Patrick Steinhardt
2023-06-06  1:23   ` Junio C Hamano
2023-06-03  1:44 ` [PATCH 0/5] cat-file: introduce NUL-terminated output format Junio C Hamano
2023-06-06  5:19 ` [PATCH v2 0/5] catfile: " Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 1/5] t1006: don't strip timestamps from expected results Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 2/5] t1006: modernize test style to use `test_cmp` Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 3/5] strbuf: provide CRLF-aware helper to read until a specified delimiter Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 4/5] cat-file: simplify reading from standard input Patrick Steinhardt
2023-06-06  5:19   ` [PATCH v2 5/5] cat-file: introduce option to delimit input and output with NUL Patrick Steinhardt
2023-06-12 20:43   ` [PATCH v2 0/5] catfile: introduce NUL-terminated output format Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).