Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] cat-file: quote-format name in error when using -z
@ 2022-12-09 15:00 Toon Claes
  2022-12-09 15:00 ` [PATCH 1/1] " Toon Claes
  2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
  0 siblings, 2 replies; 32+ messages in thread
From: Toon Claes @ 2022-12-09 15:00 UTC (permalink / raw)
  To: git; +Cc: Toon Claes

Hi,

While we at GitLab have been implementing the use of option `-z` for
`git-cat-file` with `--batch` we discovered an extreme edge-case when passing
paths with newlines to the command. In case the object is not found the error
message include the path in the output and thus the error message might get
spread over multiple lines.

The provided patch sanitizes the output, making is easier machine-parse it.

Toon

Toon Claes (1):
  cat-file: quote-format name in error when using -z

 builtin/cat-file.c  | 10 ++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 18 insertions(+)


base-commit: 2e71cbbddd64695d43383c25c7a054ac4ff86882
--
2.39.0.rc0.57

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-09 15:00 [PATCH 0/1] cat-file: quote-format name in error when using -z Toon Claes
@ 2022-12-09 15:00 ` Toon Claes
  2022-12-09 19:33   ` Phillip Wood
  2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
  1 sibling, 1 reply; 32+ messages in thread
From: Toon Claes @ 2022-12-09 15:00 UTC (permalink / raw)
  To: git; +Cc: Toon Claes

Since it's supported to have NUL-delimited input, introduced in
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), it's possible to pass paths that contain newlines. This
works great when the object is found, but when it's not, the input path
is returned in the error message. Because this can contain newlines, the
error message might get spread over multiple lines, making it harder to
machine-parse this error message.

With this change, the input is quote-formatted in the error message, if
needed. This ensures the error message is always on a single line and
makes parsing the error more straightforward.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 builtin/cat-file.c  | 10 ++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 18 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b3be58b1fb..67dd81238d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@
 #include "tree-walk.h"
 #include "oid-array.h"
 #include "packfile.h"
+#include "quote.h"
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "mailmap.h"
@@ -475,6 +476,13 @@ static void batch_one_object(const char *obj_name,
 	result = get_oid_with_context(the_repository, obj_name,
 				      flags, &data->oid, &ctx);
 	if (result != FOUND) {
+		struct strbuf quoted = STRBUF_INIT;
+
+		if (opt->nul_terminated) {
+			quote_c_style(obj_name, &quoted, NULL, 0);
+			obj_name = quoted.buf;
+		}
+
 		switch (result) {
 		case MISSING_OBJECT:
 			printf("%s missing\n", obj_name);
@@ -499,6 +507,8 @@ static void batch_one_object(const char *obj_name,
 			       result);
 			break;
 		}
+
+		strbuf_release(&quoted);
 		fflush(stdout);
 		return;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 23b8942edb..232bfd1723 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -447,6 +447,14 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
 	test_cmp expect actual
 '

+test_expect_success FUNNYNAMES '--batch-check, -z with newline in non-existent named object' '
+	printf "HEAD:newline${LF}embedded" >in &&
+	git cat-file --batch-check -z <in >actual &&
+
+	printf "\"HEAD:newline\\\\nembedded\" missing\n" >expect &&
+	test_cmp expect actual
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
--
2.39.0.rc0.57

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-09 15:00 ` [PATCH 1/1] " Toon Claes
@ 2022-12-09 19:33   ` Phillip Wood
  2022-12-09 23:58     ` Junio C Hamano
  2022-12-20  5:31     ` Toon Claes
  0 siblings, 2 replies; 32+ messages in thread
From: Phillip Wood @ 2022-12-09 19:33 UTC (permalink / raw)
  To: Toon Claes, git

Hi Toon

On 09/12/2022 15:00, Toon Claes wrote:
> Since it's supported to have NUL-delimited input, introduced in
> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
> 2022-07-22), it's possible to pass paths that contain newlines. This
> works great when the object is found, but when it's not, the input path
> is returned in the error message. Because this can contain newlines, the
> error message might get spread over multiple lines, making it harder to
> machine-parse this error message.
> 
> With this change, the input is quote-formatted in the error message, if
> needed. This ensures the error message is always on a single line and
> makes parsing the error more straightforward.

Thanks for working on this. I'd previously suggested NUL terminating the 
output of "git cat-file -z" to avoid this problem [1] but quoting the 
object name is a better solution. The implementation looks good, thanks 
for adding a test - are you sure we need the FUNNYNAMES guard on the 
test even though it isn't creating files with funny names?

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/66b71194-ad0e-18d0-e43b-71e5c47ba111@gmail.com/

> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
>   builtin/cat-file.c  | 10 ++++++++++
>   t/t1006-cat-file.sh |  8 ++++++++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index b3be58b1fb..67dd81238d 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -14,6 +14,7 @@
>   #include "tree-walk.h"
>   #include "oid-array.h"
>   #include "packfile.h"
> +#include "quote.h"
>   #include "object-store.h"
>   #include "promisor-remote.h"
>   #include "mailmap.h"
> @@ -475,6 +476,13 @@ static void batch_one_object(const char *obj_name,
>   	result = get_oid_with_context(the_repository, obj_name,
>   				      flags, &data->oid, &ctx);
>   	if (result != FOUND) {
> +		struct strbuf quoted = STRBUF_INIT;
> +
> +		if (opt->nul_terminated) {
> +			quote_c_style(obj_name, &quoted, NULL, 0);
> +			obj_name = quoted.buf;
> +		}
> +
>   		switch (result) {
>   		case MISSING_OBJECT:
>   			printf("%s missing\n", obj_name);
> @@ -499,6 +507,8 @@ static void batch_one_object(const char *obj_name,
>   			       result);
>   			break;
>   		}
> +
> +		strbuf_release(&quoted);
>   		fflush(stdout);
>   		return;
>   	}
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 23b8942edb..232bfd1723 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -447,6 +447,14 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
>   	test_cmp expect actual
>   '
> 
> +test_expect_success FUNNYNAMES '--batch-check, -z with newline in non-existent named object' '
> +	printf "HEAD:newline${LF}embedded" >in &&
> +	git cat-file --batch-check -z <in >actual &&
> +
> +	printf "\"HEAD:newline\\\\nembedded\" missing\n" >expect &&
> +	test_cmp expect actual
> +'
> +
>   batch_command_multiple_info="info $hello_sha1
>   info $tree_sha1
>   info $commit_sha1
> --
> 2.39.0.rc0.57

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-09 19:33   ` Phillip Wood
@ 2022-12-09 23:58     ` Junio C Hamano
  2022-12-11 16:30       ` Phillip Wood
  2022-12-20  5:31     ` Toon Claes
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2022-12-09 23:58 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Toon Claes, git

Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Toon
>
> On 09/12/2022 15:00, Toon Claes wrote:
>> Since it's supported to have NUL-delimited input, introduced in
>> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
>> 2022-07-22), it's possible to pass paths that contain newlines. This
>> works great when the object is found, but when it's not, the input path
>> is returned in the error message. Because this can contain newlines, the
>> error message might get spread over multiple lines, making it harder to
>> machine-parse this error message.
>> With this change, the input is quote-formatted in the error message,
>> if
>> needed. This ensures the error message is always on a single line and
>> makes parsing the error more straightforward.
>
> Thanks for working on this. I'd previously suggested NUL terminating
> the output of "git cat-file -z" to avoid this problem [1] but quoting
> the object name is a better solution.

Hmph.  My knee-jerk reaction was that it is utterly disgusting if we
quote when we do NUL-terminated.  Is your "quoting is OK over NUL
terminating" because "-z" applies only to the input?  If so, then I
would agree that is OK, but shouldn't the quoting apply regardless
of how the input is formulated?  Why do we call the cquote helper
only under "-z"?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-09 23:58     ` Junio C Hamano
@ 2022-12-11 16:30       ` Phillip Wood
  2022-12-12  0:11         ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Phillip Wood @ 2022-12-11 16:30 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood; +Cc: Toon Claes, git

On 09/12/2022 23:58, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> Hi Toon
>>
>> On 09/12/2022 15:00, Toon Claes wrote:
>>> Since it's supported to have NUL-delimited input, introduced in
>>> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
>>> 2022-07-22), it's possible to pass paths that contain newlines. This
>>> works great when the object is found, but when it's not, the input path
>>> is returned in the error message. Because this can contain newlines, the
>>> error message might get spread over multiple lines, making it harder to
>>> machine-parse this error message.
>>> With this change, the input is quote-formatted in the error message,
>>> if
>>> needed. This ensures the error message is always on a single line and
>>> makes parsing the error more straightforward.
>>
>> Thanks for working on this. I'd previously suggested NUL terminating
>> the output of "git cat-file -z" to avoid this problem [1] but quoting
>> the object name is a better solution.
> 
> Hmph.  My knee-jerk reaction was that it is utterly disgusting if we
> quote when we do NUL-terminated.  Is your "quoting is OK over NUL
> terminating" because "-z" applies only to the input?

Yes, if the object exists then delimiting the output with newlines is 
fine, it is only if the object is missing and its name contains a 
newline that there is a problem. It also makes adopting "-z" in existing 
scripts easier as there is no change required when parsing the output.

As "-z" was added in 2.38 there is also a pragmatic reason to prefer 
quoting over NUL terminated output as it allows us to fix this issue 
without changing the output delimiter of an existing option.

> If so, then I
> would agree that is OK, but shouldn't the quoting apply regardless
> of how the input is formulated?  Why do we call the cquote helper
> only under "-z"?

Without "-z" you cannot pass object names that contain newlines so not 
quoting the output does not cause a problem. We could start quoting the 
object name without "-z" but we'd be changing the output without a huge 
benefit.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-11 16:30       ` Phillip Wood
@ 2022-12-12  0:11         ` Junio C Hamano
  2022-12-12 11:34           ` Toon Claes
  2022-12-13 15:06           ` Phillip Wood
  0 siblings, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-12-12  0:11 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Toon Claes, git

Phillip Wood <phillip.wood123@gmail.com> writes:

> Without "-z" you cannot pass object names that contain newlines so not
> quoting the output does not cause a problem. We could start quoting
> the object name without "-z" but we'd be changing the output without a
> huge benefit.

That's fair.  The next question is from a devil's advocate:
is switching to the full cquote the best thing to do?

If we were using the full cquote from the very beginning, of course
it is, simply because that is what is used in all other places in
Git.  Using the full cquote does mean a LF byte will be protected
(i.e. instead of shown literally in the middle of other letters
around LF, "other\nletters around LF" would be shown), but pathnames
with backslashes and double quotes in them that have been shown
without problems would be shown differently and will break existing
parsers, which are written lazily with the assumption that they are
perfectly happy to be "simple" thans to not having to deal with LF
(because in their environment a path with LF in it do not matter).

A bit safer thing to do is to replace LF (and not any other bytes
that would be quoted with full cquote) in the path given in these
messages with something else (like NUL to truncate the output
there).  As these answers are given in order, the object names are
not absolutely needed to identify and match up the input and the
output, and properly written parsers would be prepared to see a
response with an object name that it did not request and handle it
sanely, such a change may not break such a parser for a path with
any byte that are modified with full cquote.

The above is with a devil's adovocate hat on, and I do not care too
much, as I do not think butchering backslash with full cquote would
not hurt even existing Windows users (if "HEAD:t\README.txt" named
the same blob as "HEAD:t/README.txt" on a platform, doubling the
backslashes in the output would have made quite a lot of damage, but
I do not think we allow backslashes to name tree paths).

By the way, there is another use of obj_name in batch_object_write()
that can show whatever byte in it literally to the output.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-12  0:11         ` Junio C Hamano
@ 2022-12-12 11:34           ` Toon Claes
  2022-12-12 22:09             ` Junio C Hamano
  2022-12-13 15:06           ` Phillip Wood
  1 sibling, 1 reply; 32+ messages in thread
From: Toon Claes @ 2022-12-12 11:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Toon Claes, git


Junio C Hamano <gitster@pobox.com> writes:

> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> Without "-z" you cannot pass object names that contain newlines so not
>> quoting the output does not cause a problem. We could start quoting
>> the object name without "-z" but we'd be changing the output without a
>> huge benefit.
>
> That's fair.  The next question is from a devil's advocate:
> is switching to the full cquote the best thing to do?
>
> If we were using the full cquote from the very beginning, of course
> it is, simply because that is what is used in all other places in
> Git.  Using the full cquote does mean a LF byte will be protected
> (i.e. instead of shown literally in the middle of other letters
> around LF, "other\nletters around LF" would be shown), but pathnames
> with backslashes and double quotes in them that have been shown
> without problems would be shown differently and will break existing
> parsers, which are written lazily with the assumption that they are
> perfectly happy to be "simple" thans to not having to deal with LF
> (because in their environment a path with LF in it do not matter).
>
> A bit safer thing to do is to replace LF (and not any other bytes
> that would be quoted with full cquote) in the path given in these
> messages with something else (like NUL to truncate the output
> there).

So object name "HEAD:other\nletters around LF" would give the error
message "other missing"? That error message would also occur when the
user does not provide -z. I think that might be confusing.

> As these answers are given in order, the object names are
> not absolutely needed to identify and match up the input and the
> output, and properly written parsers would be prepared to see a
> response with an object name that it did not request and handle it
> sanely, such a change may not break such a parser for a path with
> any byte that are modified with full cquote.

Yes, the answers are returned in order, so personally I don't care too
much about the returned object name format. I even would be fine with a
generic error message that omits the input name, for example "object
missing". As long as it's clear that the requested object is not found.

For your information, there is an extreme edge case a user could fake an
object, and that's what we want to avoid as well. For example the
command (line break included):

printf "aabbccddeeff00112233445566778899aabbccdd blob 26
this object is not" | git cat-file --batch -z

Would print:

aabbccddeeff00112233445566778899aabbccdd blob 26
this object is not missing

This is perfectly valid git-cat-file output. Luckily I don't see any way
how this can be abused. Generally I think it's a good idea to not return
the input as-is in any situation. We could only replace newlines, but
cquoting already sanitizes the input, so why not use that?

> The above is with a devil's adovocate hat on, and I do not care too
> much, as I do not think butchering backslash with full cquote would
> not hurt even existing Windows users (if "HEAD:t\README.txt" named
> the same blob as "HEAD:t/README.txt" on a platform, doubling the
> backslashes in the output would have made quite a lot of damage, but
> I do not think we allow backslashes to name tree paths).

> By the way, there is another use of obj_name in batch_object_write()
> that can show whatever byte in it literally to the output.

Ah thanks! I will include in the next version, when we reach a consensus
on when or what to cquote.

--
Toon

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-12 11:34           ` Toon Claes
@ 2022-12-12 22:09             ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-12-12 22:09 UTC (permalink / raw)
  To: Toon Claes; +Cc: Phillip Wood, Toon Claes, git

Toon Claes <toon@to1.studio> writes:

> Ah thanks! I will include in the next version, when we reach a consensus
> on when or what to cquote.

I think we already have consensus on when---"-z" may benefit,
otherwise we should not quote.  I am not sure what you mean by "what
to cquote"---are there things other than input object names that may
be ambiguous and require quoting?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-12  0:11         ` Junio C Hamano
  2022-12-12 11:34           ` Toon Claes
@ 2022-12-13 15:06           ` Phillip Wood
  2022-12-14  8:29             ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Phillip Wood @ 2022-12-13 15:06 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood; +Cc: Toon Claes, git

On 12/12/2022 00:11, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> Without "-z" you cannot pass object names that contain newlines so not
>> quoting the output does not cause a problem. We could start quoting
>> the object name without "-z" but we'd be changing the output without a
>> huge benefit.
> 
> That's fair.  The next question is from a devil's advocate:
> is switching to the full cquote the best thing to do?
> 
> If we were using the full cquote from the very beginning, of course
> it is, simply because that is what is used in all other places in
> Git.  Using the full cquote does mean a LF byte will be protected
> (i.e. instead of shown literally in the middle of other letters
> around LF, "other\nletters around LF" would be shown), but pathnames
> with backslashes and double quotes in them that have been shown
> without problems would be shown differently and will break existing
> parsers, which are written lazily with the assumption that they are
> perfectly happy to be "simple" thans to not having to deal with LF
> (because in their environment a path with LF in it do not matter).
> 
> A bit safer thing to do is to replace LF (and not any other bytes
> that would be quoted with full cquote) in the path given in these
> messages with something else (like NUL to truncate the output
> there).  As these answers are given in order, the object names are
> not absolutely needed to identify and match up the input and the
> output, and properly written parsers would be prepared to see a
> response with an object name that it did not request and handle it
> sanely, such a change may not break such a parser for a path with
> any byte that are modified with full cquote.
> 
> The above is with a devil's adovocate hat on, and I do not care too
> much, as I do not think butchering backslash with full cquote would
> not hurt even existing Windows users (if "HEAD:t\README.txt" named
> the same blob as "HEAD:t/README.txt" on a platform, doubling the
> backslashes in the output would have made quite a lot of damage, but
> I do not think we allow backslashes to name tree paths).

By default quoting also affects names that contain non-ascii characters. 
If we're worried about that we could only quote names that contain LF 
but as you point out the answers are given in order so I don't think 
there should be a problem with calling cquote unconditionally. (As this 
option has only existed for one release I suspect there aren't that many 
users yet as well)

Best Wishes

Phillip

> By the way, there is another use of obj_name in batch_object_write()
> that can show whatever byte in it literally to the output.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-13 15:06           ` Phillip Wood
@ 2022-12-14  8:29             ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-12-14  8:29 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Toon Claes, git

Phillip Wood <phillip.wood123@gmail.com> writes:

> By default quoting also affects names that contain non-ascii
> characters. If we're worried about that we could only quote names that
> contain LF but as you point out the answers are given in order so I
> don't think there should be a problem with calling cquote
> unconditionally. (As this option has only existed for one release I
> suspect there aren't that many users yet as well)

Good point.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-09 19:33   ` Phillip Wood
  2022-12-09 23:58     ` Junio C Hamano
@ 2022-12-20  5:31     ` Toon Claes
  2022-12-20 10:18       ` Phillip Wood
  1 sibling, 1 reply; 32+ messages in thread
From: Toon Claes @ 2022-12-20  5:31 UTC (permalink / raw)
  To: phillip.wood; +Cc: Toon Claes, git


Phillip Wood <phillip.wood123@gmail.com> writes:

> Thanks for working on this. I'd previously suggested NUL terminating the output
> of "git cat-file -z" to avoid this problem [1]
>
> [1]
> https://lore.kernel.org/git/66b71194-ad0e-18d0-e43b-71e5c47ba111@gmail.com/

What happened to this proposal? I don't see any replies to that. That's
a bit sad, because it would have been nice to have it this behavior from
the start.

> but quoting the object name is a better solution.

I would not say it's a better solution, but it's a less invasive
solution that /minimizes/ breaking changes. Ideally I'd like to have NUL
terminated output for "git cat-file -z". In a success situation I
assume this would return:

    <oid> SP <type> SP <size> NUL
    <contents> NUL

In a failure situation something like:

    <object> SP missing NUL

So when you pass -z you can keep reading until the first NUL and then
you'll know if you should read for contents as well.

Would you consider change behavior to this now?

--
Toon

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-20  5:31     ` Toon Claes
@ 2022-12-20 10:18       ` Phillip Wood
  2022-12-21 12:42         ` Toon Claes
  0 siblings, 1 reply; 32+ messages in thread
From: Phillip Wood @ 2022-12-20 10:18 UTC (permalink / raw)
  To: Toon Claes; +Cc: Toon Claes, git, Junio C Hamano, Taylor Blau

Hi Toon

On 20/12/2022 05:31, Toon Claes wrote:
> 
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> Thanks for working on this. I'd previously suggested NUL terminating the output
>> of "git cat-file -z" to avoid this problem [1]
>>
>> [1]
>> https://lore.kernel.org/git/66b71194-ad0e-18d0-e43b-71e5c47ba111@gmail.com/
> 
> What happened to this proposal? I don't see any replies to that. That's
> a bit sad, because it would have been nice to have it this behavior from
> the start.

Yes it would have been nice, unfortunately it feel through the cracks 
due to a combination of unfortunate timing and lack of time. I think the 
patch was probably in next by the time I realized the problem and wrote 
that email. Taylor was on holiday at the time and then I went away 
around the time he got back. I know it was on his todo list but I think 
he was busy catching up from being away getting ready for git merge. I 
had other things I was working on and unfortunately didn't get round to 
following it up.

>> but quoting the object name is a better solution.
> 
> I would not say it's a better solution, but it's a less invasive
> solution that /minimizes/ breaking changes. Ideally I'd like to have NUL
> terminated output for "git cat-file -z". In a success situation I
> assume this would return:
> 
>      <oid> SP <type> SP <size> NUL
>      <contents> NUL
> 
> In a failure situation something like:
> 
>      <object> SP missing NUL
> 
> So when you pass -z you can keep reading until the first NUL and then
> you'll know if you should read for contents as well.
> 
> Would you consider change behavior to this now?

Hmm I'm not sure (and luckily it isn't up to me to take the final 
decision!). I really don't know how many people are using "-z" I suspect 
it is not many and so the fallout wouldn't be too bad but we'd still be 
inconveniencing some users. I theory we could so "sorry we made a 
mistake when implementing this feature and have decided to change it" 
but we have a solution in your patch which hopefully does not break any 
users (I say hopefully because think if one is careful and keep track of 
which responses you've read it is possible to detect missing objects now 
even if their name contains a new line but it will be messy and probably 
slightly inefficient but anyone doing that will notice the change in 
output).

Overall I'd say it is tempting but risky and inconvenient to any 
existing users of "-z" (assuming someone else is actually using it). 
Quoting the object name is definitively the safer option at this point.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/1] cat-file: quote-format name in error when using -z
  2022-12-20 10:18       ` Phillip Wood
@ 2022-12-21 12:42         ` Toon Claes
  0 siblings, 0 replies; 32+ messages in thread
From: Toon Claes @ 2022-12-21 12:42 UTC (permalink / raw)
  To: phillip.wood; +Cc: Toon Claes, git, Junio C Hamano, Taylor Blau


Phillip Wood <phillip.wood123@gmail.com> writes:

> Yes it would have been nice, unfortunately it feel through the cracks due to a
> combination of unfortunate timing and lack of time. I think the patch was
> probably in next by the time I realized the problem and wrote that email. Taylor
> was on holiday at the time and then I went away around the time he got back. I
> know it was on his todo list but I think he was busy catching up from being away
> getting ready for git merge. I had other things I was working on and
> unfortunately didn't get round to following it up.

Oh sorry, I didn't want to put blame on anyone. Things happen. I just
wasn't sure I missed anything.

> Overall I'd say it is tempting but risky and inconvenient to any existing users
> of "-z" (assuming someone else is actually using it). Quoting the object name is
> definitively the safer option at this point.

I assume Taylor would be one of the consumers of output of "-z", so
thanks for putting him in Cc to get his take on this.

--
Toon

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/1] cat-file: quote-format name in error when using -z
  2022-12-09 15:00 [PATCH 0/1] cat-file: quote-format name in error when using -z Toon Claes
  2022-12-09 15:00 ` [PATCH 1/1] " Toon Claes
@ 2023-01-05  6:24 ` Toon Claes
  2023-01-05  6:24   ` [PATCH v2 1/1] " Toon Claes
  2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
  1 sibling, 2 replies; 32+ messages in thread
From: Toon Claes @ 2023-01-05  6:24 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Phillip Wood

This is the second revision of the patch to quote-format the filename in the
error message of git-cat-file(1), when using option `-z`.

Changes since v1:
* Added the same fix to batch_object_write().
* Removed FUNNYNAMES from the added test.

Toon Claes (1):
  cat-file: quote-format name in error when using -z

 builtin/cat-file.c  | 19 +++++++++++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 27 insertions(+)

Range-diff against v1:
1:  c13602e28e ! 1:  4f39eb0719 cat-file: quote-format name in error when using -z
    @@ builtin/cat-file.c
      #include "object-store.h"
      #include "promisor-remote.h"
      #include "mailmap.h"
    +@@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
    + 						       &data->oid, &data->info,
    + 						       OBJECT_INFO_LOOKUP_REPLACE);
    + 		if (ret < 0) {
    ++			struct strbuf quoted = STRBUF_INIT;
    ++
    ++			if (opt->nul_terminated &&
    ++			    obj_name) {
    ++				quote_c_style(obj_name, &quoted, NULL, 0);
    ++				obj_name = quoted.buf;
    ++			}
    ++
    + 			printf("%s missing\n",
    + 			       obj_name ? obj_name : oid_to_hex(&data->oid));
    ++			strbuf_release(&quoted);
    + 			fflush(stdout);
    + 			return;
    + 		}
     @@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
      	result = get_oid_with_context(the_repository, obj_name,
      				      flags, &data->oid, &ctx);
    @@ t/t1006-cat-file.sh: test_expect_success FUNNYNAMES '--batch-check, -z with newl
      	test_cmp expect actual
      '

    -+test_expect_success FUNNYNAMES '--batch-check, -z with newline in non-existent named object' '
    ++test_expect_success '--batch-check, -z with newline in non-existent named object' '
     +	printf "HEAD:newline${LF}embedded" >in &&
     +	git cat-file --batch-check -z <in >actual &&
     +
--
2.38.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/1] cat-file: quote-format name in error when using -z
  2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
@ 2023-01-05  6:24   ` Toon Claes
  2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
  1 sibling, 0 replies; 32+ messages in thread
From: Toon Claes @ 2023-01-05  6:24 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Phillip Wood

Since it's supported to have NUL-delimited input, introduced in
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), it's possible to pass paths that contain newlines. This
works great when the object is found, but when it's not, the input path
is returned in the error message. Because this can contain newlines, the
error message might get spread over multiple lines, making it harder to
machine-parse this error message.

With this change, the input is quote-formatted in the error message, if
needed. This ensures the error message is always on a single line and
makes parsing the error more straightforward.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 builtin/cat-file.c  | 19 +++++++++++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 27 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b3be58b1fb..6d7d5e2188 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@
 #include "tree-walk.h"
 #include "oid-array.h"
 #include "packfile.h"
+#include "quote.h"
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "mailmap.h"
@@ -439,8 +440,17 @@ static void batch_object_write(const char *obj_name,
 						       &data->oid, &data->info,
 						       OBJECT_INFO_LOOKUP_REPLACE);
 		if (ret < 0) {
+			struct strbuf quoted = STRBUF_INIT;
+
+			if (opt->nul_terminated &&
+			    obj_name) {
+				quote_c_style(obj_name, &quoted, NULL, 0);
+				obj_name = quoted.buf;
+			}
+
 			printf("%s missing\n",
 			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			strbuf_release(&quoted);
 			fflush(stdout);
 			return;
 		}
@@ -475,6 +485,13 @@ static void batch_one_object(const char *obj_name,
 	result = get_oid_with_context(the_repository, obj_name,
 				      flags, &data->oid, &ctx);
 	if (result != FOUND) {
+		struct strbuf quoted = STRBUF_INIT;
+
+		if (opt->nul_terminated) {
+			quote_c_style(obj_name, &quoted, NULL, 0);
+			obj_name = quoted.buf;
+		}
+
 		switch (result) {
 		case MISSING_OBJECT:
 			printf("%s missing\n", obj_name);
@@ -499,6 +516,8 @@ static void batch_one_object(const char *obj_name,
 			       result);
 			break;
 		}
+
+		strbuf_release(&quoted);
 		fflush(stdout);
 		return;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 23b8942edb..11344a35bf 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -447,6 +447,14 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
 	test_cmp expect actual
 '

+test_expect_success '--batch-check, -z with newline in non-existent named object' '
+	printf "HEAD:newline${LF}embedded" >in &&
+	git cat-file --batch-check -z <in >actual &&
+
+	printf "\"HEAD:newline\\\\nembedded\" missing\n" >expect &&
+	test_cmp expect actual
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
--
2.38.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 0/1] cat-file: quote-format name in error when using -z
  2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
  2023-01-05  6:24   ` [PATCH v2 1/1] " Toon Claes
@ 2023-01-16 19:07   ` Toon Claes
  2023-01-16 19:07     ` [PATCH v3 1/1] " Toon Claes
  2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
  1 sibling, 2 replies; 32+ messages in thread
From: Toon Claes @ 2023-01-16 19:07 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Phillip Wood

Hi there,

This third revision of this patch fixes broken test case that was added in
t1006.

Toon Claes (1):
  cat-file: quote-format name in error when using -z

 builtin/cat-file.c  | 19 +++++++++++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 27 insertions(+)

Range-diff against v2:
1:  4f39eb0719 ! 1:  a56932572c cat-file: quote-format name in error when using -z
    @@ t/t1006-cat-file.sh: test_expect_success FUNNYNAMES '--batch-check, -z with newl
      '

     +test_expect_success '--batch-check, -z with newline in non-existent named object' '
    -+	printf "HEAD:newline${LF}embedded" >in &&
    ++	printf "HEAD:newline${LF}missing" >in &&
     +	git cat-file --batch-check -z <in >actual &&
     +
    -+	printf "\"HEAD:newline\\\\nembedded\" missing\n" >expect &&
    ++	printf "\"HEAD:newline\\\\nmissing\" missing\n" >expect &&
     +	test_cmp expect actual
     +'
     +
--
2.39.0.rc0.57.g2e71cbbddd.dirty

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/1] cat-file: quote-format name in error when using -z
  2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
@ 2023-01-16 19:07     ` Toon Claes
  2023-01-17 15:24       ` Phillip Wood
  2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
  1 sibling, 1 reply; 32+ messages in thread
From: Toon Claes @ 2023-01-16 19:07 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Phillip Wood

Since it's supported to have NUL-delimited input, introduced in
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), it's possible to pass paths that contain newlines. This
works great when the object is found, but when it's not, the input path
is returned in the error message. Because this can contain newlines, the
error message might get spread over multiple lines, making it harder to
machine-parse this error message.

With this change, the input is quote-formatted in the error message, if
needed. This ensures the error message is always on a single line and
makes parsing the error more straightforward.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 builtin/cat-file.c  | 19 +++++++++++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 27 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index cc17635e76..b678f69773 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@
 #include "tree-walk.h"
 #include "oid-array.h"
 #include "packfile.h"
+#include "quote.h"
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "mailmap.h"
@@ -455,8 +456,17 @@ static void batch_object_write(const char *obj_name,
 						       &data->oid, &data->info,
 						       OBJECT_INFO_LOOKUP_REPLACE);
 		if (ret < 0) {
+			struct strbuf quoted = STRBUF_INIT;
+
+			if (opt->nul_terminated &&
+			    obj_name) {
+				quote_c_style(obj_name, &quoted, NULL, 0);
+				obj_name = quoted.buf;
+			}
+
 			printf("%s missing\n",
 			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			strbuf_release(&quoted);
 			fflush(stdout);
 			return;
 		}
@@ -503,6 +513,13 @@ static void batch_one_object(const char *obj_name,
 	result = get_oid_with_context(the_repository, obj_name,
 				      flags, &data->oid, &ctx);
 	if (result != FOUND) {
+		struct strbuf quoted = STRBUF_INIT;
+
+		if (opt->nul_terminated) {
+			quote_c_style(obj_name, &quoted, NULL, 0);
+			obj_name = quoted.buf;
+		}
+
 		switch (result) {
 		case MISSING_OBJECT:
 			printf("%s missing\n", obj_name);
@@ -527,6 +544,8 @@ static void batch_one_object(const char *obj_name,
 			       result);
 			break;
 		}
+
+		strbuf_release(&quoted);
 		fflush(stdout);
 		return;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 23b8942edb..e8ce20e739 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -447,6 +447,14 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
 	test_cmp expect actual
 '

+test_expect_success '--batch-check, -z with newline in non-existent named object' '
+	printf "HEAD:newline${LF}missing" >in &&
+	git cat-file --batch-check -z <in >actual &&
+
+	printf "\"HEAD:newline\\\\nmissing\" missing\n" >expect &&
+	test_cmp expect actual
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
--
2.39.0.rc0.57.g2e71cbbddd.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/1] cat-file: quote-format name in error when using -z
  2023-01-16 19:07     ` [PATCH v3 1/1] " Toon Claes
@ 2023-01-17 15:24       ` Phillip Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Phillip Wood @ 2023-01-17 15:24 UTC (permalink / raw)
  To: Toon Claes, git; +Cc: Phillip Wood

Hi Toon

On 16/01/2023 19:07, Toon Claes wrote:
> Since it's supported to have NUL-delimited input, introduced in
> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
> 2022-07-22), it's possible to pass paths that contain newlines. This
> works great when the object is found, but when it's not, the input path
> is returned in the error message. Because this can contain newlines, the
> error message might get spread over multiple lines, making it harder to
> machine-parse this error message.
> 
> With this change, the input is quote-formatted in the error message, if
> needed. This ensures the error message is always on a single line and
> makes parsing the error more straightforward.

This looks good but it would be nice to have test coverage for 
batch_one_object() as well as batch_object_write()

Best Wishes

Phillip

> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
>   builtin/cat-file.c  | 19 +++++++++++++++++++
>   t/t1006-cat-file.sh |  8 ++++++++
>   2 files changed, 27 insertions(+)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index cc17635e76..b678f69773 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -14,6 +14,7 @@
>   #include "tree-walk.h"
>   #include "oid-array.h"
>   #include "packfile.h"
> +#include "quote.h"
>   #include "object-store.h"
>   #include "promisor-remote.h"
>   #include "mailmap.h"
> @@ -455,8 +456,17 @@ static void batch_object_write(const char *obj_name,
>   						       &data->oid, &data->info,
>   						       OBJECT_INFO_LOOKUP_REPLACE);
>   		if (ret < 0) {
> +			struct strbuf quoted = STRBUF_INIT;
> +
> +			if (opt->nul_terminated &&
> +			    obj_name) {
> +				quote_c_style(obj_name, &quoted, NULL, 0);
> +				obj_name = quoted.buf;
> +			}
> +
>   			printf("%s missing\n",
>   			       obj_name ? obj_name : oid_to_hex(&data->oid));
> +			strbuf_release(&quoted);
>   			fflush(stdout);
>   			return;
>   		}
> @@ -503,6 +513,13 @@ static void batch_one_object(const char *obj_name,
>   	result = get_oid_with_context(the_repository, obj_name,
>   				      flags, &data->oid, &ctx);
>   	if (result != FOUND) {
> +		struct strbuf quoted = STRBUF_INIT;
> +
> +		if (opt->nul_terminated) {
> +			quote_c_style(obj_name, &quoted, NULL, 0);
> +			obj_name = quoted.buf;
> +		}
> +
>   		switch (result) {
>   		case MISSING_OBJECT:
>   			printf("%s missing\n", obj_name);
> @@ -527,6 +544,8 @@ static void batch_one_object(const char *obj_name,
>   			       result);
>   			break;
>   		}
> +
> +		strbuf_release(&quoted);
>   		fflush(stdout);
>   		return;
>   	}
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 23b8942edb..e8ce20e739 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -447,6 +447,14 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
>   	test_cmp expect actual
>   '
> 
> +test_expect_success '--batch-check, -z with newline in non-existent named object' '
> +	printf "HEAD:newline${LF}missing" >in &&
> +	git cat-file --batch-check -z <in >actual &&
> +
> +	printf "\"HEAD:newline\\\\nmissing\" missing\n" >expect &&
> +	test_cmp expect actual
> +'
> +
>   batch_command_multiple_info="info $hello_sha1
>   info $tree_sha1
>   info $commit_sha1
> --
> 2.39.0.rc0.57.g2e71cbbddd.dirty

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 0/2] cat-file: quote-format name in error when using -z
  2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
  2023-01-16 19:07     ` [PATCH v3 1/1] " Toon Claes
@ 2023-03-03 19:17     ` Toon Claes
  2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
                         ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Toon Claes @ 2023-03-03 19:17 UTC (permalink / raw)
  To: toon; +Cc: git, phillip.wood123

Hi,

This is the fourth revision of my patch to fix the error message for
git-cat-file(1) when using --batch and -z.

I was asked to provide test coverage for both code paths that produce this kind
of error message. Instead I've decided to extract the error formatting into a
separate function. For two reasons:

 * It deduplicates code.

 * The code path that was not tested is hit by --batch-all-objects. I was not
  able to set up a test that hits that code. I'm happy to write that test, if
  anyone can give me a pointer how to produce a "missing" error with that
  option.


Toon



Toon Claes (2):
  cat-file: extract printing batch error message into function
  cat-file: quote-format name in error when using -z

 builtin/cat-file.c  | 71 +++++++++++++++++++++++++++------------------
 t/t1006-cat-file.sh |  8 +++++
 2 files changed, 51 insertions(+), 28 deletions(-)

Range-diff against v3:
-:  ---------- > 1:  b2f4ce88f2 cat-file: extract printing batch error message into function
1:  a56932572c ! 2:  9e31796dc1 cat-file: quote-format name in error when using -z
    @@ builtin/cat-file.c
      #include "object-store.h"
      #include "promisor-remote.h"
      #include "mailmap.h"
    -@@ builtin/cat-file.c: static void batch_object_write(const char *obj_name,
    - 						       &data->oid, &data->info,
    - 						       OBJECT_INFO_LOOKUP_REPLACE);
    - 		if (ret < 0) {
    -+			struct strbuf quoted = STRBUF_INIT;
    -+
    -+			if (opt->nul_terminated &&
    -+			    obj_name) {
    -+				quote_c_style(obj_name, &quoted, NULL, 0);
    -+				obj_name = quoted.buf;
    -+			}
    -+
    - 			printf("%s missing\n",
    - 			       obj_name ? obj_name : oid_to_hex(&data->oid));
    -+			strbuf_release(&quoted);
    - 			fflush(stdout);
    - 			return;
    - 		}
    -@@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
    - 	result = get_oid_with_context(the_repository, obj_name,
    - 				      flags, &data->oid, &ctx);
    - 	if (result != FOUND) {
    +@@ builtin/cat-file.c: static void batch_print_error(const char *obj_name,
    + 			      enum get_oid_result result,
    + 			      struct batch_options* opt)
    + {
     +		struct strbuf quoted = STRBUF_INIT;
     +
    -+		if (opt->nul_terminated) {
    ++		if (opt->nul_terminated &&
    ++		    obj_name) {
     +			quote_c_style(obj_name, &quoted, NULL, 0);
     +			obj_name = quoted.buf;
     +		}
    @@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
      		switch (result) {
      		case MISSING_OBJECT:
      			printf("%s missing\n", obj_name);
    -@@ builtin/cat-file.c: static void batch_one_object(const char *obj_name,
    - 			       result);
    +@@ builtin/cat-file.c: static void batch_print_error(const char *obj_name,
      			break;
      		}
    -+
    -+		strbuf_release(&quoted);
      		fflush(stdout);
    - 		return;
    - 	}
    ++		strbuf_release(&quoted);
    + }
    +
    + /*

      ## t/t1006-cat-file.sh ##
     @@ t/t1006-cat-file.sh: test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
--
2.39.2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 1/2] cat-file: extract printing batch error message into function
  2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
@ 2023-03-03 19:17       ` Toon Claes
  2023-03-03 20:26         ` Junio C Hamano
  2023-03-03 19:17       ` [PATCH v4 2/2] " Toon Claes
  2023-03-03 20:14       ` [PATCH v4 0/2] " Junio C Hamano
  2 siblings, 1 reply; 32+ messages in thread
From: Toon Claes @ 2023-03-03 19:17 UTC (permalink / raw)
  To: toon; +Cc: git, phillip.wood123

There are two callsites that were formatting an error message in batch
mode if an object could not be found. We're about to make changes to
that and to avoid doing that twice, we extract this into a separate
function.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 builtin/cat-file.c | 61 +++++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index cc17635e76..0c47190f17 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -429,6 +429,37 @@ static void print_default_format(struct strbuf *scratch, struct expand_data *dat
 		    (uintmax_t)data->size);
 }
 
+static void batch_print_error(const char *obj_name,
+			      enum get_oid_result result,
+			      struct batch_options* opt)
+{
+		switch (result) {
+		case MISSING_OBJECT:
+			printf("%s missing\n", obj_name);
+			break;
+		case SHORT_NAME_AMBIGUOUS:
+			printf("%s ambiguous\n", obj_name);
+			break;
+		case DANGLING_SYMLINK:
+			printf("dangling %"PRIuMAX"\n%s\n",
+			       (uintmax_t)strlen(obj_name), obj_name);
+			break;
+		case SYMLINK_LOOP:
+			printf("loop %"PRIuMAX"\n%s\n",
+			       (uintmax_t)strlen(obj_name), obj_name);
+			break;
+		case NOT_DIR:
+			printf("notdir %"PRIuMAX"\n%s\n",
+			       (uintmax_t)strlen(obj_name), obj_name);
+			break;
+		default:
+			BUG("unknown get_sha1_with_context result %d\n",
+			       result);
+			break;
+		}
+		fflush(stdout);
+}
+
 /*
  * If "pack" is non-NULL, then "offset" is the byte offset within the pack from
  * which the object may be accessed (though note that we may also rely on
@@ -455,9 +486,7 @@ static void batch_object_write(const char *obj_name,
 						       &data->oid, &data->info,
 						       OBJECT_INFO_LOOKUP_REPLACE);
 		if (ret < 0) {
-			printf("%s missing\n",
-			       obj_name ? obj_name : oid_to_hex(&data->oid));
-			fflush(stdout);
+			batch_print_error(obj_name, MISSING_OBJECT, opt);
 			return;
 		}
 
@@ -503,31 +532,7 @@ static void batch_one_object(const char *obj_name,
 	result = get_oid_with_context(the_repository, obj_name,
 				      flags, &data->oid, &ctx);
 	if (result != FOUND) {
-		switch (result) {
-		case MISSING_OBJECT:
-			printf("%s missing\n", obj_name);
-			break;
-		case SHORT_NAME_AMBIGUOUS:
-			printf("%s ambiguous\n", obj_name);
-			break;
-		case DANGLING_SYMLINK:
-			printf("dangling %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
-			break;
-		case SYMLINK_LOOP:
-			printf("loop %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
-			break;
-		case NOT_DIR:
-			printf("notdir %"PRIuMAX"\n%s\n",
-			       (uintmax_t)strlen(obj_name), obj_name);
-			break;
-		default:
-			BUG("unknown get_sha1_with_context result %d\n",
-			       result);
-			break;
-		}
-		fflush(stdout);
+		batch_print_error(obj_name, result, opt);
 		return;
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/2] cat-file: quote-format name in error when using -z
  2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
  2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
@ 2023-03-03 19:17       ` Toon Claes
  2023-03-03 20:14       ` [PATCH v4 0/2] " Junio C Hamano
  2 siblings, 0 replies; 32+ messages in thread
From: Toon Claes @ 2023-03-03 19:17 UTC (permalink / raw)
  To: toon; +Cc: git, phillip.wood123

Since it's supported to have NUL-delimited input, introduced in
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), it's possible to pass paths that contain newlines. This
works great when the object is found, but when it's not, the input path
is returned in the error message. Because this can contain newlines, the
error message might get spread over multiple lines, making it harder to
machine-parse this error message.

With this change, the input is quote-formatted in the error message, if
needed. This ensures the error message is always on a single line and
makes parsing the error more straightforward.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 builtin/cat-file.c  | 10 ++++++++++
 t/t1006-cat-file.sh |  8 ++++++++
 2 files changed, 18 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0c47190f17..e0c7bfaa56 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@
 #include "tree-walk.h"
 #include "oid-array.h"
 #include "packfile.h"
+#include "quote.h"
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "mailmap.h"
@@ -433,6 +434,14 @@ static void batch_print_error(const char *obj_name,
 			      enum get_oid_result result,
 			      struct batch_options* opt)
 {
+		struct strbuf quoted = STRBUF_INIT;
+
+		if (opt->nul_terminated &&
+		    obj_name) {
+			quote_c_style(obj_name, &quoted, NULL, 0);
+			obj_name = quoted.buf;
+		}
+
 		switch (result) {
 		case MISSING_OBJECT:
 			printf("%s missing\n", obj_name);
@@ -458,6 +467,7 @@ static void batch_print_error(const char *obj_name,
 			break;
 		}
 		fflush(stdout);
+		strbuf_release(&quoted);
 }
 
 /*
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 2d875b17d8..6ac3881305 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -447,6 +447,14 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
 	test_cmp expect actual
 '
 
+test_expect_success '--batch-check, -z with newline in non-existent named object' '
+	printf "HEAD:newline${LF}missing" >in &&
+	git cat-file --batch-check -z <in >actual &&
+
+	printf "\"HEAD:newline\\\\nmissing\" missing\n" >expect &&
+	test_cmp expect actual
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 0/2] cat-file: quote-format name in error when using -z
  2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
  2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
  2023-03-03 19:17       ` [PATCH v4 2/2] " Toon Claes
@ 2023-03-03 20:14       ` Junio C Hamano
  2 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2023-03-03 20:14 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, phillip.wood123

Toon Claes <toon@iotcl.com> writes:

> I was asked to provide test coverage for both code paths that produce this kind
> of error message. Instead I've decided to ...

Is code refactoring a substitute for making sure we have adequate
test coverage?  If not, "Instead I've decided to" is a strange thing
to say at this point in the cover letter.

Relative to the previous round, you cleaned up the code by
introducing a helper function to show error messages, which gave us
reduced duplication of the code.  This is a very good thing and is
worth mentioning.

You however didn't get around to add adequate test coverage to the
resulting code, which was one thing that the previous round was
found lacking, and you can use help to improve these patches in this
area, to help them get merged.  Some of the error cases happen only
in a corrupt repository that lacks objects that are expected to be
there (e.g. a tree points at a blob in a non-lazy clone and blob is
missing), so such a test may have to deliberately corrupt the test
repository by removing .git/object/??/* files.  

Hopefully we'll hear from those who are willing to help.  Thanks for
working on this topic.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 1/2] cat-file: extract printing batch error message into function
  2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
@ 2023-03-03 20:26         ` Junio C Hamano
  2023-03-03 23:14           ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-03-03 20:26 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, phillip.wood123

Toon Claes <toon@iotcl.com> writes:

> There are two callsites that were formatting an error message in batch
> mode if an object could not be found. We're about to make changes to
> that and to avoid doing that twice, we extract this into a separate
> function.
>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ...
> +static void batch_print_error(const char *obj_name,
> +			      enum get_oid_result result,
> +			      struct batch_options* opt)
> +{
> +		switch (result) {

The body of this function is indented way too deep.  You should lose
one HT at the beginning from all its lines.

> +		case MISSING_OBJECT:
> +			printf("%s missing\n", obj_name);
> +			break;

This one expects that obj_name is always usable as a string.

> @@ -455,9 +486,7 @@ static void batch_object_write(const char *obj_name,
>  						       &data->oid, &data->info,
>  						       OBJECT_INFO_LOOKUP_REPLACE);
>  		if (ret < 0) {
> -			printf("%s missing\n",
> -			       obj_name ? obj_name : oid_to_hex(&data->oid));
> -			fflush(stdout);

This caller used to be prepared for the case where obj_name is NULL
and used the hexadecimal object name stored in data->oid in such a
case, but now ...

> +			batch_print_error(obj_name, MISSING_OBJECT, opt);

... the updated caller assumes that obj_name is always safe to feed
to printf("%s") above.

Maybe obj_name is always not-NULL when the control reaches this
point in which case the new code would be safe, but if that is the
case, the proposed log message should explain how that is true to
justify this change.

As batch_object_cb() makes a call to batch_object_write() with
obj_name set to NULL, I do not think this change is defensible,
though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 1/2] cat-file: extract printing batch error message into function
  2023-03-03 20:26         ` Junio C Hamano
@ 2023-03-03 23:14           ` Junio C Hamano
  2023-05-10 19:01             ` [PATCH v5 0/1] cat-file: quote-format name in error when using -z Toon Claes
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-03-03 23:14 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, phillip.wood123

Junio C Hamano <gitster@pobox.com> writes:

> As batch_object_cb() makes a call to batch_object_write() with
> obj_name set to NULL, I do not think this change is defensible,
> though.

This indeed seems to break t5313 when queued on top of 'master'; it
tries to run "git cat-file --batch-all-objects --batch-check" and
hits the exact codepath where a missing object is sent to the error
codepath without obj_name set to anything.

I guess we now have an existing test that you can mimic that
exhibits "missing" error?  I do not know offhand if this test
already qualifies as the test coverage Phillip wanted to make sure
exists.

..... >8 .......... >8 .......... >8 .......... >8 .......... >8 .....

expecting success of 5313.3 'pack/index object count mismatch': 
	do_pack $object &&
	munge $pack 8 "\377\0\0\0" &&
	clear_base &&

	# We enumerate the objects from the completely-fine
	# .idx, but notice later that the .pack is bogus
	# and fail to show any data.
	echo "$object missing" >expect &&
	git cat-file --batch-all-objects --batch-check >actual &&
	test_cmp expect actual &&

	# ...and here fail to load the object (without segfaulting),
	# but fallback to a good copy if available.
	test_must_fail git cat-file blob $object &&
	restore_base &&
	git cat-file blob $object >actual &&
	test_cmp file actual &&

	# ...and make sure that index-pack --verify, which has its
	# own reading routines, does not segfault.
	test_must_fail git index-pack --verify $pack

4+0 records in
4+0 records out
4 bytes copied, 0.000119221 s, 33.6 kB/s
error: packfile .git/objects/pack/pack-67be769e2843d598c78218852612520795998892.pack claims to have 4278190080 objects while index indicates 1 objects
error: packfile .git/objects/pack/pack-67be769e2843d598c78218852612520795998892.pack claims to have 4278190080 objects while index indicates 1 objects
--- expect	2023-03-03 23:11:37.504011940 +0000
+++ actual	2023-03-03 23:11:37.508012250 +0000
@@ -1 +1 @@
-fff0a2476aa5c8e60a3ef21cfc66e0cc670920be missing
+(null) missing
not ok 3 - pack/index object count mismatch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5 0/1] cat-file: quote-format name in error when using -z
  2023-03-03 23:14           ` Junio C Hamano
@ 2023-05-10 19:01             ` Toon Claes
  2023-05-10 19:01               ` [PATCH v5 1/1] " Toon Claes
  0 siblings, 1 reply; 32+ messages in thread
From: Toon Claes @ 2023-05-10 19:01 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Toon Claes

This is the fifth revision of this patch. The code changes are identical to v3,
but there was one comment[1] about adding test coverage for
batch_object_write(). In v4 I tried another approach, but that didn't work well.
So for this version I'm just adding the test to version 3 as requested. See
below the range-diff against v3, ignoring v4 ever happened.

Kind regards,
Toon

[1]: https://public-inbox.org/git/2a2a46f0-a9bc-06a6-72e1-28800518777c@dunelm.org.uk/

Toon Claes (1):
  cat-file: quote-format name in error when using -z

 builtin/cat-file.c  | 19 +++++++++++++++++++
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

Range-diff against v3:
1:  16f7a123d8 ! 1:  14bccfa9fd cat-file: quote-format name in error when using -z
    @@ t/t1006-cat-file.sh: test_expect_success FUNNYNAMES '--batch-check, -z with newl
     +	printf "\"HEAD:newline\\\\nmissing\" missing\n" >expect &&
     +	test_cmp expect actual
     +'
    ++
    ++test_expect_success FUNNYNAMES '--batch-check, -z with missing object having newline in name' '
    ++	git init missing-object-newline &&
    ++	(
    ++		cd missing-object-newline &&
    ++		file="newline${LF}embedded" &&
    ++		echo_without_newline "hello" > $file &&
    ++		git add "$file" &&
    ++		git commit -m "file with newline embedded" &&
    ++		test_tick &&
    ++
    ++		sha1=$(git rev-parse HEAD:"$file") &&
    ++		rm .git/objects/$(test_oid_to_path $sha1) &&
    ++		printf "HEAD:$file" >in &&
    ++		git cat-file --batch-check -z <in >actual &&
    ++		printf "\"HEAD:newline\\\\nembedded\" missing\n" >expect &&
    ++		test_cmp expect actual
    ++	)
    ++'
     +
      batch_command_multiple_info="info $hello_sha1
      info $tree_sha1
--
2.40.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-10 19:01             ` [PATCH v5 0/1] cat-file: quote-format name in error when using -z Toon Claes
@ 2023-05-10 19:01               ` Toon Claes
  2023-05-10 20:13                 ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Toon Claes @ 2023-05-10 19:01 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Toon Claes

Since it's supported to have NUL-delimited input, introduced in
db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
2022-07-22), it's possible to pass paths that contain newlines. This
works great when the object is found, but when it's not, the input path
is returned in the error message. Because this can contain newlines, the
error message might get spread over multiple lines, making it harder to
machine-parse this error message.

With this change, the input is quote-formatted in the error message, if
needed. This ensures the error message is always on a single line and
makes parsing the error more straightforward.

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 builtin/cat-file.c  | 19 +++++++++++++++++++
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0bafc14e6c..0ff31c4593 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -25,6 +25,7 @@
 #include "object-store.h"
 #include "replace-object.h"
 #include "promisor-remote.h"
+#include "quote.h"
 #include "mailmap.h"
 #include "write-or-die.h"
 
@@ -470,8 +471,17 @@ static void batch_object_write(const char *obj_name,
 						       &data->oid, &data->info,
 						       OBJECT_INFO_LOOKUP_REPLACE);
 		if (ret < 0) {
+			struct strbuf quoted = STRBUF_INIT;
+
+			if (opt->nul_terminated &&
+			    obj_name) {
+				quote_c_style(obj_name, &quoted, NULL, 0);
+				obj_name = quoted.buf;
+			}
+
 			printf("%s missing\n",
 			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			strbuf_release(&quoted);
 			fflush(stdout);
 			return;
 		}
@@ -518,6 +528,13 @@ static void batch_one_object(const char *obj_name,
 	result = get_oid_with_context(the_repository, obj_name,
 				      flags, &data->oid, &ctx);
 	if (result != FOUND) {
+		struct strbuf quoted = STRBUF_INIT;
+
+		if (opt->nul_terminated) {
+			quote_c_style(obj_name, &quoted, NULL, 0);
+			obj_name = quoted.buf;
+		}
+
 		switch (result) {
 		case MISSING_OBJECT:
 			printf("%s missing\n", obj_name);
@@ -542,6 +559,8 @@ static void batch_one_object(const char *obj_name,
 			       result);
 			break;
 		}
+
+		strbuf_release(&quoted);
 		fflush(stdout);
 		return;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8eac74b59c..03c1bfb86b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -447,6 +447,33 @@ test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
 	test_cmp expect actual
 '
 
+test_expect_success '--batch-check, -z with newline in non-existent named object' '
+	printf "HEAD:newline${LF}missing" >in &&
+	git cat-file --batch-check -z <in >actual &&
+
+	printf "\"HEAD:newline\\\\nmissing\" missing\n" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES '--batch-check, -z with missing object having newline in name' '
+	git init missing-object-newline &&
+	(
+		cd missing-object-newline &&
+		file="newline${LF}embedded" &&
+		echo_without_newline "hello" > $file &&
+		git add "$file" &&
+		git commit -m "file with newline embedded" &&
+		test_tick &&
+
+		sha1=$(git rev-parse HEAD:"$file") &&
+		rm .git/objects/$(test_oid_to_path $sha1) &&
+		printf "HEAD:$file" >in &&
+		git cat-file --batch-check -z <in >actual &&
+		printf "\"HEAD:newline\\\\nembedded\" missing\n" >expect &&
+		test_cmp expect actual
+	)
+'
+
 batch_command_multiple_info="info $hello_sha1
 info $tree_sha1
 info $commit_sha1
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-10 19:01               ` [PATCH v5 1/1] " Toon Claes
@ 2023-05-10 20:13                 ` Junio C Hamano
  2023-05-12  8:54                   ` Toon Claes
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-05-10 20:13 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Phillip Wood

Toon Claes <toon@iotcl.com> writes:

> Since it's supported to have NUL-delimited input, introduced in
> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
> 2022-07-22), it's possible to pass paths that contain newlines.

It has been a while since I saw this patch the last time, and it did
not immediately click to me how "pass paths" relates to passing
object names, which is what "--batch-check" takes.  Perhaps

    "cat-file --batch-check" may be fed object names of blobs and
    trees as "<commit>:<path>", and with its "-z" option, it is
    possible to feed <commit> or <path> with newlines in it.

or something?

> This
> works great when the object is found, but when it's not, the input path
> is returned in the error message. Because this can contain newlines, the
> error message might get spread over multiple lines, making it harder to
> machine-parse this error message.

Good description.  I may suggest

    "the input path is returned" -> "the input is shown verbatim"

because <path> is not the only thing that can contain LF.  E.g.

    $ git show -s 'HEAD^{/introduced in
    > db9d67}'

can find the commit resulting from this patch, so

    $ printf "%s\0" 'HEAD^{/introduced in
    > db9d67}:builtin/cat-file.cc' |
    > git cat-file --batch-check -z 

would be an input record with newline in it, that has no newline in
the path.

> With this change, the input is quote-formatted in the error message, if
> needed. This ensures the error message is always on a single line and
> makes parsing the error more straightforward.

Drop "With this change, ..." and give a command to the codebase to
c-quote the object name in the output, e.g.

    C-quote the object name from the input in the error message as
    needed, to ensure that the error message is on a single line and
    ...

The other side of the coin, however, is that an existing project
that is sane enough not to have a path with LF in it, but is not
sane enough to avoid a path with double-quote in it, would now stop
being able to parse the error message for a missing path.

It is a regression, and we may argue that it is not a large enough
regression to block progress given by this patch, but if it is not
common enough to have funny characters in the paths then we wouldn't
be seeing this patch in the first place.  So I would prefer to see
that we at least admit that we are deliberately making this change
with a known regression.  Perhaps add something like

    Note that if a project already parses the error message
    correctly because it does not have paths with newlines, this is
    a breaking change if it has paths with characters that need
    c-quoting (like double quotes and backslashes) that are not
    newlines.  We consider that this is a small enough price to pay
    to allow newlines in paths because ...

and fill the "because ..." part is sensible.  I am not filling the
"because" part, simply because I do not offhand see any good excuse
to rob Peter to pay Paul in this case.

> @@ -470,8 +471,17 @@ static void batch_object_write(const char *obj_name,
>  						       &data->oid, &data->info,
>  						       OBJECT_INFO_LOOKUP_REPLACE);
>  		if (ret < 0) {
> +			struct strbuf quoted = STRBUF_INIT;
> +
> +			if (opt->nul_terminated &&
> +			    obj_name) {
> +				quote_c_style(obj_name, &quoted, NULL, 0);
> +				obj_name = quoted.buf;
> +			}
> +
>  			printf("%s missing\n",
>  			       obj_name ? obj_name : oid_to_hex(&data->oid));
> +			strbuf_release(&quoted);
>  			fflush(stdout);
>  			return;
>  		}

Interesting.

When running "--batch-all-objects", we do not have obj_name, and we
do not have anything to "quote", but the fallback label is the full
hexadecimal object name, and we do not have to worry about line
breaks.

OK.

> @@ -518,6 +528,13 @@ static void batch_one_object(const char *obj_name,
>  	result = get_oid_with_context(the_repository, obj_name,
>  				      flags, &data->oid, &ctx);
>  	if (result != FOUND) {
> +		struct strbuf quoted = STRBUF_INIT;
> +
> +		if (opt->nul_terminated) {
> +			quote_c_style(obj_name, &quoted, NULL, 0);
> +			obj_name = quoted.buf;
> +		}
> +
>  		switch (result) {
>  		case MISSING_OBJECT:
>  			printf("%s missing\n", obj_name);
> @@ -542,6 +559,8 @@ static void batch_one_object(const char *obj_name,
>  			       result);
>  			break;
>  		}
> +
> +		strbuf_release(&quoted);
>  		fflush(stdout);
>  		return;
>  	}

This is the side that runs without "--batch-all-objects" and always
has non-null obj_name, so it looks a bit different from the other
hunk and is more straight-forward.  Looking good.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-10 20:13                 ` Junio C Hamano
@ 2023-05-12  8:54                   ` Toon Claes
  2023-05-12 16:57                     ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Toon Claes @ 2023-05-12  8:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Phillip Wood, Taylor Blau


Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>> Since it's supported to have NUL-delimited input, introduced in
>> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
>> 2022-07-22), it's possible to pass paths that contain newlines.
>
> It has been a while since I saw this patch the last time, and it did
> not immediately click to me how "pass paths" relates to passing
> object names, which is what "--batch-check" takes.  Perhaps
>
>     "cat-file --batch-check" may be fed object names of blobs and
>     trees as "<commit>:<path>", and with its "-z" option, it is
>     possible to feed <commit> or <path> with newlines in it.
>
> or something?

Good suggestion.

>> This
>> works great when the object is found, but when it's not, the input path
>> is returned in the error message. Because this can contain newlines, the
>> error message might get spread over multiple lines, making it harder to
>> machine-parse this error message.
>
> Good description.  I may suggest
>
>     "the input path is returned" -> "the input is shown verbatim"
>
> because <path> is not the only thing that can contain LF.  E.g.
>
>     $ git show -s 'HEAD^{/introduced in
>     > db9d67}'
>
> can find the commit resulting from this patch, so
>
>     $ printf "%s\0" 'HEAD^{/introduced in
>     > db9d67}:builtin/cat-file.cc' |
>     > git cat-file --batch-check -z

Thanks for that thorough explanation, makes a lot of sense.

>
> would be an input record with newline in it, that has no newline in
> the path.
>
>> With this change, the input is quote-formatted in the error message, if
>> needed. This ensures the error message is always on a single line and
>> makes parsing the error more straightforward.
>
> Drop "With this change, ..." and give a command to the codebase to
> c-quote the object name in the output, e.g.
>
>     C-quote the object name from the input in the error message as
>     needed, to ensure that the error message is on a single line and
>     ...

Sure, I'll update the commit message accordingly.

> The other side of the coin, however, is that an existing project
> that is sane enough not to have a path with LF in it, but is not
> sane enough to avoid a path with double-quote in it, would now stop
> being able to parse the error message for a missing path.

Ah interesting, this is not a case I did consider before.

> It is a regression, and we may argue that it is not a large enough
> regression to block progress given by this patch, but if it is not
> common enough to have funny characters in the paths then we wouldn't
> be seeing this patch in the first place.  So I would prefer to see
> that we at least admit that we are deliberately making this change
> with a known regression.  Perhaps add something like
>
>     Note that if a project already parses the error message
>     correctly because it does not have paths with newlines, this is
>     a breaking change if it has paths with characters that need
>     c-quoting (like double quotes and backslashes) that are not
>     newlines.  We consider that this is a small enough price to pay
>     to allow newlines in paths because ...
>
> and fill the "because ..." part is sensible.  I am not filling the
> "because" part, simply because I do not offhand see any good excuse
> to rob Peter to pay Paul in this case.

I intended to finalize this patch sooner, but other priorities popped
up. The -z flag was added in v2.38 and it would have been nice to have
the patch in v2.40, this would reduce the number of Peters affected. Now
we're a couple months and yet another release further between
introduction of the flag and making the regression, so I feel your
sentiment.

I previous conversations[1] we've been talking about making the output
NUL-terminated as well. We agreed on the cquote fix because the -z flag
was still fresh, but maybe at this time we need to revisit this.

Ideally the output should be NUL-terminated if -z is used. This was also
suggested[2] when the flag was introduced. Obviously we cannot change
this now, because it would break behavior for *everyone* using -z, not
only when funny names are used. So if we want to go this route, we
should only do so with another flag (e.g. `--null-output`) or a config
option.

But I was looking at the git-config(1) documentation:

> core.quotePath::
> 	Commands that output paths (e.g. 'ls-files', 'diff'), will
> 	quote "unusual" characters in the pathname by enclosing the
> 	pathname in double-quotes and escaping those characters with
> 	backslashes in the same way C escapes control characters (e.g.
> 	`\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
> 	values larger than 0x80 (e.g. octal `\302\265` for "micro" in
> 	UTF-8).  If this variable is set to false, bytes higher than
> 	0x80 are not considered "unusual" any more. Double-quotes,
> 	backslash and control characters are always escaped regardless
> 	of the setting of this variable.  A simple space character is
> 	not considered "unusual".  Many commands can output pathnames
> 	completely verbatim using the `-z` option. The default value
> 	is true.

If you read this, the changes of this patch fully contradict this. Also
documentation on other commands (e.g. git-check-ignore(1)) using `-z`
will mention the verbatim output. So at the moment I'm leaning toward
another solution. I'm looping in Taylor as he added the flag initially,
so he might have some insights how they are using it and how they expect
it to work.

-- Toon

[1]: https://lore.kernel.org/git/xmqq5yekyvrh.fsf@gitster.g/
[2]: https://lore.kernel.org/git/66b71194-ad0e-18d0-e43b-71e5c47ba111@gmail.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-12  8:54                   ` Toon Claes
@ 2023-05-12 16:57                     ` Junio C Hamano
  2023-05-15  8:47                       ` Phillip Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-05-12 16:57 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Phillip Wood, Taylor Blau

Toon Claes <toon@iotcl.com> writes:

> Ideally the output should be NUL-terminated if -z is used. This was also
> suggested[2] when the flag was introduced. Obviously we cannot change
> this now, because it would break behavior for *everyone* using -z, not
> only when funny names are used. So if we want to go this route, we
> should only do so with another flag (e.g. `--null-output`) or a config
> option.

Yes, `--null-output` came also to my mind.  As this new mode of
output is for consumption by programs, letting them read
NUL-terminated records is a viable, if cumbersome, possibility.

> But I was looking at the git-config(1) documentation:
>
>> core.quotePath::
>> 	Commands that output paths (e.g. 'ls-files', 'diff'), will
>> 	quote "unusual" characters in the pathname by enclosing the
>> 	pathname in double-quotes and escaping those characters with
>> 	backslashes in the same way C escapes control characters (e.g.
>> 	`\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
>> 	values larger than 0x80 (e.g. octal `\302\265` for "micro" in
>> 	UTF-8).  If this variable is set to false, bytes higher than
>> 	0x80 are not considered "unusual" any more. Double-quotes,
>> 	backslash and control characters are always escaped regardless
>> 	of the setting of this variable.  A simple space character is
>> 	not considered "unusual".  Many commands can output pathnames
>> 	completely verbatim using the `-z` option. The default value
>> 	is true.
>
> If you read this, the changes of this patch fully contradict this.

Hmph, I do not quite see where the contradiction is.  If you mean
"Many commands can output" part, I do not think it applies here.
First, your "cat-file" does not have to be a part of "many".  More
importantly, the mention of `-z` there is about the option accepted
by the diff family of commants, e.g. "git diff --name-only -z
HEAD^", that is an output record separator.  Your "-z" is about the
input record separator, and if you are not changing "-z" to suddenly
mean both input and output  separator to break existing scripts that
expect "-z" only applies to input, the above "completely verbatim"
does not apply to you.

> Also
> documentation on other commands (e.g. git-check-ignore(1)) using `-z`
> will mention the verbatim output.

Again, it is about the output.

Stepping back a bit, how big a problem is this in real life?  It
certainly is possible to create a pathname with funny byte values in
it, and in some environments,letters like single-quote that are
considered cumbersome to handle by those who are used to CLI
programs may be commonplace.  But a path with newline?  Or any
control character for that matter?  And this is not even the primary
output from the program but is an error message for consumption by
humans, no?

I am wondering if it is simpler to just declare that the paths
output in error messages have certain bytes, probably all control
characters other than HT, replaced with a dot, and tell the users
not to rely on the pathnames being intact if they contain funny
bytes in them.  That way, with the definition of "work" being "you
can read the path out of error messages that talk about it", paths
with bytes that c-quote mechanism butchers, like double quotes and
backslashes, that have worked before will not be broken, and paths
with LF or CRLF in them that have never worked would not work, but
at least does not break the input stream of whoever is reading the
error messages line by line.

I dunno.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-12 16:57                     ` Junio C Hamano
@ 2023-05-15  8:47                       ` Phillip Wood
  2023-05-15 17:20                         ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Phillip Wood @ 2023-05-15  8:47 UTC (permalink / raw)
  To: Junio C Hamano, Toon Claes; +Cc: git, Taylor Blau

On 12/05/2023 17:57, Junio C Hamano wrote:
> Toon Claes <toon@iotcl.com> writes:
> Stepping back a bit, how big a problem is this in real life?  It
> certainly is possible to create a pathname with funny byte values in
> it, and in some environments,letters like single-quote that are
> considered cumbersome to handle by those who are used to CLI
> programs may be commonplace.  But a path with newline?  Or any
> control character for that matter?  And this is not even the primary
> output from the program but is an error message for consumption by
> humans, no?
> 
> I am wondering if it is simpler to just declare that the paths
> output in error messages have certain bytes, probably all control
> characters other than HT, replaced with a dot, and tell the users
> not to rely on the pathnames being intact if they contain funny
> bytes in them.

We could only c-quote the name when it contains a control character 
other that HT. That way names containing double quotes and backslashes 
are unchanged but it will still be possible to parse the path from the 
error message. If we're going to munge the name we might as well use our 
standard quoting rather than some ad-hoc scheme.

Best Wishes

Phillip

   That way, with the definition of "work" being "you
> can read the path out of error messages that talk about it", paths
> with bytes that c-quote mechanism butchers, like double quotes and
> backslashes, that have worked before will not be broken, and paths
> with LF or CRLF in them that have never worked would not work, but
> at least does not break the input stream of whoever is reading the
> error messages line by line.
> 
> I dunno.
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-15  8:47                       ` Phillip Wood
@ 2023-05-15 17:20                         ` Junio C Hamano
  2023-06-02 13:29                           ` Phillip Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2023-05-15 17:20 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Toon Claes, git, Taylor Blau

Phillip Wood <phillip.wood123@gmail.com> writes:

> On 12/05/2023 17:57, Junio C Hamano wrote:
>> Toon Claes <toon@iotcl.com> writes:
>> Stepping back a bit, how big a problem is this in real life?  It
>> certainly is possible to create a pathname with funny byte values in
>> it, and in some environments,letters like single-quote that are
>> considered cumbersome to handle by those who are used to CLI
>> programs may be commonplace.  But a path with newline?  Or any
>> control character for that matter?  And this is not even the primary
>> output from the program but is an error message for consumption by
>> humans, no?
>> I am wondering if it is simpler to just declare that the paths
>> output in error messages have certain bytes, probably all control
>> characters other than HT, replaced with a dot, and tell the users
>> not to rely on the pathnames being intact if they contain funny
>> bytes in them.
>
> We could only c-quote the name when it contains a control character
> other that HT. That way names containing double quotes and backslashes
> are unchanged but it will still be possible to parse the path from the
> error message. If we're going to munge the name we might as well use
> our standard quoting rather than some ad-hoc scheme.

In the above suggestion, I gave up and no longer aim to do
"quoting".  A more appropriate word for the approach is "redacting".
The message essentially is: If you use truly problematic bytes in
your path, they are redacted (so do not use them if it hurts).

This is because I am not sure how "names containing dq and bs are
unchanged" can be done without ambiguity.  If I see a message that
comes out of this:

	printf("%s missing\n", obj_name);

and it looks like

	"a\nb" missing

how do I tell if it is complaining about the object the user named
with a three-byte string (i.e. lowercase-A, newline, lowercase-B),
or a six-byte string (i.e. dq, lowercase-A, bs, lowercase-N,
lowercase-B, dq)?

If we were forbidding '"' to appear in a refname, then we could take
advantage of the fact that the name of an object inside a tree at a
funny path would not start with '"', to disambiguate.  For the
three- and six-byte string cases above, the formatting function will
give these messages (referred to as "sample output" below):

	"master:a\nb" missing
	master:"a\nb" missing

because of your "we do not exactly do our standard c-quote; we
exempt dq and bs from the bytes to be quoted" rule.

But it still feels a bit misleading.  This codepath may have the
whole objectname as a single string so that c-quoting the entire
"<commit> <colon> <path>" inside a single c-quoted string that
begins with a dq is easy, but not all codepaths are lucky and some
may have to show <commit> and <path> separately, concatenated with
<colon> at the outermost output layer, which means that the second
one from the sample output may still mean the path with three-byte
name in the tree of 'master' commit.

And worse yet, because

	git branch '"master'

is possible (even though nobody sane would do that), so "treat the
string as c-quoted only if the object name as a whole begins with a
dq", this disambiguation idea would not work.  The first one from
the sample output could be the blob at the path with a five-byte
string name (i.e. lowercase-A, bs, lowercase-N, lowercase-B, dq)
in the tree of the commit at the tip of branch with seven-byte
string name (i.e. dq followed by 'master').

So, I dunno.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
  2023-05-15 17:20                         ` Junio C Hamano
@ 2023-06-02 13:29                           ` Phillip Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Phillip Wood @ 2023-06-02 13:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Toon Claes, git, Taylor Blau

Hi Junio

Sorry for the slow reply, I had intended to reply before but got 
distracted and forgot about it.

On 15/05/2023 18:20, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> On 12/05/2023 17:57, Junio C Hamano wrote:
>>> Toon Claes <toon@iotcl.com> writes:
>>> Stepping back a bit, how big a problem is this in real life?  It
>>> certainly is possible to create a pathname with funny byte values in
>>> it, and in some environments,letters like single-quote that are
>>> considered cumbersome to handle by those who are used to CLI
>>> programs may be commonplace.  But a path with newline?  Or any
>>> control character for that matter?  And this is not even the primary
>>> output from the program but is an error message for consumption by
>>> humans, no?
>>> I am wondering if it is simpler to just declare that the paths
>>> output in error messages have certain bytes, probably all control
>>> characters other than HT, replaced with a dot, and tell the users
>>> not to rely on the pathnames being intact if they contain funny
>>> bytes in them.
>>
>> We could only c-quote the name when it contains a control character
>> other that HT. That way names containing double quotes and backslashes
>> are unchanged but it will still be possible to parse the path from the
>> error message. If we're going to munge the name we might as well use
>> our standard quoting rather than some ad-hoc scheme.
> 
> In the above suggestion, I gave up and no longer aim to do
> "quoting".  A more appropriate word for the approach is "redacting".
> The message essentially is: If you use truly problematic bytes in
> your path, they are redacted (so do not use them if it hurts).
> 
> This is because I am not sure how "names containing dq and bs are
> unchanged" can be done without ambiguity.

D'oh, I should have thought of that. You're right it ends up being 
ambiguous. Anyway Patrick has just posted a patch to add NUL terminated 
output which looks like a cleaner approach.

Best Wishes

Phillip

>  If I see a message that
> comes out of this:
> 
> 	printf("%s missing\n", obj_name);
> 
> and it looks like
> 
> 	"a\nb" missing
> 
> how do I tell if it is complaining about the object the user named
> with a three-byte string (i.e. lowercase-A, newline, lowercase-B),
> or a six-byte string (i.e. dq, lowercase-A, bs, lowercase-N,
> lowercase-B, dq)?
> 
> If we were forbidding '"' to appear in a refname, then we could take
> advantage of the fact that the name of an object inside a tree at a
> funny path would not start with '"', to disambiguate.  For the
> three- and six-byte string cases above, the formatting function will
> give these messages (referred to as "sample output" below):
> 
> 	"master:a\nb" missing
> 	master:"a\nb" missing
> 
> because of your "we do not exactly do our standard c-quote; we
> exempt dq and bs from the bytes to be quoted" rule.
> 
> But it still feels a bit misleading.  This codepath may have the
> whole objectname as a single string so that c-quoting the entire
> "<commit> <colon> <path>" inside a single c-quoted string that
> begins with a dq is easy, but not all codepaths are lucky and some
> may have to show <commit> and <path> separately, concatenated with
> <colon> at the outermost output layer, which means that the second
> one from the sample output may still mean the path with three-byte
> name in the tree of 'master' commit.
> 
> And worse yet, because
> 
> 	git branch '"master'
> 
> is possible (even though nobody sane would do that), so "treat the
> string as c-quoted only if the object name as a whole begins with a
> dq", this disambiguation idea would not work.  The first one from
> the sample output could be the blob at the path with a five-byte
> string name (i.e. lowercase-A, bs, lowercase-N, lowercase-B, dq)
> in the tree of the commit at the tip of branch with seven-byte
> string name (i.e. dq followed by 'master').
> 
> So, I dunno.


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-06-02 13:29 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-09 15:00 [PATCH 0/1] cat-file: quote-format name in error when using -z Toon Claes
2022-12-09 15:00 ` [PATCH 1/1] " Toon Claes
2022-12-09 19:33   ` Phillip Wood
2022-12-09 23:58     ` Junio C Hamano
2022-12-11 16:30       ` Phillip Wood
2022-12-12  0:11         ` Junio C Hamano
2022-12-12 11:34           ` Toon Claes
2022-12-12 22:09             ` Junio C Hamano
2022-12-13 15:06           ` Phillip Wood
2022-12-14  8:29             ` Junio C Hamano
2022-12-20  5:31     ` Toon Claes
2022-12-20 10:18       ` Phillip Wood
2022-12-21 12:42         ` Toon Claes
2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
2023-01-05  6:24   ` [PATCH v2 1/1] " Toon Claes
2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
2023-01-16 19:07     ` [PATCH v3 1/1] " Toon Claes
2023-01-17 15:24       ` Phillip Wood
2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
2023-03-03 20:26         ` Junio C Hamano
2023-03-03 23:14           ` Junio C Hamano
2023-05-10 19:01             ` [PATCH v5 0/1] cat-file: quote-format name in error when using -z Toon Claes
2023-05-10 19:01               ` [PATCH v5 1/1] " Toon Claes
2023-05-10 20:13                 ` Junio C Hamano
2023-05-12  8:54                   ` Toon Claes
2023-05-12 16:57                     ` Junio C Hamano
2023-05-15  8:47                       ` Phillip Wood
2023-05-15 17:20                         ` Junio C Hamano
2023-06-02 13:29                           ` Phillip Wood
2023-03-03 19:17       ` [PATCH v4 2/2] " Toon Claes
2023-03-03 20:14       ` [PATCH v4 0/2] " Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).