Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp
@ 2023-04-22 13:50 Jeff King
  2023-04-22 15:53 ` René Scharfe
  2023-04-24 18:01 ` Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2023-04-22 13:50 UTC (permalink / raw)
  To: Thomas Bock; +Cc: Derrick Stolee, Junio C Hamano, git

The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:

   committer name <email>   \n

with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).

In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.

The new test demonstrates such a case. I also added a test to check this
case against the pretty-print formatter, which uses split_ident_line().
It's not subject to the same bug, because it insists that there be one
or more digits in the timestamp.

We can use the same logic here. If there's a non-whitespace but
non-digit value (say "committer name <email> foo"), then
parse_timestamp() would already have returned 0 anyway. So the only
change should be for this "whitespace only" case.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c               | 10 ++++++++++
 t/t4212-log-corrupt.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/commit.c b/commit.c
index ede810ac1c..56877322d3 100644
--- a/commit.c
+++ b/commit.c
@@ -120,6 +120,16 @@ static timestamp_t parse_commit_date(const char *buf, const char *tail)
 	if (dateptr == buf || dateptr == eol)
 		return 0;
 
+	/*
+	 * trim leading whitespace; parse_timestamp() will do this itself, but
+	 * it will walk past the newline at eol while doing so. So we insist
+	 * that there is at least one digit here.
+	 */
+	while (dateptr < eol && isspace(*dateptr))
+		dateptr++;
+	if (!strchr("0123456789", *dateptr))
+		return 0;
+
 	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
 	return parse_timestamp(dateptr, NULL, 10);
 }
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index af4b35ff56..d4ef48d646 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -92,4 +92,33 @@ test_expect_success 'absurdly far-in-future date' '
 	git log -1 --format=%ad $commit
 '
 
+test_expect_success 'create commit with whitespace committer date' '
+	# It is important that this subject line is numeric, since we want to
+	# be sure we are not confused by skipping whitespace and accidentally
+	# parsing the subject as a timestamp.
+	#
+	# Do not use munge_author_date here. Besides not hitting the committer
+	# line, it leaves the timezone intact, and we want nothing but
+	# whitespace.
+	test_commit 1234567890 &&
+	git cat-file commit HEAD >commit.orig &&
+	sed "s/>.*/>    /" <commit.orig >commit.munge &&
+	ws_commit=$(git hash-object --literally -w -t commit commit.munge)
+'
+
+test_expect_success '--until treats whitespace date as sentinel' '
+	echo $ws_commit >expect &&
+	git rev-list --until=1980-01-01 $ws_commit >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'pretty-printer handles whitespace date' '
+	# as with the %ad test above, we will show these as the empty string,
+	# not the 1970 epoch date. This is intentional; see 7d9a281941 (t4212:
+	# test bogus timestamps with git-log, 2014-02-24) for more discussion.
+	echo : >expect &&
+	git log -1 --format="%at:%ct" $ws_commit >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0.653.g15ca972062

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp
  2023-04-22 13:50 [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp Jeff King
@ 2023-04-22 15:53 ` René Scharfe
  2023-04-23  0:37   ` Jeff King
  2023-04-24 18:01 ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: René Scharfe @ 2023-04-22 15:53 UTC (permalink / raw)
  To: Jeff King, Thomas Bock; +Cc: Derrick Stolee, Junio C Hamano, git

Am 22.04.23 um 15:50 schrieb Jeff King:
> diff --git a/commit.c b/commit.c
> index ede810ac1c..56877322d3 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -120,6 +120,16 @@ static timestamp_t parse_commit_date(const char *buf, const char *tail)
>  	if (dateptr == buf || dateptr == eol)
>  		return 0;
>
> +	/*
> +	 * trim leading whitespace; parse_timestamp() will do this itself, but
> +	 * it will walk past the newline at eol while doing so. So we insist
> +	 * that there is at least one digit here.
> +	 */
> +	while (dateptr < eol && isspace(*dateptr))
> +		dateptr++;
> +	if (!strchr("0123456789", *dateptr))

You could use (our own) isdigit() here instead.  It's more concise and
efficient.

René


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp
  2023-04-22 15:53 ` René Scharfe
@ 2023-04-23  0:37   ` Jeff King
  2023-04-25  5:56     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-04-23  0:37 UTC (permalink / raw)
  To: René Scharfe; +Cc: Thomas Bock, Derrick Stolee, Junio C Hamano, git

On Sat, Apr 22, 2023 at 05:53:10PM +0200, René Scharfe wrote:

> Am 22.04.23 um 15:50 schrieb Jeff King:
> > diff --git a/commit.c b/commit.c
> > index ede810ac1c..56877322d3 100644
> > --- a/commit.c
> > +++ b/commit.c
> > @@ -120,6 +120,16 @@ static timestamp_t parse_commit_date(const char *buf, const char *tail)
> >  	if (dateptr == buf || dateptr == eol)
> >  		return 0;
> >
> > +	/*
> > +	 * trim leading whitespace; parse_timestamp() will do this itself, but
> > +	 * it will walk past the newline at eol while doing so. So we insist
> > +	 * that there is at least one digit here.
> > +	 */
> > +	while (dateptr < eol && isspace(*dateptr))
> > +		dateptr++;
> > +	if (!strchr("0123456789", *dateptr))
> 
> You could use (our own) isdigit() here instead.  It's more concise and
> efficient.

Heh, yes, that is much better. I had strspn() on the mind since that is
what split_ident_line() uses.

I think it could even just be:

  if (dateptr != eol)

which implies that we found some non-whitespace character, and then we
rely on parse_timestamp() to return 0 (which is what the current code is
effectively doing).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp
  2023-04-22 13:50 [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp Jeff King
  2023-04-22 15:53 ` René Scharfe
@ 2023-04-24 18:01 ` Junio C Hamano
  2023-04-25  5:27   ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2023-04-24 18:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Bock, Derrick Stolee, git

Jeff King <peff@peff.net> writes:

> The comment in parse_commit_date() claims that parse_timestamp() will
> not walk past the end of the buffer we've been given, since it will hit
> the newline at "eol" and stop. This is usually true, when dateptr
> contains actual numbers to parse. But with a line like:
>
>    committer name <email>   \n

I was wondering of this case while reading [2/3] ;-)

> ...
> In practice this can't cause us to walk off the end of an array, because
> we always add an extra NUL byte to the end of objects we load from disk
> (as a defense against exactly this kind of bug). However, you can see
> the behavior in action when "committer" is the final header (which it
> usually is, unless there's an encoding) and the subject line can be
> parsed as an integer. We walk right past the newline on the committer
> line, as well as the "\n\n" separator, and mistake the subject for the
> timestamp.


> +	/*
> +	 * trim leading whitespace; parse_timestamp() will do this itself, but
> +	 * it will walk past the newline at eol while doing so. So we insist
> +	 * that there is at least one digit here.
> +	 */

"one digit" -> "one non-whitespace".

> +	while (dateptr < eol && isspace(*dateptr))
> +		dateptr++;

This is an expected change, but

> +	if (!strchr("0123456789", *dateptr))
> +		return 0;

this is not.  Isn't the only problematic case that dateptr being at
eol?  That is what the proposed log message argued.

>  	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */

This comment is slightly stale.  dateptr < eol, *eol == '\n', and we
know the string starting at dateptr is not a run of whitespace and
that is what makes the parsing stop at eol.

> diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
> index af4b35ff56..d4ef48d646 100755
> --- a/t/t4212-log-corrupt.sh
> +++ b/t/t4212-log-corrupt.sh
> @@ -92,4 +92,33 @@ test_expect_success 'absurdly far-in-future date' '
>  	git log -1 --format=%ad $commit
>  '
>  
> +test_expect_success 'create commit with whitespace committer date' '
> +	# It is important that this subject line is numeric, since we want to
> +	# be sure we are not confused by skipping whitespace and accidentally
> +	# parsing the subject as a timestamp.

Nice.

> +	# Do not use munge_author_date here. Besides not hitting the committer
> +	# line, it leaves the timezone intact, and we want nothing but
> +	# whitespace.
> +	test_commit 1234567890 &&
> +	git cat-file commit HEAD >commit.orig &&
> +	sed "s/>.*/>    /" <commit.orig >commit.munge &&
> +	ws_commit=$(git hash-object --literally -w -t commit commit.munge)
> +'
> +
> +test_expect_success '--until treats whitespace date as sentinel' '
> +	echo $ws_commit >expect &&
> +	git rev-list --until=1980-01-01 $ws_commit >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'pretty-printer handles whitespace date' '
> +	# as with the %ad test above, we will show these as the empty string,
> +	# not the 1970 epoch date. This is intentional; see 7d9a281941 (t4212:
> +	# test bogus timestamps with git-log, 2014-02-24) for more discussion.
> +	echo : >expect &&
> +	git log -1 --format="%at:%ct" $ws_commit >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp
  2023-04-24 18:01 ` Junio C Hamano
@ 2023-04-25  5:27   ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2023-04-25  5:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Bock, Derrick Stolee, git

On Mon, Apr 24, 2023 at 11:01:26AM -0700, Junio C Hamano wrote:

> > +	/*
> > +	 * trim leading whitespace; parse_timestamp() will do this itself, but
> > +	 * it will walk past the newline at eol while doing so. So we insist
> > +	 * that there is at least one digit here.
> > +	 */
> 
> "one digit" -> "one non-whitespace".
> 
> > +	while (dateptr < eol && isspace(*dateptr))
> > +		dateptr++;
> 
> This is an expected change, but
> 
> > +	if (!strchr("0123456789", *dateptr))
> > +		return 0;
> 
> this is not.  Isn't the only problematic case that dateptr being at
> eol?  That is what the proposed log message argued.

Yes, that would be sufficient. I was moving things slightly closer to
what split_ident_line() does by actually checking for numbers. But that
led to the final paragraph in the commit message explaining how it all
ends up the same either way.

So I'll swap this out for:

  if (dateptr == eol)

which I think requires less explanation, as it leaves the function more
like it was originally (and the behavior is the same either way).

> >  	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
> 
> This comment is slightly stale.  dateptr < eol, *eol == '\n', and we
> know the string starting at dateptr is not a run of whitespace and
> that is what makes the parsing stop at eol.

Yeah, I hoped the extra context of the earlier comment would be enough. ;)
But it is probably better to spell it out by expanding this comment.
The code is certainly tricky enough.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp
  2023-04-23  0:37   ` Jeff King
@ 2023-04-25  5:56     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2023-04-25  5:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: Thomas Bock, Derrick Stolee, Junio C Hamano, git

On Sat, Apr 22, 2023 at 08:37:15PM -0400, Jeff King wrote:

> > You could use (our own) isdigit() here instead.  It's more concise and
> > efficient.
> 
> Heh, yes, that is much better. I had strspn() on the mind since that is
> what split_ident_line() uses.
> 
> I think it could even just be:
> 
>   if (dateptr != eol)
> 
> which implies that we found some non-whitespace character, and then we
> rely on parse_timestamp() to return 0 (which is what the current code is
> effectively doing).

This should be "dateptr == eol" of course, because the body of the
conditional is "return 0" to signal an error. It's correct in the v2 of
the series I just sent out.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-25  5:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-22 13:50 [PATCH 3/3] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-22 15:53 ` René Scharfe
2023-04-23  0:37   ` Jeff King
2023-04-25  5:56     ` Jeff King
2023-04-24 18:01 ` Junio C Hamano
2023-04-25  5:27   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).