* [PATCH 0/8] pretty: minor bugfixing, some refactorings
@ 2025-03-19 7:23 Martin Ågren
2025-03-19 7:23 ` [PATCH 1/8] pretty: tighten function signature to not take `void *` Martin Ågren
` (7 more replies)
0 siblings, 8 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git
Cc: Patrick Steinhardt, Jeff King, René Scharfe, Junio C Hamano,
Andy Koppe
Hi,
I've been going through pretty.c with an eye to pulling apart parsing of
placeholders and formatting of strings. These are some initial cleanups
I've accumulated in the process. I think they should be valuable on
their own, even if I eventually abandon the bigger project.
Martin Ågren (8):
pretty: tighten function signature to not take `void *`
pretty: simplify if-else to reduce code duplication
Two more or less simple cleanups.
pretty: collect padding-related fields in separate struct
pretty: fix parsing of half-valid "%<" and "%>" placeholders
pretty: after padding, reset padding info
Fix a bug or two in parsing and handling of padding placeholders.
pretty: refactor parsing of line-wrapping "%w" placeholder
pretty: refactor parsing of magic
pretty: refactor parsing of decoration options
Further splitting without any intended changes in behavior. These
refactorings aren't followed by any patch that actually benefits from
them. Still, I do think these end up making the code a tiny bit easier
to understand by way of having smaller functions and collecting related
pieces of data into smaller structs.
Unless the original authors are (TTBOMK) no longer around, I'm cc-ing
them on the individual patches and on this cover letter.
Martin
pretty.c | 300 +++++++++++++++++++++-------------
t/t4205-log-pretty-formats.sh | 15 ++
2 files changed, 197 insertions(+), 118 deletions(-)
--
2.49.0.472.ge94155a9ec
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/8] pretty: tighten function signature to not take `void *`
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 2/8] pretty: simplify if-else to reduce code duplication Martin Ågren
` (6 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git
We take a `void *` and immediately cast it. Both callers already have
this pointer as the right type, so tighten the interface and stop
casting.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/pretty.c b/pretty.c
index 0bc8ad8a9a..a4e5fc5c50 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1437,9 +1437,8 @@ static void free_decoration_options(const struct decoration_options *opts)
static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
const char *placeholder,
- void *context)
+ struct format_commit_context *c)
{
- struct format_commit_context *c = context;
const struct commit *commit = c->commit;
const char *msg = c->message;
struct commit_list *p;
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/8] pretty: simplify if-else to reduce code duplication
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
2025-03-19 7:23 ` [PATCH 1/8] pretty: tighten function signature to not take `void *` Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-20 9:17 ` Patrick Steinhardt
2025-03-24 3:50 ` Jeff King
2025-03-19 7:23 ` [PATCH 3/8] pretty: collect padding-related fields in separate struct Martin Ågren
` (5 subsequent siblings)
7 siblings, 2 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git; +Cc: Jeff King
First we look for "auto,", then we try "always,", then we fall back to
the default, which is to do exactly the same thing as we do for "auto,".
The amount of code duplication isn't huge, but still: reading this code
carefully requires spending at least *some* time on making sure the two
blocks of code are indeed identical.
Rearrange the checks so that we end with the default case,
opportunistically consuming the "auto," which may or may not be there.
In the "always," case, we don't actually *do* anything, so if we were
into golfing, we'd just write the whole thing as a single
if (!skip_prefix(begin, "always,", &begin)) {
...
}
If we ever learn something new besides "always," and "auto," we'd need
to pull things apart again. Plus we still need somewhere to place the
comment. Let's focus on code de-duplication rather than golfing for now.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/pretty.c b/pretty.c
index a4e5fc5c50..6a4264dd01 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1076,13 +1076,11 @@ static size_t parse_color(struct strbuf *sb, /* in UTF-8 */
if (!end)
return 0;
- if (skip_prefix(begin, "auto,", &begin)) {
- if (!want_color(c->pretty_ctx->color))
- return end - placeholder + 1;
- } else if (skip_prefix(begin, "always,", &begin)) {
+ if (skip_prefix(begin, "always,", &begin)) {
/* nothing to do; we do not respect want_color at all */
} else {
/* the default is the same as "auto" */
+ skip_prefix(begin, "auto,", &begin);
if (!want_color(c->pretty_ctx->color))
return end - placeholder + 1;
}
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/8] pretty: collect padding-related fields in separate struct
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
2025-03-19 7:23 ` [PATCH 1/8] pretty: tighten function signature to not take `void *` Martin Ågren
2025-03-19 7:23 ` [PATCH 2/8] pretty: simplify if-else to reduce code duplication Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-19 7:23 ` [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders Martin Ågren
` (4 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git
Padding ("%<" and "%>") involves three fields of `struct
format_commit_context`. This goes all the way back to commits a57523428b
(pretty: support padding placeholders, %< %> and %><, 2013-04-19) and
1640632b4f (pretty: support %>> that steal trailing spaces, 2013-04-19).
These fields are not used for anything else.
Make that clearer by collecting them into their own little struct. Let
our parser populate just that struct to make it obvious that the rest of
the big struct does not influence the parsing.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 42 +++++++++++++++++++++++-------------------
1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/pretty.c b/pretty.c
index 6a4264dd01..e5e8ef24fa 100644
--- a/pretty.c
+++ b/pretty.c
@@ -887,6 +887,12 @@ enum trunc_type {
trunc_right
};
+struct padding_args {
+ enum flush_type flush_type;
+ enum trunc_type truncate;
+ int padding;
+};
+
struct format_commit_context {
struct repository *repository;
const struct commit *commit;
@@ -894,13 +900,11 @@ struct format_commit_context {
unsigned commit_header_parsed:1;
unsigned commit_message_parsed:1;
struct signature_check signature_check;
- enum flush_type flush_type;
- enum trunc_type truncate;
const char *message;
char *commit_encoding;
size_t width, indent1, indent2;
int auto_color;
- int padding;
+ struct padding_args pad;
/* These offsets are relative to the start of the commit message. */
struct chunk author;
@@ -1112,7 +1116,7 @@ static size_t parse_color(struct strbuf *sb, /* in UTF-8 */
}
static size_t parse_padding_placeholder(const char *placeholder,
- struct format_commit_context *c)
+ struct padding_args *p)
{
const char *ch = placeholder;
enum flush_type flush_type;
@@ -1167,8 +1171,8 @@ static size_t parse_padding_placeholder(const char *placeholder,
if (width < 0)
return 0;
}
- c->padding = to_column ? -width : width;
- c->flush_type = flush_type;
+ p->padding = to_column ? -width : width;
+ p->flush_type = flush_type;
if (*end == ',') {
start = end + 1;
@@ -1176,15 +1180,15 @@ static size_t parse_padding_placeholder(const char *placeholder,
if (!end || end == start)
return 0;
if (starts_with(start, "trunc)"))
- c->truncate = trunc_right;
+ p->truncate = trunc_right;
else if (starts_with(start, "ltrunc)"))
- c->truncate = trunc_left;
+ p->truncate = trunc_left;
else if (starts_with(start, "mtrunc)"))
- c->truncate = trunc_middle;
+ p->truncate = trunc_middle;
else
return 0;
} else
- c->truncate = trunc_none;
+ p->truncate = trunc_none;
return end - placeholder + 1;
}
@@ -1504,7 +1508,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
case '<':
case '>':
- return parse_padding_placeholder(placeholder, c);
+ return parse_padding_placeholder(placeholder, &c->pad);
}
if (skip_prefix(placeholder, "(describe", &arg)) {
@@ -1788,7 +1792,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
{
struct strbuf local_sb = STRBUF_INIT;
size_t total_consumed = 0;
- int len, padding = c->padding;
+ int len, padding = c->pad.padding;
if (padding < 0) {
const char *start = strrchr(sb->buf, '\n');
@@ -1815,7 +1819,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
}
len = utf8_strnwidth(local_sb.buf, local_sb.len, 1);
- if (c->flush_type == flush_left_and_steal) {
+ if (c->pad.flush_type == flush_left_and_steal) {
const char *ch = sb->buf + sb->len - 1;
while (len > padding && ch > sb->buf) {
const char *p;
@@ -1841,11 +1845,11 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
ch = p - 1;
}
strbuf_setlen(sb, ch + 1 - sb->buf);
- c->flush_type = flush_left;
+ c->pad.flush_type = flush_left;
}
if (len > padding) {
- switch (c->truncate) {
+ switch (c->pad.truncate) {
case trunc_left:
strbuf_utf8_replace(&local_sb,
0, len - (padding - 2),
@@ -1868,9 +1872,9 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
strbuf_addbuf(sb, &local_sb);
} else {
size_t sb_len = sb->len, offset = 0;
- if (c->flush_type == flush_left)
+ if (c->pad.flush_type == flush_left)
offset = padding - len;
- else if (c->flush_type == flush_both)
+ else if (c->pad.flush_type == flush_both)
offset = (padding - len) / 2;
/*
* we calculate padding in columns, now
@@ -1882,7 +1886,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
local_sb.len);
}
strbuf_release(&local_sb);
- c->flush_type = no_flush;
+ c->pad.flush_type = no_flush;
return total_consumed;
}
@@ -1927,7 +1931,7 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
}
orig_len = sb->len;
- if (context->flush_type == no_flush)
+ if (context->pad.flush_type == no_flush)
consumed = format_commit_one(sb, placeholder, context);
else
consumed = format_and_pad_commit(sb, placeholder, context);
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
` (2 preceding siblings ...)
2025-03-19 7:23 ` [PATCH 3/8] pretty: collect padding-related fields in separate struct Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 5/8] pretty: after padding, reset padding info Martin Ågren
` (3 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git
When we parse a padding directive ("%<" or "%>"), we might populate a
few of the struct's fields before bailing. This can result in such
half-parsed information being used to actually introduce some
padding/truncation.
When parsing a "%<" or "%>", only store the parsed data after parsing
successfully. The added test would have failed before this commit. It
also shows how the existing behavior is hardly something someone can
rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
in the pretty output.
We could let the caller use a temporary struct and only copy the data on
success. Let's instead make our parsing function easy to use correctly
by letting it only touch the output struct in the success case.
While setting up a temporary struct for parsing into, we might as well
initialize it to a well-defined state. It's unnecessary for the current
implementation since it always writes to all three fields in a
successful case, but some future-proofing shouldn't hurt.
Note that the test relies on first using a correct placeholder
"%<(4,trunc)" where "trunc" (`trunc_right`) lingers in our struct until
it's then used instead of the invalid "bad". The next commit will teach
us to clean up any remnants of "%<(4,trunc)" after handling it.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 18 ++++++++++++------
t/t4205-log-pretty-formats.sh | 6 ++++++
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/pretty.c b/pretty.c
index e5e8ef24fa..a4fa052f8b 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1121,6 +1121,11 @@ static size_t parse_padding_placeholder(const char *placeholder,
const char *ch = placeholder;
enum flush_type flush_type;
int to_column = 0;
+ struct padding_args ans = {
+ .flush_type = no_flush,
+ .truncate = trunc_none,
+ .padding = 0,
+ };
switch (*ch++) {
case '<':
@@ -1171,8 +1176,8 @@ static size_t parse_padding_placeholder(const char *placeholder,
if (width < 0)
return 0;
}
- p->padding = to_column ? -width : width;
- p->flush_type = flush_type;
+ ans.padding = to_column ? -width : width;
+ ans.flush_type = flush_type;
if (*end == ',') {
start = end + 1;
@@ -1180,16 +1185,17 @@ static size_t parse_padding_placeholder(const char *placeholder,
if (!end || end == start)
return 0;
if (starts_with(start, "trunc)"))
- p->truncate = trunc_right;
+ ans.truncate = trunc_right;
else if (starts_with(start, "ltrunc)"))
- p->truncate = trunc_left;
+ ans.truncate = trunc_left;
else if (starts_with(start, "mtrunc)"))
- p->truncate = trunc_middle;
+ ans.truncate = trunc_middle;
else
return 0;
} else
- p->truncate = trunc_none;
+ ans.truncate = trunc_none;
+ *p = ans;
return end - placeholder + 1;
}
return 0;
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index f81e42a84d..26987ecd77 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -1130,6 +1130,12 @@ test_expect_success 'log --pretty with invalid padding format' '
test_cmp expect actual
'
+test_expect_success 'semi-parseable padding format does not get semi-applied' '
+ git log -1 --pretty="format:%<(4,trunc)%H%%<(10,bad)%H" >expect &&
+ git log -1 --pretty="format:%<(4,trunc)%H%<(10,bad)%H" >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'log --pretty with magical wrapping directives' '
commit_id=$(git commit-tree HEAD^{tree} -m "describe me") &&
git tag describe-me $commit_id &&
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/8] pretty: after padding, reset padding info
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
` (3 preceding siblings ...)
2025-03-19 7:23 ` [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder Martin Ågren
` (2 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git
After handling a padding directive ("%<" or "%>"), we leave the `struct
padding_args` in a halfway state. We modify it a bit as we apply the
padding/truncation so that by the time we're done, it can't be in quite
as many states as when we started. Still, we don't fully restore it to
its default, no-action state.
"%<" and "%>" should only affect the next placeholder, but leaving a bit
of state around doesn't make it obvious that we don't spill any of it
into our handling of later placeholders. The previous commit closed off
a way of populating only half the `struct padding_args`, thereby fixing
a bug that *also* relied on then having the other half contain this kind
of lingering data.
After that fix, I haven't figured out a way to provoke a bug using just
this here half of the issue. Still, after handling padding, let's drop
all remnants of the previous "%<" or "%>".
Unlike the bug fixed in the previous commit, this could have some
realistic chance of regressing something for someone if they've actually
been using such state leftovers (knowingly or not). Still, it seems
worthwhile to try to tighten this.
This change to pretty.c would have been sufficient to make the test
added in the previous commit pass. Belt and suspenders.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 2 ++
t/t4205-log-pretty-formats.sh | 9 +++++++++
2 files changed, 11 insertions(+)
diff --git a/pretty.c b/pretty.c
index a4fa052f8b..f53e77ed86 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1893,6 +1893,8 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
}
strbuf_release(&local_sb);
c->pad.flush_type = no_flush;
+ c->pad.truncate = trunc_none;
+ c->pad.padding = 0;
return total_consumed;
}
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 26987ecd77..d34a7cec09 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -1124,6 +1124,15 @@ test_expect_success 'log --pretty with space stealing' '
test_cmp expect actual
'
+test_expect_success 'only the next placeholder gets truncated' '
+ {
+ git log -1 --pretty="format:%<(4,trunc)%H" &&
+ printf "$(git rev-parse HEAD)"
+ } >expect &&
+ git log -1 --pretty="format:%<(4,trunc)%H%H" >actual &&
+ test_cmp expect actual
+'
+
test_expect_success 'log --pretty with invalid padding format' '
printf "%s%%<(20" "$(git rev-parse HEAD)" >expect &&
git log -1 --pretty="format:%H%<(20" >actual &&
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
` (4 preceding siblings ...)
2025-03-19 7:23 ` [PATCH 5/8] pretty: after padding, reset padding info Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 7/8] pretty: refactor parsing of magic Martin Ågren
2025-03-19 7:23 ` [PATCH 8/8] pretty: refactor parsing of decoration options Martin Ågren
7 siblings, 1 reply; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, René Scharfe
Our parsing of a "%w" placeholder is quite a big chunk of code in
the middle of our switch for handling a few different placeholders. We
parse into three different variables, then use them to compare to and
update existing values in the big `struct format_commit_context`.
Pull out a helper function for parsing such a "%w" placeholder. Define a
struct for collecting the three variables.
Unlike recent commits, parsing and subsequent use are already a bit more
separated in the sense that we don't parse directly into the big context
struct. Thus, unlike the preceding commits, this does not fix any bugs
that I'm aware of. There's still value in separating parsing and usage
more clearly and simplifying `format_commit_one()`.
Note that we use two different types for these values, `unsigned long`
when parsing, `size_t` when eventually applying. Let's go for `size_t`
in our struct. I don't know if there are platforms where assigning an
`unsigned long` to a `size_t` could truncate the value, but since we
already verify the values to be at most 16 KiB, we should be able to fit
them into any sane `size_t`s.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 120 +++++++++++++++++++++++++++++++++++--------------------
1 file changed, 76 insertions(+), 44 deletions(-)
diff --git a/pretty.c b/pretty.c
index f53e77ed86..c44ff87481 100644
--- a/pretty.c
+++ b/pretty.c
@@ -893,6 +893,10 @@ struct padding_args {
int padding;
};
+struct rewrap_args {
+ size_t width, indent1, indent2;
+};
+
struct format_commit_context {
struct repository *repository;
const struct commit *commit;
@@ -902,7 +906,7 @@ struct format_commit_context {
struct signature_check signature_check;
const char *message;
char *commit_encoding;
- size_t width, indent1, indent2;
+ struct rewrap_args rewrap;
int auto_color;
struct padding_args pad;
@@ -1034,18 +1038,21 @@ static void strbuf_wrap(struct strbuf *sb, size_t pos,
static void rewrap_message_tail(struct strbuf *sb,
struct format_commit_context *c,
- size_t new_width, size_t new_indent1,
- size_t new_indent2)
+ const struct rewrap_args *new_rewrap)
{
- if (c->width == new_width && c->indent1 == new_indent1 &&
- c->indent2 == new_indent2)
+ const struct rewrap_args *old_rewrap = &c->rewrap;
+
+ if (old_rewrap->width == new_rewrap->width &&
+ old_rewrap->indent1 == new_rewrap->indent1 &&
+ old_rewrap->indent2 == new_rewrap->indent2)
return;
+
if (c->wrap_start < sb->len)
- strbuf_wrap(sb, c->wrap_start, c->width, c->indent1, c->indent2);
+ strbuf_wrap(sb, c->wrap_start, old_rewrap->width,
+ old_rewrap->indent1, old_rewrap->indent2);
+
c->wrap_start = sb->len;
- c->width = new_width;
- c->indent1 = new_indent1;
- c->indent2 = new_indent2;
+ c->rewrap = *new_rewrap;
}
static int format_reflog_person(struct strbuf *sb,
@@ -1443,6 +1450,57 @@ static void free_decoration_options(const struct decoration_options *opts)
free(opts->tag);
}
+static size_t parse_rewrap(const char *placeholder, struct rewrap_args *rewrap)
+{
+ unsigned long width = 0, indent1 = 0, indent2 = 0;
+ char *next;
+ const char *start;
+ const char *end;
+
+ memset(rewrap, 0, sizeof(*rewrap));
+
+ if (placeholder[1] != '(')
+ return 0;
+
+ start = placeholder + 2;
+ end = strchr(start, ')');
+
+ if (!end)
+ return 0;
+ if (end > start) {
+ width = strtoul(start, &next, 10);
+ if (*next == ',') {
+ indent1 = strtoul(next + 1, &next, 10);
+ if (*next == ',') {
+ indent2 = strtoul(next + 1,
+ &next, 10);
+ }
+ }
+ if (*next != ')')
+ return 0;
+ }
+
+ /*
+ * We need to limit the format here as it allows the
+ * user to prepend arbitrarily many bytes to the buffer
+ * when rewrapping.
+ */
+ if (width > FORMATTING_LIMIT ||
+ indent1 > FORMATTING_LIMIT ||
+ indent2 > FORMATTING_LIMIT)
+ return 0;
+
+ /*
+ * These values are small enough to fit in any
+ * real-world size_t.
+ */
+ rewrap->width = width;
+ rewrap->indent1 = indent1;
+ rewrap->indent2 = indent2;
+
+ return end - placeholder + 1;
+}
+
static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
const char *placeholder,
struct format_commit_context *c)
@@ -1478,40 +1536,13 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
return ret;
}
case 'w':
- if (placeholder[1] == '(') {
- unsigned long width = 0, indent1 = 0, indent2 = 0;
- char *next;
- const char *start = placeholder + 2;
- const char *end = strchr(start, ')');
- if (!end)
- return 0;
- if (end > start) {
- width = strtoul(start, &next, 10);
- if (*next == ',') {
- indent1 = strtoul(next + 1, &next, 10);
- if (*next == ',') {
- indent2 = strtoul(next + 1,
- &next, 10);
- }
- }
- if (*next != ')')
- return 0;
- }
-
- /*
- * We need to limit the format here as it allows the
- * user to prepend arbitrarily many bytes to the buffer
- * when rewrapping.
- */
- if (width > FORMATTING_LIMIT ||
- indent1 > FORMATTING_LIMIT ||
- indent2 > FORMATTING_LIMIT)
- return 0;
- rewrap_message_tail(sb, c, width, indent1, indent2);
- return end - placeholder + 1;
- } else
- return 0;
-
+ {
+ struct rewrap_args rewrap;
+ res = parse_rewrap(placeholder, &rewrap);
+ if (res)
+ rewrap_message_tail(sb, c, &rewrap);
+ return res;
+ }
case '<':
case '>':
return parse_padding_placeholder(placeholder, &c->pad);
@@ -2005,6 +2036,7 @@ void repo_format_commit_message(struct repository *r,
};
const char *output_enc = pretty_ctx->output_encoding;
const char *utf8 = "UTF-8";
+ const struct rewrap_args rewrap_reset = { 0 };
while (strbuf_expand_step(sb, &format)) {
size_t len;
@@ -2016,7 +2048,7 @@ void repo_format_commit_message(struct repository *r,
else
strbuf_addch(sb, '%');
}
- rewrap_message_tail(sb, &context, 0, 0, 0);
+ rewrap_message_tail(sb, &context, &rewrap_reset);
/*
* Convert output to an actual output encoding; note that
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 7/8] pretty: refactor parsing of magic
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
` (5 preceding siblings ...)
2025-03-19 7:23 ` [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 8/8] pretty: refactor parsing of decoration options Martin Ågren
7 siblings, 1 reply; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
Similar to the previous commit, pull out our parsing of initial
placeholder magic into a separate function. This helps make it a bit
easier to get an overview of `format_commit_item()`. It also represents
another small step towards separating the parsing of placeholders from
subsequent usage of the parsed information.
This diff might be a bit easier to read with `-w`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 69 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 42 insertions(+), 27 deletions(-)
diff --git a/pretty.c b/pretty.c
index c44ff87481..ddc7fd6aab 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1929,17 +1929,17 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
return total_consumed;
}
-static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
- const char *placeholder,
- struct format_commit_context *context)
+enum magic {
+ NO_MAGIC,
+ ADD_LF_BEFORE_NON_EMPTY,
+ DEL_LF_BEFORE_EMPTY,
+ ADD_SP_BEFORE_NON_EMPTY
+};
+
+/* 2 for 'bad magic', otherwise whether we consumed 0 or 1 chars. */
+static size_t parse_magic(const char *placeholder, enum magic *ret)
{
- size_t consumed, orig_len;
- enum {
- NO_MAGIC,
- ADD_LF_BEFORE_NON_EMPTY,
- DEL_LF_BEFORE_EMPTY,
- ADD_SP_BEFORE_NON_EMPTY
- } magic = NO_MAGIC;
+ enum magic magic;
switch (placeholder[0]) {
case '-':
@@ -1952,28 +1952,43 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
magic = ADD_SP_BEFORE_NON_EMPTY;
break;
default:
- break;
+ *ret = NO_MAGIC;
+ return 0;
}
- if (magic != NO_MAGIC) {
- placeholder++;
- switch (placeholder[0]) {
- case 'w':
- /*
- * `%+w()` cannot ever expand to a non-empty string,
- * and it potentially changes the layout of preceding
- * contents. We're thus not able to handle the magic in
- * this combination and refuse the pattern.
- */
- return 0;
- };
- }
+ switch (placeholder[1]) {
+ case 'w':
+ /*
+ * `%+w()` cannot ever expand to a non-empty string,
+ * and it potentially changes the layout of preceding
+ * contents. We're thus not able to handle the magic in
+ * this combination and refuse the pattern.
+ */
+ *ret = NO_MAGIC;
+ return 2;
+ };
+
+ *ret = magic;
+ return 1;
+}
+
+static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
+ const char *placeholder,
+ struct format_commit_context *context)
+{
+ size_t consumed, orig_len;
+ enum magic magic;
+
+ consumed = parse_magic(placeholder, &magic);
+ if (consumed > 1)
+ return 0;
+ placeholder += consumed;
orig_len = sb->len;
if (context->pad.flush_type == no_flush)
- consumed = format_commit_one(sb, placeholder, context);
+ consumed += format_commit_one(sb, placeholder, context);
else
- consumed = format_and_pad_commit(sb, placeholder, context);
+ consumed += format_and_pad_commit(sb, placeholder, context);
if (magic == NO_MAGIC)
return consumed;
@@ -1986,7 +2001,7 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
else if (magic == ADD_SP_BEFORE_NON_EMPTY)
strbuf_insertstr(sb, orig_len, " ");
}
- return consumed + 1;
+ return consumed;
}
void userformat_find_requirements(const char *fmt, struct userformat_want *w)
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 8/8] pretty: refactor parsing of decoration options
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
` (6 preceding siblings ...)
2025-03-19 7:23 ` [PATCH 7/8] pretty: refactor parsing of magic Martin Ågren
@ 2025-03-19 7:23 ` Martin Ågren
7 siblings, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-19 7:23 UTC (permalink / raw)
To: git; +Cc: Andy Koppe
After having spotted "%(decorate", we see if there's a ':' and, if so,
reach out to `parse_decoration_options()`. We then verify there's a
closing ')' before actually considering the placeholder valid. Pull the
handling of ':' and ')' into `parse_decoration_options()` so that it's
more of a one-stop shop for handling everything after "%(decorate". Let
this include freeing up resources in the error path to make it really
easy to use this function.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
pretty.c | 52 ++++++++++++++++++++++++++++++----------------------
1 file changed, 30 insertions(+), 22 deletions(-)
diff --git a/pretty.c b/pretty.c
index ddc7fd6aab..d5a8ceb7ef 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1430,17 +1430,6 @@ static int parse_decoration_option(const char **arg,
return 0;
}
-static void parse_decoration_options(const char **arg,
- struct decoration_options *opts)
-{
- while (parse_decoration_option(arg, "prefix", &opts->prefix) ||
- parse_decoration_option(arg, "suffix", &opts->suffix) ||
- parse_decoration_option(arg, "separator", &opts->separator) ||
- parse_decoration_option(arg, "pointer", &opts->pointer) ||
- parse_decoration_option(arg, "tag", &opts->tag))
- ;
-}
-
static void free_decoration_options(const struct decoration_options *opts)
{
free(opts->prefix);
@@ -1450,6 +1439,30 @@ static void free_decoration_options(const struct decoration_options *opts)
free(opts->tag);
}
+static int parse_decoration_options(const char **arg,
+ struct decoration_options *opts)
+{
+ memset(opts, 0, sizeof(*opts));
+
+ if (**arg == ':') {
+ (*arg)++;
+ while (parse_decoration_option(arg, "prefix", &opts->prefix) ||
+ parse_decoration_option(arg, "suffix", &opts->suffix) ||
+ parse_decoration_option(arg, "separator", &opts->separator) ||
+ parse_decoration_option(arg, "pointer", &opts->pointer) ||
+ parse_decoration_option(arg, "tag", &opts->tag))
+ ;
+ }
+
+ if (**arg != ')') {
+ free_decoration_options(opts);
+ return -1;
+ }
+ (*arg)++;
+
+ return 0;
+}
+
static size_t parse_rewrap(const char *placeholder, struct rewrap_args *rewrap)
{
unsigned long width = 0, indent1 = 0, indent2 = 0;
@@ -1735,20 +1748,15 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
}
if (skip_prefix(placeholder, "(decorate", &arg)) {
- struct decoration_options opts = { NULL };
- size_t ret = 0;
+ struct decoration_options opts;
- if (*arg == ':') {
- arg++;
- parse_decoration_options(&arg, &opts);
- }
- if (*arg == ')') {
- format_decorations(sb, commit, c->auto_color, &opts);
- ret = arg - placeholder + 1;
- }
+ if (parse_decoration_options(&arg, &opts) < 0)
+ return 0;
+
+ format_decorations(sb, commit, c->auto_color, &opts);
free_decoration_options(&opts);
- return ret;
+ return arg - placeholder;
}
/* For the rest we have to parse the commit header. */
--
2.49.0.472.ge94155a9ec
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/8] pretty: simplify if-else to reduce code duplication
2025-03-19 7:23 ` [PATCH 2/8] pretty: simplify if-else to reduce code duplication Martin Ågren
@ 2025-03-20 9:17 ` Patrick Steinhardt
2025-03-20 16:10 ` Martin Ågren
2025-03-24 3:50 ` Jeff King
1 sibling, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-20 9:17 UTC (permalink / raw)
To: Martin Ågren; +Cc: git, Jeff King
On Wed, Mar 19, 2025 at 08:23:35AM +0100, Martin Ågren wrote:
> First we look for "auto,", then we try "always,", then we fall back to
Nit: we typically have the body carry enough context so that it makes
sense even without reading the commit subject.
> the default, which is to do exactly the same thing as we do for "auto,".
> The amount of code duplication isn't huge, but still: reading this code
> carefully requires spending at least *some* time on making sure the two
> blocks of code are indeed identical.
>
> Rearrange the checks so that we end with the default case,
> opportunistically consuming the "auto," which may or may not be there.
>
> In the "always," case, we don't actually *do* anything, so if we were
> into golfing, we'd just write the whole thing as a single
>
> if (!skip_prefix(begin, "always,", &begin)) {
> ...
> }
>
> If we ever learn something new besides "always," and "auto," we'd need
> to pull things apart again. Plus we still need somewhere to place the
> comment. Let's focus on code de-duplication rather than golfing for now.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> pretty.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/pretty.c b/pretty.c
> index a4e5fc5c50..6a4264dd01 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1076,13 +1076,11 @@ static size_t parse_color(struct strbuf *sb, /* in UTF-8 */
> if (!end)
> return 0;
>
> - if (skip_prefix(begin, "auto,", &begin)) {
> - if (!want_color(c->pretty_ctx->color))
> - return end - placeholder + 1;
> - } else if (skip_prefix(begin, "always,", &begin)) {
> + if (skip_prefix(begin, "always,", &begin)) {
> /* nothing to do; we do not respect want_color at all */
> } else {
> /* the default is the same as "auto" */
> + skip_prefix(begin, "auto,", &begin);
> if (!want_color(c->pretty_ctx->color))
> return end - placeholder + 1;
> }
Okay, this change should lead to the same results as before indeed. As
you mention it does require us to be more careful if we ever were to
introduce another option here. But I still think it's fine to simplify
the code like this.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] pretty: tighten function signature to not take `void *`
2025-03-19 7:23 ` [PATCH 1/8] pretty: tighten function signature to not take `void *` Martin Ågren
@ 2025-03-20 9:18 ` Patrick Steinhardt
0 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-20 9:18 UTC (permalink / raw)
To: Martin Ågren; +Cc: git
On Wed, Mar 19, 2025 at 08:23:34AM +0100, Martin Ågren wrote:
> We take a `void *` and immediately cast it. Both callers already have
> this pointer as the right type, so tighten the interface and stop
> casting.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> pretty.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/pretty.c b/pretty.c
> index 0bc8ad8a9a..a4e5fc5c50 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1437,9 +1437,8 @@ static void free_decoration_options(const struct decoration_options *opts)
>
> static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
> const char *placeholder,
> - void *context)
> + struct format_commit_context *c)
> {
> - struct format_commit_context *c = context;
> const struct commit *commit = c->commit;
> const char *msg = c->message;
> struct commit_list *p;
Makes sense. The function has been introduced all the way back in
9fa708dab1c (Pretty-format: %[+-]x to tweak inter-item newlines,
2009-10-04), and at that point in time the callers only had `void *`
contexts available. That has changed eventually, so I agree that it is
nice to adapt accordingly now.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders
2025-03-19 7:23 ` [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders Martin Ågren
@ 2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:11 ` Martin Ågren
0 siblings, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-20 9:18 UTC (permalink / raw)
To: Martin Ågren; +Cc: git
On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote:
> When we parse a padding directive ("%<" or "%>"), we might populate a
> few of the struct's fields before bailing. This can result in such
> half-parsed information being used to actually introduce some
> padding/truncation.
>
> When parsing a "%<" or "%>", only store the parsed data after parsing
> successfully. The added test would have failed before this commit. It
> also shows how the existing behavior is hardly something someone can
> rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
> in the pretty output.
Ideally I'd expect us to die when seeing misformatted placeholders like
this. This is way less confusing to the user as otherwise things _look_
like they work, but we silently do the wrong thing.
That being said, I have no idea whether we can do such a change now
without breaking existing usecases. As you rightfully argue the result
already is wrong, but with my proposal we'd completely refuse to do
anything. Which I'd argue is a good thing in the end.
> We could let the caller use a temporary struct and only copy the data on
> success. Let's instead make our parsing function easy to use correctly
> by letting it only touch the output struct in the success case.
s/success/&ful/
> While setting up a temporary struct for parsing into, we might as well
> initialize it to a well-defined state. It's unnecessary for the current
> implementation since it always writes to all three fields in a
> successful case, but some future-proofing shouldn't hurt.
>
> Note that the test relies on first using a correct placeholder
> "%<(4,trunc)" where "trunc" (`trunc_right`) lingers in our struct until
> it's then used instead of the invalid "bad". The next commit will teach
> us to clean up any remnants of "%<(4,trunc)" after handling it.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> pretty.c | 18 ++++++++++++------
> t/t4205-log-pretty-formats.sh | 6 ++++++
> 2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/pretty.c b/pretty.c
> index e5e8ef24fa..a4fa052f8b 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1121,6 +1121,11 @@ static size_t parse_padding_placeholder(const char *placeholder,
> const char *ch = placeholder;
> enum flush_type flush_type;
> int to_column = 0;
> + struct padding_args ans = {
> + .flush_type = no_flush,
> + .truncate = trunc_none,
> + .padding = 0,
> + };
>
> switch (*ch++) {
> case '<':
I honestly have no idea what `ans` stands for. You could call it
`result` to signify that it's what we'll ultimately bubble up to the
caller in the successful case.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/8] pretty: after padding, reset padding info
2025-03-19 7:23 ` [PATCH 5/8] pretty: after padding, reset padding info Martin Ågren
@ 2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:11 ` Martin Ågren
0 siblings, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-20 9:18 UTC (permalink / raw)
To: Martin Ågren; +Cc: git
On Wed, Mar 19, 2025 at 08:23:38AM +0100, Martin Ågren wrote:
> After handling a padding directive ("%<" or "%>"), we leave the `struct
> padding_args` in a halfway state. We modify it a bit as we apply the
> padding/truncation so that by the time we're done, it can't be in quite
> as many states as when we started. Still, we don't fully restore it to
> its default, no-action state.
>
> "%<" and "%>" should only affect the next placeholder, but leaving a bit
> of state around doesn't make it obvious that we don't spill any of it
> into our handling of later placeholders. The previous commit closed off
> a way of populating only half the `struct padding_args`, thereby fixing
> a bug that *also* relied on then having the other half contain this kind
> of lingering data.
>
> After that fix, I haven't figured out a way to provoke a bug using just
> this here half of the issue. Still, after handling padding, let's drop
> all remnants of the previous "%<" or "%>".
>
> Unlike the bug fixed in the previous commit, this could have some
> realistic chance of regressing something for someone if they've actually
> been using such state leftovers (knowingly or not). Still, it seems
> worthwhile to try to tighten this.
Yeah, I agree. It's very surprising that we retain only a subset of
state, and that does feel like a bug to me.
> This change to pretty.c would have been sufficient to make the test
> added in the previous commit pass. Belt and suspenders.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> pretty.c | 2 ++
> t/t4205-log-pretty-formats.sh | 9 +++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/pretty.c b/pretty.c
> index a4fa052f8b..f53e77ed86 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1893,6 +1893,8 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
> }
> strbuf_release(&local_sb);
> c->pad.flush_type = no_flush;
> + c->pad.truncate = trunc_none;
> + c->pad.padding = 0;
> return total_consumed;
> }
This is using the same default values now as you started to use in the
preceding commit. It might make sense to introduce a macro or function
to initialize the structure so that we don't duplicate initialization.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder
2025-03-19 7:23 ` [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder Martin Ågren
@ 2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:11 ` Martin Ågren
0 siblings, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-20 9:18 UTC (permalink / raw)
To: Martin Ågren; +Cc: git, René Scharfe
On Wed, Mar 19, 2025 at 08:23:39AM +0100, Martin Ågren wrote:
> Our parsing of a "%w" placeholder is quite a big chunk of code in
> the middle of our switch for handling a few different placeholders. We
> parse into three different variables, then use them to compare to and
> update existing values in the big `struct format_commit_context`.
>
> Pull out a helper function for parsing such a "%w" placeholder. Define a
> struct for collecting the three variables.
>
> Unlike recent commits, parsing and subsequent use are already a bit more
> separated in the sense that we don't parse directly into the big context
> struct. Thus, unlike the preceding commits, this does not fix any bugs
> that I'm aware of. There's still value in separating parsing and usage
> more clearly and simplifying `format_commit_one()`.
>
> Note that we use two different types for these values, `unsigned long`
> when parsing, `size_t` when eventually applying. Let's go for `size_t`
> in our struct. I don't know if there are platforms where assigning an
> `unsigned long` to a `size_t` could truncate the value, but since we
> already verify the values to be at most 16 KiB, we should be able to fit
> them into any sane `size_t`s.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> pretty.c | 120 +++++++++++++++++++++++++++++++++++--------------------
> 1 file changed, 76 insertions(+), 44 deletions(-)
>
> diff --git a/pretty.c b/pretty.c
> index f53e77ed86..c44ff87481 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -893,6 +893,10 @@ struct padding_args {
> int padding;
> };
>
> +struct rewrap_args {
> + size_t width, indent1, indent2;
> +};
> +
> struct format_commit_context {
> struct repository *repository;
> const struct commit *commit;
> @@ -902,7 +906,7 @@ struct format_commit_context {
> struct signature_check signature_check;
> const char *message;
> char *commit_encoding;
> - size_t width, indent1, indent2;
> + struct rewrap_args rewrap;
> int auto_color;
> struct padding_args pad;
>
> @@ -1034,18 +1038,21 @@ static void strbuf_wrap(struct strbuf *sb, size_t pos,
>
> static void rewrap_message_tail(struct strbuf *sb,
> struct format_commit_context *c,
> - size_t new_width, size_t new_indent1,
> - size_t new_indent2)
> + const struct rewrap_args *new_rewrap)
> {
> - if (c->width == new_width && c->indent1 == new_indent1 &&
> - c->indent2 == new_indent2)
> + const struct rewrap_args *old_rewrap = &c->rewrap;
> +
> + if (old_rewrap->width == new_rewrap->width &&
> + old_rewrap->indent1 == new_rewrap->indent1 &&
> + old_rewrap->indent2 == new_rewrap->indent2)
> return;
> +
> if (c->wrap_start < sb->len)
> - strbuf_wrap(sb, c->wrap_start, c->width, c->indent1, c->indent2);
> + strbuf_wrap(sb, c->wrap_start, old_rewrap->width,
> + old_rewrap->indent1, old_rewrap->indent2);
> +
> c->wrap_start = sb->len;
> - c->width = new_width;
> - c->indent1 = new_indent1;
> - c->indent2 = new_indent2;
> + c->rewrap = *new_rewrap;
> }
>
> static int format_reflog_person(struct strbuf *sb,
> @@ -1443,6 +1450,57 @@ static void free_decoration_options(const struct decoration_options *opts)
> free(opts->tag);
> }
>
> +static size_t parse_rewrap(const char *placeholder, struct rewrap_args *rewrap)
> +{
> + unsigned long width = 0, indent1 = 0, indent2 = 0;
> + char *next;
> + const char *start;
> + const char *end;
> +
> + memset(rewrap, 0, sizeof(*rewrap));
The `memset()` feels rather unnecessary as we only use the result at our
single caller in case we return successfully. And if we do, we know to
initialize all struct fields.
> + if (placeholder[1] != '(')
> + return 0;
This matches the `else` branch. It's nice that it's converted into an
early return.
> + start = placeholder + 2;
> + end = strchr(start, ')');
> +
> + if (!end)
> + return 0;
> + if (end > start) {
> + width = strtoul(start, &next, 10);
> + if (*next == ',') {
> + indent1 = strtoul(next + 1, &next, 10);
> + if (*next == ',') {
> + indent2 = strtoul(next + 1,
> + &next, 10);
> + }
> + }
> + if (*next != ')')
> + return 0;
> + }
> +
> + /*
> + * We need to limit the format here as it allows the
> + * user to prepend arbitrarily many bytes to the buffer
> + * when rewrapping.
> + */
> + if (width > FORMATTING_LIMIT ||
> + indent1 > FORMATTING_LIMIT ||
> + indent2 > FORMATTING_LIMIT)
> + return 0;
And all of the above matches the `if` branch, except that we don't
perform the rewrap itself.
Looks good.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 7/8] pretty: refactor parsing of magic
2025-03-19 7:23 ` [PATCH 7/8] pretty: refactor parsing of magic Martin Ågren
@ 2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:12 ` Martin Ågren
0 siblings, 1 reply; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-20 9:18 UTC (permalink / raw)
To: Martin Ågren; +Cc: git, Junio C Hamano
On Wed, Mar 19, 2025 at 08:23:40AM +0100, Martin Ågren wrote:
> Similar to the previous commit, pull out our parsing of initial
> placeholder magic into a separate function. This helps make it a bit
> easier to get an overview of `format_commit_item()`. It also represents
> another small step towards separating the parsing of placeholders from
> subsequent usage of the parsed information.
>
> This diff might be a bit easier to read with `-w`.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> pretty.c | 69 ++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 42 insertions(+), 27 deletions(-)
>
> diff --git a/pretty.c b/pretty.c
> index c44ff87481..ddc7fd6aab 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1929,17 +1929,17 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
> return total_consumed;
> }
>
> -static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
> - const char *placeholder,
> - struct format_commit_context *context)
> +enum magic {
> + NO_MAGIC,
> + ADD_LF_BEFORE_NON_EMPTY,
> + DEL_LF_BEFORE_EMPTY,
> + ADD_SP_BEFORE_NON_EMPTY
> +};
> +
It would be nice to give all of these enums a common prefix, e.g.:
enum magic {
MAGIC_NONE,
MAGIC_ADD_LF_BEFORE_NON_EMPTY,
MAGIC_DEL_LF_BEFORE_EMPTY,
MAGIC_ADD_SP_BEFORE_NON_EMPTY
};
Makes it easier to see that things belong together and it provides
proper namespacing.
> +/* 2 for 'bad magic', otherwise whether we consumed 0 or 1 chars. */
> +static size_t parse_magic(const char *placeholder, enum magic *ret)
> {
> - size_t consumed, orig_len;
> - enum {
> - NO_MAGIC,
> - ADD_LF_BEFORE_NON_EMPTY,
> - DEL_LF_BEFORE_EMPTY,
> - ADD_SP_BEFORE_NON_EMPTY
> - } magic = NO_MAGIC;
> + enum magic magic;
>
> switch (placeholder[0]) {
> case '-':
On the other hand you simply retain existing names. I don't insist on
the refactoring, but still thing it would be nice as the enum has wider
scope now.
> @@ -1952,28 +1952,43 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
> magic = ADD_SP_BEFORE_NON_EMPTY;
> break;
> default:
> - break;
> + *ret = NO_MAGIC;
> + return 0;
> }
> - if (magic != NO_MAGIC) {
> - placeholder++;
>
> - switch (placeholder[0]) {
> - case 'w':
> - /*
> - * `%+w()` cannot ever expand to a non-empty string,
> - * and it potentially changes the layout of preceding
> - * contents. We're thus not able to handle the magic in
> - * this combination and refuse the pattern.
> - */
> - return 0;
> - };
> - }
> + switch (placeholder[1]) {
> + case 'w':
> + /*
> + * `%+w()` cannot ever expand to a non-empty string,
> + * and it potentially changes the layout of preceding
> + * contents. We're thus not able to handle the magic in
> + * this combination and refuse the pattern.
> + */
> + *ret = NO_MAGIC;
> + return 2;
> + };
> +
> + *ret = magic;
> + return 1;
> +}
> +
> +static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
> + const char *placeholder,
> + struct format_commit_context *context)
> +{
> + size_t consumed, orig_len;
> + enum magic magic;
> +
> + consumed = parse_magic(placeholder, &magic);
> + if (consumed > 1)
> + return 0;
> + placeholder += consumed;
>
> orig_len = sb->len;
> if (context->pad.flush_type == no_flush)
> - consumed = format_commit_one(sb, placeholder, context);
> + consumed += format_commit_one(sb, placeholder, context);
> else
> - consumed = format_and_pad_commit(sb, placeholder, context);
> + consumed += format_and_pad_commit(sb, placeholder, context);
> if (magic == NO_MAGIC)
> return consumed;
>
> @@ -1986,7 +2001,7 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
> else if (magic == ADD_SP_BEFORE_NON_EMPTY)
> strbuf_insertstr(sb, orig_len, " ");
> }
> - return consumed + 1;
> + return consumed;
It took me a bit to figure out why this is equivalent to what we had
before. But:
- If `parse_magic()` returns bigger than 1 we'd have exited early, so
this return here is never hit.
- If it returns `0` we have hit `NO_MAGIC`, and we have another early
return for this case.
So we only end up here in case `consumed = parse_magic(...)` is 1, and
then we add the result from `format_and_pad_commit()` to that value.
Which means that the refactoring is true to the original spirit.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/8] pretty: simplify if-else to reduce code duplication
2025-03-20 9:17 ` Patrick Steinhardt
@ 2025-03-20 16:10 ` Martin Ågren
0 siblings, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-20 16:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Jeff King
Hi Patrick,
Thanks for reviewing!
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Mar 19, 2025 at 08:23:35AM +0100, Martin Ågren wrote:
> > First we look for "auto,", then we try "always,", then we fall back to
>
> Nit: we typically have the body carry enough context so that it makes
> sense even without reading the commit subject.
Thanks. I'll add some context: "After spotting "%C", we first ..."
Martin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders
2025-03-20 9:18 ` Patrick Steinhardt
@ 2025-03-20 16:11 ` Martin Ågren
2025-03-24 10:10 ` Patrick Steinhardt
0 siblings, 1 reply; 22+ messages in thread
From: Martin Ågren @ 2025-03-20 16:11 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote:
> > When parsing a "%<" or "%>", only store the parsed data after parsing
> > successfully. The added test would have failed before this commit. It
> > also shows how the existing behavior is hardly something someone can
> > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
> > in the pretty output.
>
> Ideally I'd expect us to die when seeing misformatted placeholders like
> this. This is way less confusing to the user as otherwise things _look_
> like they work, but we silently do the wrong thing.
Right. I can see how it makes some kind of sense to print what we don't
understand when it's something short and simple like "%X". But for more
complex "%X(first,second)" it's kind of obvious that a misspelled
"X(fist,second)" isn't something you want in the output. The whole "if
we can't parse, return zero as the number of consumed characters so that
we can print verbatim while looking for next '%'" is a central piece of
the design here. One could certainly imagine a "strict" mode.
> That being said, I have no idea whether we can do such a change now
> without breaking existing usecases. As you rightfully argue the result
> already is wrong, but with my proposal we'd completely refuse to do
> anything. Which I'd argue is a good thing in the end.
I can see the value of a strict mode, with command line options and
config switches and whatnot, maybe even a changed default behavior at
some point. I'd rather punt on that for now. TBH, I'd be afraid to do a
hard switch from "0 means print it instead" to "0 means die". I don't
disagree that it would be a better end-game though, at some point.
> > We could let the caller use a temporary struct and only copy the data on
> > success. Let's instead make our parsing function easy to use correctly
> > by letting it only touch the output struct in the success case.
>
> s/success/&ful/
Thanks.
> > + struct padding_args ans = {
> > + .flush_type = no_flush,
> > + .truncate = trunc_none,
> > + .padding = 0,
> > + };
> >
> > switch (*ch++) {
> > case '<':
>
> I honestly have no idea what `ans` stands for. You could call it
> `result` to signify that it's what we'll ultimately bubble up to the
> caller in the successful case.
Fair. :-) It's "answer", but "result" is much better. Thanks.
Martin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/8] pretty: after padding, reset padding info
2025-03-20 9:18 ` Patrick Steinhardt
@ 2025-03-20 16:11 ` Martin Ågren
0 siblings, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-20 16:11 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Mar 19, 2025 at 08:23:38AM +0100, Martin Ågren wrote:
> Yeah, I agree. It's very surprising that we retain only a subset of
> state, and that does feel like a bug to me.
>
> > c->pad.flush_type = no_flush;
> > + c->pad.truncate = trunc_none;
> > + c->pad.padding = 0;
> > return total_consumed;
> > }
>
> This is using the same default values now as you started to use in the
> preceding commit. It might make sense to introduce a macro or function
> to initialize the structure so that we don't duplicate initialization.
Good point. I'll make the preceding commit use a new
`padding_args_clear()`, then reuse it here.
BTW, we rely on initializing the struct with all-zeroes to put it in
this cleared state. Which is true, since the "none"/"no" enum members
are indeed zero. That's not explicit though. I'm thinking of adding a
preparatory patch to make `no_flush` and `trunc_none` be explicitly
zero, and see if there are other such enum values in this file.
Martin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder
2025-03-20 9:18 ` Patrick Steinhardt
@ 2025-03-20 16:11 ` Martin Ågren
0 siblings, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-20 16:11 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, René Scharfe
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Mar 19, 2025 at 08:23:39AM +0100, Martin Ågren wrote:
> > + memset(rewrap, 0, sizeof(*rewrap));
>
> The `memset()` feels rather unnecessary as we only use the result at our
> single caller in case we return successfully. And if we do, we know to
> initialize all struct fields.
Thanks, good point. I'll drop it.
Martin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 7/8] pretty: refactor parsing of magic
2025-03-20 9:18 ` Patrick Steinhardt
@ 2025-03-20 16:12 ` Martin Ågren
0 siblings, 0 replies; 22+ messages in thread
From: Martin Ågren @ 2025-03-20 16:12 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Mar 19, 2025 at 08:23:40AM +0100, Martin Ågren wrote:
> > +enum magic {
> > + NO_MAGIC,
> > + ADD_LF_BEFORE_NON_EMPTY,
> > + DEL_LF_BEFORE_EMPTY,
> > + ADD_SP_BEFORE_NON_EMPTY
> > +};
> > +
>
> It would be nice to give all of these enums a common prefix, e.g.:
>
> enum magic {
> MAGIC_NONE,
> MAGIC_ADD_LF_BEFORE_NON_EMPTY,
> MAGIC_DEL_LF_BEFORE_EMPTY,
> MAGIC_ADD_SP_BEFORE_NON_EMPTY
> };
>
> Makes it easier to see that things belong together and it provides
> proper namespacing.
Agreed, good point.
> On the other hand you simply retain existing names. I don't insist on
> the refactoring, but still thing it would be nice as the enum has wider
> scope now.
Right. It's only file-scoped, but that's still a bigger scope... I'll
rename them to give them all a common prefix as suggested.
> It took me a bit to figure out why this is equivalent to what we had
> before. But:
>
> - If `parse_magic()` returns bigger than 1 we'd have exited early, so
> this return here is never hit.
>
> - If it returns `0` we have hit `NO_MAGIC`, and we have another early
> return for this case.
>
> So we only end up here in case `consumed = parse_magic(...)` is 1, and
> then we add the result from `format_and_pad_commit()` to that value.
> Which means that the refactoring is true to the original spirit.
If there's anything in particular you think should be called out in the
commit message to assist future readers, just let me know. I'll take
your points above as inspiration for things to highlight better.
There's also the return value "2", which is a bit, well, magic. Or at
least fairly arbitrary. I kind of preferred it over switching to a
signed type in this one spot though.
Thanks for all your very helpful comments.
Martin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/8] pretty: simplify if-else to reduce code duplication
2025-03-19 7:23 ` [PATCH 2/8] pretty: simplify if-else to reduce code duplication Martin Ågren
2025-03-20 9:17 ` Patrick Steinhardt
@ 2025-03-24 3:50 ` Jeff King
1 sibling, 0 replies; 22+ messages in thread
From: Jeff King @ 2025-03-24 3:50 UTC (permalink / raw)
To: Martin Ågren; +Cc: git
On Wed, Mar 19, 2025 at 08:23:35AM +0100, Martin Ågren wrote:
> First we look for "auto,", then we try "always,", then we fall back to
> the default, which is to do exactly the same thing as we do for "auto,".
> The amount of code duplication isn't huge, but still: reading this code
> carefully requires spending at least *some* time on making sure the two
> blocks of code are indeed identical.
>
> Rearrange the checks so that we end with the default case,
> opportunistically consuming the "auto," which may or may not be there.
OK. The duplicated lines are not all that long, but I don't mind
collapsing the cases, especially with the explanatory comment that's
there.
> In the "always," case, we don't actually *do* anything, so if we were
> into golfing, we'd just write the whole thing as a single
>
> if (!skip_prefix(begin, "always,", &begin)) {
> ...
> }
>
> If we ever learn something new besides "always," and "auto," we'd need
> to pull things apart again. Plus we still need somewhere to place the
> comment. Let's focus on code de-duplication rather than golfing for now.
Yeah, I think what you wrote in the patch is much better than trying to
golf further.
So looks good to me.
-Peff
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders
2025-03-20 16:11 ` Martin Ågren
@ 2025-03-24 10:10 ` Patrick Steinhardt
0 siblings, 0 replies; 22+ messages in thread
From: Patrick Steinhardt @ 2025-03-24 10:10 UTC (permalink / raw)
To: Martin Ågren; +Cc: git
On Thu, Mar 20, 2025 at 05:11:09PM +0100, Martin Ågren wrote:
> On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
> >
> > On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote:
> > > When parsing a "%<" or "%>", only store the parsed data after parsing
> > > successfully. The added test would have failed before this commit. It
> > > also shows how the existing behavior is hardly something someone can
> > > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
> > > in the pretty output.
> >
> > Ideally I'd expect us to die when seeing misformatted placeholders like
> > this. This is way less confusing to the user as otherwise things _look_
> > like they work, but we silently do the wrong thing.
>
> Right. I can see how it makes some kind of sense to print what we don't
> understand when it's something short and simple like "%X". But for more
> complex "%X(first,second)" it's kind of obvious that a misspelled
> "X(fist,second)" isn't something you want in the output. The whole "if
> we can't parse, return zero as the number of consumed characters so that
> we can print verbatim while looking for next '%'" is a central piece of
> the design here. One could certainly imagine a "strict" mode.
>
> > That being said, I have no idea whether we can do such a change now
> > without breaking existing usecases. As you rightfully argue the result
> > already is wrong, but with my proposal we'd completely refuse to do
> > anything. Which I'd argue is a good thing in the end.
>
> I can see the value of a strict mode, with command line options and
> config switches and whatnot, maybe even a changed default behavior at
> some point. I'd rather punt on that for now. TBH, I'd be afraid to do a
> hard switch from "0 means print it instead" to "0 means die". I don't
> disagree that it would be a better end-game though, at some point.
Yup, I fully agree that this is a bit more of a risky change and that it
doesn't have to be part of this patch series.
Patrick
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-03-24 10:10 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-19 7:23 [PATCH 0/8] pretty: minor bugfixing, some refactorings Martin Ågren
2025-03-19 7:23 ` [PATCH 1/8] pretty: tighten function signature to not take `void *` Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 2/8] pretty: simplify if-else to reduce code duplication Martin Ågren
2025-03-20 9:17 ` Patrick Steinhardt
2025-03-20 16:10 ` Martin Ågren
2025-03-24 3:50 ` Jeff King
2025-03-19 7:23 ` [PATCH 3/8] pretty: collect padding-related fields in separate struct Martin Ågren
2025-03-19 7:23 ` [PATCH 4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:11 ` Martin Ågren
2025-03-24 10:10 ` Patrick Steinhardt
2025-03-19 7:23 ` [PATCH 5/8] pretty: after padding, reset padding info Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:11 ` Martin Ågren
2025-03-19 7:23 ` [PATCH 6/8] pretty: refactor parsing of line-wrapping "%w" placeholder Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:11 ` Martin Ågren
2025-03-19 7:23 ` [PATCH 7/8] pretty: refactor parsing of magic Martin Ågren
2025-03-20 9:18 ` Patrick Steinhardt
2025-03-20 16:12 ` Martin Ågren
2025-03-19 7:23 ` [PATCH 8/8] pretty: refactor parsing of decoration options Martin Ågren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).