Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Glen Choo <chooglen@google.com>
To: Linus Arver via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Linus Arver <linusa@google.com>
Subject: Re: [PATCH 2/5] trailer: split process_input_file into separate pieces
Date: Mon, 07 Aug 2023 15:39:28 -0700	[thread overview]
Message-ID: <kl6l5y5qa34v.fsf@chooglen-macbookpro.roam.corp.google.com> (raw)
In-Reply-To: <d023c297dcac0bb96f681dc1fc0116a649c2efec.1691211879.git.gitgitgadget@gmail.com>

"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Currently, process_input_file does three things:
>
>     (1) parse the input string for trailers,
>     (2) print text before the trailers, and
>     (3) calculate the position of the input where the trailers end.
>
> Rename this function to parse_trailers(), and make it only do
> (1).

Okay, process_input_file() is a very unhelpful name (What does it mean
to "process a file"?). In contrast, parse_trailers() is more
self-descriptive (It parses trailers into some appropriate format, so it
shouldn't do things like print.) Makes sense.

Is there some additional, unstated purpose behind this change besides
"move things around for readability"? E.g. do you intend to move
parse_trailers() to a future trailer parsing library? If so, that would
be useful context to evaluate the goodness of this split.

> The caller of this function, process_trailers, becomes responsible
> for (2) and (3). These items belong inside process_trailers because they
> are both concerned with printing the surrounding text around
> trailers (which is already one of the immediate concerns of
> process_trailers).

I agree that (2) doesn't belong in parse_trailers(). OTOH, (3) sounds
like something that belongs in parse_trailers() - you have to parse
trailers in order to tell where the trailers start and end, so it makes
sense for the parsing function to give those values.

> diff --git a/trailer.c b/trailer.c
> index dff3fafe865..16fbba03d07 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -961,28 +961,23 @@ static void unfold_value(struct strbuf *val)
>  	strbuf_release(&out);
>  }
>  
> -static size_t process_input_file(FILE *outfile,
> -				 const char *str,
> -				 struct list_head *head,
> -				 const struct process_trailer_options *opts)
> +/*
> + * Parse trailers in "str" and populate the "head" linked list structure.
> + */
> +static void parse_trailers(struct trailer_info *info,

"info" is an out parameter, and IIRC we typically put out parameters
towards the end. I didn't find a callout in CodingGuidelines, though, so
idk if this is an ironclad rule or not.

> +			     const char *str,
> +			     struct list_head *head,
> +			     const struct process_trailer_options *opts)
>  {
> -	struct trailer_info info;
>  	struct strbuf tok = STRBUF_INIT;
>  	struct strbuf val = STRBUF_INIT;
>  	size_t i;
>  
> -	trailer_info_get(&info, str, opts);
> -
> -	/* Print lines before the trailers as is */
> -	if (!opts->only_trailers)
> -		fwrite(str, 1, info.trailer_start - str, outfile);

We no longer fwrite the contents before the trailer, okay.

> +	trailer_info_get(info, str, opts);

This is where we actually get the start and end of trailers, and each
trailer string. This is parsing out the trailers from a string, so what
other parsing is left? Reading ahead shows that we're actually parsing
the trailer string into a "struct trailer_item". Okay, so this function
is basically a wrapper around trailer_info_get() that also "returns" the
parsed trailer_items.

> -	if (!opts->only_trailers && !info.blank_line_before_trailer)
> -		fprintf(outfile, "\n");
> -

So we don't print the trailing line. Also makes sense.

> @@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile,
>  		}
>  	}
>  
> -	trailer_info_release(&info);
> -
> -	return info.trailer_end - str;
> +	trailer_info_release(info);
>  }
>  

Even though "info" is a pointer passed into this function, we are
_release-ing it. This is not an umabiguously good change, IMO. Before,
"info" was never used outside of this function, so we should obviously
release it before returning. However, now that "info" is an out
parameter, we should be more careful about releasing it. I don't think
it's obvious that the caller will see the right values for
info.trailer_end and info.trailer_start, but free()-d values for
info.trailers, and a meaningless value for info.trailer_nr (since the
items were free()-d).

I think it might be better to update the comment on parse_trailers()
like so:

  /*
   * Parse trailers in "str", populating the trailer info and "head"
   * linked list structure.
   */

and make it the caller's responsibility to call trailer_info_release().
We could move this call to where we "free_all(head)".

>  static void free_all(struct list_head *head)
> @@ -1054,6 +1047,7 @@ void process_trailers(const char *file,
>  {
>  	LIST_HEAD(head);
>  	struct strbuf sb = STRBUF_INIT;
> +	struct trailer_info info;
>  	size_t trailer_end;
>  	FILE *outfile = stdout;
>  
> @@ -1064,8 +1058,16 @@ void process_trailers(const char *file,
>  	if (opts->in_place)
>  		outfile = create_in_place_tempfile(file);

Thinking out loud, should we move the creation of outfile next to where
we first use it?

> +	parse_trailers(&info, sb.buf, &head, opts);
> +	trailer_end = info.trailer_end - sb.buf;
> +
>  	/* Print the lines before the trailers */
> -	trailer_end = process_input_file(outfile, sb.buf, &head, opts);
> +	if (!opts->only_trailers)
> +		fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile);

I'm not sure if it is an unambiguously good change for the caller to
learn how to compute the start and end of the trailer sections by doing
pointer arithmetic, but I guess format_trailer_info() does this anyway,
so your proposal to move (3) outside of the parse_trailers() makes
sense.

It feels a bit non-obvious that trailer_start and trailer_end are
pointing inside the input string. I wonder if we should just return the
_start and _end offsets directly instead of returning pointers. I.e.:

   struct trailer_info {
     int blank_line_before_trailer;
 -  /*
 -   * Pointers to the start and end of the trailer block found. If there
 -   * is no trailer block found, these 2 pointers point to the end of the
 -   * input string.
 -   */
 -   const char *trailer_start, *trailer_end;
 +   /* Offsets to the trailer block start and end in the input string */
 +   size_t *trailer_start, *trailer_end;

Which makes their intended use fairly unambiguous. A quick grep suggests
that in trailer.c, we're roughly as likely to use the pointer directly
vs using it to do pointer arithmetic, so converging on one use might be
a win for readability. The only other user outside of trailer.c is
sequencer.c, which doesn't care about the return type - it only checks
if there are trailers.

  reply	other threads:[~2023-08-07 22:39 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-05  5:04 [PATCH 0/5] Trailer readability cleanups Linus Arver via GitGitGadget
2023-08-05  5:04 ` [PATCH 1/5] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
2023-08-07 21:16   ` Glen Choo
2023-08-08 12:19     ` Phillip Wood
2023-08-10 23:15       ` Linus Arver
2023-08-10 22:50     ` Linus Arver
2023-08-05  5:04 ` [PATCH 2/5] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
2023-08-07 22:39   ` Glen Choo [this message]
2023-08-11  0:41     ` Linus Arver
2023-08-05  5:04 ` [PATCH 3/5] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
2023-08-07 22:51   ` Glen Choo
2023-08-11  0:59     ` Linus Arver
2023-08-11  1:02       ` Linus Arver
2023-08-11 21:11       ` Glen Choo
2023-08-05  5:04 ` [PATCH 4/5] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
2023-08-07 23:28   ` Glen Choo
2023-08-11  1:25     ` Linus Arver
2023-08-11 20:51       ` Glen Choo
2023-08-05  5:04 ` [PATCH 5/5] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
2023-08-07 23:45   ` Glen Choo
2023-08-11 18:00     ` Linus Arver
2023-09-09  6:16 ` [PATCH v2 0/6] Trailer readability cleanups Linus Arver via GitGitGadget
2023-09-09  6:16   ` [PATCH v2 1/6] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
2023-09-11 17:10     ` Junio C Hamano
2023-09-09  6:16   ` [PATCH v2 2/6] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
2023-09-11 17:10     ` Junio C Hamano
2023-09-09  6:16   ` [PATCH v2 3/6] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
2023-09-09  6:16   ` [PATCH v2 4/6] trailer: teach find_patch_start about --no-divider Linus Arver via GitGitGadget
2023-09-11 17:25     ` Junio C Hamano
2023-09-14  2:19       ` Linus Arver
2023-09-14  3:12         ` Junio C Hamano
2023-09-14  5:31           ` Linus Arver
2023-09-09  6:16   ` [PATCH v2 5/6] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
2023-09-11 18:54     ` Junio C Hamano
2023-09-14  2:41       ` Linus Arver
2023-09-14  3:16         ` Junio C Hamano
2023-09-22 18:23           ` Linus Arver
2023-09-22 19:48             ` Junio C Hamano
2023-09-26  5:30               ` Linus Arver
2023-09-09  6:16   ` [PATCH v2 6/6] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
2023-09-11 19:01     ` Junio C Hamano
2023-09-14  1:21       ` Linus Arver
2023-09-14  3:18         ` Linus Arver
2023-09-22 19:50   ` [PATCH v3 0/9] Trailer readability cleanups Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 1/9] trailer: separate public from internal portion of trailer_iterator Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 2/9] trailer: split process_input_file into separate pieces Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 3/9] trailer: split process_command_line_args into separate functions Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 4/9] trailer: rename *_DEFAULT enums to *_UNSPECIFIED Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 5/9] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 6/9] trailer: find the end of the log message Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 7/9] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 8/9] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
2023-09-22 19:50     ` [PATCH v3 9/9] trailer: make stack variable names match field names Linus Arver via GitGitGadget
2023-09-22 22:47     ` [PATCH v3 0/9] Trailer readability cleanups Junio C Hamano
2023-09-22 23:13       ` Linus Arver
2023-09-23  0:48         ` Junio C Hamano
2023-09-26  5:40           ` Linus Arver
2023-09-26  6:22     ` [PATCH v4 0/4] " Linus Arver via GitGitGadget
2023-09-26  6:22       ` [PATCH v4 1/4] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
2023-09-26  6:22       ` [PATCH v4 2/4] trailer: find the end of the log message Linus Arver via GitGitGadget
2023-09-28 23:16         ` Jonathan Tan
2023-10-20  0:24           ` Linus Arver
2023-10-20  0:36             ` Junio C Hamano
2023-09-26  6:22       ` [PATCH v4 3/4] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget
2023-09-26  6:22       ` [PATCH v4 4/4] trailer: only use trailer_block_* variables if trailers were found Linus Arver via GitGitGadget
2023-10-20 19:01       ` [PATCH v5 0/3] Trailer readability cleanups Linus Arver via GitGitGadget
2023-10-20 19:01         ` [PATCH v5 1/3] commit: ignore_non_trailer computes number of bytes to ignore Linus Arver via GitGitGadget
2023-10-20 19:01         ` [PATCH v5 2/3] trailer: find the end of the log message Linus Arver via GitGitGadget
2023-10-20 21:29           ` Junio C Hamano
2023-12-29  6:42             ` Linus Arver
2023-12-29 21:03               ` Linus Arver
2023-10-20 19:01         ` [PATCH v5 3/3] trailer: use offsets for trailer_start/trailer_end Linus Arver via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=kl6l5y5qa34v.fsf@chooglen-macbookpro.roam.corp.google.com \
    --to=chooglen@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=linusa@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).