Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Chris Torek <chris.torek@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/6] banned: mark `strok()` as banned
Date: Tue, 18 Apr 2023 15:18:40 -0400	[thread overview]
Message-ID: <cover.1681845518.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681428696.git.me@ttaylorr.com>

Here is a medium-sized reroll of my series to add `strtok()` to the list
of banned functions.

Some notable things that have changed since the first round[1] include:

  - The behavior of `string_list_split_in_place()` is unchanged, and
    `string_list_split_in_place_multi()` now has the more sane behavior
    of removing all delimiter characters from its output, similar to
    strtok().

  - We now guard against dangerous cases of assigning `list->nr` with a
    new `string_list_setlen()` function.

  - More test cases are added in t0063.

  - `strtok_r()` is no longer on the banned list, and `strtok()` now
    *is* on the list, after I forgot to remove a stray `#if 0` left over
    from debugging.

As always, a range-diff is included below for convenience. Thanks in
advance for your review!

[1]: https://lore.kernel.org/git/cover.1681428696.git.me@ttaylorr.com/

Taylor Blau (6):
  string-list: introduce `string_list_split_in_place_multi()`
  string-list: introduce `string_list_setlen()`
  t/helper/test-hashmap.c: avoid using `strtok()`
  t/helper/test-oidmap.c: avoid using `strtok()`
  t/helper/test-json-writer.c: avoid using `strtok()`
  banned.h: mark `strtok()` as banned

 banned.h                    |   2 +
 string-list.c               |  35 ++++++++++--
 string-list.h               |  17 ++++++
 t/helper/test-hashmap.c     |  30 ++++++++---
 t/helper/test-json-writer.c |  76 ++++++++++++++++----------
 t/helper/test-oidmap.c      |  20 ++++---
 t/helper/test-string-list.c |  15 ++++++
 t/t0063-string-list.sh      | 105 +++++++++++++++++++++++++-----------
 8 files changed, 224 insertions(+), 76 deletions(-)

Range-diff against v1:
1:  dda218c8c1 ! 1:  6658b231a9 string-list: introduce `string_list_split_in_place_multi()`
    @@ Commit message
         `_multi` variant splits the given string any any character appearing in
         the string `delim`.
     
    +    Like `strtok()`, the `_multi` variant skips past sequential delimiting
    +    characters. For example:
    +
    +        string_list_split_in_place(&xs, xstrdup("foo::bar::baz"), ":", -1);
    +
    +    would place in `xs` the elements "foo", "bar", and "baz".
    +
         Instead of using `strchr(2)` to locate the first occurrence of the given
         delimiter character, `string_list_split_in_place_multi()` uses
    -    `strpbrk(2)` to find the first occurrence of *any* character in the given
    -    delimiter string.
    +    `strcspn(2)` to move past the initial segment of characters comprised of
    +    any characters in the delimiting set.
    +
    +    When only a single delimiting character is provided, `strcspn(2)` has
    +    equivalent performance to `strchr(2)`. Modern `strcspn(2)`
    +    implementations treat an empty delimiter or the singleton delimiter as a
    +    special case and fall back to calling strchrnul(). Both glibc[1] and
    +    musl[2] implement `strcspn(2)` this way.
     
         Since the `_multi` variant is a generalization of the original
         implementation, reimplement `string_list_split_in_place()` in terms of
         the more general function by providing a single-character string for the
         list of accepted delimiters.
     
    +    To avoid regressions, update t0063 in this patch as well. Any "common"
    +    test cases (i.e., those that produce the same result whether you call
    +    `string_list_split()` or `string_list_split_in_place_multi()`) are
    +    grouped into a loop which is parameterized over the function to test.
    +
    +    Any cases which aren't common (of which there is one existing case, and
    +    a handful of new ones added which are specific to the `_multi` variant)
    +    are tested independently.
    +
    +    [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
    +    [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## string-list.c ##
    @@ string-list.c: int string_list_split(struct string_list *list, const char *strin
      
     -int string_list_split_in_place(struct string_list *list, char *string,
     -			       int delim, int maxsplit)
    -+int string_list_split_in_place_multi(struct string_list *list, char *string,
    -+				     const char *delim, int maxsplit)
    ++static int string_list_split_in_place_1(struct string_list *list, char *string,
    ++					const char *delim, int maxsplit,
    ++					unsigned runs)
      {
      	int count = 0;
      	char *p = string, *end;
     @@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
    + 		die("internal error in string_list_split_in_place(): "
    + 		    "list->strdup_strings must not be set");
    + 	for (;;) {
    ++		if (runs)
    ++			p += strspn(p, delim);
    ++
    + 		count++;
    + 		if (maxsplit >= 0 && count > maxsplit) {
      			string_list_append(list, p);
      			return count;
      		}
     -		end = strchr(p, delim);
    -+		end = strpbrk(p, delim);
    - 		if (end) {
    +-		if (end) {
    ++		end = p + strcspn(p, delim);
    ++		if (end && *end) {
      			*end = '\0';
      			string_list_append(list, p);
    + 			p = end + 1;
     @@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
      		}
      	}
      }
     +
    ++int string_list_split_in_place_multi(struct string_list *list, char *string,
    ++				     const char *delim, int maxsplit)
    ++{
    ++	return string_list_split_in_place_1(list, string, delim, maxsplit, 1);
    ++}
    ++
     +int string_list_split_in_place(struct string_list *list, char *string,
     +			       int delim, int maxsplit)
     +{
     +	char delim_s[2] = { delim, 0 };
     +
    -+	return string_list_split_in_place_multi(list, string, delim_s,
    -+						maxsplit);
    ++	return string_list_split_in_place_1(list, string, delim_s, maxsplit, 0);
     +}
     
      ## string-list.h ##
    @@ string-list.h: int string_list_split(struct string_list *list, const char *strin
     + *
     + * The "_multi" variant splits the given string on any character
     + * appearing in "delim", and the non-"_multi" variant splits only on the
    -+ * given character.
    ++ * given character. The "_multi" variant behaves like `strtok()` where
    ++ * no element contains the delimiting byte(s).
       */
     +int string_list_split_in_place_multi(struct string_list *list, char *string,
     +				     const char *delim, int maxsplit);
      int string_list_split_in_place(struct string_list *list, char *string,
      			       int delim, int maxsplit);
      #endif /* STRING_LIST_H */
    +
    + ## t/helper/test-string-list.c ##
    +@@ t/helper/test-string-list.c: int cmd__string_list(int argc, const char **argv)
    + 		return 0;
    + 	}
    + 
    ++	if (argc == 5 && !strcmp(argv[1], "split_in_place_multi")) {
    ++		struct string_list list = STRING_LIST_INIT_NODUP;
    ++		int i;
    ++		char *s = xstrdup(argv[2]);
    ++		const char *delim = argv[3];
    ++		int maxsplit = atoi(argv[4]);
    ++
    ++		i = string_list_split_in_place_multi(&list, s, delim, maxsplit);
    ++		printf("%d\n", i);
    ++		write_list(&list);
    ++		string_list_clear(&list, 0);
    ++		free(s);
    ++		return 0;
    ++	}
    ++
    + 	if (argc == 4 && !strcmp(argv[1], "filter")) {
    + 		/*
    + 		 * Retain only the items that have the specified prefix.
    +
    + ## t/t0063-string-list.sh ##
    +@@ t/t0063-string-list.sh: test_split () {
    + 	"
    + }
    + 
    +-test_split "foo:bar:baz" ":" "-1" <<EOF
    +-3
    +-[0]: "foo"
    +-[1]: "bar"
    +-[2]: "baz"
    +-EOF
    ++test_split_in_place_multi () {
    ++	cat >expected &&
    ++	test_expect_success "split_in_place_multi $1 at $2, max $3" "
    ++		test-tool string-list split_in_place_multi '$1' '$2' '$3' >actual &&
    ++		test_cmp expected actual
    ++	"
    ++}
    + 
    +-test_split "foo:bar:baz" ":" "0" <<EOF
    +-1
    +-[0]: "foo:bar:baz"
    +-EOF
    ++for test_fn in test_split test_split_in_place_multi
    ++do
    ++	$test_fn "foo:bar:baz" ":" "-1" <<-\EOF
    ++	3
    ++	[0]: "foo"
    ++	[1]: "bar"
    ++	[2]: "baz"
    ++	EOF
    + 
    +-test_split "foo:bar:baz" ":" "1" <<EOF
    +-2
    +-[0]: "foo"
    +-[1]: "bar:baz"
    +-EOF
    ++	$test_fn "foo:bar:baz" ":" "0" <<-\EOF
    ++	1
    ++	[0]: "foo:bar:baz"
    ++	EOF
    + 
    +-test_split "foo:bar:baz" ":" "2" <<EOF
    +-3
    +-[0]: "foo"
    +-[1]: "bar"
    +-[2]: "baz"
    +-EOF
    ++	$test_fn "foo:bar:baz" ":" "1" <<-\EOF
    ++	2
    ++	[0]: "foo"
    ++	[1]: "bar:baz"
    ++	EOF
    + 
    +-test_split "foo:bar:" ":" "-1" <<EOF
    +-3
    +-[0]: "foo"
    +-[1]: "bar"
    +-[2]: ""
    +-EOF
    ++	$test_fn "foo:bar:baz" ":" "2" <<-\EOF
    ++	3
    ++	[0]: "foo"
    ++	[1]: "bar"
    ++	[2]: "baz"
    ++	EOF
    + 
    +-test_split "" ":" "-1" <<EOF
    +-1
    +-[0]: ""
    +-EOF
    ++	$test_fn "foo:bar:" ":" "-1" <<-\EOF
    ++	3
    ++	[0]: "foo"
    ++	[1]: "bar"
    ++	[2]: ""
    ++	EOF
    ++
    ++	$test_fn "" ":" "-1" <<-\EOF
    ++	1
    ++	[0]: ""
    ++	EOF
    ++done
    + 
    + test_split ":" ":" "-1" <<EOF
    + 2
    +@@ t/t0063-string-list.sh: test_split ":" ":" "-1" <<EOF
    + [1]: ""
    + EOF
    + 
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "-1" <<-\EOF
    ++3
    ++[0]: "foo"
    ++[1]: "bar"
    ++[2]: "baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "0" <<-\EOF
    ++1
    ++[0]: "foo:;:bar:;:baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "1" <<-\EOF
    ++2
    ++[0]: "foo"
    ++[1]: "bar:;:baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "2" <<-\EOF
    ++3
    ++[0]: "foo"
    ++[1]: "bar"
    ++[2]: "baz"
    ++EOF
    ++
    ++test_split_in_place_multi "foo:;:bar:;:" ":;" "-1" <<-\EOF
    ++3
    ++[0]: "foo"
    ++[1]: "bar"
    ++[2]: ""
    ++EOF
    ++
    + test_expect_success "test filter_string_list" '
    + 	test "x-" = "x$(test-tool string-list filter - y)" &&
    + 	test "x-" = "x$(test-tool string-list filter no y)" &&
-:  ---------- > 2:  2a20ad8bc5 string-list: introduce `string_list_setlen()`
2:  0f199468a3 ! 3:  0ae07dec36 t/helper/test-hashmap.c: avoid using `strtok()`
    @@ t/helper/test-hashmap.c: int cmd__hashmap(int argc, const char **argv)
     +		 * By doing so, we'll instead overwrite the existing
     +		 * entries and avoid re-allocating.
     +		 */
    -+		parts.nr = 0;
    ++		string_list_setlen(&parts, 0);
      		/* break line into command and up to two parameters */
     -		cmd = strtok(line.buf, DELIM);
     +		string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
3:  135222d04e ! 4:  a659431e9c t/helper/test-oidmap.c: avoid using `strtok()`
    @@ t/helper/test-oidmap.c: int cmd__oidmap(int argc UNUSED, const char **argv UNUSE
      		struct object_id oid;
      
     +		/* see the comment in cmd__hashmap() */
    -+		parts.nr = 0;
    ++		string_list_setlen(&parts, 0);
      		/* break line into command and up to two parameters */
     -		cmd = strtok(line.buf, DELIM);
     +		string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
4:  ae29d4d892 ! 5:  fc6cd23698 t/helper/test-json-writer.c: avoid using `strtok()`
    @@ t/helper/test-json-writer.c: static int scripted(void)
      
     -		verb = strtok(line, " ");
     +		/* see the comment in cmd__hashmap() */
    -+		parts.nr = 0;
    ++		string_list_setlen(&parts, 0);
     +		/* break line into command and zero or more tokens */
     +		string_list_split_in_place(&parts, line, ' ', -1);
     +
5:  1d955f8bc6 ! 6:  56d2318a6d banned.h: mark `strtok()`, `strtok_r()` as banned
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    banned.h: mark `strtok()`, `strtok_r()` as banned
    +    banned.h: mark `strtok()` as banned
     
         `strtok_r()` is reentrant, but `strtok()` is not, meaning that using it
         is not thread-safe.
     
    -    We could ban `strtok()` and force callers to use its reentrant
    -    counterpart, but there are a few drawbacks to doing so:
    +    `strtok()` has a couple of drawbacks that make it undesirable to have
    +    any new instances. In addition to being thread-unsafe, it also
    +    encourages confusing data flows, where `strtok()` may be called from
    +    multiple functions with its first argument as NULL, making it unclear
    +    from the immediate context which string is being tokenized.
     
    -      - `strtok_r()` forces the caller to maintain an extra string pointer
    -        to pass as its `saveptr` value
    +    Now that we have removed all instances of `strtok()` from the tree,
    +    let's ban `strtok()` to avoid introducing new ones in the future. If new
    +    callers should arise, they can either use:
     
    -      - `strtok_r()` also requires that its `saveptr` value be unmodified
    -        between calls.
    +      - `string_list_split_in_place()`,
    +      - `string_list_split_in_place_multi()`, or
    +      - `strtok_r()`.
     
    -      - `strtok()` (and by extension, `strtok_r()`) is confusing when used
    -        across multiple functions, since the caller is supposed to pass NULL
    -        as its first argument after the first call. This makes it difficult
    -        to determine what string is actually being tokenized without clear
    -        dataflow.
    +    Callers are encouraged to use either of the string_list functions when
    +    appropriate over `strtok_r()`, since the latter suffers from the same
    +    confusing data-flow problem as `strtok()` does.
     
    -    So while we could ban only `strtok()`, there really is no good reason to
    -    use either when callers could instead use the much friendlier
    -    `string_list_split_in_place()` API, which avoids the above issues.
    +    But callers may prefer `strtok_r()` when the number of tokens in a given
    +    string is unknown, and they want to split and process them one at a
    +    time, so `strtok_r()` is left off the banned.h list.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ banned.h
      #define strncpy(x,y,n) BANNED(strncpy)
      #undef strncat
      #define strncat(x,y,n) BANNED(strncat)
    -+#if 0
     +#undef strtok
     +#define strtok(x,y) BANNED(strtok)
    -+#undef strtok_r
    -+#define strtok_r(x,y,z) BANNED(strtok_r)
    -+#endif
      
      #undef sprintf
      #undef vsprintf
-- 
2.40.0.358.g56d2318a6d

  parent reply	other threads:[~2023-04-18 19:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 23:31 [PATCH 0/5] banned: mark `strok()`, `strtok_r()` as banned Taylor Blau
2023-04-13 23:31 ` [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 10:10   ` Jeff King
2023-04-18 17:08     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-18 10:23   ` Jeff King
2023-04-18 18:06     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 3/5] t/helper/test-oidmap.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 4/5] t/helper/test-json-writer.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned Taylor Blau
2023-04-14  1:39   ` Junio C Hamano
2023-04-14  2:08     ` Chris Torek
2023-04-14 13:41     ` Taylor Blau
2023-04-18 19:18 ` Taylor Blau [this message]
2023-04-18 19:18   ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 19:39     ` Junio C Hamano
2023-04-18 20:54       ` Taylor Blau
2023-04-22 11:12     ` Jeff King
2023-04-22 15:53       ` René Scharfe
2023-04-23  0:35         ` Jeff King
2023-04-24 16:24           ` Junio C Hamano
2023-04-23  2:38       ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`t Taylor Blau
2023-04-23  2:40         ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-22 11:14     ` Jeff King
2023-04-18 19:18   ` [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-22 11:16     ` Jeff King
2023-04-24 21:19       ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 6/6] banned.h: mark `strtok()` as banned Taylor Blau
2023-04-24 22:20 ` [PATCH v3 0/6] banned: mark `strok()`, `strtok_r()` " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 1/6] string-list: multi-delimiter `string_list_split_in_place()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-25  6:21     ` Jeff King
2023-04-25 21:00       ` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-25 13:57     ` Jeff Hostetler
2023-04-24 22:20   ` [PATCH v3 6/6] banned.h: mark `strtok()` and `strtok_r()` as banned Taylor Blau
2023-04-24 22:25     ` Chris Torek
2023-04-24 23:00       ` Taylor Blau
2023-04-25  6:26     ` Jeff King
2023-04-25 21:02       ` Taylor Blau
2023-04-25  6:27   ` [PATCH v3 0/6] banned: mark `strok()`, " Jeff King
2023-04-25 21:03     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1681845518.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=chris.torek@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).