From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Chris Torek <chris.torek@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/6] banned: mark `strok()` as banned
Date: Tue, 18 Apr 2023 15:18:40 -0400 [thread overview]
Message-ID: <cover.1681845518.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681428696.git.me@ttaylorr.com>
Here is a medium-sized reroll of my series to add `strtok()` to the list
of banned functions.
Some notable things that have changed since the first round[1] include:
- The behavior of `string_list_split_in_place()` is unchanged, and
`string_list_split_in_place_multi()` now has the more sane behavior
of removing all delimiter characters from its output, similar to
strtok().
- We now guard against dangerous cases of assigning `list->nr` with a
new `string_list_setlen()` function.
- More test cases are added in t0063.
- `strtok_r()` is no longer on the banned list, and `strtok()` now
*is* on the list, after I forgot to remove a stray `#if 0` left over
from debugging.
As always, a range-diff is included below for convenience. Thanks in
advance for your review!
[1]: https://lore.kernel.org/git/cover.1681428696.git.me@ttaylorr.com/
Taylor Blau (6):
string-list: introduce `string_list_split_in_place_multi()`
string-list: introduce `string_list_setlen()`
t/helper/test-hashmap.c: avoid using `strtok()`
t/helper/test-oidmap.c: avoid using `strtok()`
t/helper/test-json-writer.c: avoid using `strtok()`
banned.h: mark `strtok()` as banned
banned.h | 2 +
string-list.c | 35 ++++++++++--
string-list.h | 17 ++++++
t/helper/test-hashmap.c | 30 ++++++++---
t/helper/test-json-writer.c | 76 ++++++++++++++++----------
t/helper/test-oidmap.c | 20 ++++---
t/helper/test-string-list.c | 15 ++++++
t/t0063-string-list.sh | 105 +++++++++++++++++++++++++-----------
8 files changed, 224 insertions(+), 76 deletions(-)
Range-diff against v1:
1: dda218c8c1 ! 1: 6658b231a9 string-list: introduce `string_list_split_in_place_multi()`
@@ Commit message
`_multi` variant splits the given string any any character appearing in
the string `delim`.
+ Like `strtok()`, the `_multi` variant skips past sequential delimiting
+ characters. For example:
+
+ string_list_split_in_place(&xs, xstrdup("foo::bar::baz"), ":", -1);
+
+ would place in `xs` the elements "foo", "bar", and "baz".
+
Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
- `strpbrk(2)` to find the first occurrence of *any* character in the given
- delimiter string.
+ `strcspn(2)` to move past the initial segment of characters comprised of
+ any characters in the delimiting set.
+
+ When only a single delimiting character is provided, `strcspn(2)` has
+ equivalent performance to `strchr(2)`. Modern `strcspn(2)`
+ implementations treat an empty delimiter or the singleton delimiter as a
+ special case and fall back to calling strchrnul(). Both glibc[1] and
+ musl[2] implement `strcspn(2)` this way.
Since the `_multi` variant is a generalization of the original
implementation, reimplement `string_list_split_in_place()` in terms of
the more general function by providing a single-character string for the
list of accepted delimiters.
+ To avoid regressions, update t0063 in this patch as well. Any "common"
+ test cases (i.e., those that produce the same result whether you call
+ `string_list_split()` or `string_list_split_in_place_multi()`) are
+ grouped into a loop which is parameterized over the function to test.
+
+ Any cases which aren't common (of which there is one existing case, and
+ a handful of new ones added which are specific to the `_multi` variant)
+ are tested independently.
+
+ [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
+ [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## string-list.c ##
@@ string-list.c: int string_list_split(struct string_list *list, const char *strin
-int string_list_split_in_place(struct string_list *list, char *string,
- int delim, int maxsplit)
-+int string_list_split_in_place_multi(struct string_list *list, char *string,
-+ const char *delim, int maxsplit)
++static int string_list_split_in_place_1(struct string_list *list, char *string,
++ const char *delim, int maxsplit,
++ unsigned runs)
{
int count = 0;
char *p = string, *end;
@@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
+ die("internal error in string_list_split_in_place(): "
+ "list->strdup_strings must not be set");
+ for (;;) {
++ if (runs)
++ p += strspn(p, delim);
++
+ count++;
+ if (maxsplit >= 0 && count > maxsplit) {
string_list_append(list, p);
return count;
}
- end = strchr(p, delim);
-+ end = strpbrk(p, delim);
- if (end) {
+- if (end) {
++ end = p + strcspn(p, delim);
++ if (end && *end) {
*end = '\0';
string_list_append(list, p);
+ p = end + 1;
@@ string-list.c: int string_list_split_in_place(struct string_list *list, char *string,
}
}
}
+
++int string_list_split_in_place_multi(struct string_list *list, char *string,
++ const char *delim, int maxsplit)
++{
++ return string_list_split_in_place_1(list, string, delim, maxsplit, 1);
++}
++
+int string_list_split_in_place(struct string_list *list, char *string,
+ int delim, int maxsplit)
+{
+ char delim_s[2] = { delim, 0 };
+
-+ return string_list_split_in_place_multi(list, string, delim_s,
-+ maxsplit);
++ return string_list_split_in_place_1(list, string, delim_s, maxsplit, 0);
+}
## string-list.h ##
@@ string-list.h: int string_list_split(struct string_list *list, const char *strin
+ *
+ * The "_multi" variant splits the given string on any character
+ * appearing in "delim", and the non-"_multi" variant splits only on the
-+ * given character.
++ * given character. The "_multi" variant behaves like `strtok()` where
++ * no element contains the delimiting byte(s).
*/
+int string_list_split_in_place_multi(struct string_list *list, char *string,
+ const char *delim, int maxsplit);
int string_list_split_in_place(struct string_list *list, char *string,
int delim, int maxsplit);
#endif /* STRING_LIST_H */
+
+ ## t/helper/test-string-list.c ##
+@@ t/helper/test-string-list.c: int cmd__string_list(int argc, const char **argv)
+ return 0;
+ }
+
++ if (argc == 5 && !strcmp(argv[1], "split_in_place_multi")) {
++ struct string_list list = STRING_LIST_INIT_NODUP;
++ int i;
++ char *s = xstrdup(argv[2]);
++ const char *delim = argv[3];
++ int maxsplit = atoi(argv[4]);
++
++ i = string_list_split_in_place_multi(&list, s, delim, maxsplit);
++ printf("%d\n", i);
++ write_list(&list);
++ string_list_clear(&list, 0);
++ free(s);
++ return 0;
++ }
++
+ if (argc == 4 && !strcmp(argv[1], "filter")) {
+ /*
+ * Retain only the items that have the specified prefix.
+
+ ## t/t0063-string-list.sh ##
+@@ t/t0063-string-list.sh: test_split () {
+ "
+ }
+
+-test_split "foo:bar:baz" ":" "-1" <<EOF
+-3
+-[0]: "foo"
+-[1]: "bar"
+-[2]: "baz"
+-EOF
++test_split_in_place_multi () {
++ cat >expected &&
++ test_expect_success "split_in_place_multi $1 at $2, max $3" "
++ test-tool string-list split_in_place_multi '$1' '$2' '$3' >actual &&
++ test_cmp expected actual
++ "
++}
+
+-test_split "foo:bar:baz" ":" "0" <<EOF
+-1
+-[0]: "foo:bar:baz"
+-EOF
++for test_fn in test_split test_split_in_place_multi
++do
++ $test_fn "foo:bar:baz" ":" "-1" <<-\EOF
++ 3
++ [0]: "foo"
++ [1]: "bar"
++ [2]: "baz"
++ EOF
+
+-test_split "foo:bar:baz" ":" "1" <<EOF
+-2
+-[0]: "foo"
+-[1]: "bar:baz"
+-EOF
++ $test_fn "foo:bar:baz" ":" "0" <<-\EOF
++ 1
++ [0]: "foo:bar:baz"
++ EOF
+
+-test_split "foo:bar:baz" ":" "2" <<EOF
+-3
+-[0]: "foo"
+-[1]: "bar"
+-[2]: "baz"
+-EOF
++ $test_fn "foo:bar:baz" ":" "1" <<-\EOF
++ 2
++ [0]: "foo"
++ [1]: "bar:baz"
++ EOF
+
+-test_split "foo:bar:" ":" "-1" <<EOF
+-3
+-[0]: "foo"
+-[1]: "bar"
+-[2]: ""
+-EOF
++ $test_fn "foo:bar:baz" ":" "2" <<-\EOF
++ 3
++ [0]: "foo"
++ [1]: "bar"
++ [2]: "baz"
++ EOF
+
+-test_split "" ":" "-1" <<EOF
+-1
+-[0]: ""
+-EOF
++ $test_fn "foo:bar:" ":" "-1" <<-\EOF
++ 3
++ [0]: "foo"
++ [1]: "bar"
++ [2]: ""
++ EOF
++
++ $test_fn "" ":" "-1" <<-\EOF
++ 1
++ [0]: ""
++ EOF
++done
+
+ test_split ":" ":" "-1" <<EOF
+ 2
+@@ t/t0063-string-list.sh: test_split ":" ":" "-1" <<EOF
+ [1]: ""
+ EOF
+
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "-1" <<-\EOF
++3
++[0]: "foo"
++[1]: "bar"
++[2]: "baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "0" <<-\EOF
++1
++[0]: "foo:;:bar:;:baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "1" <<-\EOF
++2
++[0]: "foo"
++[1]: "bar:;:baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:baz" ":;" "2" <<-\EOF
++3
++[0]: "foo"
++[1]: "bar"
++[2]: "baz"
++EOF
++
++test_split_in_place_multi "foo:;:bar:;:" ":;" "-1" <<-\EOF
++3
++[0]: "foo"
++[1]: "bar"
++[2]: ""
++EOF
++
+ test_expect_success "test filter_string_list" '
+ test "x-" = "x$(test-tool string-list filter - y)" &&
+ test "x-" = "x$(test-tool string-list filter no y)" &&
-: ---------- > 2: 2a20ad8bc5 string-list: introduce `string_list_setlen()`
2: 0f199468a3 ! 3: 0ae07dec36 t/helper/test-hashmap.c: avoid using `strtok()`
@@ t/helper/test-hashmap.c: int cmd__hashmap(int argc, const char **argv)
+ * By doing so, we'll instead overwrite the existing
+ * entries and avoid re-allocating.
+ */
-+ parts.nr = 0;
++ string_list_setlen(&parts, 0);
/* break line into command and up to two parameters */
- cmd = strtok(line.buf, DELIM);
+ string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
3: 135222d04e ! 4: a659431e9c t/helper/test-oidmap.c: avoid using `strtok()`
@@ t/helper/test-oidmap.c: int cmd__oidmap(int argc UNUSED, const char **argv UNUSE
struct object_id oid;
+ /* see the comment in cmd__hashmap() */
-+ parts.nr = 0;
++ string_list_setlen(&parts, 0);
/* break line into command and up to two parameters */
- cmd = strtok(line.buf, DELIM);
+ string_list_split_in_place_multi(&parts, line.buf, DELIM, 2);
4: ae29d4d892 ! 5: fc6cd23698 t/helper/test-json-writer.c: avoid using `strtok()`
@@ t/helper/test-json-writer.c: static int scripted(void)
- verb = strtok(line, " ");
+ /* see the comment in cmd__hashmap() */
-+ parts.nr = 0;
++ string_list_setlen(&parts, 0);
+ /* break line into command and zero or more tokens */
+ string_list_split_in_place(&parts, line, ' ', -1);
+
5: 1d955f8bc6 ! 6: 56d2318a6d banned.h: mark `strtok()`, `strtok_r()` as banned
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- banned.h: mark `strtok()`, `strtok_r()` as banned
+ banned.h: mark `strtok()` as banned
`strtok_r()` is reentrant, but `strtok()` is not, meaning that using it
is not thread-safe.
- We could ban `strtok()` and force callers to use its reentrant
- counterpart, but there are a few drawbacks to doing so:
+ `strtok()` has a couple of drawbacks that make it undesirable to have
+ any new instances. In addition to being thread-unsafe, it also
+ encourages confusing data flows, where `strtok()` may be called from
+ multiple functions with its first argument as NULL, making it unclear
+ from the immediate context which string is being tokenized.
- - `strtok_r()` forces the caller to maintain an extra string pointer
- to pass as its `saveptr` value
+ Now that we have removed all instances of `strtok()` from the tree,
+ let's ban `strtok()` to avoid introducing new ones in the future. If new
+ callers should arise, they can either use:
- - `strtok_r()` also requires that its `saveptr` value be unmodified
- between calls.
+ - `string_list_split_in_place()`,
+ - `string_list_split_in_place_multi()`, or
+ - `strtok_r()`.
- - `strtok()` (and by extension, `strtok_r()`) is confusing when used
- across multiple functions, since the caller is supposed to pass NULL
- as its first argument after the first call. This makes it difficult
- to determine what string is actually being tokenized without clear
- dataflow.
+ Callers are encouraged to use either of the string_list functions when
+ appropriate over `strtok_r()`, since the latter suffers from the same
+ confusing data-flow problem as `strtok()` does.
- So while we could ban only `strtok()`, there really is no good reason to
- use either when callers could instead use the much friendlier
- `string_list_split_in_place()` API, which avoids the above issues.
+ But callers may prefer `strtok_r()` when the number of tokens in a given
+ string is unknown, and they want to split and process them one at a
+ time, so `strtok_r()` is left off the banned.h list.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
@@ banned.h
#define strncpy(x,y,n) BANNED(strncpy)
#undef strncat
#define strncat(x,y,n) BANNED(strncat)
-+#if 0
+#undef strtok
+#define strtok(x,y) BANNED(strtok)
-+#undef strtok_r
-+#define strtok_r(x,y,z) BANNED(strtok_r)
-+#endif
#undef sprintf
#undef vsprintf
--
2.40.0.358.g56d2318a6d
next prev parent reply other threads:[~2023-04-18 19:20 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-13 23:31 [PATCH 0/5] banned: mark `strok()`, `strtok_r()` as banned Taylor Blau
2023-04-13 23:31 ` [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 10:10 ` Jeff King
2023-04-18 17:08 ` Taylor Blau
2023-04-13 23:31 ` [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-18 10:23 ` Jeff King
2023-04-18 18:06 ` Taylor Blau
2023-04-13 23:31 ` [PATCH 3/5] t/helper/test-oidmap.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 4/5] t/helper/test-json-writer.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned Taylor Blau
2023-04-14 1:39 ` Junio C Hamano
2023-04-14 2:08 ` Chris Torek
2023-04-14 13:41 ` Taylor Blau
2023-04-18 19:18 ` Taylor Blau [this message]
2023-04-18 19:18 ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 19:39 ` Junio C Hamano
2023-04-18 20:54 ` Taylor Blau
2023-04-22 11:12 ` Jeff King
2023-04-22 15:53 ` René Scharfe
2023-04-23 0:35 ` Jeff King
2023-04-24 16:24 ` Junio C Hamano
2023-04-23 2:38 ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`t Taylor Blau
2023-04-23 2:40 ` Taylor Blau
2023-04-18 19:18 ` [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-22 11:14 ` Jeff King
2023-04-18 19:18 ` [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-22 11:16 ` Jeff King
2023-04-24 21:19 ` Taylor Blau
2023-04-18 19:18 ` [PATCH v2 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-18 19:18 ` [PATCH v2 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-18 19:18 ` [PATCH v2 6/6] banned.h: mark `strtok()` as banned Taylor Blau
2023-04-24 22:20 ` [PATCH v3 0/6] banned: mark `strok()`, `strtok_r()` " Taylor Blau
2023-04-24 22:20 ` [PATCH v3 1/6] string-list: multi-delimiter `string_list_split_in_place()` Taylor Blau
2023-04-24 22:20 ` [PATCH v3 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-25 6:21 ` Jeff King
2023-04-25 21:00 ` Taylor Blau
2023-04-24 22:20 ` [PATCH v3 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-24 22:20 ` [PATCH v3 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-24 22:20 ` [PATCH v3 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-25 13:57 ` Jeff Hostetler
2023-04-24 22:20 ` [PATCH v3 6/6] banned.h: mark `strtok()` and `strtok_r()` as banned Taylor Blau
2023-04-24 22:25 ` Chris Torek
2023-04-24 23:00 ` Taylor Blau
2023-04-25 6:26 ` Jeff King
2023-04-25 21:02 ` Taylor Blau
2023-04-25 6:27 ` [PATCH v3 0/6] banned: mark `strok()`, " Jeff King
2023-04-25 21:03 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1681845518.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=chris.torek@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).