Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>,
	"Chris Torek" <chris.torek@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Jeff Hostetler" <jeffhostetler@github.com>,
	"René Scharfe" <l.s.r@web.de>
Subject: [PATCH v3 1/6] string-list: multi-delimiter `string_list_split_in_place()`
Date: Mon, 24 Apr 2023 18:20:10 -0400	[thread overview]
Message-ID: <59d3e778b6c80b6bd10c1126cdc3f7e86558f3e7.1682374789.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1682374789.git.me@ttaylorr.com>

Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.

This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:

    string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)

would yield a string list of:

    ["foo", "", "", "bar", "", "", "baz"]

Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.

To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/gc.c                |  4 +--
 diff.c                      |  2 +-
 notes.c                     |  2 +-
 refs/packed-backend.c       |  2 +-
 string-list.c               |  4 +--
 string-list.h               |  2 +-
 t/helper/test-string-list.c |  4 +--
 t/t0063-string-list.sh      | 51 +++++++++++++++++++++++++++++++++++++
 8 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index edd98d35a5..f68e976704 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1687,11 +1687,11 @@ static int get_schedule_cmd(const char **cmd, int *is_available)
 	if (is_available)
 		*is_available = 0;
 
-	string_list_split_in_place(&list, testing, ',', -1);
+	string_list_split_in_place(&list, testing, ",", -1);
 	for_each_string_list_item(item, &list) {
 		struct string_list pair = STRING_LIST_INIT_NODUP;
 
-		if (string_list_split_in_place(&pair, item->string, ':', 2) != 2)
+		if (string_list_split_in_place(&pair, item->string, ":", 2) != 2)
 			continue;
 
 		if (!strcmp(*cmd, pair.items[0].string)) {
diff --git a/diff.c b/diff.c
index 78b0fdd8ca..378a0248e1 100644
--- a/diff.c
+++ b/diff.c
@@ -134,7 +134,7 @@ static int parse_dirstat_params(struct diff_options *options, const char *params
 	int i;
 
 	if (*params_copy)
-		string_list_split_in_place(&params, params_copy, ',', -1);
+		string_list_split_in_place(&params, params_copy, ",", -1);
 	for (i = 0; i < params.nr; i++) {
 		const char *p = params.items[i].string;
 		if (!strcmp(p, "changes")) {
diff --git a/notes.c b/notes.c
index 45fb7f22d1..eee806f626 100644
--- a/notes.c
+++ b/notes.c
@@ -963,7 +963,7 @@ void string_list_add_refs_from_colon_sep(struct string_list *list,
 	char *globs_copy = xstrdup(globs);
 	int i;
 
-	string_list_split_in_place(&split, globs_copy, ':', -1);
+	string_list_split_in_place(&split, globs_copy, ":", -1);
 	string_list_remove_empty_items(&split, 0);
 
 	for (i = 0; i < split.nr; i++)
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 1eba1015dd..cc903baa7e 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -650,7 +650,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 					 snapshot->buf,
 					 snapshot->eof - snapshot->buf);
 
-		string_list_split_in_place(&traits, p, ' ', -1);
+		string_list_split_in_place(&traits, p, " ", -1);
 
 		if (unsorted_string_list_has_string(&traits, "fully-peeled"))
 			snapshot->peeled = PEELED_FULLY;
diff --git a/string-list.c b/string-list.c
index db473f273e..5f5b60fe1c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -301,7 +301,7 @@ int string_list_split(struct string_list *list, const char *string,
 }
 
 int string_list_split_in_place(struct string_list *list, char *string,
-			       int delim, int maxsplit)
+			       const char *delim, int maxsplit)
 {
 	int count = 0;
 	char *p = string, *end;
@@ -315,7 +315,7 @@ int string_list_split_in_place(struct string_list *list, char *string,
 			string_list_append(list, p);
 			return count;
 		}
-		end = strchr(p, delim);
+		end = strpbrk(p, delim);
 		if (end) {
 			*end = '\0';
 			string_list_append(list, p);
diff --git a/string-list.h b/string-list.h
index c7b0d5d000..77854840f7 100644
--- a/string-list.h
+++ b/string-list.h
@@ -270,5 +270,5 @@ int string_list_split(struct string_list *list, const char *string,
  * list->strdup_strings must *not* be set.
  */
 int string_list_split_in_place(struct string_list *list, char *string,
-			       int delim, int maxsplit);
+			       const char *delim, int maxsplit);
 #endif /* STRING_LIST_H */
diff --git a/t/helper/test-string-list.c b/t/helper/test-string-list.c
index 2123dda85b..63df88575c 100644
--- a/t/helper/test-string-list.c
+++ b/t/helper/test-string-list.c
@@ -62,7 +62,7 @@ int cmd__string_list(int argc, const char **argv)
 		struct string_list list = STRING_LIST_INIT_NODUP;
 		int i;
 		char *s = xstrdup(argv[2]);
-		int delim = *argv[3];
+		const char *delim = argv[3];
 		int maxsplit = atoi(argv[4]);
 
 		i = string_list_split_in_place(&list, s, delim, maxsplit);
@@ -111,7 +111,7 @@ int cmd__string_list(int argc, const char **argv)
 		 */
 		if (sb.len && sb.buf[sb.len - 1] == '\n')
 			strbuf_setlen(&sb, sb.len - 1);
-		string_list_split_in_place(&list, sb.buf, '\n', -1);
+		string_list_split_in_place(&list, sb.buf, "\n", -1);
 
 		string_list_sort(&list);
 
diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh
index 46d4839194..1fee6d9010 100755
--- a/t/t0063-string-list.sh
+++ b/t/t0063-string-list.sh
@@ -18,6 +18,14 @@ test_split () {
 	"
 }
 
+test_split_in_place() {
+	cat >expected &&
+	test_expect_success "split (in place) $1 at $2, max $3" "
+		test-tool string-list split_in_place '$1' '$2' '$3' >actual &&
+		test_cmp expected actual
+	"
+}
+
 test_split "foo:bar:baz" ":" "-1" <<EOF
 3
 [0]: "foo"
@@ -61,6 +69,49 @@ test_split ":" ":" "-1" <<EOF
 [1]: ""
 EOF
 
+test_split_in_place "foo:;:bar:;:baz:;:" ":;" "-1" <<EOF
+10
+[0]: "foo"
+[1]: ""
+[2]: ""
+[3]: "bar"
+[4]: ""
+[5]: ""
+[6]: "baz"
+[7]: ""
+[8]: ""
+[9]: ""
+EOF
+
+test_split_in_place "foo:;:bar:;:baz" ":;" "0" <<EOF
+1
+[0]: "foo:;:bar:;:baz"
+EOF
+
+test_split_in_place "foo:;:bar:;:baz" ":;" "1" <<EOF
+2
+[0]: "foo"
+[1]: ";:bar:;:baz"
+EOF
+
+test_split_in_place "foo:;:bar:;:baz" ":;" "2" <<EOF
+3
+[0]: "foo"
+[1]: ""
+[2]: ":bar:;:baz"
+EOF
+
+test_split_in_place "foo:;:bar:;:" ":;" "-1" <<EOF
+7
+[0]: "foo"
+[1]: ""
+[2]: ""
+[3]: "bar"
+[4]: ""
+[5]: ""
+[6]: ""
+EOF
+
 test_expect_success "test filter_string_list" '
 	test "x-" = "x$(test-tool string-list filter - y)" &&
 	test "x-" = "x$(test-tool string-list filter no y)" &&
-- 
2.40.0.380.gd2df7d2365


  reply	other threads:[~2023-04-24 22:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 23:31 [PATCH 0/5] banned: mark `strok()`, `strtok_r()` as banned Taylor Blau
2023-04-13 23:31 ` [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 10:10   ` Jeff King
2023-04-18 17:08     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-18 10:23   ` Jeff King
2023-04-18 18:06     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 3/5] t/helper/test-oidmap.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 4/5] t/helper/test-json-writer.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned Taylor Blau
2023-04-14  1:39   ` Junio C Hamano
2023-04-14  2:08     ` Chris Torek
2023-04-14 13:41     ` Taylor Blau
2023-04-18 19:18 ` [PATCH v2 0/6] banned: mark `strok()` " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 19:39     ` Junio C Hamano
2023-04-18 20:54       ` Taylor Blau
2023-04-22 11:12     ` Jeff King
2023-04-22 15:53       ` René Scharfe
2023-04-23  0:35         ` Jeff King
2023-04-24 16:24           ` Junio C Hamano
2023-04-23  2:38       ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`t Taylor Blau
2023-04-23  2:40         ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-22 11:14     ` Jeff King
2023-04-18 19:18   ` [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-22 11:16     ` Jeff King
2023-04-24 21:19       ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 6/6] banned.h: mark `strtok()` as banned Taylor Blau
2023-04-24 22:20 ` [PATCH v3 0/6] banned: mark `strok()`, `strtok_r()` " Taylor Blau
2023-04-24 22:20   ` Taylor Blau [this message]
2023-04-24 22:20   ` [PATCH v3 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-25  6:21     ` Jeff King
2023-04-25 21:00       ` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-25 13:57     ` Jeff Hostetler
2023-04-24 22:20   ` [PATCH v3 6/6] banned.h: mark `strtok()` and `strtok_r()` as banned Taylor Blau
2023-04-24 22:25     ` Chris Torek
2023-04-24 23:00       ` Taylor Blau
2023-04-25  6:26     ` Jeff King
2023-04-25 21:02       ` Taylor Blau
2023-04-25  6:27   ` [PATCH v3 0/6] banned: mark `strok()`, " Jeff King
2023-04-25 21:03     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59d3e778b6c80b6bd10c1126cdc3f7e86558f3e7.1682374789.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=chris.torek@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhostetler@github.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).