Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Chris Torek <chris.torek@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`
Date: Tue, 18 Apr 2023 15:18:43 -0400	[thread overview]
Message-ID: <6658b231a906dde6acbe7ce156da693ef7dc40e6.1681845518.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681845518.git.me@ttaylorr.com>

Introduce a variant of the `string_list_split_in_place()` function that
takes a string of accepted delimiters.

By contrast to its cousin `string_list_split_in_place()` which splits
the given string at every instance of the single character `delim`, the
`_multi` variant splits the given string any any character appearing in
the string `delim`.

Like `strtok()`, the `_multi` variant skips past sequential delimiting
characters. For example:

    string_list_split_in_place(&xs, xstrdup("foo::bar::baz"), ":", -1);

would place in `xs` the elements "foo", "bar", and "baz".

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strcspn(2)` has
equivalent performance to `strchr(2)`. Modern `strcspn(2)`
implementations treat an empty delimiter or the singleton delimiter as a
special case and fall back to calling strchrnul(). Both glibc[1] and
musl[2] implement `strcspn(2)` this way.

Since the `_multi` variant is a generalization of the original
implementation, reimplement `string_list_split_in_place()` in terms of
the more general function by providing a single-character string for the
list of accepted delimiters.

To avoid regressions, update t0063 in this patch as well. Any "common"
test cases (i.e., those that produce the same result whether you call
`string_list_split()` or `string_list_split_in_place_multi()`) are
grouped into a loop which is parameterized over the function to test.

Any cases which aren't common (of which there is one existing case, and
a handful of new ones added which are specific to the `_multi` variant)
are tested independently.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 string-list.c               |  26 +++++++--
 string-list.h               |   7 +++
 t/helper/test-string-list.c |  15 ++++++
 t/t0063-string-list.sh      | 105 +++++++++++++++++++++++++-----------
 4 files changed, 118 insertions(+), 35 deletions(-)

diff --git a/string-list.c b/string-list.c
index db473f273e..b27a53f2e1 100644
--- a/string-list.c
+++ b/string-list.c
@@ -300,8 +300,9 @@ int string_list_split(struct string_list *list, const char *string,
 	}
 }
 
-int string_list_split_in_place(struct string_list *list, char *string,
-			       int delim, int maxsplit)
+static int string_list_split_in_place_1(struct string_list *list, char *string,
+					const char *delim, int maxsplit,
+					unsigned runs)
 {
 	int count = 0;
 	char *p = string, *end;
@@ -310,13 +311,16 @@ int string_list_split_in_place(struct string_list *list, char *string,
 		die("internal error in string_list_split_in_place(): "
 		    "list->strdup_strings must not be set");
 	for (;;) {
+		if (runs)
+			p += strspn(p, delim);
+
 		count++;
 		if (maxsplit >= 0 && count > maxsplit) {
 			string_list_append(list, p);
 			return count;
 		}
-		end = strchr(p, delim);
-		if (end) {
+		end = p + strcspn(p, delim);
+		if (end && *end) {
 			*end = '\0';
 			string_list_append(list, p);
 			p = end + 1;
@@ -326,3 +330,17 @@ int string_list_split_in_place(struct string_list *list, char *string,
 		}
 	}
 }
+
+int string_list_split_in_place_multi(struct string_list *list, char *string,
+				     const char *delim, int maxsplit)
+{
+	return string_list_split_in_place_1(list, string, delim, maxsplit, 1);
+}
+
+int string_list_split_in_place(struct string_list *list, char *string,
+			       int delim, int maxsplit)
+{
+	char delim_s[2] = { delim, 0 };
+
+	return string_list_split_in_place_1(list, string, delim_s, maxsplit, 0);
+}
diff --git a/string-list.h b/string-list.h
index c7b0d5d000..f01bbb0bb6 100644
--- a/string-list.h
+++ b/string-list.h
@@ -268,7 +268,14 @@ int string_list_split(struct string_list *list, const char *string,
  * new string_list_items point into string (which therefore must not
  * be modified or freed while the string_list is in use).
  * list->strdup_strings must *not* be set.
+ *
+ * The "_multi" variant splits the given string on any character
+ * appearing in "delim", and the non-"_multi" variant splits only on the
+ * given character. The "_multi" variant behaves like `strtok()` where
+ * no element contains the delimiting byte(s).
  */
+int string_list_split_in_place_multi(struct string_list *list, char *string,
+				     const char *delim, int maxsplit);
 int string_list_split_in_place(struct string_list *list, char *string,
 			       int delim, int maxsplit);
 #endif /* STRING_LIST_H */
diff --git a/t/helper/test-string-list.c b/t/helper/test-string-list.c
index 2123dda85b..119bc9e1c9 100644
--- a/t/helper/test-string-list.c
+++ b/t/helper/test-string-list.c
@@ -73,6 +73,21 @@ int cmd__string_list(int argc, const char **argv)
 		return 0;
 	}
 
+	if (argc == 5 && !strcmp(argv[1], "split_in_place_multi")) {
+		struct string_list list = STRING_LIST_INIT_NODUP;
+		int i;
+		char *s = xstrdup(argv[2]);
+		const char *delim = argv[3];
+		int maxsplit = atoi(argv[4]);
+
+		i = string_list_split_in_place_multi(&list, s, delim, maxsplit);
+		printf("%d\n", i);
+		write_list(&list);
+		string_list_clear(&list, 0);
+		free(s);
+		return 0;
+	}
+
 	if (argc == 4 && !strcmp(argv[1], "filter")) {
 		/*
 		 * Retain only the items that have the specified prefix.
diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh
index 46d4839194..9c5094616a 100755
--- a/t/t0063-string-list.sh
+++ b/t/t0063-string-list.sh
@@ -18,42 +18,53 @@ test_split () {
 	"
 }
 
-test_split "foo:bar:baz" ":" "-1" <<EOF
-3
-[0]: "foo"
-[1]: "bar"
-[2]: "baz"
-EOF
+test_split_in_place_multi () {
+	cat >expected &&
+	test_expect_success "split_in_place_multi $1 at $2, max $3" "
+		test-tool string-list split_in_place_multi '$1' '$2' '$3' >actual &&
+		test_cmp expected actual
+	"
+}
 
-test_split "foo:bar:baz" ":" "0" <<EOF
-1
-[0]: "foo:bar:baz"
-EOF
+for test_fn in test_split test_split_in_place_multi
+do
+	$test_fn "foo:bar:baz" ":" "-1" <<-\EOF
+	3
+	[0]: "foo"
+	[1]: "bar"
+	[2]: "baz"
+	EOF
 
-test_split "foo:bar:baz" ":" "1" <<EOF
-2
-[0]: "foo"
-[1]: "bar:baz"
-EOF
+	$test_fn "foo:bar:baz" ":" "0" <<-\EOF
+	1
+	[0]: "foo:bar:baz"
+	EOF
 
-test_split "foo:bar:baz" ":" "2" <<EOF
-3
-[0]: "foo"
-[1]: "bar"
-[2]: "baz"
-EOF
+	$test_fn "foo:bar:baz" ":" "1" <<-\EOF
+	2
+	[0]: "foo"
+	[1]: "bar:baz"
+	EOF
 
-test_split "foo:bar:" ":" "-1" <<EOF
-3
-[0]: "foo"
-[1]: "bar"
-[2]: ""
-EOF
+	$test_fn "foo:bar:baz" ":" "2" <<-\EOF
+	3
+	[0]: "foo"
+	[1]: "bar"
+	[2]: "baz"
+	EOF
 
-test_split "" ":" "-1" <<EOF
-1
-[0]: ""
-EOF
+	$test_fn "foo:bar:" ":" "-1" <<-\EOF
+	3
+	[0]: "foo"
+	[1]: "bar"
+	[2]: ""
+	EOF
+
+	$test_fn "" ":" "-1" <<-\EOF
+	1
+	[0]: ""
+	EOF
+done
 
 test_split ":" ":" "-1" <<EOF
 2
@@ -61,6 +72,38 @@ test_split ":" ":" "-1" <<EOF
 [1]: ""
 EOF
 
+test_split_in_place_multi "foo:;:bar:;:baz" ":;" "-1" <<-\EOF
+3
+[0]: "foo"
+[1]: "bar"
+[2]: "baz"
+EOF
+
+test_split_in_place_multi "foo:;:bar:;:baz" ":;" "0" <<-\EOF
+1
+[0]: "foo:;:bar:;:baz"
+EOF
+
+test_split_in_place_multi "foo:;:bar:;:baz" ":;" "1" <<-\EOF
+2
+[0]: "foo"
+[1]: "bar:;:baz"
+EOF
+
+test_split_in_place_multi "foo:;:bar:;:baz" ":;" "2" <<-\EOF
+3
+[0]: "foo"
+[1]: "bar"
+[2]: "baz"
+EOF
+
+test_split_in_place_multi "foo:;:bar:;:" ":;" "-1" <<-\EOF
+3
+[0]: "foo"
+[1]: "bar"
+[2]: ""
+EOF
+
 test_expect_success "test filter_string_list" '
 	test "x-" = "x$(test-tool string-list filter - y)" &&
 	test "x-" = "x$(test-tool string-list filter no y)" &&
-- 
2.40.0.358.g56d2318a6d


  reply	other threads:[~2023-04-18 19:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 23:31 [PATCH 0/5] banned: mark `strok()`, `strtok_r()` as banned Taylor Blau
2023-04-13 23:31 ` [PATCH 1/5] string-list: introduce `string_list_split_in_place_multi()` Taylor Blau
2023-04-18 10:10   ` Jeff King
2023-04-18 17:08     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-18 10:23   ` Jeff King
2023-04-18 18:06     ` Taylor Blau
2023-04-13 23:31 ` [PATCH 3/5] t/helper/test-oidmap.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 4/5] t/helper/test-json-writer.c: " Taylor Blau
2023-04-13 23:31 ` [PATCH 5/5] banned.h: mark `strtok()`, `strtok_r()` as banned Taylor Blau
2023-04-14  1:39   ` Junio C Hamano
2023-04-14  2:08     ` Chris Torek
2023-04-14 13:41     ` Taylor Blau
2023-04-18 19:18 ` [PATCH v2 0/6] banned: mark `strok()` " Taylor Blau
2023-04-18 19:18   ` Taylor Blau [this message]
2023-04-18 19:39     ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()` Junio C Hamano
2023-04-18 20:54       ` Taylor Blau
2023-04-22 11:12     ` Jeff King
2023-04-22 15:53       ` René Scharfe
2023-04-23  0:35         ` Jeff King
2023-04-24 16:24           ` Junio C Hamano
2023-04-23  2:38       ` [PATCH v2 1/6] string-list: introduce `string_list_split_in_place_multi()`t Taylor Blau
2023-04-23  2:40         ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-22 11:14     ` Jeff King
2023-04-18 19:18   ` [PATCH v2 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-22 11:16     ` Jeff King
2023-04-24 21:19       ` Taylor Blau
2023-04-18 19:18   ` [PATCH v2 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-18 19:18   ` [PATCH v2 6/6] banned.h: mark `strtok()` as banned Taylor Blau
2023-04-24 22:20 ` [PATCH v3 0/6] banned: mark `strok()`, `strtok_r()` " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 1/6] string-list: multi-delimiter `string_list_split_in_place()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 2/6] string-list: introduce `string_list_setlen()` Taylor Blau
2023-04-25  6:21     ` Jeff King
2023-04-25 21:00       ` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 3/6] t/helper/test-hashmap.c: avoid using `strtok()` Taylor Blau
2023-04-24 22:20   ` [PATCH v3 4/6] t/helper/test-oidmap.c: " Taylor Blau
2023-04-24 22:20   ` [PATCH v3 5/6] t/helper/test-json-writer.c: " Taylor Blau
2023-04-25 13:57     ` Jeff Hostetler
2023-04-24 22:20   ` [PATCH v3 6/6] banned.h: mark `strtok()` and `strtok_r()` as banned Taylor Blau
2023-04-24 22:25     ` Chris Torek
2023-04-24 23:00       ` Taylor Blau
2023-04-25  6:26     ` Jeff King
2023-04-25 21:02       ` Taylor Blau
2023-04-25  6:27   ` [PATCH v3 0/6] banned: mark `strok()`, " Jeff King
2023-04-25 21:03     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6658b231a906dde6acbe7ce156da693ef7dc40e6.1681845518.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=chris.torek@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).