All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] name-rev: use memory pool for name strings
@ 2024-02-25 11:39 René Scharfe
  2024-02-25 11:39 ` [PATCH 1/2] mem-pool: add mem_pool_strfmt() René Scharfe
  2024-02-25 11:39 ` [PATCH 2/2] name-rev: use mem_pool_strfmt() René Scharfe
  0 siblings, 2 replies; 10+ messages in thread
From: René Scharfe @ 2024-02-25 11:39 UTC (permalink / raw
  To: git

Avoid awkward string buffer pre-sizing by adding and using an efficient
mem-pool-backed string format function.

  mem-pool: add mem_pool_strfmt()
  name-rev: use mem_pool_strfmt()

 builtin/name-rev.c | 39 ++++++++++++++++++++-------------------
 mem-pool.c         | 39 +++++++++++++++++++++++++++++++++++++++
 mem-pool.h         |  5 +++++
 3 files changed, 64 insertions(+), 19 deletions(-)

--
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] mem-pool: add mem_pool_strfmt()
  2024-02-25 11:39 [PATCH 0/2] name-rev: use memory pool for name strings René Scharfe
@ 2024-02-25 11:39 ` René Scharfe
  2024-02-26  7:08   ` Jeff King
  2024-02-25 11:39 ` [PATCH 2/2] name-rev: use mem_pool_strfmt() René Scharfe
  1 sibling, 1 reply; 10+ messages in thread
From: René Scharfe @ 2024-02-25 11:39 UTC (permalink / raw
  To: git

Add a function for building a string, printf style, using a memory pool.
It uses the free space in the current block in the first attempt.  If
that suffices then the result can already be used without copying or
reformatting.

For strings that are significantly shorter on average than the block
size (ca. 1 MiB by default) this is the case most of the time, leading
to a better perfomance than a solution that doesn't access mem-pool
internals.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 mem-pool.c | 39 +++++++++++++++++++++++++++++++++++++++
 mem-pool.h |  5 +++++
 2 files changed, 44 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index c7d6256020..2078c22b09 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -107,6 +107,45 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
 	return r;
 }

+static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt,
+			      va_list ap)
+{
+	struct mp_block *block = pool->mp_block;
+	char *next_free = block ? block->next_free : NULL;
+	size_t available = block ? block->end - block->next_free : 0;
+	va_list cp;
+	int len, len2;
+	char *ret;
+
+	va_copy(cp, ap);
+	len = vsnprintf(next_free, available, fmt, cp);
+	va_end(cp);
+	if (len < 0)
+		BUG("your vsnprintf is broken (returned %d)", len);
+
+	ret = mem_pool_alloc(pool, len + 1);  /* 1 for NUL */
+
+	/* Shortcut; relies on mem_pool_alloc() not touching buffer contents. */
+	if (ret == next_free)
+		return ret;
+
+	len2 = vsnprintf(ret, len + 1, fmt, ap);
+	if (len2 != len)
+		BUG("your vsnprintf is broken (returns inconsistent lengths)");
+	return ret;
+}
+
+char *mem_pool_strfmt(struct mem_pool *pool, const char *fmt, ...)
+{
+	va_list ap;
+	char *ret;
+
+	va_start(ap, fmt);
+	ret = mem_pool_strvfmt(pool, fmt, ap);
+	va_end(ap);
+	return ret;
+}
+
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size)
 {
 	size_t len = st_mult(count, size);
diff --git a/mem-pool.h b/mem-pool.h
index fe7507f022..d1c66413ec 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -47,6 +47,11 @@ void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 char *mem_pool_strdup(struct mem_pool *pool, const char *str);
 char *mem_pool_strndup(struct mem_pool *pool, const char *str, size_t len);

+/*
+ * Allocate memory from the memory pool and format a string into it.
+ */
+char *mem_pool_strfmt(struct mem_pool *pool, const char *fmt, ...);
+
 /*
  * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
  * pool will be empty and not contain any memory. It still needs to be free'd
--
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] name-rev: use mem_pool_strfmt()
  2024-02-25 11:39 [PATCH 0/2] name-rev: use memory pool for name strings René Scharfe
  2024-02-25 11:39 ` [PATCH 1/2] mem-pool: add mem_pool_strfmt() René Scharfe
@ 2024-02-25 11:39 ` René Scharfe
  2024-02-26  2:05   ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: René Scharfe @ 2024-02-25 11:39 UTC (permalink / raw
  To: git

1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04)
got a big performance boost in an unusual repository by calculating the
name length in advance.  This is a bit awkward, as it references the
name components twice.

Use a memory pool to store the strings for the struct rev_name member
tip_name.  Using mem_pool_strfmt() allows efficient allocation without
explicit size calculation.  This simplifies the formatting part of the
code without giving up performance:

Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.231 s ±  0.013 s    [User: 1.082 s, System: 0.136 s]
  Range (min … max):    1.214 s …  1.252 s    10 runs

Benchmark 2: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.220 s ±  0.020 s    [User: 1.083 s, System: 0.130 s]
  Range (min … max):    1.197 s …  1.254 s    10 runs

Don't bother discarding the memory pool just before exiting.  The effort
for that would be very low, but actually measurable in the above
example, with no benefit to users.  At least UNLEAK it to calm down leak
checkers.  This addresses the leaks that 45a14f578e (Revert "name-rev:
release unused name strings", 2022-04-22) brought back.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
This doesn't make any test script leak-free, though.

 builtin/name-rev.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 2dd1807c4e..ad9930c831 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -15,6 +15,7 @@
 #include "commit-slab.h"
 #include "commit-graph.h"
 #include "wildmatch.h"
+#include "mem-pool.h"

 /*
  * One day.  See the 'name a rev shortly after epoch' test in t6120 when
@@ -155,30 +156,25 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 	return name;
 }

-static char *get_parent_name(const struct rev_name *name, int parent_number)
+static char *get_parent_name(const struct rev_name *name, int parent_number,
+			     struct mem_pool *string_pool)
 {
-	struct strbuf sb = STRBUF_INIT;
 	size_t len;

 	strip_suffix(name->tip_name, "^0", &len);
 	if (name->generation > 0) {
-		strbuf_grow(&sb, len +
-			    1 + decimal_width(name->generation) +
-			    1 + decimal_width(parent_number));
-		strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name,
-			    name->generation, parent_number);
+		return mem_pool_strfmt(string_pool, "%.*s~%d^%d",
+				       (int)len, name->tip_name,
+				       name->generation, parent_number);
 	} else {
-		strbuf_grow(&sb, len +
-			    1 + decimal_width(parent_number));
-		strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name,
-			    parent_number);
+		return mem_pool_strfmt(string_pool, "%.*s^%d",
+				       (int)len, name->tip_name, parent_number);
 	}
-	return strbuf_detach(&sb, NULL);
 }

 static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int from_tag, int deref)
+		int from_tag, int deref, struct mem_pool *string_pool)
 {
 	struct prio_queue queue;
 	struct commit *commit;
@@ -195,9 +191,10 @@ static void name_rev(struct commit *start_commit,
 	if (!start_name)
 		return;
 	if (deref)
-		start_name->tip_name = xstrfmt("%s^0", tip_name);
+		start_name->tip_name = mem_pool_strfmt(string_pool, "%s^0",
+						       tip_name);
 	else
-		start_name->tip_name = xstrdup(tip_name);
+		start_name->tip_name = mem_pool_strdup(string_pool, tip_name);

 	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
 	prio_queue_put(&queue, start_commit);
@@ -235,7 +232,8 @@ static void name_rev(struct commit *start_commit,
 				if (parent_number > 1)
 					parent_name->tip_name =
 						get_parent_name(name,
-								parent_number);
+								parent_number,
+								string_pool);
 				else
 					parent_name->tip_name = name->tip_name;
 				ALLOC_GROW(parents_to_queue,
@@ -415,7 +413,7 @@ static int name_ref(const char *path, const struct object_id *oid,
 	return 0;
 }

-static void name_tips(void)
+static void name_tips(struct mem_pool *string_pool)
 {
 	int i;

@@ -428,7 +426,7 @@ static void name_tips(void)
 		struct tip_table_entry *e = &tip_table.table[i];
 		if (e->commit) {
 			name_rev(e->commit, e->refname, e->taggerdate,
-				 e->from_tag, e->deref);
+				 e->from_tag, e->deref, string_pool);
 		}
 	}
 }
@@ -561,6 +559,7 @@ static void name_rev_line(char *p, struct name_ref_data *data)

 int cmd_name_rev(int argc, const char **argv, const char *prefix)
 {
+	struct mem_pool string_pool;
 	struct object_array revs = OBJECT_ARRAY_INIT;
 	int all = 0, annotate_stdin = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0;
 	struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP };
@@ -587,6 +586,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};

+	mem_pool_init(&string_pool, 0);
 	init_commit_rev_name(&rev_names);
 	git_config(git_default_config, NULL);
 	argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0);
@@ -648,7 +648,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 	adjust_cutoff_timestamp_for_slop();

 	for_each_ref(name_ref, &data);
-	name_tips();
+	name_tips(&string_pool);

 	if (annotate_stdin) {
 		struct strbuf sb = STRBUF_INIT;
@@ -676,6 +676,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 				  always, allow_undefined, data.name_only);
 	}

+	UNLEAK(string_pool);
 	UNLEAK(revs);
 	return 0;
 }
--
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] name-rev: use mem_pool_strfmt()
  2024-02-25 11:39 ` [PATCH 2/2] name-rev: use mem_pool_strfmt() René Scharfe
@ 2024-02-26  2:05   ` Junio C Hamano
  2024-02-26 18:06     ` René Scharfe
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2024-02-26  2:05 UTC (permalink / raw
  To: René Scharfe; +Cc: git

René Scharfe <l.s.r@web.de> writes:

> 1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04)
> got a big performance boost in an unusual repository by calculating the
> name length in advance.  This is a bit awkward, as it references the
> name components twice.
>
> Use a memory pool to store the strings for the struct rev_name member
> tip_name.  Using mem_pool_strfmt() allows efficient allocation without
> explicit size calculation.  This simplifies the formatting part of the
> code without giving up performance:
>
> Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all
>   Time (mean ± σ):      1.231 s ±  0.013 s    [User: 1.082 s, System: 0.136 s]
>   Range (min … max):    1.214 s …  1.252 s    10 runs
>
> Benchmark 2: ./git -C ../chromium/src name-rev --all
>   Time (mean ± σ):      1.220 s ±  0.020 s    [User: 1.083 s, System: 0.130 s]
>   Range (min … max):    1.197 s …  1.254 s    10 runs
>
> Don't bother discarding the memory pool just before exiting.  The effort
> for that would be very low, but actually measurable in the above
> example, with no benefit to users.  At least UNLEAK it to calm down leak
> checkers.  This addresses the leaks that 45a14f578e (Revert "name-rev:
> release unused name strings", 2022-04-22) brought back.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> This doesn't make any test script leak-free, though.

Hmph, is the root cause of the leak because no sensible ownership
rules are applied to .tip_name?  Sometimes it is allocated for the
paritcular rev_name, but some other times the pointer is copied from
another rev_name.tip_name?  As the way currently the code uses the
.tip_name member seems to be "allocate and use without any freeing",
I tend to agree that throwing them into mem-pool does make sense.

>  builtin/name-rev.c | 39 ++++++++++++++++++++-------------------
>  1 file changed, 20 insertions(+), 19 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 2dd1807c4e..ad9930c831 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -15,6 +15,7 @@
>  #include "commit-slab.h"
>  #include "commit-graph.h"
>  #include "wildmatch.h"
> +#include "mem-pool.h"
>
>  /*
>   * One day.  See the 'name a rev shortly after epoch' test in t6120 when
> @@ -155,30 +156,25 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>  	return name;
>  }
>
> -static char *get_parent_name(const struct rev_name *name, int parent_number)
> +static char *get_parent_name(const struct rev_name *name, int parent_number,
> +			     struct mem_pool *string_pool)
>  {
> -	struct strbuf sb = STRBUF_INIT;
>  	size_t len;
>
>  	strip_suffix(name->tip_name, "^0", &len);
>  	if (name->generation > 0) {
> -		strbuf_grow(&sb, len +
> -			    1 + decimal_width(name->generation) +
> -			    1 + decimal_width(parent_number));
> -		strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name,
> -			    name->generation, parent_number);
> +		return mem_pool_strfmt(string_pool, "%.*s~%d^%d",
> +				       (int)len, name->tip_name,
> +				       name->generation, parent_number);
>  	} else {
> -		strbuf_grow(&sb, len +
> -			    1 + decimal_width(parent_number));
> -		strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name,
> -			    parent_number);
> +		return mem_pool_strfmt(string_pool, "%.*s^%d",
> +				       (int)len, name->tip_name, parent_number);
>  	}
> -	return strbuf_detach(&sb, NULL);
>  }
>
>  static void name_rev(struct commit *start_commit,
>  		const char *tip_name, timestamp_t taggerdate,
> -		int from_tag, int deref)
> +		int from_tag, int deref, struct mem_pool *string_pool)
>  {
>  	struct prio_queue queue;
>  	struct commit *commit;
> @@ -195,9 +191,10 @@ static void name_rev(struct commit *start_commit,
>  	if (!start_name)
>  		return;
>  	if (deref)
> -		start_name->tip_name = xstrfmt("%s^0", tip_name);
> +		start_name->tip_name = mem_pool_strfmt(string_pool, "%s^0",
> +						       tip_name);
>  	else
> -		start_name->tip_name = xstrdup(tip_name);
> +		start_name->tip_name = mem_pool_strdup(string_pool, tip_name);
>
>  	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
>  	prio_queue_put(&queue, start_commit);
> @@ -235,7 +232,8 @@ static void name_rev(struct commit *start_commit,
>  				if (parent_number > 1)
>  					parent_name->tip_name =
>  						get_parent_name(name,
> -								parent_number);
> +								parent_number,
> +								string_pool);
>  				else
>  					parent_name->tip_name = name->tip_name;
>  				ALLOC_GROW(parents_to_queue,
> @@ -415,7 +413,7 @@ static int name_ref(const char *path, const struct object_id *oid,
>  	return 0;
>  }
>
> -static void name_tips(void)
> +static void name_tips(struct mem_pool *string_pool)
>  {
>  	int i;
>
> @@ -428,7 +426,7 @@ static void name_tips(void)
>  		struct tip_table_entry *e = &tip_table.table[i];
>  		if (e->commit) {
>  			name_rev(e->commit, e->refname, e->taggerdate,
> -				 e->from_tag, e->deref);
> +				 e->from_tag, e->deref, string_pool);
>  		}
>  	}
>  }
> @@ -561,6 +559,7 @@ static void name_rev_line(char *p, struct name_ref_data *data)
>
>  int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  {
> +	struct mem_pool string_pool;
>  	struct object_array revs = OBJECT_ARRAY_INIT;
>  	int all = 0, annotate_stdin = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0;
>  	struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP };
> @@ -587,6 +586,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  		OPT_END(),
>  	};
>
> +	mem_pool_init(&string_pool, 0);
>  	init_commit_rev_name(&rev_names);
>  	git_config(git_default_config, NULL);
>  	argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0);
> @@ -648,7 +648,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  	adjust_cutoff_timestamp_for_slop();
>
>  	for_each_ref(name_ref, &data);
> -	name_tips();
> +	name_tips(&string_pool);
>
>  	if (annotate_stdin) {
>  		struct strbuf sb = STRBUF_INIT;
> @@ -676,6 +676,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  				  always, allow_undefined, data.name_only);
>  	}
>
> +	UNLEAK(string_pool);
>  	UNLEAK(revs);
>  	return 0;
>  }
> --
> 2.44.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mem-pool: add mem_pool_strfmt()
  2024-02-25 11:39 ` [PATCH 1/2] mem-pool: add mem_pool_strfmt() René Scharfe
@ 2024-02-26  7:08   ` Jeff King
  2024-02-26 18:17     ` René Scharfe
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2024-02-26  7:08 UTC (permalink / raw
  To: René Scharfe; +Cc: git

On Sun, Feb 25, 2024 at 12:39:44PM +0100, René Scharfe wrote:

> +static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt,
> +			      va_list ap)
> +{
> +	struct mp_block *block = pool->mp_block;
> +	char *next_free = block ? block->next_free : NULL;
> +	size_t available = block ? block->end - block->next_free : 0;
> +	va_list cp;
> +	int len, len2;
> +	char *ret;
> +
> +	va_copy(cp, ap);
> +	len = vsnprintf(next_free, available, fmt, cp);
> +	va_end(cp);
> +	if (len < 0)
> +		BUG("your vsnprintf is broken (returned %d)", len);
> +
> +	ret = mem_pool_alloc(pool, len + 1);  /* 1 for NUL */
> +
> +	/* Shortcut; relies on mem_pool_alloc() not touching buffer contents. */
> +	if (ret == next_free)
> +		return ret;
> +
> +	len2 = vsnprintf(ret, len + 1, fmt, ap);
> +	if (len2 != len)
> +		BUG("your vsnprintf is broken (returns inconsistent lengths)");
> +	return ret;
> +}

This is pulling heavily from strbuf_vaddf(). This might be a dumb idea,
but... would it be reasonable to instead push a global flag that causes
xmalloc() to use a memory pool instead of the regular heap?

Then you could do something like:

  push_mem_pool(pool);
  str = xstrfmt("%.*s~%d^%d", ...etc...);
  pop_mem_pool(pool);

It's a little more involved at the caller, but it means that it now
works for all allocations, not just this one string helper.

Obviously you'd want it to be a thread-local value to prevent races. But
I still wonder if it could cause havoc when some sub-function makes an
allocation that the caller does not expect.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] name-rev: use mem_pool_strfmt()
  2024-02-26  2:05   ` Junio C Hamano
@ 2024-02-26 18:06     ` René Scharfe
  0 siblings, 0 replies; 10+ messages in thread
From: René Scharfe @ 2024-02-26 18:06 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Am 26.02.24 um 03:05 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> 1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04)
>> got a big performance boost in an unusual repository by calculating the
>> name length in advance.  This is a bit awkward, as it references the
>> name components twice.
>>
>> Use a memory pool to store the strings for the struct rev_name member
>> tip_name.  Using mem_pool_strfmt() allows efficient allocation without
>> explicit size calculation.  This simplifies the formatting part of the
>> code without giving up performance:
>>
>> Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all
>>   Time (mean ± σ):      1.231 s ±  0.013 s    [User: 1.082 s, System: 0.136 s]
>>   Range (min … max):    1.214 s …  1.252 s    10 runs
>>
>> Benchmark 2: ./git -C ../chromium/src name-rev --all
>>   Time (mean ± σ):      1.220 s ±  0.020 s    [User: 1.083 s, System: 0.130 s]
>>   Range (min … max):    1.197 s …  1.254 s    10 runs
>>
>> Don't bother discarding the memory pool just before exiting.  The effort
>> for that would be very low, but actually measurable in the above
>> example, with no benefit to users.  At least UNLEAK it to calm down leak
>> checkers.  This addresses the leaks that 45a14f578e (Revert "name-rev:
>> release unused name strings", 2022-04-22) brought back.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> This doesn't make any test script leak-free, though.
>
> Hmph, is the root cause of the leak because no sensible ownership
> rules are applied to .tip_name?  Sometimes it is allocated for the
> paritcular rev_name, but some other times the pointer is copied from
> another rev_name.tip_name?  As the way currently the code uses the
> .tip_name member seems to be "allocate and use without any freeing",
> I tend to agree that throwing them into mem-pool does make sense.

Yes, the tip_name string is shared between child and first parent (and
its first parent and so on, until a better name is found.  Without this
sharing the peak memory footprint of "git name-rev --all" in Git's repo
goes from 40011328 to 46286528 for me currently.

I'm not too worried about the leak, though, as we can't release the
memory until we're done anyway.  The ability to build strings without
having to calculate their sizes in advance (either by running vsnprintf
twice, which is slow, or by doing format-specific calculations) is more
interesting to me.

The reason why none of the test scripts become leak-free is other
commands (i.e. not git name-rev) that are still leaking.

René

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mem-pool: add mem_pool_strfmt()
  2024-02-26  7:08   ` Jeff King
@ 2024-02-26 18:17     ` René Scharfe
  2024-02-27  7:53       ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2024-02-26 18:17 UTC (permalink / raw
  To: Jeff King; +Cc: git

Am 26.02.24 um 08:08 schrieb Jeff King:
> On Sun, Feb 25, 2024 at 12:39:44PM +0100, René Scharfe wrote:
>
>> +static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt,
>> +			      va_list ap)
>> +{
>> +	struct mp_block *block = pool->mp_block;
>> +	char *next_free = block ? block->next_free : NULL;
>> +	size_t available = block ? block->end - block->next_free : 0;
>> +	va_list cp;
>> +	int len, len2;
>> +	char *ret;
>> +
>> +	va_copy(cp, ap);
>> +	len = vsnprintf(next_free, available, fmt, cp);
>> +	va_end(cp);
>> +	if (len < 0)
>> +		BUG("your vsnprintf is broken (returned %d)", len);
>> +
>> +	ret = mem_pool_alloc(pool, len + 1);  /* 1 for NUL */
>> +
>> +	/* Shortcut; relies on mem_pool_alloc() not touching buffer contents. */
>> +	if (ret == next_free)
>> +		return ret;
>> +
>> +	len2 = vsnprintf(ret, len + 1, fmt, ap);
>> +	if (len2 != len)
>> +		BUG("your vsnprintf is broken (returns inconsistent lengths)");
>> +	return ret;
>> +}
>
> This is pulling heavily from strbuf_vaddf(). This might be a dumb idea,
> but... would it be reasonable to instead push a global flag that causes
> xmalloc() to use a memory pool instead of the regular heap?
>
> Then you could do something like:
>
>   push_mem_pool(pool);
>   str = xstrfmt("%.*s~%d^%d", ...etc...);
>   pop_mem_pool(pool);
>
> It's a little more involved at the caller, but it means that it now
> works for all allocations, not just this one string helper.

That would allow to keep track of allocations that would otherwise leak.
We can achieve that more easily by pushing the pointer to a global array
and never freeing it.  Hmm.

It would not allow the shortcut of using the vast pool as a scratch
space to format the string with a single vsnprintf call in most cases.
Or am I missing something?

René

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mem-pool: add mem_pool_strfmt()
  2024-02-26 18:17     ` René Scharfe
@ 2024-02-27  7:53       ` Jeff King
  2024-02-27 17:58         ` René Scharfe
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2024-02-27  7:53 UTC (permalink / raw
  To: René Scharfe; +Cc: git

On Mon, Feb 26, 2024 at 07:17:00PM +0100, René Scharfe wrote:

> > This is pulling heavily from strbuf_vaddf(). This might be a dumb idea,
> > but... would it be reasonable to instead push a global flag that causes
> > xmalloc() to use a memory pool instead of the regular heap?
> >
> > Then you could do something like:
> >
> >   push_mem_pool(pool);
> >   str = xstrfmt("%.*s~%d^%d", ...etc...);
> >   pop_mem_pool(pool);
> >
> > It's a little more involved at the caller, but it means that it now
> > works for all allocations, not just this one string helper.
> 
> That would allow to keep track of allocations that would otherwise leak.
> We can achieve that more easily by pushing the pointer to a global array
> and never freeing it.  Hmm.

I see two uses for memory pools:

  1. You want to optimize allocations (either locality or per-allocation
     overhead).

  2. You want to make a bunch of allocations with the same lifetime
     without worrying about their lifetime otherwise. And then you can
     free them all in one swoop at the end.

And my impression is that you were interested in (2) here. If what
you're saying is that another way to do that is:

  str = xstrfmt(...whatever...);
  attach_to_pool(pool, str);

  ...much later...
  free_pool(pool);

then yeah, I'd agree. And that is a lot less tricky / invasive than what
I suggested.

> It would not allow the shortcut of using the vast pool as a scratch
> space to format the string with a single vsnprintf call in most cases.
> Or am I missing something?

So here it sounds like you do care about some of the performance
aspects. So no, it would not allow that. You'd be using the vast pool of
heap memory provided by malloc(), and trusting that a call to malloc()
is not that expensive in practice. I don't know how true that is, or how
much it matters for this case.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mem-pool: add mem_pool_strfmt()
  2024-02-27  7:53       ` Jeff King
@ 2024-02-27 17:58         ` René Scharfe
  2024-03-07  9:58           ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: René Scharfe @ 2024-02-27 17:58 UTC (permalink / raw
  To: Jeff King; +Cc: git

Am 27.02.24 um 08:53 schrieb Jeff King:
> On Mon, Feb 26, 2024 at 07:17:00PM +0100, René Scharfe wrote:
>
>>> This is pulling heavily from strbuf_vaddf(). This might be a dumb idea,
>>> but... would it be reasonable to instead push a global flag that causes
>>> xmalloc() to use a memory pool instead of the regular heap?
>>>
>>> Then you could do something like:
>>>
>>>   push_mem_pool(pool);
>>>   str = xstrfmt("%.*s~%d^%d", ...etc...);
>>>   pop_mem_pool(pool);
>>>
>>> It's a little more involved at the caller, but it means that it now
>>> works for all allocations, not just this one string helper.
>>
>> That would allow to keep track of allocations that would otherwise leak.
>> We can achieve that more easily by pushing the pointer to a global array
>> and never freeing it.  Hmm.
>
> I see two uses for memory pools:
>
>   1. You want to optimize allocations (either locality or per-allocation
>      overhead).
>
>   2. You want to make a bunch of allocations with the same lifetime
>      without worrying about their lifetime otherwise. And then you can
>      free them all in one swoop at the end.
>
> And my impression is that you were interested in (2) here. If what
> you're saying is that another way to do that is:
>
>   str = xstrfmt(...whatever...);
>   attach_to_pool(pool, str);
>
>   ...much later...
>   free_pool(pool);
>
> then yeah, I'd agree. And that is a lot less tricky / invasive than what
> I suggested.
>
>> It would not allow the shortcut of using the vast pool as a scratch
>> space to format the string with a single vsnprintf call in most cases.
>> Or am I missing something?
>
> So here it sounds like you do care about some of the performance
> aspects. So no, it would not allow that. You'd be using the vast pool of
> heap memory provided by malloc(), and trusting that a call to malloc()
> is not that expensive in practice. I don't know how true that is, or how
> much it matters for this case.

In the specific use case we can look at three cases:

1. xstrfmt() [before 1c56fc2084]
2. size calculation + pre-size + strbuf_addf() [1c56fc2084]
3. mem_pool_strfmt() [this patch]

The performance of 2 and 3 is basically the same, 1 was worse.

xstrfmt() and strbuf_addf() both wrap strbuf_vaddf(), which uses
malloc() and vsnprintf().  My conclusion is that malloc() is fast
enough, but running vsnprintf() twice is slow (first time to determine
the allocation size, second time to actually build the string).  With a
memory pool we almost always only need to call it once per string, and
that's why I use it here.

The benefit of this patch (to me) is better code readability (no more
manual pre-sizing) without sacrificing performance.

The ability to clear (or UNLEAK) all these strings in one go is just a
bonus.

René

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mem-pool: add mem_pool_strfmt()
  2024-02-27 17:58         ` René Scharfe
@ 2024-03-07  9:58           ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2024-03-07  9:58 UTC (permalink / raw
  To: René Scharfe; +Cc: git

On Tue, Feb 27, 2024 at 06:58:26PM +0100, René Scharfe wrote:

> >> It would not allow the shortcut of using the vast pool as a scratch
> >> space to format the string with a single vsnprintf call in most cases.
> >> Or am I missing something?
> >
> > So here it sounds like you do care about some of the performance
> > aspects. So no, it would not allow that. You'd be using the vast pool of
> > heap memory provided by malloc(), and trusting that a call to malloc()
> > is not that expensive in practice. I don't know how true that is, or how
> > much it matters for this case.
> 
> In the specific use case we can look at three cases:
> 
> 1. xstrfmt() [before 1c56fc2084]
> 2. size calculation + pre-size + strbuf_addf() [1c56fc2084]
> 3. mem_pool_strfmt() [this patch]
> 
> The performance of 2 and 3 is basically the same, 1 was worse.
> 
> xstrfmt() and strbuf_addf() both wrap strbuf_vaddf(), which uses
> malloc() and vsnprintf().  My conclusion is that malloc() is fast
> enough, but running vsnprintf() twice is slow (first time to determine
> the allocation size, second time to actually build the string).  With a
> memory pool we almost always only need to call it once per string, and
> that's why I use it here.
> 
> The benefit of this patch (to me) is better code readability (no more
> manual pre-sizing) without sacrificing performance.
> 
> The ability to clear (or UNLEAK) all these strings in one go is just a
> bonus.

Ah, OK. I admit I did not read the series all that carefully. Mostly I
am just concerned about a world where there are parallel-universe
versions of every allocating function that takes a mem-pool. If it's
limited to this obvious string formatting variant that may not be too
bad, though.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-07  9:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-25 11:39 [PATCH 0/2] name-rev: use memory pool for name strings René Scharfe
2024-02-25 11:39 ` [PATCH 1/2] mem-pool: add mem_pool_strfmt() René Scharfe
2024-02-26  7:08   ` Jeff King
2024-02-26 18:17     ` René Scharfe
2024-02-27  7:53       ` Jeff King
2024-02-27 17:58         ` René Scharfe
2024-03-07  9:58           ` Jeff King
2024-02-25 11:39 ` [PATCH 2/2] name-rev: use mem_pool_strfmt() René Scharfe
2024-02-26  2:05   ` Junio C Hamano
2024-02-26 18:06     ` René Scharfe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.