Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, Git List <git@vger.kernel.org>
Subject: Re: free and errno, was Re: [PATCH] apply: replace mksnpath() with a mkpathdup() call
Date: Sun, 14 Apr 2024 17:17:23 +0200	[thread overview]
Message-ID: <10b7fff3-7a2c-4555-9210-8000aae43583@web.de> (raw)
In-Reply-To: <20240407011834.GA1085004@coredump.intra.peff.net>

Am 07.04.24 um 03:18 schrieb Jeff King:
> On Sat, Apr 06, 2024 at 07:45:17PM +0200, René Scharfe wrote:
>
>> I calmed down a bit now.  And ask myself how widespread the issue actually
>> is.  Used the following silly Coccinelle rule to find functions that use
>> errno after a free(3) call:
>>
>> @@
>> @@
>> - free(...);
>>   ...
>>   errno
>>
>> Found only a handful of places, and they all set errno explicitly, so
>> they are fine.
>
> Is that enough, though? Imagine I have:
>
>   int foo(const char *x)
>   {
>      char *y = xstfrmt("something/%s", x);
>      int fd = open(y, ...);
>      free(y);
>      return fd;
>   }
>
> Then errno is important if some caller of foo() checks errno after foo()
> returns an error. And that caller might not even be in the same file.

Yes.

> In other words, it really depends on the contract of foo() with respect
> to errno. And I don't think there is a way in C to specify that
> contract in away that the compiler can understand.

The context attribute of sparse could be used, I guess.  We'd have to
duplicate the declaration of all library functions, though, to tag them
with a may_have_changed_errno attribute.  And we'd need to clear it on
success, so we'd have to modify all callers.  On the flip side it would
allow detecting consecutive errno changes, e.g. by two I/O functions
with no error checking in between.  But the implementation effort seems
way too high.

It would be easier if the error information was in one place instead of
one bit in the return value (NULL or less than 0) and the rest in errno.

>> No idea how to match any use of errno except assignment.  And no idea how
>> to find indirect callers of free(3) that use errno with no potential
>> assignment in between.
>
> Yeah, I guess the indirect callers of free() are really the flip-side.
> My example was indirect users of errno. ;)

Indeed.

>>> The other reason is that macros (especially of system names) can create
>>> their own headaches.  We could require xfree() everywhere as a
>>> complement to xmalloc (or maybe even just places where the errno
>>> preservation seems useful), but that's one more thing to remember.
>>
>> An xfree() to go along with xmalloc()/xrealloc()/xcalloc()/xstrdup() would
>> fit in nicely and might be easier to remember than free() after a while.
>> Having to convert thousands of calls is unappealing, though.
>
> My biggest concern with it is that we'd have to remember to use it, and
> there's not good compiler enforcement. But I guess coccinelle can help
> us there.q

Yes, converting all free(3) calls is easy with Coccinelle, and the same
semantic patch can be used to enforce the use of xfree().  Still the
initial diff would be huge (2000+ changed lines in 300+ files).

> My secondary concern is that it might make people think that xmalloc()
> and xfree() are always paired, and thus you can do clever things in one
> as long as the other matches it. But we sometimes free memory from
> system routines like getline(). Maybe a scary comment would be enough?

Amazing foresight!  Currently we only use getline(3) (which can act like
realloc(3)) in contrib/credential/, though.  Any others?

The "x" prefix doesn't promise exclusivity (hermetic seal?), but it
certainly suggests there are some shared feature between the allocator
functions and xfree(), while they only have in common that the are
wrapped.  We could call the latter errno_preserving_free() instead or so.

I'm more worried about the overhead.  For a 0-1% slowdown (in the case of
git log) git_free() or xfree() give us accurate error numbers on all
platforms.  Non-misleading error codes are worth a lot (seen my share of
cursed messages), but xfree() is only necessary in the error handling
code.  The happy path can call free(3) directly without harm.

But how to classify call sites accordingly?  It's easy for programmers
once they know it's necessary.  Is there a way in C, macro or Coccinelle
to have our cake and only eat it if really needed?  I don't see any. :-|

>> Found four places that did not expect free(3) to mess up their errno by
>> running the test suite with that.  Patch below.
>
> These are perhaps worth fixing (though not if we come up with a
> universal solution). But I'd be surprised if they are the only ones. By
> its nature, this problem only manifests when there are actual errors,
> and our test suite is mostly checking happy paths. So I'd assume there
> are a ton of "if (ret < 0) { free(foo); return -1; }" spots that are
> simply not exercised by the test suite at all.

Makes sense.

René

  reply	other threads:[~2024-04-14 15:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 21:08 [PATCH] apply: replace mksnpath() with a mkpathdup() call René Scharfe
2024-04-04 21:29 ` Junio C Hamano
2024-04-04 22:53 ` free and errno, was " Jeff King
2024-04-04 23:08   ` Junio C Hamano
2024-04-05 10:52   ` René Scharfe
2024-04-05 17:35     ` Jeff King
2024-04-05 17:41       ` Jeff King
2024-04-06 17:45       ` René Scharfe
2024-04-07  1:18         ` Jeff King
2024-04-14 15:17           ` René Scharfe [this message]
2024-04-24  1:11             ` Jeff King
2024-04-05 10:53 ` [PATCH v2 1/2] apply: avoid fixed-size buffer in create_one_file() René Scharfe
2024-04-05 10:56   ` [PATCH v2 2/2] path: remove mksnpath() René Scharfe
2024-04-05 17:37     ` Jeff King
2024-04-05 16:51   ` [PATCH v2 1/2] apply: avoid fixed-size buffer in create_one_file() Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10b7fff3-7a2c-4555-9210-8000aae43583@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).