Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Kyle Lippincott <spectral@google.com>
To: Calvin Wan <calvinwan@google.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	 rsbecker@nexbridge.com, phillip.wood@dunelm.org.uk,
	 Josh Steadmon <steadmon@google.com>,
	Emily Shaffer <nasamuffin@google.com>,
	 Enrico Mrass <emrass@google.com>
Subject: Re: [RFD] Libification proposal: separate internal and external interfaces
Date: Wed, 8 May 2024 18:00:24 -0700	[thread overview]
Message-ID: <CAO_smVg_RmL2_ERXG0uZ+CgEHZBm34PvmNucbSTdsEDLjU-fhA@mail.gmail.com> (raw)
In-Reply-To: <20240422162617.308366-1-calvinwan@google.com>

Apologies for being absent from a thread that I was the initial author
of, there was a family emergency.

On Mon, Apr 22, 2024 at 9:26 AM Calvin Wan <calvinwan@google.com> wrote:
>
> Thanks everyone for your initial comments on this discussion. I wanted
> to provide some examples of how an internal/external interface could
> look in practice -- originally I had intended to use git-std-lib v6 as
> that example, but found that it fell short due to feedback that only
> being able to expose a smaller subset of functions in that library would
> be insufficient for users since they should have the same tools that we
> have for building Git. In this reply, I have two examples of paths
> forward that such an interface could look like for future libraries
> (both methods would require a non-trivial amount of code change so this
> seemed like a better idea than completely refactoring git-std-lib twice).
>
> Part of the reason for wanting to expose a smaller subset of library
> functions initially was to avoid having to expose functions that do
> things a library function shouldn't, mainly those with die() calls. I
> chose `strbuf_grow()` as the example function to be libified with an
> internal/external interface since it has a die() call and in a library,
> we would want to pass that error up rather than die()ing. I have two
> ideas for how such an interface could look. For reference, this is how
> `strbuf_grow()` currently looks:
>
> void strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         int new_buf = !sb->alloc;
>         if (unsigned_add_overflows(extra, 1) ||
>             unsigned_add_overflows(sb->len, extra + 1))
>                 die("you want to use way too much memory");
>         if (new_buf)
>                 sb->buf = NULL;
>         ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>         if (new_buf)
>                 sb->buf[0] = '\0';
> }
>
> The first idea involves turning `strbuf_grow()` into a wrapper function
> that invokes its equivalent library function, eg.
> `libgit_strbuf_grow()`:
>
> int libgit_strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         int new_buf = !sb->alloc;
>         if (unsigned_add_overflows(extra, 1) ||
>             unsigned_add_overflows(sb->len, extra + 1))
>                 return -1;
>         if (new_buf)
>                 sb->buf = NULL;
>         ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>         if (new_buf)
>                 sb->buf[0] = '\0';
>         return 0;
> }
>
> void strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         if (libgit_strbuf_grow(sb, extra))
>                 die("you want to use way too much memory");
> }
>

When writing the initial document, I was thinking of something almost
the exact opposite of this, but I hadn't been thinking about the
problems with the code calling `die` and similar. I was thinking along
the lines of:

// In a .c file that is "library internal".
// This translation unit can assume that we've done #include
"git-compat-util.h" and anything else it wants.
int strbuf_grow(struct strbuf *sb, size_t extra)
{
        int new_buf = !sb->alloc;
        if (unsigned_add_overflows(extra, 1) ||
            unsigned_add_overflows(sb->len, extra + 1))
                return -1;
        if (new_buf)
                sb->buf = NULL;
        ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
        if (new_buf)
                sb->buf[0] = '\0';
        return 0;
}

// In a .c file for the interface as used by other projects:
int gitlib_strbuf_grow(struct strbuf *sb, size_t extra)  // Or maybe int64_t?
{
        return strbuf_grow(sb, extra);
}

After reading this thread, I agree that this isn't sufficient to avoid
churn, and I like the proposed interface, but with one main tweak:

// RENAMED from previous code block (no other changes)
// In a .c file that is "library internal".
// This translation unit can assume that we've done #include
"git-compat-util.h" and anything else it wants.
int strbuf_grow_impl(struct strbuf *sb, size_t extra)
{
        int new_buf = !sb->alloc;
        if (unsigned_add_overflows(extra, 1) ||
            unsigned_add_overflows(sb->len, extra + 1))
                return -1;
        if (new_buf)
                sb->buf = NULL;
        ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
        if (new_buf)
                sb->buf[0] = '\0';
        return 0;
}

// In a .c file for the interface as used by other projects:
int gitlib_strbuf_grow(struct strbuf *sb, size_t extra)
{
        return strbuf_grow_impl(sb, extra);
}

// NEW from previous code block
// In a .c file for the interface as used by the git project itself:
void strbuf_grow(struct strbuf *sb, size_t extra)
{
        if (strbuf_grow_impl(sb, extra))
                die("you want to use way too much memory")
}

I'm recommending this pattern primarily because of our platform
support concerns. If we can't elevate the entire project to assume
that C99 is available in a standards compliant way, we can't have
header files that look like this be part of building the `git` binary
itself (or any of the helper binaries):

#include <stdint.h>  /* Our platform support policy doesn't allow this */
int gitlib_strbuf_grow(struct strbuf *sb, size_t extra);

It's not just the #includes, though. As stated in the original
document, we run into problems with platform-defined types and
everything else that's tweaked in/provided by git-compat-util.h:
- This header file that's included in the non-git projects can't use
`off_t` or `struct stat`.
- This header file can't assume that any types related to sockets are
available, because those come from <sys/socket.h> on Linux and from
winsock2.h on Windows.
- It can't assume that we have `NORETURN` (and it can't assume that we
don't need it), or `MAYBE_UNUSED`, or ...

Most of those issues _may_ be able to be resolved by having a
"gitlib-compat-util.h" file included at the top of the "external
project" .h file. But that's insufficient. Example:

#include <unistd.h>
#include "git/gitlib.h"  // Oops, the `#define _GNU_SOURCE` in the
transitive "gitlib-compat-util.h" has no effect!

Or the opposite:

#include "git/gitlib.h"  // Oops, this set _FILE_OFFSET_BITS=64 when
the project wasn't expecting it!
#include <unistd.h>  // For this translation unit only, `off_t` might
be a different size than elsewhere in the project, I hope you like
debugging segfaults.

The only ways I could come up with to solve these problems were to
hold the "external interface" to a different standard, that is
simultaneously both more permissive (it can assume C99), and
restrictive (it can't rely on things like off_t), and this makes it
incompatible with these external interfaces being used by the git
project itself, which has a broader set of platforms it needs to
support. But the external interfaces must be very simple wrappers
around code that _is_ shared with the git executable.

> (Note a context object could also be added as a parameter to
> `libgit_strbuf_grow()` for error messages and possibly global variables.)
>
> In this scenario, we would be exposing `libgit_strbuf_grow()` to
> external consumers of the library, while not having to refactor internal
> uses of `strbuf_grow()`. This method would reduce initial churn within
> the codebase, however, we would want to eventually get rid of
> `strbuf_grow()` and use `libgit_strbuf_grow()` internally as well. I
> envision that it would be easier to remove die()'s all at once, from top
> down, once libification has progressed further since top level callers
> do not have to worry about refactoring any callers to accomodate passing
> up error messages/codes.
>
> The shortfall of this approach is that we'd be carrying two different
> functions for every library function until we are able to remove all of
> them. It would also create additional toil for Git contributors to
> figure out which version of the function should be used.
>
> The second idea removes the need for two different functions by removing
> the wrapper function and instead refactoring all callers of
> `strbuf_grow()` (and subsequently callers of other library functions).
>
> int libgit_strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         int new_buf = !sb->alloc;
>         if (unsigned_add_overflows(extra, 1) ||
>             unsigned_add_overflows(sb->len, extra + 1))
>                 return -1;
>         if (new_buf)
>                 sb->buf = NULL;
>         ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>         if (new_buf)
>                 sb->buf[0] = '\0';
>         return 0;
> }
>
> void strbuf_grow_caller() {
>         strbuf *sb;
>         size_t extra;
>
>         // if only success/failure is passed up
>         if (libgit_strbuf_grow(sb, extra))
>                 die("you want to use way too much memory");
>
>         // if context object is used
>         if (libgit_strbuf_grow(sb, extra, context_obj))
>                 die(context_obj->error_msg);
>
>         // if there are multiple error codes that can be passed up
>         if (libgit_strbuf_grow(sb, extra) == -1)
>                 die("you want to use way too much memory");
>         else if (libgit_strbuf_grow(sb, extra) == -2)
>                 die("some other error");
> }
>
> One shortcoming of this approach is the need to refactor all callers of
> library functions, but that can be handled better and the churn made
> more visible with a coccinelle patch. Another shortcoming is the need
> for lengthier code blocks whenever calling a library function, however,
> it could also be seen as a benefit since the caller would understand the
> function can die(). These error messages would also ideally be passed up
> as well in the future rather than die()ing.
>
> While I tried to find a solution that avoided the shortcomings of both
> approaches, I think that answer simply does not exist so the ideas above
> are what I believe to be the least disruptive options. I'm wondering
> which interface would be more suitable, and also open to hearing if
> there are any other ideas!

  parent reply	other threads:[~2024-05-09  1:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 14:18 [RFD] Libification proposal: separate internal and external interfaces Calvin Wan
2024-04-07 21:33 ` brian m. carlson
2024-04-07 21:48   ` rsbecker
2024-04-08  1:09     ` brian m. carlson
2024-04-08 11:07       ` rsbecker
2024-04-08 21:29       ` Junio C Hamano
2024-04-09  0:35         ` brian m. carlson
2024-04-09 17:26           ` Calvin Wan
2024-04-09  9:40         ` Phillip Wood
2024-04-09 17:30           ` Calvin Wan
2024-04-22 16:26 ` Calvin Wan
2024-04-22 20:28   ` Junio C Hamano
2024-04-23  9:57   ` phillip.wood123
2024-05-09  1:00   ` Kyle Lippincott [this message]
2024-05-10  9:52     ` Phillip Wood
2024-05-10 21:35       ` Kyle Lippincott
2024-05-09 19:45   ` Kyle Lippincott
2024-05-09 20:14     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAO_smVg_RmL2_ERXG0uZ+CgEHZBm34PvmNucbSTdsEDLjU-fhA@mail.gmail.com \
    --to=spectral@google.com \
    --cc=calvinwan@google.com \
    --cc=emrass@google.com \
    --cc=git@vger.kernel.org \
    --cc=nasamuffin@google.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=rsbecker@nexbridge.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).