Video conference libification eng discussion, today at 16:30 UTC

Git Mailing List Archive mirror
 help / color / mirror / Atom feed

* Video conference libification eng discussion, today at 16:30 UTC
@ 2023-05-18 15:56 Emily Shaffer
  2023-05-18 17:35 ` Emily Shaffer
  0 siblings, 1 reply; 2+ messages in thread
From: Emily Shaffer @ 2023-05-18 15:56 UTC (permalink / raw)
  To: Git List

Hello folks,

Google is hosting a standing engineering discussion about libifying
Git, for contributors interested in participating in or hearing about
the progress of this effort. Expect this discussion to be free-form
and casual - no powerpoints here!

We're hoping to hold this meeting every Thursday at 9:30am Pacific
(16:30 UTC) via Google Meet.

To get an invite to the video conference and the notes document,
please reply to this email. Please also add points to the agenda
(template follows) if you want to raise awareness of them.

We'll choose a topic to discuss at the beginning of the meeting, and
I'll reply afterwards with the notes.

- (asynchronous) What's cooking in libification?
  - Patches we sent regarding libification
  - Patches for review related to the libification effort
- (asynchronous) What happened in the past 1-2 weeks that interested
parties or intermittent contributors need to know?
- (asynchronous) Where are you stuck? What eng discussions do we want
to have? (We'll choose a topic from this list at the beginning of the
meeting.)

Here's the notes from last week:

*   What's cooking in libification?
    *   Patches we sent regarding libification
        *   https://lore.kernel.org/git/pull.1497.git.git.1682104398.gitgitgadget@gmail.com/
config RFC
            *   Some split opinion upstream, interested in reaching
agreement before continuing
            *   Whether to stand up libified API in parallel, or
replace internal callers with new API
            *   Not many places are using extra metadata. Avarab is
suggesting to only change the API invoked for some callers, not all of
them.
            *   Glen: my concern is that the config API is already
huge (bigger than it needs to be). I don't want to add another
parallel set of APIs there 🙁
            *   And, passing the metadata is something we probably
wanted for a while, but considered too churny. With that refactor i
think we can improve a lot of the diagnostics and error reporting in
config in git overall. Would be less painful to not carry the burden
of the two APIs and add new logging piecemeal.
            *   Plus, we need to be able to make these large scale
refactors for libification in general
            *   Emily: how can we move forward? Pros and cons? Get
avar to come to a chalk talk?
            *   Jonathan: avar proposed something, glen explained why
he disagreed, i don't think it's our turn, i think it's avar's turn.
            *   Junio: avar is on pat leave 🙂
            *   Emily: so we shouldn't be waiting for avar, we should
be convincing someone else on the list to agree with glen, right?
            *   Jonathan: originally i was interested in migrating
stuff to the configset first, but past me didn't say that, so i trust
past me too 🙂
            *   There's a summary of that approach in the cover letter
at the bottom.
            *   Glen: avar proposed another option, also linked in the
cover letter.
            *   Jonathan: based on the nature of these patches,
probably there is not less churn by converting to configset first.
            *   Glen: I found the conversion to configset to be really
difficult actually
            *   Glen: The other approach suggested - the config API
has a low-level get\_config\_from\_file() which doesn't evaluate
includes etc. and git\_config(), which reads various sources and
caches in a configset. git\_config() does cache. Avar's alternative
proposal is that since very few places need to read from file
directly, we can drop get\_config\_from\_file() and always use the
caching api to read from things. So we do one complete pass and then
read from cache the rest of the time.
            *   There's a little extra cost of memory and runtime but
it's not that big, and we have a smaller public API.
            *   But it's taking away a sensible purpose-build API and
replacing it with something more specific to the git binary. We don't
need the caching to read a bundle-uri file, for example, we can just
read the file directly.
            *   Junio: one example of reading from file is bundle-uri,
but that's not really a config, same for gitmodules? Why are we still
reading from config file directly at all, even the oldest git config
functions go through the in-memory cache, right?
            *   Glen: yeah, we only use this to read stuff in config
syntax that isn't actually config, those examples and also the
sequencer.
            *   For libification, it makes sense for a libified caller
to want to leave things in config file format, and we can put that in
a low-level config library somewhere, right? But config.h is huge now
so it's not suited to a public api
            *   Jonathan: are you looking for others' opinions? Do you
want to rearrange the patches?
            *   Glen: Would like someone else to agree the approach is
promising before I start
            *   Jonathan: well, I agree 🙂
            *   Emily: might hold more weight to find a non-googler reviewer
            *   Randall: I don't know enough about config to take a
look at it unfortunately. I have a gut feeling of trepidation and not
sure why. Not offering to review it in depth 🙂
            *   Glen: because of the scale of the change, maybe?
            *   Randall: users of the API vs. internal users of the
API, what's the use case for not using cached config?
            *   Emily: probably makes sense for VFS to use a file-only
lib for reading local, gitmodules, etc.
            *   Junio: that's not a reason not to use caching, though, right?
            *   Glen: basically, we don't have a general "read from
file and put it in cache". Implementing it wouldn't necessarily make
the API simpler, so nobody implemented it internally yet
            *   Randall: Are we increasing IO as a tradeoff? If we
normally should be using the cache, then getting config values
shouldn't usually require going to the filesystem (except the first
time). But if we have an alternate path to config data, then we're
double-reading, right? Does this approach increase the amount of
redundant IO we perform?
            *   Junio: historical context: we didn't have
git\_config\_get\_foo() that names a variable, we only had the
git\_config() callback, and that callback always reads all the files.
If you had this call twice, then it was very expensive, historically.
That's why we added caching behavior.
            *   Jonathan: nobody is proposing double read. The
callback approach doesn't double-read anymore, right?
    *   Patches for review related to the libification effort
        *   Elijah’s final cleanup for cache.h:
https://lore.kernel.org/git/pull.1525.git.1683431149.gitgitgadget@gmail.com/
            *   Calvin: currently finishing up review for this
            *   Jonathan: does it need more reviews than calvin's?
            *   Calvin: my main concern is that the split between
objstore and objstore-lowlevel is not super clear. Otherwise i'm happy
with the series
            *   Jonathan: is that a blocking concern?
            *   Calvin: depends on elijah's response. Right now it's
unclear which file to add a new function to, so we need some kind of
definition. So yes, it's blocking-ish? But I think elijah can solve
the concern easily
            *   Jonathan: i'll take a peek tomorrow
            *   Glen: on a previous series, elijah referred to
low-level as having no textual dependency on the header. Would be good
to clarify this upstream
*   (asynchronous) What happened in the past 1-2 weeks that interested
parties or intermittent contributors need to know?
*   (asynchronous) Where are you stuck? What eng discussions do we
want to have? (We'll choose a topic from this list at the beginning of
the meeting.)
*
*   Glen: how are the c-tap-harness discussions going?
    *   Josh is taking this series over including pursuing
discussions, so no update here unfortunately

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Video conference libification eng discussion, today at 16:30 UTC
  2023-05-18 15:56 Video conference libification eng discussion, today at 16:30 UTC Emily Shaffer
@ 2023-05-18 17:35 ` Emily Shaffer
  0 siblings, 0 replies; 2+ messages in thread
From: Emily Shaffer @ 2023-05-18 17:35 UTC (permalink / raw)
  To: Git List

Since I've been not very good at posting notes in a timely manner, I
added the comment-only link to the Google Doc to the event on
tinyurl.com/gitcal.

Here are the notes from today:

*   (asynchronous) What's cooking in libification?
    *   Patches we sent regarding libification
        *   Git-compat-util cleanup
https://lore.kernel.org/git/20230516170932.1358685-1-calvinwan@google.com/
            *   Emily will look tomorrow
        *   C TAP Harness v2
https://lore.kernel.org/git/20230517-unit-tests-v2-v2-0-21b5b60f4b32@google.com/
    *   Patches for review related to the libification effort
*   (asynchronous) What happened in the past 1-2 weeks that interested
parties or intermittent contributors need to know?
*   (asynchronous) Where are you stuck? What eng discussions do we
want to have? (We'll choose a topic from this list at the beginning of
the meeting.)
*   Session topic: &lt;TBD>
*   Glen: C++ compatibility - maybe better to revisit with more folks later
    *   Trying to include these C libs from C++ code. Banned.h becomes
tricky; the C++ will have to #include some C headers we want, which
include C libraries we want. A lot of the Git C headers aren't C++
compatible (inline functions defined in header). What about
header-only libraries which aren't C++ compatible
    *   Emily: do we have any specific examples of problematic header-only libs?
    *   Glen: I need to grab some specific examples, but khash and
hash both seem to have problems
    *   Emily: at least for banned.h we can make sure to include it
from the source only, not the headers
    *   Junio: what kind of problem is it? Does just 'extern c' fix it?
    *   Glen: It fixes some of it, but e.g. implicit cast to/from
void\* doesn't work even after extern c. We do that kind of assignment
in macros a lot
    *   Junio: is it possible to update the macros and make them C++ friendly?
    *   Glen: Yeah, we can, adding the explicit cast is OK for example.
    *   Junio: Might be worth the effort
    *   Emily: it's noise as far as Git is concerned, so do we really
care? Will the patches be reasonable to accept?
    *   Junio: for explicit goal of libification you kind of need to
be prepared to be built from other languages though
    *   Junio: could be acceptable, we had a similar effort some years
ago with a name collision between cpp/c
    *   Glen: if we have prior art where git is included from c++
codebase, then it might be useful
    *   Emily: can we refuse to expose some types to the public (e.g.
if khash doesn't work in c++ at all, then we can refuse to use khash
on a library interface?)
    *   Jonathan: that makes it easier to consume for other languages
because header-only macro libraries are really difficult to write
bindings for - e.g. khash writes the types as well as functions on the
fly, that makes it very difficult to bind to
    *   Emily: is there a performance hit for moving inlined functions
into the source instead of the header?
    *   Calvin: the compiler inlines some other stuff sometimes too,
so your inline function getting moved into the relevant .c might get
inlined anyway….
    *   Emily: maybe we just try it and see with godbolt or post-preprocessor
    *   Calvin: header-only files are a problem because we can't
libify them, because there's no .c to turn into .o/a/so, right? Can we
just shove it into something more generalized like khash-wrapper.[ch]
    *   Emily: because khash does code generation, not really.
    *   Glen: we could use an intermediate header to declare the
functions which would be generated, we can use that to hide khash for
a specific "template" type. If we really need to expose a
post-generated khash at an API we can do this
    *   It might be difficult to not ever expose khash, because the
headers might want it?
    *   Emily: &lt;contrived example about khash.h, config.h, config-public.h>
    *   Junio: packed-bitmap code uses khash, if we look for
identifies starting with kh\_\*, these are exposed, there are some
places where we can change the function definition, but the return
types are using kh\_iter, which is tricky to avoid exposing. Should we
hide it? Wrap it? Use it as is? Maybe we'll end up hiding some stuff
that was generated and exposing some other stuff that's basically
constant? Not sure
    *   Emily: for packed-bitmap for example, do we already know
consistently which type khash will use to generate stuff? Is it easy
to convert here to post-generated code instead of template/macro
generation stuff?
    *   Glen: The generation stuff happens when you invoke the C
macro, so when you already typed KHASH\_INIT(...), you only have so
many types that are used.
    *   Junio: if you grep kh\_.\* in the .h files, there aren't that
many places using it. Oidset exposes khash types, etc. Might be
tedious to wrap them but shouldn't be difficult, if we need it for
external libraries to call without being aware that those functions
are khash based.
    *   Jonathan: if they're just pointers, we can probably just
forward decl the generated shape. Elijah sent some code that does
something like this already.
    *   Elijah: if you look for KHASH\_INIT functions, we instantiate
6 of these throughout the codebase - one's specific for fsmonitor and
one specific to delta islands, and then the rest are used in a few
places for oidset, oidpathmap, etc. It's not that many places

On Thu, May 18, 2023 at 8:56 AM Emily Shaffer <nasamuffin@google.com> wrote:
>
> Hello folks,
>
> Google is hosting a standing engineering discussion about libifying
> Git, for contributors interested in participating in or hearing about
> the progress of this effort. Expect this discussion to be free-form
> and casual - no powerpoints here!
>
> We're hoping to hold this meeting every Thursday at 9:30am Pacific
> (16:30 UTC) via Google Meet.
>
> To get an invite to the video conference and the notes document,
> please reply to this email. Please also add points to the agenda
> (template follows) if you want to raise awareness of them.
>
> We'll choose a topic to discuss at the beginning of the meeting, and
> I'll reply afterwards with the notes.
>
> - (asynchronous) What's cooking in libification?
>   - Patches we sent regarding libification
>   - Patches for review related to the libification effort
> - (asynchronous) What happened in the past 1-2 weeks that interested
> parties or intermittent contributors need to know?
> - (asynchronous) Where are you stuck? What eng discussions do we want
> to have? (We'll choose a topic from this list at the beginning of the
> meeting.)
>
> Here's the notes from last week:
>
> *   What's cooking in libification?
>     *   Patches we sent regarding libification
>         *   https://lore.kernel.org/git/pull.1497.git.git.1682104398.gitgitgadget@gmail.com/
> config RFC
>             *   Some split opinion upstream, interested in reaching
> agreement before continuing
>             *   Whether to stand up libified API in parallel, or
> replace internal callers with new API
>             *   Not many places are using extra metadata. Avarab is
> suggesting to only change the API invoked for some callers, not all of
> them.
>             *   Glen: my concern is that the config API is already
> huge (bigger than it needs to be). I don't want to add another
> parallel set of APIs there 🙁
>             *   And, passing the metadata is something we probably
> wanted for a while, but considered too churny. With that refactor i
> think we can improve a lot of the diagnostics and error reporting in
> config in git overall. Would be less painful to not carry the burden
> of the two APIs and add new logging piecemeal.
>             *   Plus, we need to be able to make these large scale
> refactors for libification in general
>             *   Emily: how can we move forward? Pros and cons? Get
> avar to come to a chalk talk?
>             *   Jonathan: avar proposed something, glen explained why
> he disagreed, i don't think it's our turn, i think it's avar's turn.
>             *   Junio: avar is on pat leave 🙂
>             *   Emily: so we shouldn't be waiting for avar, we should
> be convincing someone else on the list to agree with glen, right?
>             *   Jonathan: originally i was interested in migrating
> stuff to the configset first, but past me didn't say that, so i trust
> past me too 🙂
>             *   There's a summary of that approach in the cover letter
> at the bottom.
>             *   Glen: avar proposed another option, also linked in the
> cover letter.
>             *   Jonathan: based on the nature of these patches,
> probably there is not less churn by converting to configset first.
>             *   Glen: I found the conversion to configset to be really
> difficult actually
>             *   Glen: The other approach suggested - the config API
> has a low-level get\_config\_from\_file() which doesn't evaluate
> includes etc. and git\_config(), which reads various sources and
> caches in a configset. git\_config() does cache. Avar's alternative
> proposal is that since very few places need to read from file
> directly, we can drop get\_config\_from\_file() and always use the
> caching api to read from things. So we do one complete pass and then
> read from cache the rest of the time.
>             *   There's a little extra cost of memory and runtime but
> it's not that big, and we have a smaller public API.
>             *   But it's taking away a sensible purpose-build API and
> replacing it with something more specific to the git binary. We don't
> need the caching to read a bundle-uri file, for example, we can just
> read the file directly.
>             *   Junio: one example of reading from file is bundle-uri,
> but that's not really a config, same for gitmodules? Why are we still
> reading from config file directly at all, even the oldest git config
> functions go through the in-memory cache, right?
>             *   Glen: yeah, we only use this to read stuff in config
> syntax that isn't actually config, those examples and also the
> sequencer.
>             *   For libification, it makes sense for a libified caller
> to want to leave things in config file format, and we can put that in
> a low-level config library somewhere, right? But config.h is huge now
> so it's not suited to a public api
>             *   Jonathan: are you looking for others' opinions? Do you
> want to rearrange the patches?
>             *   Glen: Would like someone else to agree the approach is
> promising before I start
>             *   Jonathan: well, I agree 🙂
>             *   Emily: might hold more weight to find a non-googler reviewer
>             *   Randall: I don't know enough about config to take a
> look at it unfortunately. I have a gut feeling of trepidation and not
> sure why. Not offering to review it in depth 🙂
>             *   Glen: because of the scale of the change, maybe?
>             *   Randall: users of the API vs. internal users of the
> API, what's the use case for not using cached config?
>             *   Emily: probably makes sense for VFS to use a file-only
> lib for reading local, gitmodules, etc.
>             *   Junio: that's not a reason not to use caching, though, right?
>             *   Glen: basically, we don't have a general "read from
> file and put it in cache". Implementing it wouldn't necessarily make
> the API simpler, so nobody implemented it internally yet
>             *   Randall: Are we increasing IO as a tradeoff? If we
> normally should be using the cache, then getting config values
> shouldn't usually require going to the filesystem (except the first
> time). But if we have an alternate path to config data, then we're
> double-reading, right? Does this approach increase the amount of
> redundant IO we perform?
>             *   Junio: historical context: we didn't have
> git\_config\_get\_foo() that names a variable, we only had the
> git\_config() callback, and that callback always reads all the files.
> If you had this call twice, then it was very expensive, historically.
> That's why we added caching behavior.
>             *   Jonathan: nobody is proposing double read. The
> callback approach doesn't double-read anymore, right?
>     *   Patches for review related to the libification effort
>         *   Elijah’s final cleanup for cache.h:
> https://lore.kernel.org/git/pull.1525.git.1683431149.gitgitgadget@gmail.com/
>             *   Calvin: currently finishing up review for this
>             *   Jonathan: does it need more reviews than calvin's?
>             *   Calvin: my main concern is that the split between
> objstore and objstore-lowlevel is not super clear. Otherwise i'm happy
> with the series
>             *   Jonathan: is that a blocking concern?
>             *   Calvin: depends on elijah's response. Right now it's
> unclear which file to add a new function to, so we need some kind of
> definition. So yes, it's blocking-ish? But I think elijah can solve
> the concern easily
>             *   Jonathan: i'll take a peek tomorrow
>             *   Glen: on a previous series, elijah referred to
> low-level as having no textual dependency on the header. Would be good
> to clarify this upstream
> *   (asynchronous) What happened in the past 1-2 weeks that interested
> parties or intermittent contributors need to know?
> *   (asynchronous) Where are you stuck? What eng discussions do we
> want to have? (We'll choose a topic from this list at the beginning of
> the meeting.)
> *
> *   Glen: how are the c-tap-harness discussions going?
>     *   Josh is taking this series over including pursuing
> discussions, so no update here unfortunately

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-05-18 17:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 15:56 Video conference libification eng discussion, today at 16:30 UTC Emily Shaffer
2023-05-18 17:35 ` Emily Shaffer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).