All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* How to mirror and augment a git repository
@ 2023-03-04 12:19 Sebastian Tennant
  2023-03-05  9:02 ` Bagas Sanjaya
  2023-03-06  8:08 ` Jeff King
  0 siblings, 2 replies; 10+ messages in thread
From: Sebastian Tennant @ 2023-03-04 12:19 UTC (permalink / raw
  To: git

Hello list,

I wish to mirror _and augment_ an upstream git repository.

              .--------.
              |Upstream|
              '--------'
                   |
          .----------------.
          |Augmented mirror|
          '----------------'
           /       |       \
   .--------. .--------. .--------.
   |Client#1| |Client#2| |Client#3|
   '--------' '--------' '--------'

Clients of the augmented mirror must have access to everything
available from upstream but must also be able to collaborate on
additional development branches not available from upstream.

Initial approach:

 Augmented mirror:

   $ git clone --mirror <upstream> upstream
   $ cd upstream
   $ git remote update  # regular cron job

 Clients (bare repo & worktrees preferred):

   $ git clone --bare <mirror> mirror
   $ cd mirror
   $ git config remote.origin.fetch\
         "+refs/heads/*:refs/remotes/origin/*"
   $ git remote update


This arrangement worked fine until I decded to run:

   $ git remote prune origin

on the augmented mirror and lost all the additional development
branches the clients had added and shared amongst themselves.

I've tried running the augmented mirror as a plain bare repo, i.e.

   $ git config --unset remote.origin.fetch
   $ git config --unset remote.origin.mirror

but then the cron job (git remote update) is no longer sufficient in
making all upstream activity available downstream.

So, how best to run an augmented mirror such as this?

If my initial approach was correct, is there a way to protect the
additional branches so that ‘git remote prune origin’ may be run
safely on the augmented mirror from time to time?

Any help/tips/pointers/suggestions much appreciated.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-04 12:19 How to mirror and augment a git repository Sebastian Tennant
@ 2023-03-05  9:02 ` Bagas Sanjaya
  2023-03-05  9:50   ` Sebastian Tennant
  2023-03-06  8:08 ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Bagas Sanjaya @ 2023-03-05  9:02 UTC (permalink / raw
  To: Sebastian Tennant, git

On 3/4/23 19:19, Sebastian Tennant wrote:
> Hello list,
> 
> I wish to mirror _and augment_ an upstream git repository.
> 
>               .--------.
>               |Upstream|
>               '--------'
>                    |
>           .----------------.
>           |Augmented mirror|
>           '----------------'
>            /       |       \
>    .--------. .--------. .--------.
>    |Client#1| |Client#2| |Client#3|
>    '--------' '--------' '--------'
> 
> Clients of the augmented mirror must have access to everything
> available from upstream but must also be able to collaborate on
> additional development branches not available from upstream.
> 

I guess the augmented mirror is integration tree before changes
from clients going upstream, right?

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-05  9:02 ` Bagas Sanjaya
@ 2023-03-05  9:50   ` Sebastian Tennant
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Tennant @ 2023-03-05  9:50 UTC (permalink / raw
  To: Bagas Sanjaya; +Cc: git

Quoth Bagas Sanjaya <bagasdotme@gmail.com>
on Sun, 5 Mar 2023 16:02:05 +0700:
> On 3/4/23 19:19, Sebastian Tennant wrote:
>> Hello list,
>> 
>> I wish to mirror _and augment_ an upstream git repository.
>> 
>>               .--------.
>>               |Upstream|
>>               '--------'
>>                    |
>>           .----------------.
>>           |Augmented mirror|
>>           '----------------'
>>            /       |       \
>>    .--------. .--------. .--------.
>>    |Client#1| |Client#2| |Client#3|
>>    '--------' '--------' '--------'
>> 
>> Clients of the augmented mirror must have access to everything
>> available from upstream but must also be able to collaborate on
>> additional development branches not available from upstream.
>> 
>
> I guess the augmented mirror is integration tree before changes
> from clients going upstream, right?

Some of the changes introduced by the clients may end up being
accepted upstream.  The augmented mirror is primarily a way of
collaborating on and distributing customisations between clients.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-04 12:19 How to mirror and augment a git repository Sebastian Tennant
  2023-03-05  9:02 ` Bagas Sanjaya
@ 2023-03-06  8:08 ` Jeff King
  2023-03-08 16:54   ` Sebastian Tennant
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff King @ 2023-03-06  8:08 UTC (permalink / raw
  To: Sebastian Tennant; +Cc: git

On Sat, Mar 04, 2023 at 12:19:16PM +0000, Sebastian Tennant wrote:

>  Augmented mirror:
> 
>    $ git clone --mirror <upstream> upstream
>    $ cd upstream
>    $ git remote update  # regular cron job

The problem here is that your refspec for "origin" in the mirror will be
"+refs/*:refs/*". So it claims to have responsibility for the whole refs
namespace. And because of the "+", it will happily overwrite local refs
when fetching, including branches pushed up by the client. You noticed
it most with "prune", because that deletes local branches not present in
upstream repo. But a similar problem would happen if both a client and
the upstream had a branch named "foo".

> I've tried running the augmented mirror as a plain bare repo, i.e.
> 
>    $ git config --unset remote.origin.fetch
>    $ git config --unset remote.origin.mirror
> 
> but then the cron job (git remote update) is no longer sufficient in
> making all upstream activity available downstream.

Right. If you drop the fetch refspec entirely, then it will only fetch
HEAD during your cron jobs, which is not what you want. You want a
refspec that tells Git to fetch everything, but you need to divide up
the "refs/" namespace into local stuff and mirrored stuff.

You could use the normal "+refs/heads/*:refs/remotes/origin/*" refspec,
but it's awkward for the clients to access "refs/remotes/" on the
mirror.

So you probably want to keep the upstream branches in "refs/heads/", but
mark a special part of the namespace. Like:

  cd augmented-mirror
  git config remote.origin.fetch '+refs/heads/*:refs/heads/upstream/*'

And then "git fetch" will pull all of the remote branches into the
"upstream/" namespace. And it's safe to prune, because it will only
delete branches in refs/heads/upstream/ (and you may want to just "git
fetch --prune" as you fetch via cron, which is a little more efficient
than a separate "git remote prune").

Clients can name their branches whatever they like, as long as they
don't start with "upstream/", and that won't interfere with the mirror.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-06  8:08 ` Jeff King
@ 2023-03-08 16:54   ` Sebastian Tennant
  2023-03-09  3:12     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Tennant @ 2023-03-08 16:54 UTC (permalink / raw
  To: Jeff King; +Cc: git

Hello Jeff,

Thanks for your explanation.  It's been really helpful.

Quoth Jeff King <peff@peff.net>
on Mon, 6 Mar 2023 03:08:50 -0500:
> On Sat, Mar 04, 2023 at 12:19:16PM +0000, Sebastian Tennant wrote:
>
>>  Augmented mirror:
>>
>>    $ git clone --mirror <upstream> upstream
>>    $ cd upstream
>>    $ git remote update  # regular cron job
>
> The problem here is that your refspec for "origin" in the mirror
> will be "+refs/*:refs/*". So it claims to have responsibility for
> the whole refs namespace. And because of the "+", it will happily
> overwrite local refs when fetching, including branches pushed up by
> the client. You noticed it most with "prune", because that deletes
> local branches not present in upstream repo. But a similar problem
> would happen if both a client and the upstream had a branch named
> "foo".

Understood.

>> I've tried running the augmented mirror as a plain bare repo, i.e.
>>
>>    $ git config --unset remote.origin.fetch
>>    $ git config --unset remote.origin.mirror
>>
>> but then the cron job (git remote update) is no longer sufficient in
>> making all upstream activity available downstream.
>
> Right. If you drop the fetch refspec entirely, then it will only fetch
> HEAD during your cron jobs, which is not what you want.

Precisely.

> You want a refspec that tells Git to fetch everything, but you need
> to divide up the "refs/" namespace into local stuff and mirrored
> stuff.
>
> You could use the normal "+refs/heads/*:refs/remotes/origin/*" refspec,
> but it's awkward for the clients to access "refs/remotes/" on the
> mirror.

Indeed.  To fetch a known ref, a client (also with the normal fetch
refspec) would have to do something like this, for example:

 $ git fetch origin\
       refs/remotes/origin/<ref>:refs/remotes/upstream/<ref>

Alternatively, they could add an additional fetch refspec to their
config:

[remote="origin"]
 ...
 fetch = +refs/heads/*:refs/remotes/origin/*             # normal
 fetch = +refs/remotes/origin/*:refs/remotes/upstream/*  # additional

This would have the advantage of fetching all the upstream refs on the
next update giving them a better idea of what's happening upstream.

Is my understanding more or less correct?

> So you probably want to keep the upstream branches in "refs/heads/",
> but mark a special part of the namespace. Like:
>
>   cd augmented-mirror
>   git config remote.origin.fetch '+refs/heads/*:refs/heads/upstream/*'
>
> And then "git fetch" will pull all of the remote branches into the
> "upstream/" namespace.

I see.  And creating the 'upstream' namespace under 'heads' (instead
of under 'remotes') is crucial, since any client (with a normal fetch
refspec) will then receive those refs automatically, i.e. without the
additional fetch refspec I've described above.

> And it's safe to prune, because it will only delete branches in
> refs/heads/upstream/ (and you may want to just "git fetch --prune"
> as you fetch via cron, which is a little more efficient than a
> separate "git remote prune").

Understood and noted.

> Clients can name their branches whatever they like, as long as they
> don't start with "upstream/", and that won't interfere with the
> mirror.

Got it.  Thanks again.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-08 16:54   ` Sebastian Tennant
@ 2023-03-09  3:12     ` Jeff King
  2023-03-11 10:47       ` Sebastian Tennant
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2023-03-09  3:12 UTC (permalink / raw
  To: Sebastian Tennant; +Cc: git

On Wed, Mar 08, 2023 at 04:54:39PM +0000, Sebastian Tennant wrote:

> > You could use the normal "+refs/heads/*:refs/remotes/origin/*" refspec,
> > but it's awkward for the clients to access "refs/remotes/" on the
> > mirror.
> 
> Indeed.  To fetch a known ref, a client (also with the normal fetch
> refspec) would have to do something like this, for example:
> 
>  $ git fetch origin\
>        refs/remotes/origin/<ref>:refs/remotes/upstream/<ref>
> 
> Alternatively, they could add an additional fetch refspec to their
> config:
> 
> [remote="origin"]
>  ...
>  fetch = +refs/heads/*:refs/remotes/origin/*             # normal
>  fetch = +refs/remotes/origin/*:refs/remotes/upstream/*  # additional
> 
> This would have the advantage of fetching all the upstream refs on the
> next update giving them a better idea of what's happening upstream.
> 
> Is my understanding more or less correct?

Yes, that's exactly correct. In some ways it is cleaner than using
"refs/heads/upstream" because it leaves the whole "refs/heads/"
namespace intact for local client branches. It depends on how much of a
pain it is to configure the extra refspec in each client. :)

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-09  3:12     ` Jeff King
@ 2023-03-11 10:47       ` Sebastian Tennant
  2023-03-11 14:55         ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Tennant @ 2023-03-11 10:47 UTC (permalink / raw
  To: Jeff King; +Cc: git

Hello Jeff,

Alas, I thought I understood fetch refspecs, but it appears not.

Quoth Jeff King <peff@peff.net>
on Wed, 8 Mar 2023 22:12:29 -0500:
> On Wed, Mar 08, 2023 at 04:54:39PM +0000, Sebastian Tennant wrote:
> […]
>> Indeed.  To fetch a known ref, a client (also with the normal fetch
>> refspec) would have to do something like this, for example:
>>
>>  $ git fetch origin\
>>        refs/remotes/origin/<ref>:refs/remotes/upstream/<ref>
>>
>> Alternatively, they could add an additional fetch refspec to their
>> config:
>>
>> [remote="origin"]
>>  ...
>>  fetch = +refs/heads/*:refs/remotes/origin/*             # normal
>>  fetch = +refs/remotes/origin/*:refs/remotes/upstream/*  # additional
>>
>> This would have the advantage of fetching all the upstream refs on
>> the next update giving them a better idea of what's happening
>> upstream.
>>
>> Is my understanding more or less correct?
>
> Yes, that's exactly correct. In some ways it is cleaner than using
> "refs/heads/upstream" because it leaves the whole "refs/heads/"
> namespace intact for local client branches. It depends on how much
> of a pain it is to configure the extra refspec in each client. :)

I decided to go with the cleaner approach.

Here are the actions I'm taking to configure the mirror:

 $ git clone --bare https://url.of/project.git
 $ cd project.git
 $ git remote rename origin upstream
 $ git config remote.upstream.fetch\
       '+refs/heads/*:refs/remotes/upstream/*'
 $ git fetch upstream --prune

At this point there are no refs under refs/heads (and very many under
refs/remotes/upstream).

Here are the actions I'm then taking on a client:

 $ git clone --bare mirror:path/to/project.git
 $ cd project.git
 $ git remote rename origin mirror
 $ git config remote.mirror.fetch\
       '+refs/heads/*:refs/remotes/mirror/*'

At this point, both mirror and client have a normal fetch refspec,
i.e. no additional refspec has been added to the client, yet when I
run:

 $ git fetch mirror --prune

on the client, all the refs on mirror under refs/remotes/upstream are
fetched and placed under refs/remotes/mirror on the client.

My understanding of refspec:

 +refs/heads/*:refs/remotes/mirror/*

is "fetch only those refs under refs/heads and place them under
refs/remotes/mirror", which in this case should mean that no refs are
fetched (since there are none on mirror under refs/heads).

What's going on here that I'm just not getting?

If I add the additional refspec to the client:

 $ git config --add remote.mirror.fetch\
   '+refs/remotes/upstream/*:refs/remotes/upstream/*'

and fetch once more, I end up with all the refs already under
refs/remotes/mirror duplicated under refs/remotes/upstream.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-11 10:47       ` Sebastian Tennant
@ 2023-03-11 14:55         ` Jeff King
  2023-03-12 18:11           ` Sebastian Tennant
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2023-03-11 14:55 UTC (permalink / raw
  To: Sebastian Tennant; +Cc: git

On Sat, Mar 11, 2023 at 10:47:40AM +0000, Sebastian Tennant wrote:

> Here are the actions I'm taking to configure the mirror:
> 
>  $ git clone --bare https://url.of/project.git
>  $ cd project.git
>  $ git remote rename origin upstream
>  $ git config remote.upstream.fetch\
>        '+refs/heads/*:refs/remotes/upstream/*'
>  $ git fetch upstream --prune
> 
> At this point there are no refs under refs/heads (and very many under
> refs/remotes/upstream).

You'd have refs under refs/heads at this point. They were created when
you did the original bare clone (since bare clones fetch all heads to
start with, though they don't set up a refspec).

And they won't be deleted by the pruning fetch, of course, because you
configured the refspec to limit itself to refs/remotes/upstream on the
local side.

If you don't want them (and I think you don't), you can just initialize
the repository directly, and then fetch, like:

  git init --bare project.git
  cd project.git
  git config remote.upstream.url https://url.of/project.git
  [and then configure refspec and fetch --prune as before]

> Here are the actions I'm then taking on a client:
> 
>  $ git clone --bare mirror:path/to/project.git
>  $ cd project.git
>  $ git remote rename origin mirror
>  $ git config remote.mirror.fetch\
>        '+refs/heads/*:refs/remotes/mirror/*'

This bare clone will do the same thing. So you'll end up with a copy of
all of the heads created in the earlier step. Worse, they won't be the
current state of those branches, but stale ones left from when you
created the mirror repo.

I think you want _two_ refspecs in the clients:

  - one to fetch the client-local branches stored on the mirror. That
    is:

      +refs/heads/*:refs/remotes/origin/*

    and those branches just appear as normal.

  - one to fetch the mirrored upstream branches from the special
    namespace on the mirror. That one is:

      +refs/remotes/upstream/*:refs/remotes/upstream/*

> At this point, both mirror and client have a normal fetch refspec,
> i.e. no additional refspec has been added to the client, yet when I
> run:
> 
>  $ git fetch mirror --prune
> 
> on the client, all the refs on mirror under refs/remotes/upstream are
> fetched and placed under refs/remotes/mirror on the client.

The refspec you showed above for the client is fetching from
refs/heads/* on the remote side. So it will never look at
refs/remotes/upstream from the mirror.

> My understanding of refspec:
> 
>  +refs/heads/*:refs/remotes/mirror/*
> 
> is "fetch only those refs under refs/heads and place them under
> refs/remotes/mirror", which in this case should mean that no refs are
> fetched (since there are none on mirror under refs/heads).

Your understanding is right, but the gotcha of how "clone --bare" works
is making things more confusing.

> What's going on here that I'm just not getting?
> 
> If I add the additional refspec to the client:
> 
>  $ git config --add remote.mirror.fetch\
>    '+refs/remotes/upstream/*:refs/remotes/upstream/*'
> 
> and fetch once more, I end up with all the refs already under
> refs/remotes/mirror duplicated under refs/remotes/upstream.

Right. Those are the copies you actually _want_ to have. The ones pulled
from refs/heads/* aren't. You'd expect that to be empty to start, and
then eventually it would be populated by pushes from local clients.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-11 14:55         ` Jeff King
@ 2023-03-12 18:11           ` Sebastian Tennant
  2023-03-13 16:30             ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Tennant @ 2023-03-12 18:11 UTC (permalink / raw
  To: Jeff King; +Cc: git

Quoth Jeff King <peff@peff.net>
on Sat, 11 Mar 2023 09:55:13 -0500:
> On Sat, Mar 11, 2023 at 10:47:40AM +0000, Sebastian Tennant wrote:
>
>> Here are the actions I'm taking to configure the mirror:
>>
>>  $ git clone --bare https://url.of/project.git
>>  $ cd project.git
>>  $ git remote rename origin upstream
>>  $ git config remote.upstream.fetch\
>>        '+refs/heads/*:refs/remotes/upstream/*'
>>  $ git fetch upstream --prune
>>
>> At this point there are no refs under refs/heads (and very many under
>> refs/remotes/upstream).
>
> You'd have refs under refs/heads at this point.  They were created
> when you did the original bare clone (since bare clones fetch all
> heads to start with, though they don't set up a refspec).

I see the refs under refs/heads at last!

My mistake was looking for them in the file system (instead of in file
‘packed-refs’).

> And they won't be deleted by the pruning fetch, of course, because
> you configured the refspec to limit itself to refs/remotes/upstream
> on the local side.
>
> If you don't want them (and I think you don't), you can just initialize
> the repository directly, and then fetch, like:
>
>   git init --bare project.git
>   cd project.git
>   git config remote.upstream.url https://url.of/project.git
>   [and then configure refspec and fetch --prune as before]

This is precisely the behaviour I've been wanting (and mistakenly
expecting).

>> Here are the actions I'm then taking on a client:
>>
>>  $ git clone --bare mirror:path/to/project.git
>>  $ cd project.git
>>  $ git remote rename origin mirror
>>  $ git config remote.mirror.fetch\
>>        '+refs/heads/*:refs/remotes/mirror/*'
>
> This bare clone will do the same thing. So you'll end up with a copy
> of all of the heads created in the earlier step. Worse, they won't
> be the current state of those branches, but stale ones left from
> when you created the mirror repo.
>
> I think you want _two_ refspecs in the clients:
>
>   - one to fetch the client-local branches stored on the mirror. That
>     is:
>
>       +refs/heads/*:refs/remotes/origin/*
>
>     and those branches just appear as normal.
>
>   - one to fetch the mirrored upstream branches from the special
>     namespace on the mirror. That one is:
>
>       +refs/remotes/upstream/*:refs/remotes/upstream/*

Yup, this is what I want, and actually what I already have in place,
configuration wise.

The problem was in my partial understanding of the consequences of
passing option --bare to ‘git clone’ (and in my not thinking to look
in file ‘packed-refs’).

Thanks again Jeff.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How to mirror and augment a git repository
  2023-03-12 18:11           ` Sebastian Tennant
@ 2023-03-13 16:30             ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2023-03-13 16:30 UTC (permalink / raw
  To: Sebastian Tennant; +Cc: git

On Sun, Mar 12, 2023 at 06:11:22PM +0000, Sebastian Tennant wrote:

> > You'd have refs under refs/heads at this point.  They were created
> > when you did the original bare clone (since bare clones fetch all
> > heads to start with, though they don't set up a refspec).
> 
> I see the refs under refs/heads at last!
> 
> My mistake was looking for them in the file system (instead of in file
> ‘packed-refs’).

The best way to check is "git for-each-ref", which handles the storage
details (including if we ever move to a new storage mechanism like
reftable).

> The problem was in my partial understanding of the consequences of
> passing option --bare to ‘git clone’ (and in my not thinking to look
> in file ‘packed-refs’).

To be fair, I usually have to double-check the rules for "--bare"
myself. :)

Glad everything is working now.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-13 16:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-04 12:19 How to mirror and augment a git repository Sebastian Tennant
2023-03-05  9:02 ` Bagas Sanjaya
2023-03-05  9:50   ` Sebastian Tennant
2023-03-06  8:08 ` Jeff King
2023-03-08 16:54   ` Sebastian Tennant
2023-03-09  3:12     ` Jeff King
2023-03-11 10:47       ` Sebastian Tennant
2023-03-11 14:55         ` Jeff King
2023-03-12 18:11           ` Sebastian Tennant
2023-03-13 16:30             ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.