Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* "git fetch --refetch" and multiple (separate/orphan) branches
@ 2023-06-02 21:22 Tao Klerks
  2023-06-03  8:18 ` Robert Coup
  0 siblings, 1 reply; 3+ messages in thread
From: Tao Klerks @ 2023-06-02 21:22 UTC (permalink / raw)
  To: git; +Cc: Robert Coup

Hi folks,

I just recently noticed that "--refetch" was added in 2.36, and I got
pretty excited - the ability to "fill in" missing blobs after a
too-filtered clone is something that I've wanted a number of times, as
I mentioned in 2021 in thread
https://public-inbox.org/git/CAPMMpohOuXX-0YOjV46jFZFvx7mQdj0p7s8SDR4SQxj5hEhCgg@mail.gmail.com/
.

When I first ran "git fetch --refetch" today however (git 2.38.1,
against server git/2.38.4.gl1), with a configured blob filter of
"blob:1100M", a much higher size than any blob in the history, it only
got a *relatively* small number of objects - 3GB of data rather than
the 18GB that a new unfiltered fetch would have retrieved.

After some more testing I tried again, and got the expected outcome
that time. The relevant difference between the two attempts is that in
the first case, when I only got some of the objects I expected, there
was an updated tag as a result of the fetch. The second time, when I
got everything, there were no updated refs.

In this repository there are several "independent" sets of branches,
and the tag updated in that first fetch belongs to one of the
smaller-history branches.

What I believe is happening is that *if* there are refs to be updated
(or new refs, presumably), *then* the objects returned to the client
are only those required for those refs. If, on the other hand, there
are no updated refs, then you get what is advertised in the doc: "all
objects as a fresh clone would [...]".

I've tested a couple of different scenarios and the behavior seems
consistent with this explanation.

In a repo where all branches are derived from the same history, this
probably isn't very noticeable; in the repo I'm working on it makes a
huge difference, so the only way I can imagine getting "correct"
behavior would be to always to a "git fetch" right before the "git
fetch --refetch".

Is this a bug, or expected behavior that should be noted in the doc,
or do we consider the multiple-independent-branches usecase to be
edge-casey enough to be an easter egg for people like me?

Thanks,
Tao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "git fetch --refetch" and multiple (separate/orphan) branches
  2023-06-02 21:22 "git fetch --refetch" and multiple (separate/orphan) branches Tao Klerks
@ 2023-06-03  8:18 ` Robert Coup
  2023-08-10  7:14   ` Tao Klerks
  0 siblings, 1 reply; 3+ messages in thread
From: Robert Coup @ 2023-06-03  8:18 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

Hi Tao

On Fri, 2 Jun 2023 at 22:23, Tao Klerks <tao@klerks.biz> wrote:

> What I believe is happening is that *if* there are refs to be updated
> (or new refs, presumably), *then* the objects returned to the client
> are only those required for those refs. If, on the other hand, there
> are no updated refs, then you get what is advertised in the doc: "all
> objects as a fresh clone would [...]".
>
> I've tested a couple of different scenarios and the behavior seems
> consistent with this explanation.

Do you have a repo & steps that could reproduce this easily? Otherwise
I can try and work up something.

> Is this a bug, or expected behavior that should be noted in the doc,
> or do we consider the multiple-independent-branches usecase to be
> edge-casey enough to be an easter egg for people like me?

At first glance it appears to be a bug.

Thanks,

Rob :)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "git fetch --refetch" and multiple (separate/orphan) branches
  2023-06-03  8:18 ` Robert Coup
@ 2023-08-10  7:14   ` Tao Klerks
  0 siblings, 0 replies; 3+ messages in thread
From: Tao Klerks @ 2023-08-10  7:14 UTC (permalink / raw)
  To: Robert Coup; +Cc: git

Hi Robert,

Sorry about the extended delay, I haven't had a chance to do "git
hacking" in a while.

On Sat, Jun 3, 2023 at 10:18 AM Robert Coup <robert@coup.net.nz> wrote:
>
> On Fri, 2 Jun 2023 at 22:23, Tao Klerks <tao@klerks.biz> wrote:
>
> > What I believe is happening is that *if* there are refs to be updated
> > (or new refs, presumably), *then* the objects returned to the client
> > are only those required for those refs. If, on the other hand, there
> > are no updated refs, then you get what is advertised in the doc: "all
> > objects as a fresh clone would [...]".
> >
> > I've tested a couple of different scenarios and the behavior seems
> > consistent with this explanation.
>
> Do you have a repo & steps that could reproduce this easily? Otherwise
> I can try and work up something.
>

Does the following work? It shows that with a change to the orphan
branch from another client, a refetch in the original client gets
about half the objects (the ones for the orphan branch that was
updated), and in another fetch right after, with no new changes, the
refetch gets all 600-or-so objects.


create_n_commits() {
  for i in $(seq $2); do
    echo "another new line $RANDOM" >> "$1/datafile"
    git -C "$1" add datafile
    git -C "$1" commit -m anothercommit -q
  done
}

mkdir refetch-testing
SERVERFOLDER=refetch-testing/server
git init "$SERVERFOLDER" --bare

CLIENTFOLDER=refetch-testing/client
git init "$CLIENTFOLDER"
git -C "$CLIENTFOLDER" remote add origin "../server"

git -C "$CLIENTFOLDER" checkout -b main
create_n_commits "$CLIENTFOLDER" 100
git -C "$CLIENTFOLDER" push origin HEAD

git -C "$CLIENTFOLDER" checkout --orphan orphan
create_n_commits "$CLIENTFOLDER" 100
git -C "$CLIENTFOLDER" push origin HEAD

echo "---HERE IS A NORMAL FULL REFETCH---"
git -C "$CLIENTFOLDER" fetch --refetch
echo "---NORMAL FULL REFETCH ENDS---"

OTHERCLIENTFOLDER=refetch-testing/otherclient
git clone "$SERVERFOLDER" "$OTHERCLIENTFOLDER"
git -C "$OTHERCLIENTFOLDER" checkout orphan
create_n_commits "$OTHERCLIENTFOLDER" 5
git -C "$OTHERCLIENTFOLDER" push origin HEAD

echo "---HERE IS A WEIRD PARTIAL REFETCH OF ONE BRANCH ONLY---"
git -C "$CLIENTFOLDER" fetch --refetch
echo "---WEIRD PARTIAL REFETCH ENDS---"

echo "---HERE IS NORMAL REFETCH AGAIN---"
git -C "$CLIENTFOLDER" fetch --refetch
echo "---NORMAL REFETCH ENDS---"

rm -rf refetch-testing

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-08-10  7:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 21:22 "git fetch --refetch" and multiple (separate/orphan) branches Tao Klerks
2023-06-03  8:18 ` Robert Coup
2023-08-10  7:14   ` Tao Klerks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).