Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Fraser Hanson <fraser.hanson@gmail.com>
To: Glen Choo <chooglen@google.com>
Cc: git@vger.kernel.org
Subject: Re: git fetch recursion problem
Date: Mon, 19 Jun 2023 11:40:12 -0700	[thread overview]
Message-ID: <CA+3o5aO8oGnSLwTB52nHPsfCU0tSpkkkDV3dZcZ-8vt=BhoNAA@mail.gmail.com> (raw)
In-Reply-To: <CA+3o5aNgChKi-m6F_sYr4Sc+VXP-K2BCMpTpY8Km+kH5u9tkCQ@mail.gmail.com>

I figured out what is going on here.
This problem is caused by the server configuration, but git should
handle this more gracefully because it is easy to hit this problem and
the consequence are severe.

The root cause is that the git server is using the git "dumb http" protocol.
The git server runs apache configured like this:
    <Directory "/srv/git">
        Options +Indexes
        AllowOverride None
        Order allow,deny
        Allow from all
    </Directory>

Apache does not pass requests to git-http-backend.
The server's /srv/git/ dir contains bare git repositories created with
`git clone --mirror`.
The repositories on the server must be prepared before download with
`git update-server-info`.

This is how the bug develops.

### git client requests a partial clone from the server:
$ git clone --filter=blob:none  http://172.20.208.191/git/kmarius/jsregexp.git

### git clone succeeds, but there are no *.promisor files (these
should exist for a partial clone)
$ cd jsregexp
$ find .git -name \*.promisor | wc -l
0

### the cloned repository is still configured as a partial clone
$ cat .git/config
[core]
    repositoryformatversion = 1
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = http://172.20.208.191/git/kmarius/jsregexp.git
    fetch = +refs/heads/*:refs/remotes/origin/*
    promisor = true
    partialclonefilter = blob:none
[branch "master"]
    remote = origin
    merge = refs/heads/master

### git pull works, as long as there are no new commits added to the
server's git repo
$ git pull
Already up to date.

### Next, add a commit into the server's git repository.
### Then update the server's git repository with this:
server# cd /srv/git/kmarius/jsregexp.git
server# git update-server-info

### Back on the client side, git pull is now broken:

$ GIT_TRACE=1 git fetch 2>&1 | head -20
14:21:03.574765 git.c:460               trace: built-in: git fetch
14:21:03.575455 run-command.c:655       trace: run_command:
GIT_DIR=.git git remote-http origin
http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.578239 git.c:750               trace: exec: git-remote-http
origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.578349 run-command.c:655       trace: run_command:
git-remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.590589 run-command.c:655       trace: run_command: git -c
fetch.negotiationAlgorithm=noop fetch origin --no-tags
--no-write-fetch-head --recurse-submodules=no --filter=blob:none
--stdin
14:21:03.593654 git.c:460               trace: built-in: git fetch
origin --no-tags --no-write-fetch-head --recurse-submodules=no
--filter=blob:none --stdin
14:21:03.594362 run-command.c:655       trace: run_command: git
remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.597148 git.c:750               trace: exec: git-remote-http
origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.597253 run-command.c:655       trace: run_command:
git-remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.609592 run-command.c:655       trace: run_command: git -c
fetch.negotiationAlgorithm=noop fetch origin --no-tags
--no-write-fetch-head --recurse-submodules=no --filter=blob:none
--stdin
14:21:03.612749 git.c:460               trace: built-in: git fetch
origin --no-tags --no-write-fetch-head --recurse-submodules=no
--filter=blob:none --stdin
14:21:03.613392 run-command.c:655       trace: run_command: git
remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.616123 git.c:750               trace: exec: git-remote-http
origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.616235 run-command.c:655       trace: run_command:
git-remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.630921 run-command.c:655       trace: run_command: git -c
fetch.negotiationAlgorithm=noop fetch origin --no-tags
--no-write-fetch-head --recurse-submodules=no --filter=blob:none
--stdin
14:21:03.634074 git.c:460               trace: built-in: git fetch
origin --no-tags --no-write-fetch-head --recurse-submodules=no
--filter=blob:none --stdin
14:21:03.634746 run-command.c:655       trace: run_command: git
remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.636388 git.c:750               trace: exec: git-remote-http
origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.636443 run-command.c:655       trace: run_command:
git-remote-http origin http://172.20.208.191/git/kmarius/jsregexp.git
14:21:03.649168 run-command.c:655       trace: run_command: git -c
fetch.negotiationAlgorithm=noop fetch origin --no-tags
--no-write-fetch-head --recurse-submodules=no --filter=blob:none
--stdin
...

This will repeat forever, repeatedly spawning sub-processes.

The impact can be severe.  I discovered this while using Neovim's
lazy.nvim plugin manager within a secure, firewalled environment
without internet access. The editor is configured to pull packages
from an intranet mirror site containing bare clones of GitHub
projects.  The multi-threaded lazy.nvim plugin manager attempts to
update all 30 or so repositories in my configuration simultaneously.
All of the repos with fresh commits on our dumb http mirror server hit
the bug, causes many git processes to spawn very fast.  The system
locks up in seconds, the OOM killer shows up too late to save it.

Git should handle this situation better.  The following would be nice:
* when the http dumb protocol is used for clone or fetch, log an
info-level message which is visible with GIT_TRACE
* when a git partial clone is performed over dumb http protocol, log a
warning-level message explaining that the resulting repo may be broken
* when a 'git fetch' is done from a broken repository, don't
repeatedly spawn git-remote-http processes forever

      reply	other threads:[~2023-06-19 18:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-27 17:57 git fetch recursion problem Fraser Hanson
2023-05-30 19:11 ` Fraser Hanson
2023-06-01 22:40 ` Glen Choo
2023-06-01 22:57   ` Fraser Hanson
2023-06-19 18:40     ` Fraser Hanson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+3o5aO8oGnSLwTB52nHPsfCU0tSpkkkDV3dZcZ-8vt=BhoNAA@mail.gmail.com' \
    --to=fraser.hanson@gmail.com \
    --cc=chooglen@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).