Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: <rsbecker@nexbridge.com>
To: "'Dragan Simic'" <dsimic@manjaro.org>, <git@vger.kernel.org>
Subject: RE: [RFC] Avaiable disk space when automatic garbage collection kicks in
Date: Wed, 12 Jun 2024 13:04:12 -0400	[thread overview]
Message-ID: <123401dabcea$8fe50110$afaf0330$@nexbridge.com> (raw)
In-Reply-To: <164fc547afd66caf58019b6c614b5134@manjaro.org>

On Wednesday, June 12, 2024 12:25 PM, Dragan Simic wrote:
>[Maybe this RFC deserves a "bump", so let me try.]
>On 2024-04-08 18:29, Dragan Simic wrote:
>> Hello all,
>>
>> A few days ago I've noticed a rather unusual issue, but still a
>> realistic one.  When automatic garbage collection kicks in, as a
>> result of gc.auto >= 0, which is also the default, the local
>> repository can be left in a rather strange state if there isn't enough
>> free space available on the respective filesystem for writing the
>> objects, etc.
>>
>> It might be a good idea to estimate the required amount of free
>> filesystem space before starting the garbage collection, be it
>> automatic or manual, and refuse the operation if there isn't enough
>> free space available.
>>
>> As a note, the need_to_gc() function already does something a bit
>> similar with the available system RAM.
>>
>> Any thoughts?

I am not sure there is a good portable way of reliably doing this using OS
APIs, particularly with virtual disks and shared file sets. An edge
condition would be setting up a separate file set for content inside .git
for massive repositories, so taking an estimate in the working index would
not fix the above.

It might be useful to add a configuration item like: 

gc.reserve = size   # possibly with mb, kb, gb, tb, or some other suffix
indicating how much space must be available to reserve prior to starting the
operation.

Then creating a file (with real content) inside .git (or .git/objects) with
the reserved size. If the file cannot be constructed, gc gets suppressed.
This can happen for more than size issue - permissions, for example. Note
also that some file systems to not actually allocate the entire space just
setting EOF, so that technique, while fast, will also not work portably.

After the reserve works, it can be removed (and hopefully NFS will properly
close it), providing a lock is put in place, followed by gc running. It
might be useful to do this even on a non-auto gc. While this can be
expensive (writing a block of stuff twice), it is safer this way.

Just a thought.

Randall.


  reply	other threads:[~2024-06-12 17:04 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08 16:29 [RFC] Avaiable disk space when automatic garbage collection kicks in Dragan Simic
2024-06-12 16:25 ` Dragan Simic
2024-06-12 17:04   ` rsbecker [this message]
2024-06-12 17:25     ` Dragan Simic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='123401dabcea$8fe50110$afaf0330$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=dsimic@manjaro.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).