* problem with modification time of packfiles
@ 2015-10-18 21:37 Andreas Amann
2015-10-19 2:57 ` brian m. carlson
0 siblings, 1 reply; 5+ messages in thread
From: Andreas Amann @ 2015-10-18 21:37 UTC (permalink / raw
To: git
git (2.6.1) sometimes updates the modification time of a packfile, even if it
has not changed at all.
On my system this triggers quite expensive an d unnecessary backup
operations, which I would prefer to avoid. Is there a simple way to
keep the mtime of packfiles fixed, once they are created?
Apparently the undesired mtime update is done in
sha1_file.c:freshen_file() which is called (indirectly) by
write_sha1_file(). However I did not understand, why this is done.
Any clarification and pointers, how mtime can be kept constant would be
appreciated.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: problem with modification time of packfiles
2015-10-18 21:37 problem with modification time of packfiles Andreas Amann
@ 2015-10-19 2:57 ` brian m. carlson
2015-10-19 19:59 ` Andreas Amann
0 siblings, 1 reply; 5+ messages in thread
From: brian m. carlson @ 2015-10-19 2:57 UTC (permalink / raw
To: Andreas Amann; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]
On Sun, Oct 18, 2015 at 10:37:55PM +0100, Andreas Amann wrote:
> git (2.6.1) sometimes updates the modification time of a packfile, even if it
> has not changed at all.
>
> On my system this triggers quite expensive an d unnecessary backup
> operations, which I would prefer to avoid. Is there a simple way to
> keep the mtime of packfiles fixed, once they are created?
>
> Apparently the undesired mtime update is done in
> sha1_file.c:freshen_file() which is called (indirectly) by
> write_sha1_file(). However I did not understand, why this is done.
>
> Any clarification and pointers, how mtime can be kept constant would be
> appreciated.
This is required to avoid deleting items that might still be needed.
The commit message for the commit that introduced that function is as
follows:
write_sha1_file: freshen existing objects
When we try to write a loose object file, we first check
whether that object already exists. If so, we skip the
write as an optimization. However, this can interfere with
prune's strategy of using mtimes to mark files in progress.
For example, if a branch contains a particular tree object
and is deleted, that tree object may become unreachable, and
have an old mtime. If a new operation then tries to write
the same tree, this ends up as a noop; we notice we
already have the object and do nothing. A prune running
simultaneously with this operation will see the object as
old, and may delete it.
We can solve this by "freshening" objects that we avoid
writing by updating their mtime. The algorithm for doing so
is essentially the same as that of has_sha1_file. Therefore
we provide a new (static) interface "check_and_freshen",
which finds and optionally freshens the object. It's trivial
to implement freshening and simple checking by tweaking a
single parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perhaps implementing a backup strategy based on content instead of mtime
would be more successful.
--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: problem with modification time of packfiles
2015-10-19 2:57 ` brian m. carlson
@ 2015-10-19 19:59 ` Andreas Amann
2015-10-19 23:09 ` brian m. carlson
0 siblings, 1 reply; 5+ messages in thread
From: Andreas Amann @ 2015-10-19 19:59 UTC (permalink / raw
To: brian m. carlson; +Cc: git
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On Sun, Oct 18, 2015 at 10:37:55PM +0100, Andreas Amann wrote:
>> git (2.6.1) sometimes updates the modification time of a packfile, even if it
>> has not changed at all.
>>
>> On my system this triggers quite expensive an d unnecessary backup
>> operations, which I would prefer to avoid. Is there a simple way to
>> keep the mtime of packfiles fixed, once they are created?
>>
>> Apparently the undesired mtime update is done in
>> sha1_file.c:freshen_file() which is called (indirectly) by
>> write_sha1_file(). However I did not understand, why this is done.
>>
>> Any clarification and pointers, how mtime can be kept constant would be
>> appreciated.
>
> This is required to avoid deleting items that might still be needed.
> The commit message for the commit that introduced that function is as
> follows:
>
> write_sha1_file: freshen existing objects
>
> When we try to write a loose object file, we first check
> whether that object already exists. If so, we skip the
> write as an optimization. However, this can interfere with
> prune's strategy of using mtimes to mark files in progress.
>
Thank you for your answer. However, this reasoning only applies to loose
objects and not packfiles.
My understanding is that "git prune" will not prune any pack files
(except those starting with tmp_). Only "git repack" should do that.
Repack seems to be however mtime agnostic and therefore it does not seem
to be necessary to freshen packfiles.
It therefore seems that git freshens packfiles unnecessarily, which can
lead to expensive and unnecessary backup operations.
Given this, would a trivial patch to remove the freshening of packfiles
be acceptable?
Alternatively, maybe it would be preferrable to use ctime instead of
mtime to mark recently used packfiles and loose objects? This might be
more natural, as mtime is usually associated with a "modification" of
the file itself, which does not happen here. ctime on the other hand
indicates attribute changes. (Instead of utime() in this case chmod()
could be used to update ctime.)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: problem with modification time of packfiles
2015-10-19 19:59 ` Andreas Amann
@ 2015-10-19 23:09 ` brian m. carlson
2015-10-19 23:52 ` Jeff King
0 siblings, 1 reply; 5+ messages in thread
From: brian m. carlson @ 2015-10-19 23:09 UTC (permalink / raw
To: Andreas Amann; +Cc: git, Jeff King, Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]
On Mon, Oct 19, 2015 at 08:59:15PM +0100, Andreas Amann wrote:
> Thank you for your answer. However, this reasoning only applies to loose
> objects and not packfiles.
>
> My understanding is that "git prune" will not prune any pack files
> (except those starting with tmp_). Only "git repack" should do that.
> Repack seems to be however mtime agnostic and therefore it does not seem
> to be necessary to freshen packfiles.
>
> It therefore seems that git freshens packfiles unnecessarily, which can
> lead to expensive and unnecessary backup operations.
>
> Given this, would a trivial patch to remove the freshening of packfiles
> be acceptable?
I'm not familiar enough with the code to say for certain, but it looks
like you're right. Peff, Junio, do you think this is safe, or is there
something we're missing?
--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: problem with modification time of packfiles
2015-10-19 23:09 ` brian m. carlson
@ 2015-10-19 23:52 ` Jeff King
0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2015-10-19 23:52 UTC (permalink / raw
To: brian m. carlson; +Cc: Andreas Amann, git, Junio C Hamano
On Mon, Oct 19, 2015 at 11:09:19PM +0000, brian m. carlson wrote:
> On Mon, Oct 19, 2015 at 08:59:15PM +0100, Andreas Amann wrote:
> > Thank you for your answer. However, this reasoning only applies to loose
> > objects and not packfiles.
> >
> > My understanding is that "git prune" will not prune any pack files
> > (except those starting with tmp_). Only "git repack" should do that.
> > Repack seems to be however mtime agnostic and therefore it does not seem
> > to be necessary to freshen packfiles.
> >
> > It therefore seems that git freshens packfiles unnecessarily, which can
> > lead to expensive and unnecessary backup operations.
> >
> > Given this, would a trivial patch to remove the freshening of packfiles
> > be acceptable?
>
> I'm not familiar enough with the code to say for certain, but it looks
> like you're right. Peff, Junio, do you think this is safe, or is there
> something we're missing?
No, it's not safe. When doing a full repack, we pack only reachable
objects. Unreachable ones are either loosened and given the mtime of the
packfile (from which they can then be pruned), or discarded if the pack
mtime is already old (as an optimization to avoid writing and then
immediately pruning).
See builtin/pack-objects.c:loosen_unused_packed_objects.
-Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-19 23:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-18 21:37 problem with modification time of packfiles Andreas Amann
2015-10-19 2:57 ` brian m. carlson
2015-10-19 19:59 ` Andreas Amann
2015-10-19 23:09 ` brian m. carlson
2015-10-19 23:52 ` Jeff King
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.