All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* problem with modification time of packfiles
@ 2015-10-18 21:37 Andreas Amann
  2015-10-19  2:57 ` brian m. carlson
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Amann @ 2015-10-18 21:37 UTC (permalink / raw
  To: git

git (2.6.1) sometimes updates the modification time of a packfile, even if it
has not changed at all.

On my system this triggers quite expensive an d unnecessary backup
operations, which I would prefer to avoid.  Is there a simple way to
keep the mtime of packfiles fixed, once they are created?

Apparently the undesired mtime update is done in
sha1_file.c:freshen_file() which is called (indirectly) by
write_sha1_file().  However I did not understand, why this is done.

Any clarification and pointers, how mtime can be kept constant would be
appreciated.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with modification time of packfiles
  2015-10-18 21:37 problem with modification time of packfiles Andreas Amann
@ 2015-10-19  2:57 ` brian m. carlson
  2015-10-19 19:59   ` Andreas Amann
  0 siblings, 1 reply; 5+ messages in thread
From: brian m. carlson @ 2015-10-19  2:57 UTC (permalink / raw
  To: Andreas Amann; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

On Sun, Oct 18, 2015 at 10:37:55PM +0100, Andreas Amann wrote:
> git (2.6.1) sometimes updates the modification time of a packfile, even if it
> has not changed at all.
> 
> On my system this triggers quite expensive an d unnecessary backup
> operations, which I would prefer to avoid.  Is there a simple way to
> keep the mtime of packfiles fixed, once they are created?
> 
> Apparently the undesired mtime update is done in
> sha1_file.c:freshen_file() which is called (indirectly) by
> write_sha1_file().  However I did not understand, why this is done.
> 
> Any clarification and pointers, how mtime can be kept constant would be
> appreciated.

This is required to avoid deleting items that might still be needed.
The commit message for the commit that introduced that function is as
follows:

  write_sha1_file: freshen existing objects
  
  When we try to write a loose object file, we first check
  whether that object already exists. If so, we skip the
  write as an optimization. However, this can interfere with
  prune's strategy of using mtimes to mark files in progress.
  
  For example, if a branch contains a particular tree object
  and is deleted, that tree object may become unreachable, and
  have an old mtime. If a new operation then tries to write
  the same tree, this ends up as a noop; we notice we
  already have the object and do nothing. A prune running
  simultaneously with this operation will see the object as
  old, and may delete it.
  
  We can solve this by "freshening" objects that we avoid
  writing by updating their mtime. The algorithm for doing so
  is essentially the same as that of has_sha1_file. Therefore
  we provide a new (static) interface "check_and_freshen",
  which finds and optionally freshens the object. It's trivial
  to implement freshening and simple checking by tweaking a
  single parameter.
  
  Signed-off-by: Jeff King <peff@peff.net>
  Signed-off-by: Junio C Hamano <gitster@pobox.com>

Perhaps implementing a backup strategy based on content instead of mtime
would be more successful.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with modification time of packfiles
  2015-10-19  2:57 ` brian m. carlson
@ 2015-10-19 19:59   ` Andreas Amann
  2015-10-19 23:09     ` brian m. carlson
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Amann @ 2015-10-19 19:59 UTC (permalink / raw
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On Sun, Oct 18, 2015 at 10:37:55PM +0100, Andreas Amann wrote:
>> git (2.6.1) sometimes updates the modification time of a packfile, even if it
>> has not changed at all.
>> 
>> On my system this triggers quite expensive an d unnecessary backup
>> operations, which I would prefer to avoid.  Is there a simple way to
>> keep the mtime of packfiles fixed, once they are created?
>> 
>> Apparently the undesired mtime update is done in
>> sha1_file.c:freshen_file() which is called (indirectly) by
>> write_sha1_file().  However I did not understand, why this is done.
>> 
>> Any clarification and pointers, how mtime can be kept constant would be
>> appreciated.
>
> This is required to avoid deleting items that might still be needed.
> The commit message for the commit that introduced that function is as
> follows:
>
>   write_sha1_file: freshen existing objects
>   
>   When we try to write a loose object file, we first check
>   whether that object already exists. If so, we skip the
>   write as an optimization. However, this can interfere with
>   prune's strategy of using mtimes to mark files in progress.
>   

Thank you for your answer.  However, this reasoning only applies to loose
objects and not packfiles.

My understanding is that "git prune" will not prune any pack files
(except those starting with tmp_).  Only "git repack" should do that.
Repack seems to be however mtime agnostic and therefore it does not seem
to be necessary to freshen packfiles.

It therefore seems that git freshens packfiles unnecessarily, which can
lead to expensive and unnecessary backup operations. 

Given this, would a trivial patch to remove the freshening of packfiles
be acceptable?

Alternatively, maybe it would be preferrable to use ctime instead of
mtime to mark recently used packfiles and loose objects?  This might be
more natural, as mtime is usually associated with a "modification" of
the file itself, which does not happen here.  ctime on the other hand
indicates attribute changes.  (Instead of utime() in this case chmod()
could be used to update ctime.)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with modification time of packfiles
  2015-10-19 19:59   ` Andreas Amann
@ 2015-10-19 23:09     ` brian m. carlson
  2015-10-19 23:52       ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: brian m. carlson @ 2015-10-19 23:09 UTC (permalink / raw
  To: Andreas Amann; +Cc: git, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

On Mon, Oct 19, 2015 at 08:59:15PM +0100, Andreas Amann wrote:
> Thank you for your answer.  However, this reasoning only applies to loose
> objects and not packfiles.
> 
> My understanding is that "git prune" will not prune any pack files
> (except those starting with tmp_).  Only "git repack" should do that.
> Repack seems to be however mtime agnostic and therefore it does not seem
> to be necessary to freshen packfiles.
> 
> It therefore seems that git freshens packfiles unnecessarily, which can
> lead to expensive and unnecessary backup operations.
> 
> Given this, would a trivial patch to remove the freshening of packfiles
> be acceptable?

I'm not familiar enough with the code to say for certain, but it looks
like you're right.  Peff, Junio, do you think this is safe, or is there
something we're missing?
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problem with modification time of packfiles
  2015-10-19 23:09     ` brian m. carlson
@ 2015-10-19 23:52       ` Jeff King
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2015-10-19 23:52 UTC (permalink / raw
  To: brian m. carlson; +Cc: Andreas Amann, git, Junio C Hamano

On Mon, Oct 19, 2015 at 11:09:19PM +0000, brian m. carlson wrote:

> On Mon, Oct 19, 2015 at 08:59:15PM +0100, Andreas Amann wrote:
> > Thank you for your answer.  However, this reasoning only applies to loose
> > objects and not packfiles.
> > 
> > My understanding is that "git prune" will not prune any pack files
> > (except those starting with tmp_).  Only "git repack" should do that.
> > Repack seems to be however mtime agnostic and therefore it does not seem
> > to be necessary to freshen packfiles.
> > 
> > It therefore seems that git freshens packfiles unnecessarily, which can
> > lead to expensive and unnecessary backup operations.
> > 
> > Given this, would a trivial patch to remove the freshening of packfiles
> > be acceptable?
> 
> I'm not familiar enough with the code to say for certain, but it looks
> like you're right.  Peff, Junio, do you think this is safe, or is there
> something we're missing?

No, it's not safe. When doing a full repack, we pack only reachable
objects. Unreachable ones are either loosened and given the mtime of the
packfile (from which they can then be pruned), or discarded if the pack
mtime is already old (as an optimization to avoid writing and then
immediately pruning).

See builtin/pack-objects.c:loosen_unused_packed_objects.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-19 23:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-18 21:37 problem with modification time of packfiles Andreas Amann
2015-10-19  2:57 ` brian m. carlson
2015-10-19 19:59   ` Andreas Amann
2015-10-19 23:09     ` brian m. carlson
2015-10-19 23:52       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.