All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Commit/merge messages for binary files
@ 2011-02-17 20:40 Piotr Krukowiecki
  2011-02-18 13:53 ` Matthieu Moy
  0 siblings, 1 reply; 12+ messages in thread
From: Piotr Krukowiecki @ 2011-02-17 20:40 UTC (permalink / raw
  To: git

Hi,

there's a different output when committing change and when merging
change for a binary file.
Do the insertions/deletions have any meaning for binary files?

For example:

$ git commit -m Updated
[topic 5da30ce] Updated
 1 files changed, 8 insertions(+), 103 deletions(-)
 rewrite blob.o (100%)

$ git merge topic
Updating 75ab259..5da30ce
Fast-forward
 blob.o |  Bin 25920 -> 4364 bytes
 1 files changed, 0 insertions(+), 0 deletions(-)

-- 
Piotrek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Commit/merge messages for binary files
  2011-02-17 20:40 Commit/merge messages for binary files Piotr Krukowiecki
@ 2011-02-18 13:53 ` Matthieu Moy
  2011-02-18 20:30   ` Piotr Krukowiecki
  0 siblings, 1 reply; 12+ messages in thread
From: Matthieu Moy @ 2011-02-18 13:53 UTC (permalink / raw
  To: Piotr Krukowiecki; +Cc: git

Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:

> Hi,
>
> there's a different output when committing change and when merging
> change for a binary file.
> Do the insertions/deletions have any meaning for binary files?

No. They're inserted/deleted *lines*, and that wouldn't make sense for
binary files.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Commit/merge messages for binary files
  2011-02-18 13:53 ` Matthieu Moy
@ 2011-02-18 20:30   ` Piotr Krukowiecki
  2011-02-18 21:19     ` Matthieu Moy
  0 siblings, 1 reply; 12+ messages in thread
From: Piotr Krukowiecki @ 2011-02-18 20:30 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git

W dniu 18.02.2011 14:53, Matthieu Moy pisze:
> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
> 
>> Hi,
>>
>> there's a different output when committing change and when merging
>> change for a binary file.
>> Do the insertions/deletions have any meaning for binary files?
> 
> No. They're inserted/deleted *lines*, and that wouldn't make sense for
> binary files.
> 

So it's a bug?

-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Commit/merge messages for binary files
  2011-02-18 20:30   ` Piotr Krukowiecki
@ 2011-02-18 21:19     ` Matthieu Moy
  2011-02-18 21:42       ` Piotr Krukowiecki
  0 siblings, 1 reply; 12+ messages in thread
From: Matthieu Moy @ 2011-02-18 21:19 UTC (permalink / raw
  To: Piotr Krukowiecki; +Cc: git

Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:

> W dniu 18.02.2011 14:53, Matthieu Moy pisze:
>> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
>> 
>>> Hi,
>>>
>>> there's a different output when committing change and when merging
>>> change for a binary file.
>>> Do the insertions/deletions have any meaning for binary files?
>> 
>> No. They're inserted/deleted *lines*, and that wouldn't make sense for
>> binary files.
>
> So it's a bug?

I don't see any bug. There were no insertion/deletion in text files,
hence you see 0 for both.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Commit/merge messages for binary files
  2011-02-18 21:19     ` Matthieu Moy
@ 2011-02-18 21:42       ` Piotr Krukowiecki
  2011-02-19  7:06         ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Piotr Krukowiecki @ 2011-02-18 21:42 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git

W dniu 18.02.2011 22:19, Matthieu Moy pisze:
> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
> 
>> W dniu 18.02.2011 14:53, Matthieu Moy pisze:
>>> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
>>>
>>>> Hi,
>>>>
>>>> there's a different output when committing change and when merging
>>>> change for a binary file.
>>>> Do the insertions/deletions have any meaning for binary files?
>>>
>>> No. They're inserted/deleted *lines*, and that wouldn't make sense for
>>> binary files.
>>
>> So it's a bug?
> 
> I don't see any bug. There were no insertion/deletion in text files,
> hence you see 0 for both.

Maybe I wasn't clear. Both the commit and the merge was for the same binary
file.

In case of commit there were non-zero insertions/deletions. 
See example below - notice both update binary file blob.o:

$ git commit -m Updated
[topic 5da30ce] Updated
 1 files changed, 8 insertions(+), 103 deletions(-)
 rewrite blob.o (100%)

$ git merge topic
Updating 75ab259..5da30ce
Fast-forward
 blob.o |  Bin 25920 -> 4364 bytes
 1 files changed, 0 insertions(+), 0 deletions(-)


Hope that helps,

-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Commit/merge messages for binary files
  2011-02-18 21:42       ` Piotr Krukowiecki
@ 2011-02-19  7:06         ` Jeff King
  2011-02-19  8:03           ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-02-19  7:06 UTC (permalink / raw
  To: Piotr Krukowiecki; +Cc: Matthieu Moy, git

On Fri, Feb 18, 2011 at 10:42:42PM +0100, Piotr Krukowiecki wrote:

> Maybe I wasn't clear. Both the commit and the merge was for the same binary
> file.
> 
> In case of commit there were non-zero insertions/deletions. 
> See example below - notice both update binary file blob.o:
> 
> $ git commit -m Updated
> [topic 5da30ce] Updated
>  1 files changed, 8 insertions(+), 103 deletions(-)
>  rewrite blob.o (100%)

Yeah, that seems weird. I think it might have to do with break
detection in the diff, which is on during the commit summary diff:

  $ git init
  $ dd if=/dev/urandom of=foo.rand bs=1k count=4
  $ git add foo.rand
  $ git commit -m one
  [master (root-commit) 8c5783a] one
   1 files changed, 0 insertions(+), 0 deletions(-)
   create mode 100644 foo.rand
  $ git show --oneline --stat --summary
  8c5783a one
   foo.rand |  Bin 0 -> 4096 bytes
   1 files changed, 0 insertions(+), 0 deletions(-)
   create mode 100644 foo.rand

So far so good. Now let's rewrite it.

  $ dd if=/dev/urandom of=foo.rand bs=1k count=4
  $ git commit -a -m two
  [master 43961bc] two
   1 files changed, 15 insertions(+), 19 deletions(-)
   rewrite foo.rand (100%)

Now that seems wrong. What about doing another diff:

  $ git show --oneline --stat --summary
  43961bc two
   foo.rand |  Bin 4096 -> 4096 bytes
   1 files changed, 0 insertions(+), 0 deletions(-)

That's right. Now let's turn on break detection:

  $ git show --oneline --stat --summary -B
  43961bc two
   foo.rand |   34 +++++++++++++++-------------------
   1 files changed, 15 insertions(+), 19 deletions(-)
   rewrite foo.rand (100%)

Broken again. So I guess we have some problem with making sure we treat
broken filepairs as binary.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Commit/merge messages for binary files
  2011-02-19  7:06         ` Jeff King
@ 2011-02-19  8:03           ` Jeff King
  2011-02-19  8:04             ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King
  2011-02-19  8:16             ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2011-02-19  8:03 UTC (permalink / raw
  To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git

On Sat, Feb 19, 2011 at 02:06:01AM -0500, Jeff King wrote:

> Now that seems wrong. What about doing another diff:
> 
>   $ git show --oneline --stat --summary
>   43961bc two
>    foo.rand |  Bin 4096 -> 4096 bytes
>    1 files changed, 0 insertions(+), 0 deletions(-)
> 
> That's right. Now let's turn on break detection:
> 
>   $ git show --oneline --stat --summary -B
>   43961bc two
>    foo.rand |   34 +++++++++++++++-------------------
>    1 files changed, 15 insertions(+), 19 deletions(-)
>    rewrite foo.rand (100%)
> 
> Broken again. So I guess we have some problem with making sure we treat
> broken filepairs as binary.

Yep, builtin_diffstat just didn't handle that case. Two patch series to
follow:

  [1/2]: diff: handle diffstat of rewritten binary files
  [2/2]: diff: don't retrieve binary blobs for diffstat

The first one fixes the bug, and the second is a convenient optimization
I noticed while there.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] diff: handle diffstat of rewritten binary files
  2011-02-19  8:03           ` Jeff King
@ 2011-02-19  8:04             ` Jeff King
  2011-02-19  8:16             ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff King @ 2011-02-19  8:04 UTC (permalink / raw
  To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git

The logic in builtin_diffstat assumes that a
complete_rewrite pair should have its lines counted. This is
nonsensical for binary files and leads to confusing things
like:

  $ git diff --stat --summary HEAD^ HEAD
   foo.rand |  Bin 4096 -> 4096 bytes
   1 files changed, 0 insertions(+), 0 deletions(-)

  $ git diff --stat --summary -B HEAD^ HEAD
   foo.rand |   34 +++++++++++++++-------------------
   1 files changed, 15 insertions(+), 19 deletions(-)
   rewrite foo.rand (100%)

So let's reorder the function to handle binary files first
(which from diffstat's perspective look like complete
rewrites anyway), then rewrites, then actual diffstats.

There are two bonus prizes to this reorder:

  1. It gets rid of a now-superfluous goto.

  2. The binary case is at the top, which means we can
     further optimize it in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c |   24 ++++++++++++++----------
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/diff.c b/diff.c
index 5422c43..2ac0fe9 100644
--- a/diff.c
+++ b/diff.c
@@ -2079,25 +2079,30 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		data->is_unmerged = 1;
 		return;
 	}
-	if (complete_rewrite) {
+
+	if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
+		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
+			die("unable to read files to diff");
+		data->is_binary = 1;
+		data->added = mf2.size;
+		data->deleted = mf1.size;
+	}
+
+	else if (complete_rewrite) {
 		diff_populate_filespec(one, 0);
 		diff_populate_filespec(two, 0);
 		data->deleted = count_lines(one->data, one->size);
 		data->added = count_lines(two->data, two->size);
-		goto free_and_return;
 	}
-	if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
-		die("unable to read files to diff");
 
-	if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
-		data->is_binary = 1;
-		data->added = mf2.size;
-		data->deleted = mf1.size;
-	} else {
+	else {
 		/* Crazy xdl interfaces.. */
 		xpparam_t xpp;
 		xdemitconf_t xecfg;
 
+		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
+			die("unable to read files to diff");
+
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = o->xdl_opts;
@@ -2105,7 +2110,6 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 			      &xpp, &xecfg);
 	}
 
- free_and_return:
 	diff_free_filespec_data(one);
 	diff_free_filespec_data(two);
 }
-- 
1.7.4.1.26.g3372c

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
  2011-02-19  8:03           ` Jeff King
  2011-02-19  8:04             ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King
@ 2011-02-19  8:16             ` Jeff King
  2011-02-21 23:33               ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-02-19  8:16 UTC (permalink / raw
  To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git

We only need the size, which is much cheaper to get,
especially if it is a big binary file.

Signed-off-by: Jeff King <peff@peff.net>
---
This of course is only really helpful if you have marked the files as
binary via gitattributes, since otherwise we have to pull in the blob to
find out that it's binary. :)

But in my real-world photo/video repo, which has media files marked via
gitattributes as binary (but to textconv exif tags, of course). The
commit in question has 26 files totalling 88 megabytes.

  $ time git show --stat >old
  real    0m0.428s
  user    0m0.392s
  sys     0m0.032s

  $ time git.jk.diffstat-binary show --stat >new
  real    0m0.005s
  user    0m0.004s
  sys     0m0.000s

  $ cmp old new && echo ok
  ok

8500% speedup isn't too bad. :)

 diff.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 2ac0fe9..6640857 100644
--- a/diff.c
+++ b/diff.c
@@ -245,6 +245,15 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
 	return 0;
 }
 
+/* like fill_mmfile, but only for size, so we can avoid retrieving blob */
+static unsigned long diff_filespec_size(struct diff_filespec *one)
+{
+	if (!DIFF_FILE_VALID(one))
+		return 0;
+	diff_populate_filespec(one, 1);
+	return one->size;
+}
+
 static int count_trailing_blank(mmfile_t *mf, unsigned ws_rule)
 {
 	char *ptr = mf->ptr;
@@ -2081,11 +2090,9 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 	}
 
 	if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
-		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
-			die("unable to read files to diff");
 		data->is_binary = 1;
-		data->added = mf2.size;
-		data->deleted = mf1.size;
+		data->added = diff_filespec_size(two);
+		data->deleted = diff_filespec_size(one);
 	}
 
 	else if (complete_rewrite) {
-- 
1.7.4.1.26.g3372c

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
  2011-02-19  8:16             ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
@ 2011-02-21 23:33               ` Junio C Hamano
  2011-02-22 15:37                 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-02-21 23:33 UTC (permalink / raw
  To: Jeff King; +Cc: Piotr Krukowiecki, Junio C Hamano, Matthieu Moy, git

Jeff King <peff@peff.net> writes:

> We only need the size, which is much cheaper to get,
> especially if it is a big binary file.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Nice ;-)  Do we want a test or two for 1/2 by the way?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
  2011-02-21 23:33               ` Junio C Hamano
@ 2011-02-22 15:37                 ` Jeff King
  2011-02-22 19:00                   ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-02-22 15:37 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Piotr Krukowiecki, Matthieu Moy, git

On Mon, Feb 21, 2011 at 03:33:05PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We only need the size, which is much cheaper to get,
> > especially if it is a big binary file.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> 
> Nice ;-)  Do we want a test or two for 1/2 by the way?

Yeah, it's probably a good idea. Can you squash this in, or should I
resend?

---
diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
index 7e7b307..7d7470f 100755
--- a/t/t4031-diff-rewrite-binary.sh
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -44,6 +44,13 @@ test_expect_success 'rewrite diff can show binary patch' '
 	grep "GIT binary patch" diff
 '
 
+test_expect_success 'rewrite diff --stat shows binary changes' '
+	git diff -B --stat --summary >diff &&
+	grep "Bin" diff &&
+	grep "0 insertions.*0 deletions" diff &&
+	grep " rewrite file" diff
+'
+
 {
 	echo "#!$SHELL_PATH"
 	cat <<'EOF'

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
  2011-02-22 15:37                 ` Jeff King
@ 2011-02-22 19:00                   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-02-22 19:00 UTC (permalink / raw
  To: Jeff King; +Cc: Junio C Hamano, Piotr Krukowiecki, Matthieu Moy, git

Jeff King <peff@peff.net> writes:

>> Nice ;-)  Do we want a test or two for 1/2 by the way?
>
> Yeah, it's probably a good idea. Can you squash this in,...

Done; thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-02-22 19:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17 20:40 Commit/merge messages for binary files Piotr Krukowiecki
2011-02-18 13:53 ` Matthieu Moy
2011-02-18 20:30   ` Piotr Krukowiecki
2011-02-18 21:19     ` Matthieu Moy
2011-02-18 21:42       ` Piotr Krukowiecki
2011-02-19  7:06         ` Jeff King
2011-02-19  8:03           ` Jeff King
2011-02-19  8:04             ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King
2011-02-19  8:16             ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
2011-02-21 23:33               ` Junio C Hamano
2011-02-22 15:37                 ` Jeff King
2011-02-22 19:00                   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.