* Commit/merge messages for binary files
@ 2011-02-17 20:40 Piotr Krukowiecki
2011-02-18 13:53 ` Matthieu Moy
0 siblings, 1 reply; 12+ messages in thread
From: Piotr Krukowiecki @ 2011-02-17 20:40 UTC (permalink / raw
To: git
Hi,
there's a different output when committing change and when merging
change for a binary file.
Do the insertions/deletions have any meaning for binary files?
For example:
$ git commit -m Updated
[topic 5da30ce] Updated
1 files changed, 8 insertions(+), 103 deletions(-)
rewrite blob.o (100%)
$ git merge topic
Updating 75ab259..5da30ce
Fast-forward
blob.o | Bin 25920 -> 4364 bytes
1 files changed, 0 insertions(+), 0 deletions(-)
--
Piotrek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files
2011-02-17 20:40 Commit/merge messages for binary files Piotr Krukowiecki
@ 2011-02-18 13:53 ` Matthieu Moy
2011-02-18 20:30 ` Piotr Krukowiecki
0 siblings, 1 reply; 12+ messages in thread
From: Matthieu Moy @ 2011-02-18 13:53 UTC (permalink / raw
To: Piotr Krukowiecki; +Cc: git
Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
> Hi,
>
> there's a different output when committing change and when merging
> change for a binary file.
> Do the insertions/deletions have any meaning for binary files?
No. They're inserted/deleted *lines*, and that wouldn't make sense for
binary files.
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files
2011-02-18 13:53 ` Matthieu Moy
@ 2011-02-18 20:30 ` Piotr Krukowiecki
2011-02-18 21:19 ` Matthieu Moy
0 siblings, 1 reply; 12+ messages in thread
From: Piotr Krukowiecki @ 2011-02-18 20:30 UTC (permalink / raw
To: Matthieu Moy; +Cc: git
W dniu 18.02.2011 14:53, Matthieu Moy pisze:
> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
>
>> Hi,
>>
>> there's a different output when committing change and when merging
>> change for a binary file.
>> Do the insertions/deletions have any meaning for binary files?
>
> No. They're inserted/deleted *lines*, and that wouldn't make sense for
> binary files.
>
So it's a bug?
--
Piotr Krukowiecki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files
2011-02-18 20:30 ` Piotr Krukowiecki
@ 2011-02-18 21:19 ` Matthieu Moy
2011-02-18 21:42 ` Piotr Krukowiecki
0 siblings, 1 reply; 12+ messages in thread
From: Matthieu Moy @ 2011-02-18 21:19 UTC (permalink / raw
To: Piotr Krukowiecki; +Cc: git
Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
> W dniu 18.02.2011 14:53, Matthieu Moy pisze:
>> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
>>
>>> Hi,
>>>
>>> there's a different output when committing change and when merging
>>> change for a binary file.
>>> Do the insertions/deletions have any meaning for binary files?
>>
>> No. They're inserted/deleted *lines*, and that wouldn't make sense for
>> binary files.
>
> So it's a bug?
I don't see any bug. There were no insertion/deletion in text files,
hence you see 0 for both.
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files
2011-02-18 21:19 ` Matthieu Moy
@ 2011-02-18 21:42 ` Piotr Krukowiecki
2011-02-19 7:06 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Piotr Krukowiecki @ 2011-02-18 21:42 UTC (permalink / raw
To: Matthieu Moy; +Cc: git
W dniu 18.02.2011 22:19, Matthieu Moy pisze:
> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
>
>> W dniu 18.02.2011 14:53, Matthieu Moy pisze:
>>> Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
>>>
>>>> Hi,
>>>>
>>>> there's a different output when committing change and when merging
>>>> change for a binary file.
>>>> Do the insertions/deletions have any meaning for binary files?
>>>
>>> No. They're inserted/deleted *lines*, and that wouldn't make sense for
>>> binary files.
>>
>> So it's a bug?
>
> I don't see any bug. There were no insertion/deletion in text files,
> hence you see 0 for both.
Maybe I wasn't clear. Both the commit and the merge was for the same binary
file.
In case of commit there were non-zero insertions/deletions.
See example below - notice both update binary file blob.o:
$ git commit -m Updated
[topic 5da30ce] Updated
1 files changed, 8 insertions(+), 103 deletions(-)
rewrite blob.o (100%)
$ git merge topic
Updating 75ab259..5da30ce
Fast-forward
blob.o | Bin 25920 -> 4364 bytes
1 files changed, 0 insertions(+), 0 deletions(-)
Hope that helps,
--
Piotr Krukowiecki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files
2011-02-18 21:42 ` Piotr Krukowiecki
@ 2011-02-19 7:06 ` Jeff King
2011-02-19 8:03 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-02-19 7:06 UTC (permalink / raw
To: Piotr Krukowiecki; +Cc: Matthieu Moy, git
On Fri, Feb 18, 2011 at 10:42:42PM +0100, Piotr Krukowiecki wrote:
> Maybe I wasn't clear. Both the commit and the merge was for the same binary
> file.
>
> In case of commit there were non-zero insertions/deletions.
> See example below - notice both update binary file blob.o:
>
> $ git commit -m Updated
> [topic 5da30ce] Updated
> 1 files changed, 8 insertions(+), 103 deletions(-)
> rewrite blob.o (100%)
Yeah, that seems weird. I think it might have to do with break
detection in the diff, which is on during the commit summary diff:
$ git init
$ dd if=/dev/urandom of=foo.rand bs=1k count=4
$ git add foo.rand
$ git commit -m one
[master (root-commit) 8c5783a] one
1 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 foo.rand
$ git show --oneline --stat --summary
8c5783a one
foo.rand | Bin 0 -> 4096 bytes
1 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 foo.rand
So far so good. Now let's rewrite it.
$ dd if=/dev/urandom of=foo.rand bs=1k count=4
$ git commit -a -m two
[master 43961bc] two
1 files changed, 15 insertions(+), 19 deletions(-)
rewrite foo.rand (100%)
Now that seems wrong. What about doing another diff:
$ git show --oneline --stat --summary
43961bc two
foo.rand | Bin 4096 -> 4096 bytes
1 files changed, 0 insertions(+), 0 deletions(-)
That's right. Now let's turn on break detection:
$ git show --oneline --stat --summary -B
43961bc two
foo.rand | 34 +++++++++++++++-------------------
1 files changed, 15 insertions(+), 19 deletions(-)
rewrite foo.rand (100%)
Broken again. So I guess we have some problem with making sure we treat
broken filepairs as binary.
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Commit/merge messages for binary files
2011-02-19 7:06 ` Jeff King
@ 2011-02-19 8:03 ` Jeff King
2011-02-19 8:04 ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King
2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2011-02-19 8:03 UTC (permalink / raw
To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git
On Sat, Feb 19, 2011 at 02:06:01AM -0500, Jeff King wrote:
> Now that seems wrong. What about doing another diff:
>
> $ git show --oneline --stat --summary
> 43961bc two
> foo.rand | Bin 4096 -> 4096 bytes
> 1 files changed, 0 insertions(+), 0 deletions(-)
>
> That's right. Now let's turn on break detection:
>
> $ git show --oneline --stat --summary -B
> 43961bc two
> foo.rand | 34 +++++++++++++++-------------------
> 1 files changed, 15 insertions(+), 19 deletions(-)
> rewrite foo.rand (100%)
>
> Broken again. So I guess we have some problem with making sure we treat
> broken filepairs as binary.
Yep, builtin_diffstat just didn't handle that case. Two patch series to
follow:
[1/2]: diff: handle diffstat of rewritten binary files
[2/2]: diff: don't retrieve binary blobs for diffstat
The first one fixes the bug, and the second is a convenient optimization
I noticed while there.
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] diff: handle diffstat of rewritten binary files
2011-02-19 8:03 ` Jeff King
@ 2011-02-19 8:04 ` Jeff King
2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
1 sibling, 0 replies; 12+ messages in thread
From: Jeff King @ 2011-02-19 8:04 UTC (permalink / raw
To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git
The logic in builtin_diffstat assumes that a
complete_rewrite pair should have its lines counted. This is
nonsensical for binary files and leads to confusing things
like:
$ git diff --stat --summary HEAD^ HEAD
foo.rand | Bin 4096 -> 4096 bytes
1 files changed, 0 insertions(+), 0 deletions(-)
$ git diff --stat --summary -B HEAD^ HEAD
foo.rand | 34 +++++++++++++++-------------------
1 files changed, 15 insertions(+), 19 deletions(-)
rewrite foo.rand (100%)
So let's reorder the function to handle binary files first
(which from diffstat's perspective look like complete
rewrites anyway), then rewrites, then actual diffstats.
There are two bonus prizes to this reorder:
1. It gets rid of a now-superfluous goto.
2. The binary case is at the top, which means we can
further optimize it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
---
diff.c | 24 ++++++++++++++----------
1 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/diff.c b/diff.c
index 5422c43..2ac0fe9 100644
--- a/diff.c
+++ b/diff.c
@@ -2079,25 +2079,30 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
data->is_unmerged = 1;
return;
}
- if (complete_rewrite) {
+
+ if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
+ if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
+ die("unable to read files to diff");
+ data->is_binary = 1;
+ data->added = mf2.size;
+ data->deleted = mf1.size;
+ }
+
+ else if (complete_rewrite) {
diff_populate_filespec(one, 0);
diff_populate_filespec(two, 0);
data->deleted = count_lines(one->data, one->size);
data->added = count_lines(two->data, two->size);
- goto free_and_return;
}
- if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
- die("unable to read files to diff");
- if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
- data->is_binary = 1;
- data->added = mf2.size;
- data->deleted = mf1.size;
- } else {
+ else {
/* Crazy xdl interfaces.. */
xpparam_t xpp;
xdemitconf_t xecfg;
+ if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
+ die("unable to read files to diff");
+
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
xpp.flags = o->xdl_opts;
@@ -2105,7 +2110,6 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
&xpp, &xecfg);
}
- free_and_return:
diff_free_filespec_data(one);
diff_free_filespec_data(two);
}
--
1.7.4.1.26.g3372c
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
2011-02-19 8:03 ` Jeff King
2011-02-19 8:04 ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King
@ 2011-02-19 8:16 ` Jeff King
2011-02-21 23:33 ` Junio C Hamano
1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-02-19 8:16 UTC (permalink / raw
To: Piotr Krukowiecki; +Cc: Junio C Hamano, Matthieu Moy, git
We only need the size, which is much cheaper to get,
especially if it is a big binary file.
Signed-off-by: Jeff King <peff@peff.net>
---
This of course is only really helpful if you have marked the files as
binary via gitattributes, since otherwise we have to pull in the blob to
find out that it's binary. :)
But in my real-world photo/video repo, which has media files marked via
gitattributes as binary (but to textconv exif tags, of course). The
commit in question has 26 files totalling 88 megabytes.
$ time git show --stat >old
real 0m0.428s
user 0m0.392s
sys 0m0.032s
$ time git.jk.diffstat-binary show --stat >new
real 0m0.005s
user 0m0.004s
sys 0m0.000s
$ cmp old new && echo ok
ok
8500% speedup isn't too bad. :)
diff.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/diff.c b/diff.c
index 2ac0fe9..6640857 100644
--- a/diff.c
+++ b/diff.c
@@ -245,6 +245,15 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
return 0;
}
+/* like fill_mmfile, but only for size, so we can avoid retrieving blob */
+static unsigned long diff_filespec_size(struct diff_filespec *one)
+{
+ if (!DIFF_FILE_VALID(one))
+ return 0;
+ diff_populate_filespec(one, 1);
+ return one->size;
+}
+
static int count_trailing_blank(mmfile_t *mf, unsigned ws_rule)
{
char *ptr = mf->ptr;
@@ -2081,11 +2090,9 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
}
if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
- if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
- die("unable to read files to diff");
data->is_binary = 1;
- data->added = mf2.size;
- data->deleted = mf1.size;
+ data->added = diff_filespec_size(two);
+ data->deleted = diff_filespec_size(one);
}
else if (complete_rewrite) {
--
1.7.4.1.26.g3372c
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
@ 2011-02-21 23:33 ` Junio C Hamano
2011-02-22 15:37 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-02-21 23:33 UTC (permalink / raw
To: Jeff King; +Cc: Piotr Krukowiecki, Junio C Hamano, Matthieu Moy, git
Jeff King <peff@peff.net> writes:
> We only need the size, which is much cheaper to get,
> especially if it is a big binary file.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
Nice ;-) Do we want a test or two for 1/2 by the way?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
2011-02-21 23:33 ` Junio C Hamano
@ 2011-02-22 15:37 ` Jeff King
2011-02-22 19:00 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2011-02-22 15:37 UTC (permalink / raw
To: Junio C Hamano; +Cc: Piotr Krukowiecki, Matthieu Moy, git
On Mon, Feb 21, 2011 at 03:33:05PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > We only need the size, which is much cheaper to get,
> > especially if it is a big binary file.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
>
> Nice ;-) Do we want a test or two for 1/2 by the way?
Yeah, it's probably a good idea. Can you squash this in, or should I
resend?
---
diff --git a/t/t4031-diff-rewrite-binary.sh b/t/t4031-diff-rewrite-binary.sh
index 7e7b307..7d7470f 100755
--- a/t/t4031-diff-rewrite-binary.sh
+++ b/t/t4031-diff-rewrite-binary.sh
@@ -44,6 +44,13 @@ test_expect_success 'rewrite diff can show binary patch' '
grep "GIT binary patch" diff
'
+test_expect_success 'rewrite diff --stat shows binary changes' '
+ git diff -B --stat --summary >diff &&
+ grep "Bin" diff &&
+ grep "0 insertions.*0 deletions" diff &&
+ grep " rewrite file" diff
+'
+
{
echo "#!$SHELL_PATH"
cat <<'EOF'
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] diff: don't retrieve binary blobs for diffstat
2011-02-22 15:37 ` Jeff King
@ 2011-02-22 19:00 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-02-22 19:00 UTC (permalink / raw
To: Jeff King; +Cc: Junio C Hamano, Piotr Krukowiecki, Matthieu Moy, git
Jeff King <peff@peff.net> writes:
>> Nice ;-) Do we want a test or two for 1/2 by the way?
>
> Yeah, it's probably a good idea. Can you squash this in,...
Done; thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-02-22 19:00 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17 20:40 Commit/merge messages for binary files Piotr Krukowiecki
2011-02-18 13:53 ` Matthieu Moy
2011-02-18 20:30 ` Piotr Krukowiecki
2011-02-18 21:19 ` Matthieu Moy
2011-02-18 21:42 ` Piotr Krukowiecki
2011-02-19 7:06 ` Jeff King
2011-02-19 8:03 ` Jeff King
2011-02-19 8:04 ` [PATCH 1/2] diff: handle diffstat of rewritten " Jeff King
2011-02-19 8:16 ` [PATCH 2/2] diff: don't retrieve binary blobs for diffstat Jeff King
2011-02-21 23:33 ` Junio C Hamano
2011-02-22 15:37 ` Jeff King
2011-02-22 19:00 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.