Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* smudge filters do not round trip through `git diff` / `git apply`
@ 2022-11-01 20:55 Anthony Sottile
  0 siblings, 0 replies; only message in thread
From: Anthony Sottile @ 2022-11-01 20:55 UTC (permalink / raw
  To: Git Mailing List

this is boiled down from a larger problem outlined here:
https://github.com/pre-commit/pre-commit/issues/776

I've had some time to sit down and poke at this today -- here's my
minimal reproduction using just `git` and `git-crypt` -- though my
users have reported this also seems to affect other smudge filters.

I understand `git-crypt` is not associated with the git project,
however it was an easy, readily-available smudge filter to demonstrate
the problem.  I'm using git-crypt 0.6.0 (from ubuntu 22.04).

(the key material below is not sensitive -- I generated it afresh in a
docker container)

```bash
#!/usr/bin/env bash
set -euxo pipefail

rm -rf repo

git --version
git init --quiet repo -b main
cd repo
git commit --allow-empty -m 'Initial empty commit'

# determinstic git-crypt key so the output is stable
base64 -d > keyfile <<EOF
AEdJVENSWVBUS0VZAAAAAgAAAAAAAAABAAAABAAAAAAAAAADAAAAIIBi0O4iuCHghpYj4Teb6F72
KjTRHePBTf/6XC6fiVqvAAAABQAAAECTwWTDHfx0/Ytw3IZrVhonb5IPTr7kio27u0prnb8X25ui
9k4UqrdRQy8ZtBERv6wnHwC8A6q7CamRZ22L4q7UAAAAAA==
EOF
git-crypt unlock keyfile

echo 'f filter=git-crypt diff=git-crypt' > .gitattributes
git add .gitattributes
echo 'hello world' > f
git add f
rm f && touch f

tree="$(git write-tree)"
! git diff-index \
    --ignore-submodules \
    --binary \
    --exit-code \
    --no-color \
    --no-ext-diff \
    --no-textconv \
    "$tree" -- > patch

git checkout -- .
git apply patch || (echo FAILED && cat patch && exit 1)
```

here's my output:

```console
$ bash t.sh
+ rm -rf repo
+ git --version
git version 2.38.1.381.gc03801e19c
+ git init --quiet repo -b main
+ cd repo
+ git commit --allow-empty -m 'Initial empty commit'
[main (root-commit) 97d9520] Initial empty commit
+ base64 -d
+ git-crypt unlock keyfile
+ echo 'f filter=git-crypt diff=git-crypt'
+ git add .gitattributes
+ echo 'hello world'
+ git add f
+ rm f
+ touch f
++ git write-tree
+ tree=beca08f8b3c0774060f3e28e081ac69a80a1a10d
+ git diff-index --ignore-submodules --binary --exit-code --no-color
--no-ext-diff --no-textconv beca08f8b3c0774060f3e28e081ac69a80a1a10d
--
+ git checkout -- .
+ git apply patch
error: binary patch to 'f' creates incorrect result (expecting
2f89279ce748725a41cec60d5025b22efc863b42, got
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391)
error: f: patch does not apply
+ echo FAILED
FAILED
+ cat patch
diff --git a/f b/f
index ee7d1b67cd31482ae9bc772a0b2d016c81e1c613..2f89279ce748725a41cec60d5025b22efc863b42
100644
GIT binary patch
literal 0
HcmV?d00001

literal 34
qcmZQ@_Y83kiVO&0IB9<_f2(d}=e+;CXFaaIzZ@c>%yE8!<X-^jJPz&v

+ exit 1
```

I traced through the execution and it appears that smudge filters are
maybe still running despite the `--no-textconv` setting which may
explain this?

```
+ GIT_TRACE=2
+ git diff-index --ignore-submodules --binary --exit-code --no-color
--no-ext-diff --no-textconv beca08f8b3c0774060f3e28e081ac69a80a1a10d
--
20:40:21.551771 git.c:455               trace: built-in: git
diff-index --ignore-submodules --binary --exit-code --no-color
--no-ext-diff --no-textconv beca08f8b3c0774060f3e28e081ac69a80a1a10d
--
20:40:21.552190 run-command.c:668       trace: run_command: '"git-crypt" clean'
20:40:21.555249 git.c:455               trace: built-in: git rev-parse --git-dir
```

as shown in the output I'm using the current primary branch revision
of git -- though I usually use 2.34.1 (ubuntu 22.04)

oddly enough, using `--textconv` instead of `--no-textconv` "fixes" --
but is unsatisfactory for my use case (I don't want to rely on the
state of filters installed, etc.)

the error message seems to occur due to the comparison of the hash in
the `index a...b` line above the patch hunk

`e69de29bb2d1d6434b8b29ae775ad8c2e48c5391` is the hash of an empty file:

```console
$ git hash-object -w /dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
```

`2f89279ce748725a41cec60d5025b22efc863b42` appears to be the hash of
the smudged empty file (I used base64 here since it's binary nonsense
-- I got the contents of this blob by committing the empty file and
then fished the object out of the git database instead of trying to do
the patch dance):

```console
$ base64 -d <<< 'AEdJVENSWVBUAGoDwWO5GXWQ4B1kIQ==' | git hash-object
-w /dev/stdin
2f89279ce748725a41cec60d5025b22efc863b42
```

I *believe* the fix here is to avoid smudging in `git diff
--no-textconv` -- I started a patch where I added a `HASH_NO_TEXTCONV`
flag to `cache.h` but wasn't super sure on where to go from there and
decided I should ask first whether this is the right approach to take!

anthony

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-11-01 20:56 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-01 20:55 smudge filters do not round trip through `git diff` / `git apply` Anthony Sottile

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).