Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Tao Klerks <tao@klerks.biz>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: icase pathspec magic support in ls-tree
Date: Fri, 14 Oct 2022 10:31:06 +0200	[thread overview]
Message-ID: <CAPMMpogoX+R7eHkTZQKkx6HiS4ksk_sjryyuAufD_xuLfCVD+A@mail.gmail.com> (raw)
In-Reply-To: <20221014045140.7ibix3632w4uset5@tb-raspi4>

On Fri, Oct 14, 2022 at 6:51 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Thu, Oct 13, 2022 at 08:35:11AM +0200, Tao Klerks wrote:
>
> Did you ever consider to write a shell script,
> that can detect icase-collisions ?
>
> For example, we can use Linux:
>  git ls-files | tr 'A-Z' 'a-z' | sort | uniq -d ; echo $?
>  include/uapi/linux/netfilter_ipv4/ipt_ecn.h
>  include/uapi/linux/netfilter_ipv4/ipt_ttl.h
>  [snip the other files]
>
> The GNU versions of uniq allow an even shorter command,
> (But the POSIX versions don't)
>
> git ls-files  | sort | uniq -i -d
>
> I think that a script like this could do the trick:
>
> #!/bin/sh
> ret=1
> >/tmp/$$-exp
> git ls-files  | sort | uniq -i -d >/tmp/$$-act &&
>   cmp /tmp/$$-exp /tmp/$$-act &&
>     ret=0
>     rm -f /tmp/$$-exp /tmp/$$-act
>     exit $ret
>
>
> ####################
> The usage of files in /tmp is probably debatable,
> I want just illustrate how a combination of shell
> scripts in combination with existing commands can be used.
>
> The biggest step may be to introduce a server-side hook
> that does a check.
> But once that is done and working, you probably do
> not want to miss it.

Thanks for the proposal! Sorry I was a bit vague in my "I suspect I'll
have to do a full-tree duplicate-file-search on every ref update", but
your suggestion is almost exactly what I meant.

On my machine, on this repo, a full-tree case-insensitive duplicate
search costs me about 800ms for 100k files, or 1,800ms for 200k files:

git ls-tree --name-only -r $NEWHASH | sort | uniq -i -d

I need to use ls-tree rather than ls-files because this is indeed a
command to run in an update hook, and there is no working tree - no
(relevant) index, in a server-side update hook.

The 800ms for 100k files are composed of 200ms of ls-tree, 600ms of
sort, and about 10ms of uniq.

My intent with supporting icase pathspec magic was to do something like:

git --icase-pathspecs ls-tree --name-only -r $NEWHASH -- PATHS OF
ADDED FILES | sort | uniq -i -d

Which would be near-instantaneous in the vast majority of cases (and
I'd have some file count limit past which I would fall back to doing
the full tree, to avoid excessive command lengths). Unfortunately,
"--icase-pathspecs" is not supported in ls-tree, hence this thread :)

But yes - ultimately, paying that "full dupe search" per-update server
hook processing time cost has seemed like the only sensible way of
doing this - until I thought about Elijah's suggestion a little harder
that is!

More in the next part of the thread.

  reply	other threads:[~2022-10-14  8:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30 12:04 icase pathspec magic support in ls-tree Tao Klerks
2022-09-30 13:53 ` Ævar Arnfjörð Bjarmason
2022-10-02 19:07   ` brian m. carlson
2022-10-13  6:35     ` Tao Klerks
2022-10-14  4:51       ` Torsten Bögershausen
2022-10-14  8:31         ` Tao Klerks [this message]
2022-10-14  8:37           ` Erik Cervin Edin
2022-10-14  7:41       ` Elijah Newren
2022-10-14  8:03         ` Erik Cervin Edin
2022-10-14  8:57           ` Tao Klerks
2022-10-14  8:48         ` Tao Klerks
2022-10-14  9:07           ` Tao Klerks
2022-10-14 12:00             ` Erik Cervin Edin
2022-10-14 17:06           ` Elijah Newren
2022-10-15 22:06             ` Tao Klerks
2022-10-17 15:46               ` Tao Klerks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPMMpogoX+R7eHkTZQKkx6HiS4ksk_sjryyuAufD_xuLfCVD+A@mail.gmail.com \
    --to=tao@klerks.biz \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).