Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* git grep -E doesn't accept \b word boundaries?
@ 2023-05-03 19:04 Kevin Ushey
  2023-05-03 19:35 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Ushey @ 2023-05-03 19:04 UTC (permalink / raw)
  To: git

Hello,

I'm seeing the following, which I believe is unexpected. I have a file
with contents:

$ cat hello.txt
WholeWord
Whole Word
Whole

I can use `git grep` to search with word boundaries; e.g.

$ git grep --untracked '\bWhole\b'
hello.txt:Whole Word
hello.txt:Whole

However, if I add `-E` to use extended regular expressions, the same
invocation finds no search results.

$ git grep --untracked -E '\bWhole\b'

This does seem to work as expected with the '-w' flag, e.g.

$ git grep --untracked -E -w 'Whole'
hello.txt:Whole Word
hello.txt:Whole

as well as with POSIX word boundaries, e.g.

$ git grep --untracked -E '[[:<:]]Whole[[:>:]]'
hello.txt:Whole Word
hello.txt:Whole

Is this a bug, or am I misunderstanding some behavior in `git grep`?
For posterity:

$ git grep --untracked -G '\bWhole\b'
hello.txt:Whole Word
hello.txt:Whole

$ git grep --untracked -E '\bWhole\b'

$ git grep --untracked -P '\bWhole\b'
hello.txt:Whole Word
hello.txt:Whole

For what it's worth, I don't see this issue with an older version of
`git` on an Ubuntu 22.04 VM:

root@96722b73f316:~/test# git --version
git version 2.34.1
root@96722b73f316:~/test# git grep --untracked -E '\bWhole\b'
hello.txt:Whole Word
hello.txt:Whole

Thanks,
Kevin

------

[System Info]
git version:
git version 2.40.1
cpu: arm64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28
PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 arm64
compiler info: clang: 14.0.3 (clang-1403.0.22.14.1)
libc info: no libc information available
$SHELL (typically, interactive shell): /opt/homebrew/bin/bash

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git grep -E doesn't accept \b word boundaries?
  2023-05-03 19:04 git grep -E doesn't accept \b word boundaries? Kevin Ushey
@ 2023-05-03 19:35 ` Junio C Hamano
  2023-05-03 20:32   ` Kevin Ushey
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2023-05-03 19:35 UTC (permalink / raw)
  To: Kevin Ushey; +Cc: git

Kevin Ushey <kevinushey@gmail.com> writes:

> I'm seeing the following, which I believe is unexpected. I have a file
> with contents:
>
> $ cat hello.txt
> WholeWord
> Whole Word
> Whole
>
> I can use `git grep` to search with word boundaries; e.g.
>
> $ git grep --untracked '\bWhole\b'
> hello.txt:Whole Word
> hello.txt:Whole
>
> However, if I add `-E` to use extended regular expressions, the same
> invocation finds no search results.
>
> $ git grep --untracked -E '\bWhole\b'

Does not seem to reproduce for me.  In a randomly picked repository
(the source to git itself), I did

    $ cat >hello.txt
    WholeWord
    Whole Word
    Whole
    ^D

and "git grep --untracked -E '\bWhole\b' hello.txt" with or without
the "-E" option shows the same two lines as hits.

Without the pathspec hello.txt, the output includes one line from
unpack-trees.c as well, but the hits from the untracked hello.txt
are the same.

The tip of 'master', v2.40.0, v2.38.4, v2.37.4, v2.35.4 (they are by
no means significant milestones---just some random versions I picked
to test) all behave the same way.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git grep -E doesn't accept \b word boundaries?
  2023-05-03 19:35 ` Junio C Hamano
@ 2023-05-03 20:32   ` Kevin Ushey
  2023-05-03 20:45     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Ushey @ 2023-05-03 20:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Thanks for the quick response! I wonder if this issue could be macOS-specific?

I just tried building git from sources, and I was able to reproduce
the issue with 2.39.3:

$ ./git --version
git version 2.39.3
$ ./git grep -E '\bupdate\b'

But everything works okay for me with 2.38.5:

$ ./git --version
git version 2.38.5
kevin@MBP-P2MQ:~/projects/git [(HEAD detached at v2.38.5)]
$ ./git grep -E '\bupdate\b'
.github/workflows/l10n.yml:          sudo apt-get update -q &&
.gitignore:/git-update-index
.gitignore:/git-update-ref
< ... etc ...>

I see this bit in the release notes, which seems potentially related:

https://github.com/git/git/blob/69c786637d7a7fe3b2b8f7d989af095f5f49c3a8/Documentation/RelNotes/2.39.0.txt#L64-L65

And indeed, I can't reproduce the issue if I compile git 2.39.3 with
'make NO_REGEX=1'. So, perhaps a difference between git's compat regex
library and the one provided by macOS?

Thanks,
Kevin

On Wed, May 3, 2023 at 12:35 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Kevin Ushey <kevinushey@gmail.com> writes:
>
> > I'm seeing the following, which I believe is unexpected. I have a file
> > with contents:
> >
> > $ cat hello.txt
> > WholeWord
> > Whole Word
> > Whole
> >
> > I can use `git grep` to search with word boundaries; e.g.
> >
> > $ git grep --untracked '\bWhole\b'
> > hello.txt:Whole Word
> > hello.txt:Whole
> >
> > However, if I add `-E` to use extended regular expressions, the same
> > invocation finds no search results.
> >
> > $ git grep --untracked -E '\bWhole\b'
>
> Does not seem to reproduce for me.  In a randomly picked repository
> (the source to git itself), I did
>
>     $ cat >hello.txt
>     WholeWord
>     Whole Word
>     Whole
>     ^D
>
> and "git grep --untracked -E '\bWhole\b' hello.txt" with or without
> the "-E" option shows the same two lines as hits.
>
> Without the pathspec hello.txt, the output includes one line from
> unpack-trees.c as well, but the hits from the untracked hello.txt
> are the same.
>
> The tip of 'master', v2.40.0, v2.38.4, v2.37.4, v2.35.4 (they are by
> no means significant milestones---just some random versions I picked
> to test) all behave the same way.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git grep -E doesn't accept \b word boundaries?
  2023-05-03 20:32   ` Kevin Ushey
@ 2023-05-03 20:45     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2023-05-03 20:45 UTC (permalink / raw)
  To: Kevin Ushey; +Cc: git

Kevin Ushey <kevinushey@gmail.com> writes:

> Thanks for the quick response! I wonder if this issue could be macOS-specific?

Ah, yes, I somehow thought you mentioned Ubuntu and totally blocked
that macOS issue out of my mind, but I do recall that it has been
reported that build with macOS native regexp library is broken a few
times recently on this list.

    https://lore.kernel.org/git/?q=macOS+regexp

finds this thread, which unfortunately was mistitled to make them
sound as if they were about "-P", but the issue in the thread was
about extended regexp.

  https://lore.kernel.org/git/03fd7ddb-8241-1a0a-3e82-d8083e4ce0f7@web.de/


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-03 20:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-03 19:04 git grep -E doesn't accept \b word boundaries? Kevin Ushey
2023-05-03 19:35 ` Junio C Hamano
2023-05-03 20:32   ` Kevin Ushey
2023-05-03 20:45     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).