All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems
@ 2024-02-13 20:52 Jeff Hostetler via GitGitGadget
  2024-02-13 20:52 ` [PATCH 01/12] sparse-index: pass string length to index_file_exists() Jeff Hostetler via GitGitGadget
                   ` (12 more replies)
  0 siblings, 13 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler

Fix FSMonitor client code to detect case-incorrect FSEvents and map them to
the canonical case expected by the index.

FSEvents are delivered to the FSMonitor daemon using the observed case which
may or may not match the expected case stored in the index for tracked files
and/or directories. This caused index_name_pos() to report a negative index
position (defined as the suggested insertion point). Since the value was
negative, the FSMonitor refresh lookup would not invalidate the
CE_FSMONITOR_VALID bit on the "expected" (case-insensitive-equivalent)
cache-entries. Therefore, git status would not report them as modified.

This was a fairly obscure problem and only happened when the case of a
sub-directory or a file was artificially changed.

This first runs the original lookup using the observed case. If that fails,
it assumes that the observed pathname refers to a file and uses the
case-insensitive name-hash hashmap to find an equivalent path (cache-entry)
in the index. If that fails, it assumes the pathname refers to a directory
and uses the case-insensitive dir-name-hash to find the equivalent directory
and then repeats the index_name_pos() lookup to find a directory or
suggested insertion point with the expected case.

Two new test cases were added to t7527 to demonstrate this.

Since this was rather obscure, I also added some additional tracing under
the GIT_TRACE_FSMONITOR key.

I also did considerable refactoring of the original code before adding the
new lookups.

Finally, I made more explicit the relationship between the FSEvents and the
(new) sparse-index directory cache-entries, since sparse-index was added
slightly after the FSMonitor feature.

Jeff Hostetler (12):
  sparse-index: pass string length to index_file_exists()
  name-hash: add index_dir_exists2()
  t7527: add case-insensitve test for FSMonitor
  fsmonitor: refactor refresh callback on directory events
  fsmonitor: refactor refresh callback for non-directory events
  fsmonitor: clarify handling of directory events in callback
  fsmonitor: refactor untracked-cache invalidation
  fsmonitor: support case-insensitive directory events
  fsmonitor: refactor non-directory callback
  fsmonitor: support case-insensitive non-directory events
  fsmonitor: refactor bit invalidation in refresh callback
  t7527: update case-insenstive fsmonitor test

 fsmonitor.c                  | 338 +++++++++++++++++++++++++++++------
 name-hash.c                  |  16 ++
 name-hash.h                  |   2 +
 sparse-index.c               |   4 +-
 t/t7527-builtin-fsmonitor.sh | 220 +++++++++++++++++++++++
 5 files changed, 522 insertions(+), 58 deletions(-)


base-commit: 3526e67d917bcd03f317a058208fa02737654637
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1662%2Fjeffhostetler%2Ffsmonitor-ignore-case-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1662/jeffhostetler/fsmonitor-ignore-case-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1662
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/12] sparse-index: pass string length to index_file_exists()
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-13 22:07   ` Junio C Hamano
  2024-02-13 20:52 ` [PATCH 02/12] name-hash: add index_dir_exists2() Jeff Hostetler via GitGitGadget
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

The call to index_file_exists() in the loop in expand_to_path() passes
the wrong string length.  Let's fix that.

The loop in expand_to_path() searches the name-hash for each
sub-directory prefix in the provided pathname. That is, by searching
for "dir1/" then "dir1/dir2/" then "dir1/dir2/dir3/" and so on until
it finds a cache-entry representing a sparse directory.

The code creates "strbuf path_mutable" to contain the working pathname
and modifies the buffer in-place by temporarily replacing the character
following each successive "/" with NUL for the duration of the call to
index_file_exists().

It does not update the strbuf.len during this substitution.

Pass the patched length of the prefix path instead.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 sparse-index.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index 3578feb2837..e48e40cae71 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -579,8 +579,9 @@ void expand_to_path(struct index_state *istate,
 		replace++;
 		temp = *replace;
 		*replace = '\0';
+		substr_len = replace - path_mutable.buf;
 		if (index_file_exists(istate, path_mutable.buf,
-				      path_mutable.len, icase)) {
+				      substr_len, icase)) {
 			/*
 			 * We found a parent directory in the name-hash
 			 * hashtable, because only sparse directory entries
@@ -593,7 +594,6 @@ void expand_to_path(struct index_state *istate,
 		}
 
 		*replace = temp;
-		substr_len = replace - path_mutable.buf;
 	}
 
 cleanup:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 02/12] name-hash: add index_dir_exists2()
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  2024-02-13 20:52 ` [PATCH 01/12] sparse-index: pass string length to index_file_exists() Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-13 21:43   ` Junio C Hamano
  2024-02-15  9:31   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 03/12] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Create a new version of index_dir_exists() to return the canonical
spelling of the matched directory prefix.

The existing index_dir_exists() returns a boolean to indicate if
there is a case-insensitive match in the directory name-hash, but
it doesn't tell the caller the exact spelling of that match.

The new version also copies the matched spelling to a provided strbuf.
This lets the caller, for example, then call index_name_pos() with the
correct case to search the cache-entry array for the real insertion
position.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 name-hash.c | 16 ++++++++++++++++
 name-hash.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/name-hash.c b/name-hash.c
index 251f036eef6..d735c81acb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	dir = find_dir_entry(istate, name, namelen);
 	return dir && dir->nr;
 }
+int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
+		      struct strbuf *canonical_path)
+{
+	struct dir_entry *dir;
+
+	strbuf_init(canonical_path, namelen+1);
+
+	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, 0);
+	dir = find_dir_entry(istate, name, namelen);
+
+	if (dir && dir->nr)
+		strbuf_add(canonical_path, dir->name, dir->namelen);
+
+	return dir && dir->nr;
+}
 
 void adjust_dirname_case(struct index_state *istate, char *name)
 {
diff --git a/name-hash.h b/name-hash.h
index b1b4b0fb337..2fcac5c4870 100644
--- a/name-hash.h
+++ b/name-hash.h
@@ -5,6 +5,8 @@ struct cache_entry;
 struct index_state;
 
 int index_dir_exists(struct index_state *istate, const char *name, int namelen);
+int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
+		      struct strbuf *canonical_path);
 void adjust_dirname_case(struct index_state *istate, char *name);
 struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 03/12] t7527: add case-insensitve test for FSMonitor
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  2024-02-13 20:52 ` [PATCH 01/12] sparse-index: pass string length to index_file_exists() Jeff Hostetler via GitGitGadget
  2024-02-13 20:52 ` [PATCH 02/12] name-hash: add index_dir_exists2() Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-13 20:52 ` [PATCH 04/12] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

The FSMonitor client code trusts the spelling of the pathnames in the
FSEvents received from the FSMonitor daemon.  On case-insensitive file
systems, these OBSERVED pathnames may be spelled differently than the
EXPECTED pathnames listed in the .git/index.  This causes a miss when
using `index_name_pos()` which expects the given case to be correct.

When this happens, the FSMonitor client code does not update the state
of the CE_FSMONITOR_VALID bit when refreshing the index (and before
starting to scan the worktree).

This results in modified files NOT being reported by `git status` when
there is a discrepancy in the case-spelling of a tracked file's
pathname.

This commit contains a (rather contrived) test case to demonstrate
this.  A later commit in this series will update the FSMonitor client
code to recognize these discrepancies and update the CE_ bit accordingly.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t7527-builtin-fsmonitor.sh | 217 +++++++++++++++++++++++++++++++++++
 1 file changed, 217 insertions(+)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 78503158fd6..5cd68b2ea82 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1037,4 +1037,221 @@ test_expect_success 'split-index and FSMonitor work well together' '
 	)
 '
 
+# The FSMonitor daemon reports the OBSERVED pathname of modified files
+# and thus contains the OBSERVED spelling on case-insensitive file
+# systems.  The daemon does not (and should not) load the .git/index
+# file and therefore does not know the expected case-spelling.  Since
+# it is possible for the user to create files/subdirectories with the
+# incorrect case, a modified file event for a tracked will not have
+# the EXPECTED case. This can cause `index_name_pos()` to incorrectly
+# report that the file is untracked. This causes the client to fail to
+# mark the file as possibly dirty (keeping the CE_FSMONITOR_VALID bit
+# set) so that `git status` will avoid inspecting it and thus not
+# present in the status output.
+#
+# The setup is a little contrived.
+#
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
+	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
+
+	git init subdir_case_wrong &&
+	(
+		cd subdir_case_wrong &&
+		echo x >AAA &&
+		echo x >BBB &&
+
+		mkdir dir1 &&
+		echo x >dir1/file1 &&
+		mkdir dir1/dir2 &&
+		echo x >dir1/dir2/file2 &&
+		mkdir dir1/dir2/dir3 &&
+		echo x >dir1/dir2/dir3/file3 &&
+
+		echo x >yyy &&
+		echo x >zzz &&
+		git add . &&
+		git commit -m "data" &&
+
+		# This will cause "dir1/" and everything under it
+		# to be deleted.
+		git sparse-checkout set --cone --sparse-index &&
+
+		# Create dir2 with the wrong case and then let Git
+		# repopulate dir3 -- it will not correct the spelling
+		# of dir2.
+		mkdir dir1 &&
+		mkdir dir1/DIR2 &&
+		git sparse-checkout add dir1/dir2/dir3
+	) &&
+
+	start_daemon -C subdir_case_wrong --tf "$PWD/subdir_case_wrong.trace" &&
+
+	# Enable FSMonitor in the client. Run enough commands for
+	# the .git/index to sync up with the daemon with everything
+	# marked clean.
+	git -C subdir_case_wrong config core.fsmonitor true &&
+	git -C subdir_case_wrong update-index --fsmonitor &&
+	git -C subdir_case_wrong status &&
+
+	# Make some files dirty so that FSMonitor gets FSEvents for
+	# each of them.
+	echo xx >>subdir_case_wrong/AAA &&
+	echo xx >>subdir_case_wrong/dir1/DIR2/dir3/file3 &&
+	echo xx >>subdir_case_wrong/zzz &&
+
+	GIT_TRACE_FSMONITOR="$PWD/subdir_case_wrong.log" \
+		git -C subdir_case_wrong --no-optional-locks status --short \
+			>"$PWD/subdir_case_wrong.out" &&
+
+	# "git status" should have gotten file events for each of
+	# the 3 files.
+	#
+	# "dir2" should be in the observed case on disk.
+	grep "fsmonitor_refresh_callback" \
+		<"$PWD/subdir_case_wrong.log" \
+		>"$PWD/subdir_case_wrong.log1" &&
+
+	grep -q "AAA.*pos 0" "$PWD/subdir_case_wrong.log1" &&
+	grep -q "zzz.*pos 6" "$PWD/subdir_case_wrong.log1" &&
+
+	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
+
+	# The refresh-callbacks should have caused "git status" to clear
+	# the CE_FSMONITOR_VALID bit on each of those files and caused
+	# the worktree scan to visit them and mark them as modified.
+	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
+	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
+
+	# However, with the fsmonitor client bug, the "(pos -3)" causes
+	# the client to not update the bit and never rescan the file
+	# and therefore not report it as dirty.
+	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
+'
+
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
+	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
+
+	git init file_case_wrong &&
+	(
+		cd file_case_wrong &&
+		echo x >AAA &&
+		echo x >BBB &&
+
+		mkdir dir1 &&
+		mkdir dir1/dir2 &&
+		mkdir dir1/dir2/dir3 &&
+		echo x >dir1/dir2/dir3/FILE-3-B &&
+		echo x >dir1/dir2/dir3/XXXX-3-X &&
+		echo x >dir1/dir2/dir3/file-3-a &&
+		echo x >dir1/dir2/dir3/yyyy-3-y &&
+		mkdir dir1/dir2/dir4 &&
+		echo x >dir1/dir2/dir4/FILE-4-A &&
+		echo x >dir1/dir2/dir4/XXXX-4-X &&
+		echo x >dir1/dir2/dir4/file-4-b &&
+		echo x >dir1/dir2/dir4/yyyy-4-y &&
+
+		echo x >yyy &&
+		echo x >zzz &&
+		git add . &&
+		git commit -m "data"
+	) &&
+
+	start_daemon -C file_case_wrong --tf "$PWD/file_case_wrong.trace" &&
+
+	# Enable FSMonitor in the client. Run enough commands for
+	# the .git/index to sync up with the daemon with everything
+	# marked clean.
+	git -C file_case_wrong config core.fsmonitor true &&
+	git -C file_case_wrong update-index --fsmonitor &&
+	git -C file_case_wrong status &&
+
+	# Make some files dirty so that FSMonitor gets FSEvents for
+	# each of them.
+	echo xx >>file_case_wrong/AAA &&
+	echo xx >>file_case_wrong/zzz &&
+
+	# Rename some files so that FSMonitor sees a create and delete
+	# FSEvent for each.  (A simple "mv foo FOO" is not portable
+	# between macOS and Windows. It works on both platforms, but makes
+	# the test messy, since (1) one platform updates "ctime" on the
+	# moved file and one does not and (2) it causes a directory event
+	# on one platform and not on the other which causes additional
+	# scanning during "git status" which causes a "H" vs "h" discrepancy
+	# in "git ls-files -f".)  So old-school it and move it out of the
+	# way and copy it to the case-incorrect name so that we get fresh
+	# "ctime" and "mtime" values.
+
+	mv file_case_wrong/dir1/dir2/dir3/file-3-a file_case_wrong/dir1/dir2/dir3/ORIG &&
+	cp file_case_wrong/dir1/dir2/dir3/ORIG     file_case_wrong/dir1/dir2/dir3/FILE-3-A &&
+	rm file_case_wrong/dir1/dir2/dir3/ORIG &&
+	mv file_case_wrong/dir1/dir2/dir4/FILE-4-A file_case_wrong/dir1/dir2/dir4/ORIG &&
+	cp file_case_wrong/dir1/dir2/dir4/ORIG     file_case_wrong/dir1/dir2/dir4/file-4-a &&
+	rm file_case_wrong/dir1/dir2/dir4/ORIG &&
+
+	# Run status enough times to fully sync.
+	#
+	# The first instance should get the create and delete FSEvents
+	# for each pair.  Status should update the index with a new FSM
+	# token (so the next invocation will not see data for these
+	# events).
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try1.log" \
+		git -C file_case_wrong status --short \
+			>"$PWD/file_case_wrong-try1.out" &&
+	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-3-a.*pos 4"  "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*FILE-4-A.*pos 6"  "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try1.log" &&
+
+	# FSM refresh will have invalidated the FSM bit and cause a regular
+	# (real) scan of these tracked files, so they should have "H" status.
+	# (We will not see a "h" status until the next refresh (on the next
+	# command).)
+
+	git -C file_case_wrong ls-files -f >"$PWD/file_case_wrong-lsf1.out" &&
+	grep -q "H dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-lsf1.out" &&
+	grep -q "H dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-lsf1.out" &&
+
+
+	# Try the status again. We assume that the above status command
+	# advanced the token so that the next one will not see those events.
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try2.log" \
+		git -C file_case_wrong status --short \
+			>"$PWD/file_case_wrong-try2.out" &&
+	! grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*file-3-a.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*FILE-4-A.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*file-4-a.*pos" "$PWD/file_case_wrong-try2.log" &&
+
+	# FSM refresh saw nothing, so it will mark all files as valid,
+	# so they should now have "h" status.
+
+	git -C file_case_wrong ls-files -f >"$PWD/file_case_wrong-lsf2.out" &&
+	grep -q "h dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-lsf2.out" &&
+	grep -q "h dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-lsf2.out" &&
+
+
+	# We now have files with clean content, but with case-incorrect
+	# file names.  Modify them to see if status properly reports
+	# them.
+
+	echo xx >>file_case_wrong/dir1/dir2/dir3/FILE-3-A &&
+	echo xx >>file_case_wrong/dir1/dir2/dir4/file-4-a &&
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try3.log" \
+		git -C file_case_wrong --no-optional-locks status --short \
+			>"$PWD/file_case_wrong-try3.out" &&
+	# FSEvents are in observed case.
+	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
+
+	# Status should say these files are modified, but with the case
+	# bug, the "pos -3" cause the client to not update the FSM bit
+	# and never cause the file to be rescanned and therefore to not
+	# report it dirty.
+	! grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
+	! grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 04/12] fsmonitor: refactor refresh callback on directory events
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (2 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 03/12] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index f670c509378..b1ef01bf3cd 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static void fsmonitor_refresh_callback_slash(
+	struct index_state *istate, const char *name, int len, int pos)
+{
+	int i;
+
+	/*
+	 * The daemon can decorate directory events, such as
+	 * moves or renames, with a trailing slash if the OS
+	 * FS Event contains sufficient information, such as
+	 * MacOS.
+	 *
+	 * Use this to invalidate the entire cone under that
+	 * directory.
+	 *
+	 * We do not expect an exact match because the index
+	 * does not normally contain directory entries, so we
+	 * start at the insertion point and scan.
+	 */
+	if (pos < 0)
+		pos = -pos - 1;
+
+	/* Mark all entries for the folder invalid */
+	for (i = pos; i < istate->cache_nr; i++) {
+		if (!starts_with(istate->cache[i]->name, name))
+			break;
+		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+	}
+}
+
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
 	int i, len = strlen(name);
@@ -193,28 +222,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 			 name, pos);
 
 	if (name[len - 1] == '/') {
-		/*
-		 * The daemon can decorate directory events, such as
-		 * moves or renames, with a trailing slash if the OS
-		 * FS Event contains sufficient information, such as
-		 * MacOS.
-		 *
-		 * Use this to invalidate the entire cone under that
-		 * directory.
-		 *
-		 * We do not expect an exact match because the index
-		 * does not normally contain directory entries, so we
-		 * start at the insertion point and scan.
-		 */
-		if (pos < 0)
-			pos = -pos - 1;
-
-		/* Mark all entries for the folder invalid */
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		fsmonitor_refresh_callback_slash(istate, name, len, pos);
 
 		/*
 		 * We need to remove the traling "/" from the path
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (3 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 04/12] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-14  1:34   ` Junio C Hamano
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 06/12] fsmonitor: clarify handling of directory events in callback Jeff Hostetler via GitGitGadget
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 66 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index b1ef01bf3cd..614270fa5e8 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,42 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static void fsmonitor_refresh_callback_unqualified(
+	struct index_state *istate, const char *name, int len, int pos)
+{
+	int i;
+
+	if (pos >= 0) {
+		/*
+		 * We have an exact match for this path and can just
+		 * invalidate it.
+		 */
+		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+	} else {
+		/*
+		 * The path is not a tracked file -or- it is a
+		 * directory event on a platform that cannot
+		 * distinguish between file and directory events in
+		 * the event handler, such as Windows.
+		 *
+		 * Scan as if it is a directory and invalidate the
+		 * cone under it.  (But remember to ignore items
+		 * between "name" and "name/", such as "name-" and
+		 * "name.".
+		 */
+		pos = -pos - 1;
+
+		for (i = pos; i < istate->cache_nr; i++) {
+			if (!starts_with(istate->cache[i]->name, name))
+				break;
+			if ((unsigned char)istate->cache[i]->name[len] > '/')
+				break;
+			if (istate->cache[i]->name[len] == '/')
+				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		}
+	}
+}
+
 static void fsmonitor_refresh_callback_slash(
 	struct index_state *istate, const char *name, int len, int pos)
 {
@@ -214,7 +250,7 @@ static void fsmonitor_refresh_callback_slash(
 
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
-	int i, len = strlen(name);
+	int len = strlen(name);
 	int pos = index_name_pos(istate, name, len);
 
 	trace_printf_key(&trace_fsmonitor,
@@ -229,34 +265,8 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 		 * for the untracked cache.
 		 */
 		name[len - 1] = '\0';
-	} else if (pos >= 0) {
-		/*
-		 * We have an exact match for this path and can just
-		 * invalidate it.
-		 */
-		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
 	} else {
-		/*
-		 * The path is not a tracked file -or- it is a
-		 * directory event on a platform that cannot
-		 * distinguish between file and directory events in
-		 * the event handler, such as Windows.
-		 *
-		 * Scan as if it is a directory and invalidate the
-		 * cone under it.  (But remember to ignore items
-		 * between "name" and "name/", such as "name-" and
-		 * "name.".
-		 */
-		pos = -pos - 1;
-
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			if ((unsigned char)istate->cache[i]->name[len] > '/')
-				break;
-			if (istate->cache[i]->name[len] == '/')
-				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 06/12] fsmonitor: clarify handling of directory events in callback
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (4 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-14  7:47   ` Junio C Hamano
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 47 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 614270fa5e8..754fe20cfd0 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -219,24 +219,40 @@ static void fsmonitor_refresh_callback_unqualified(
 	}
 }
 
-static void fsmonitor_refresh_callback_slash(
+/*
+ * The daemon can decorate directory events, such as a move or rename,
+ * by adding a trailing slash to the observed name.  Use this to
+ * explicitly invalidate the entire cone under that directory.
+ *
+ * The daemon can only reliably do that if the OS FSEvent contains
+ * sufficient information in the event.
+ *
+ * macOS FSEvents have enough information.
+ *
+ * Other platforms may or may not be able to do it (and it might
+ * depend on the type of event (for example, a daemon could lstat() an
+ * observed pathname after a rename, but not after a delete)).
+ *
+ * If we find an exact match in the index for a path with a trailing
+ * slash, it means that we matched a sparse-index directory in a
+ * cone-mode sparse-checkout (since that's the only time we have
+ * directories in the index).  We should never see this in practice
+ * (because sparse directories should not be present and therefore
+ * not generating FS events).  Either way, we can treat them in the
+ * same way and just invalidate the cache-entry and the untracked
+ * cache (and in this case, the forward cache-entry scan won't find
+ * anything and it doesn't hurt to let it run).
+ *
+ * Return the number of cache-entries that we invalidated.  We will
+ * use this later to determine if we need to attempt a second
+ * case-insensitive search.
+ */
+static int fsmonitor_refresh_callback_slash(
 	struct index_state *istate, const char *name, int len, int pos)
 {
 	int i;
+	int nr_in_cone = 0;
 
-	/*
-	 * The daemon can decorate directory events, such as
-	 * moves or renames, with a trailing slash if the OS
-	 * FS Event contains sufficient information, such as
-	 * MacOS.
-	 *
-	 * Use this to invalidate the entire cone under that
-	 * directory.
-	 *
-	 * We do not expect an exact match because the index
-	 * does not normally contain directory entries, so we
-	 * start at the insertion point and scan.
-	 */
 	if (pos < 0)
 		pos = -pos - 1;
 
@@ -245,7 +261,10 @@ static void fsmonitor_refresh_callback_slash(
 		if (!starts_with(istate->cache[i]->name, name))
 			break;
 		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		nr_in_cone++;
 	}
+
+	return nr_in_cone;
 }
 
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (5 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 06/12] fsmonitor: clarify handling of directory events in callback Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-14 16:46   ` Junio C Hamano
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 08/12] fsmonitor: support case-insensitive directory events Jeff Hostetler via GitGitGadget
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 754fe20cfd0..14585b6c516 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,11 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+/*
+ * Invalidate the untracked cache for the given pathname.  Copy the
+ * buffer to a proper null-terminated string (since the untracked
+ * cache code does not use (buf, len) style argument).  Also strip any
+ * trailing slash.
+ */
+static void my_invalidate_untracked_cache(
+	struct index_state *istate, const char *name, int len)
+{
+	struct strbuf work_path = STRBUF_INIT;
+
+	if (!len)
+		return;
+
+	if (name[len-1] == '/')
+		len--;
+
+	strbuf_add(&work_path, name, len);
+	untracked_cache_invalidate_path(istate, work_path.buf, 0);
+	strbuf_release(&work_path);
+}
+
 static void fsmonitor_refresh_callback_unqualified(
 	struct index_state *istate, const char *name, int len, int pos)
 {
 	int i;
 
+	my_invalidate_untracked_cache(istate, name, len);
+
 	if (pos >= 0) {
 		/*
 		 * We have an exact match for this path and can just
@@ -253,6 +277,8 @@ static int fsmonitor_refresh_callback_slash(
 	int i;
 	int nr_in_cone = 0;
 
+	my_invalidate_untracked_cache(istate, name, len);
+
 	if (pos < 0)
 		pos = -pos - 1;
 
@@ -278,21 +304,9 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 
 	if (name[len - 1] == '/') {
 		fsmonitor_refresh_callback_slash(istate, name, len, pos);
-
-		/*
-		 * We need to remove the traling "/" from the path
-		 * for the untracked cache.
-		 */
-		name[len - 1] = '\0';
 	} else {
 		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
 	}
-
-	/*
-	 * Mark the untracked cache dirty even if it wasn't found in the index
-	 * as it could be a new untracked file.
-	 */
-	untracked_cache_invalidate_path(istate, name, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 08/12] fsmonitor: support case-insensitive directory events
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (6 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 09/12] fsmonitor: refactor non-directory callback Jeff Hostetler via GitGitGadget
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teach fsmonitor_refresh_callback() to handle case-insensitive
lookups if case-sensitive lookups fail on case-insensitive systems.
This can cause 'git status' to report stale status for files if there
are case issues/errors in the worktree.

The FSMonitor daemon sends FSEvents using the observed spelling
of each pathname.  On case-insensitive file systems this may be
different than the expected case spelling.

The existing code uses index_name_pos() to find the cache-entry for
the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
that the worktree scan/index refresh will revisit and revalidate the
path.

On a case-insensitive file system, the exact match lookup may fail
to find the associated cache-entry. This causes status to think that
the cached CE flags are correct and skip over the file.

Update the handling of directory-style FSEvents (ones containing a
path with a trailing slash) to optionally use the name-hash if the
case-correct search does not find a match.

(The FSMonitor daemon can send directory FSEvents if the OS provides
that information.)

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 120 insertions(+), 2 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 14585b6c516..73e6ac82af7 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -5,6 +5,7 @@
 #include "ewah/ewok.h"
 #include "fsmonitor.h"
 #include "fsmonitor-ipc.h"
+#include "name-hash.h"
 #include "run-command.h"
 #include "strbuf.h"
 #include "trace2.h"
@@ -183,6 +184,9 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static int fsmonitor_refresh_callback_slash(
+	struct index_state *istate, const char *name, int len, int pos);
+
 /*
  * Invalidate the untracked cache for the given pathname.  Copy the
  * buffer to a proper null-terminated string (since the untracked
@@ -205,6 +209,84 @@ static void my_invalidate_untracked_cache(
 	strbuf_release(&work_path);
 }
 
+/*
+ * Use the name-hash to lookup the pathname.
+ *
+ * Returns the number of cache-entries that we invalidated.
+ */
+static int my_callback_name_hash(
+	struct index_state *istate, const char *name, int len)
+{
+	struct cache_entry *ce = NULL;
+
+	ce = index_file_exists(istate, name, len, 1);
+	if (!ce)
+		return 0;
+
+	/*
+	 * The index contains a case-insensitive match for the pathname.
+	 * This could either be a regular file or a sparse-index directory.
+	 *
+	 * We should not have seen FSEvents for a sparse-index directory,
+	 * but we handle it just in case.
+	 *
+	 * Either way, we know that there are not any cache-entries for
+	 * children inside the cone of the directory, so we don't need to
+	 * do the usual scan.
+	 */
+	trace_printf_key(&trace_fsmonitor,
+			 "fsmonitor_refresh_callback map '%s' '%s'",
+			 name, ce->name);
+
+	my_invalidate_untracked_cache(istate, ce->name, ce->ce_namelen);
+
+	ce->ce_flags &= ~CE_FSMONITOR_VALID;
+	return 1;
+}
+
+/*
+ * Use the directory name-hash to find the correct-case spelling
+ * of the directory.  Use the canonical spelling to invalidate all
+ * of the cache-entries within the matching cone.
+ *
+ * The pathname MUST NOT have a trailing slash.
+ *
+ * Returns the number of cache-entries that we invalidated.
+ */
+static int my_callback_dir_name_hash(
+	struct index_state *istate, const char *name, int len)
+{
+	struct strbuf canonical_path = STRBUF_INIT;
+	int pos;
+	int nr_in_cone;
+
+	if (!index_dir_exists2(istate, name, len, &canonical_path))
+		return 0; /* name is untracked */
+	if (!memcmp(name, canonical_path.buf, len)) {
+		strbuf_release(&canonical_path);
+		return 0; /* should not happen */
+	}
+
+	trace_printf_key(&trace_fsmonitor,
+			 "fsmonitor_refresh_callback map '%s' '%s'",
+			 name, canonical_path.buf);
+
+	/*
+	 * The directory name-hash only tells us the corrected
+	 * spelling of the prefix.  We have to use this canonical
+	 * path to do a lookup in the cache-entry array so that we
+	 * we repeat the original search using the case-corrected
+	 * spelling.
+	 */
+	strbuf_addch(&canonical_path, '/');
+	pos = index_name_pos(istate, canonical_path.buf,
+			     canonical_path.len);
+	nr_in_cone = fsmonitor_refresh_callback_slash(
+		istate, canonical_path.buf, canonical_path.len, pos);
+	strbuf_release(&canonical_path);
+	return nr_in_cone;
+}
+
 static void fsmonitor_refresh_callback_unqualified(
 	struct index_state *istate, const char *name, int len, int pos)
 {
@@ -269,7 +351,10 @@ static void fsmonitor_refresh_callback_unqualified(
  *
  * Return the number of cache-entries that we invalidated.  We will
  * use this later to determine if we need to attempt a second
- * case-insensitive search.
+ * case-insensitive search.  That is, if a observed-case search yields
+ * any results, we assume the prefix is case-correct.  If there are
+ * no matches, we still don't know if the observed path is simply
+ * untracked or case-incorrect.
  */
 static int fsmonitor_refresh_callback_slash(
 	struct index_state *istate, const char *name, int len, int pos)
@@ -293,17 +378,50 @@ static int fsmonitor_refresh_callback_slash(
 	return nr_in_cone;
 }
 
+/*
+ * On a case-insensitive FS, use the name-hash and directory name-hash
+ * to map the case of the observed path to the canonical case expected
+ * by the index.
+ *
+ * The given pathname includes the trailing slash.
+ *
+ * Return the number of cache-entries that we invalidated.
+ */
+static int fsmonitor_refresh_callback_slash_icase(
+	struct index_state *istate, const char *name, int len)
+{
+	int nr_in_cone;
+
+	/*
+	 * Look for a case-incorrect sparse-index directory.
+	 */
+	nr_in_cone = my_callback_name_hash(istate, name, len);
+	if (nr_in_cone)
+		return nr_in_cone;
+
+	/*
+	 * (len-1) because we do not include the trailing slash in the
+	 * pathname.
+	 */
+	nr_in_cone = my_callback_dir_name_hash(istate, name, len-1);
+	return nr_in_cone;
+}
+
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
 	int len = strlen(name);
 	int pos = index_name_pos(istate, name, len);
+	int nr_in_cone;
+
 
 	trace_printf_key(&trace_fsmonitor,
 			 "fsmonitor_refresh_callback '%s' (pos %d)",
 			 name, pos);
 
 	if (name[len - 1] == '/') {
-		fsmonitor_refresh_callback_slash(istate, name, len, pos);
+		nr_in_cone = fsmonitor_refresh_callback_slash(istate, name, len, pos);
+		if (ignore_case && !nr_in_cone)
+			fsmonitor_refresh_callback_slash_icase(istate, name, len);
 	} else {
 		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 09/12] fsmonitor: refactor non-directory callback
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (7 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 08/12] fsmonitor: support case-insensitive directory events Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 10/12] fsmonitor: support case-insensitive non-directory events Jeff Hostetler via GitGitGadget
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Refactor the fsmonitor_refresh_callback_unqualified() code
to try to use the _callback_slash() code and avoid having
a custom filter in the child cache-entry scanner.

On platforms that DO NOT annotate FS events with a trailing
slash, if we fail to find an exact match for the pathname
in the index, we do not know if the pathname represents a
directory or simply an untracked file.  Pretend that the pathname
is a directory and try again before assuming it is an untracked
file.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 59 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 73e6ac82af7..cb27bae8aa8 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -287,41 +287,52 @@ static int my_callback_dir_name_hash(
 	return nr_in_cone;
 }
 
-static void fsmonitor_refresh_callback_unqualified(
+/*
+ * The daemon sent an observed pathname without a trailing slash.
+ * (This is the normal case.)  We do not know if it is a tracked or
+ * untracked file, a sparse-directory, or a populated directory (on a
+ * platform such as Windows where FSEvents are not qualified).
+ *
+ * The pathname contains the observed case reported by the FS. We
+ * do not know it is case-correct or -incorrect.
+ *
+ * Assume it is case-correct and try an exact match.
+ *
+ * Return the number of cache-entries that we invalidated.
+ */
+static int fsmonitor_refresh_callback_unqualified(
 	struct index_state *istate, const char *name, int len, int pos)
 {
-	int i;
-
 	my_invalidate_untracked_cache(istate, name, len);
 
 	if (pos >= 0) {
 		/*
-		 * We have an exact match for this path and can just
-		 * invalidate it.
+		 * An exact match on a tracked file. We assume that we
+		 * do not need to scan forward for a sparse-directory
+		 * cache-entry with the same pathname, nor for a cone
+		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
 		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+		return 1;
 	} else {
+		int nr_in_cone;
+		struct strbuf work_path = STRBUF_INIT;
+
 		/*
-		 * The path is not a tracked file -or- it is a
-		 * directory event on a platform that cannot
-		 * distinguish between file and directory events in
-		 * the event handler, such as Windows.
-		 *
-		 * Scan as if it is a directory and invalidate the
-		 * cone under it.  (But remember to ignore items
-		 * between "name" and "name/", such as "name-" and
-		 * "name.".
+		 * The negative "pos" gives us the suggested insertion
+		 * point for the pathname (without the trailing slash).
+		 * We need to see if there is a directory with that
+		 * prefix, but there can be lots of pathnames between
+		 * "foo" and "foo/" like "foo-" or "foo-bar", so we
+		 * don't want to do our own scan.
 		 */
-		pos = -pos - 1;
-
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			if ((unsigned char)istate->cache[i]->name[len] > '/')
-				break;
-			if (istate->cache[i]->name[len] == '/')
-				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		strbuf_add(&work_path, name, len);
+		strbuf_addch(&work_path, '/');
+		pos = index_name_pos(istate, work_path.buf, work_path.len);
+		nr_in_cone = fsmonitor_refresh_callback_slash(
+			istate, work_path.buf, work_path.len, pos);
+		strbuf_release(&work_path);
+		return nr_in_cone;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 10/12] fsmonitor: support case-insensitive non-directory events
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (8 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 09/12] fsmonitor: refactor non-directory callback Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-13 20:52 ` [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index cb27bae8aa8..a7847f07a40 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -336,6 +336,36 @@ static int fsmonitor_refresh_callback_unqualified(
 	}
 }
 
+/*
+ * On a case-insensitive FS, use the name-hash to map the case of
+ * the observed path to the canonical case expected by the index.
+ *
+ * The given pathname DOES NOT include the trailing slash.
+ *
+ * Return the number of cache-entries that we invalidated.
+ */
+static int fsmonitor_refresh_callback_unqualified_icase(
+	struct index_state *istate, const char *name, int len)
+{
+	int nr_in_cone;
+
+	/*
+	 * Look for a case-incorrect match for this non-directory
+	 * pathname.
+	 */
+	nr_in_cone = my_callback_name_hash(istate, name, len);
+	if (nr_in_cone)
+		return nr_in_cone;
+
+	/*
+	 * Try the directory name-hash and see if there is a
+	 * case-incorrect directory with this pathanme.
+	 * (len) because we don't have a trailing slash.
+	 */
+	nr_in_cone = my_callback_dir_name_hash(istate, name, len);
+	return nr_in_cone;
+}
+
 /*
  * The daemon can decorate directory events, such as a move or rename,
  * by adding a trailing slash to the observed name.  Use this to
@@ -434,7 +464,9 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 		if (ignore_case && !nr_in_cone)
 			fsmonitor_refresh_callback_slash_icase(istate, name, len);
 	} else {
-		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
+		nr_in_cone = fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
+		if (ignore_case && !nr_in_cone)
+			fsmonitor_refresh_callback_unqualified_icase(istate, name, len);
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (9 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 10/12] fsmonitor: support case-insensitive non-directory events Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-13 20:52 ` [PATCH 12/12] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  12 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Refactor code in the fsmonitor_refresh_callback() call chain dealing
with invalidating the CE_FSMONITOR_VALID bit and add a trace message.

During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
data from the FSMonitor daemon (so that a later phase will lstat() and
verify the true state of the file).

Create a new function to clear the bit and add some unique tracing for
it to help debug edge cases.

This is similar to the existing `mark_fsmonitor_invalid()` function,
but we don't need the extra stuff that it does.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index a7847f07a40..75c7f73f68d 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -209,6 +209,20 @@ static void my_invalidate_untracked_cache(
 	strbuf_release(&work_path);
 }
 
+/*
+ * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
+ * but we've already handled the untracked-cache and I want a different
+ * trace message.
+ */
+static void my_invalidate_ce_fsm(struct cache_entry *ce)
+{
+	if (ce->ce_flags & CE_FSMONITOR_VALID)
+		trace_printf_key(&trace_fsmonitor,
+				 "fsmonitor_refresh_cb_invalidate '%s'",
+				 ce->name);
+	ce->ce_flags &= ~CE_FSMONITOR_VALID;
+}
+
 /*
  * Use the name-hash to lookup the pathname.
  *
@@ -240,7 +254,7 @@ static int my_callback_name_hash(
 
 	my_invalidate_untracked_cache(istate, ce->name, ce->ce_namelen);
 
-	ce->ce_flags &= ~CE_FSMONITOR_VALID;
+	my_invalidate_ce_fsm(ce);
 	return 1;
 }
 
@@ -312,7 +326,7 @@ static int fsmonitor_refresh_callback_unqualified(
 		 * cache-entry with the same pathname, nor for a cone
 		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
-		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+		my_invalidate_ce_fsm(istate->cache[pos]);
 		return 1;
 	} else {
 		int nr_in_cone;
@@ -412,7 +426,7 @@ static int fsmonitor_refresh_callback_slash(
 	for (i = pos; i < istate->cache_nr; i++) {
 		if (!starts_with(istate->cache[i]->name, name))
 			break;
-		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		my_invalidate_ce_fsm(istate->cache[i]);
 		nr_in_cone++;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 12/12] t7527: update case-insenstive fsmonitor test
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (10 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
@ 2024-02-13 20:52 ` Jeff Hostetler via GitGitGadget
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  12 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-13 20:52 UTC (permalink / raw
  To: git; +Cc: Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Now that the FSMonitor client has been updated to better
handle events on case-insenstive file systems, update the
two tests that demonstrated the bug.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t7527-builtin-fsmonitor.sh | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 5cd68b2ea82..03af8539ca8 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1116,16 +1116,17 @@ test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
 
 	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
 
+	# Also verify that we get a mapping event to correct the case.
+	grep -q "map.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
+		"$PWD/subdir_case_wrong.log1" &&
+
 	# The refresh-callbacks should have caused "git status" to clear
 	# the CE_FSMONITOR_VALID bit on each of those files and caused
 	# the worktree scan to visit them and mark them as modified.
 	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
 	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
 
-	# However, with the fsmonitor client bug, the "(pos -3)" causes
-	# the client to not update the bit and never rescan the file
-	# and therefore not report it as dirty.
-	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
+	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
 '
 
 test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
@@ -1246,12 +1247,14 @@ test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
 	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
 	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
 
-	# Status should say these files are modified, but with the case
-	# bug, the "pos -3" cause the client to not update the FSM bit
-	# and never cause the file to be rescanned and therefore to not
-	# report it dirty.
-	! grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
-	! grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
+	# Also verify that we get a mapping event to correct the case.
+	grep -q "fsmonitor_refresh_callback map.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
+		"$PWD/file_case_wrong-try3.log" &&
+	grep -q "fsmonitor_refresh_callback map.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
+		"$PWD/file_case_wrong-try3.log" &&
+
+	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
+	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/12] name-hash: add index_dir_exists2()
  2024-02-13 20:52 ` [PATCH 02/12] name-hash: add index_dir_exists2() Jeff Hostetler via GitGitGadget
@ 2024-02-13 21:43   ` Junio C Hamano
  2024-02-20 17:38     ` Jeff Hostetler
  2024-02-15  9:31   ` Patrick Steinhardt
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2024-02-13 21:43 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Create a new version of index_dir_exists() to return the canonical
> spelling of the matched directory prefix.
>
> The existing index_dir_exists() returns a boolean to indicate if
> there is a case-insensitive match in the directory name-hash, but
> it doesn't tell the caller the exact spelling of that match.
>
> The new version also copies the matched spelling to a provided strbuf.
> This lets the caller, for example, then call index_name_pos() with the
> correct case to search the cache-entry array for the real insertion
> position.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  name-hash.c | 16 ++++++++++++++++
>  name-hash.h |  2 ++
>  2 files changed, 18 insertions(+)
>
> diff --git a/name-hash.c b/name-hash.c
> index 251f036eef6..d735c81acb3 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>  	dir = find_dir_entry(istate, name, namelen);
>  	return dir && dir->nr;
>  }
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path)
> +{
> +	struct dir_entry *dir;
> +
> +	strbuf_init(canonical_path, namelen+1);
> +
> +	lazy_init_name_hash(istate);
> +	expand_to_path(istate, name, namelen, 0);
> +	dir = find_dir_entry(istate, name, namelen);
> +
> +	if (dir && dir->nr)
> +		strbuf_add(canonical_path, dir->name, dir->namelen);
> +
> +	return dir && dir->nr;
> +}
>  
>  void adjust_dirname_case(struct index_state *istate, char *name)

Missing inter-function blank line, before the new function.

I wonder if we can avoid such repetition---the body of
index_dir_exists() is 100% shared with this new function.

Isn't it extremely unusual to receive "struct strbuf *" and call
strbuf_init() on it?  It means that the caller is expected to have a
strbuf and pass a pointer to it, but also it is expected to leave
the strbuf uninitialized.

I'd understand if it calls strbuf_reset(), but it may not even be
necessary, if we make it responsibility of the caller to pass a
valid strbuf to be appended into.

	int index_dir_find(struct index_state *istate,
			   const char *name, int namelen,
			   struct strbuf *canonical_path)
	{
                struct dir_entry *dir;

                lazy_init_name_hash(istate);
                expand_to_path(istate, name, namelen, 0);
                dir = find_dir_entry(istate, name, namelen);

                if (canonical_path && dir && dir->nr) {
			// strbuf_reset(canonical_path); ???
                	strbuf_add(canonical_path, dir->name, dir->namelen);
		}
                return dir && dir->nr;
	}

Then we can do

	#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)

in the header for existing callers.

>  {
> diff --git a/name-hash.h b/name-hash.h
> index b1b4b0fb337..2fcac5c4870 100644
> --- a/name-hash.h
> +++ b/name-hash.h
> @@ -5,6 +5,8 @@ struct cache_entry;
>  struct index_state;
>  
>  int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path);
>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/12] sparse-index: pass string length to index_file_exists()
  2024-02-13 20:52 ` [PATCH 01/12] sparse-index: pass string length to index_file_exists() Jeff Hostetler via GitGitGadget
@ 2024-02-13 22:07   ` Junio C Hamano
  2024-02-20 17:34     ` Jeff Hostetler
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2024-02-13 22:07 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> The call to index_file_exists() in the loop in expand_to_path() passes
> the wrong string length.  Let's fix that.
>
> The loop in expand_to_path() searches the name-hash for each
> sub-directory prefix in the provided pathname. That is, by searching
> for "dir1/" then "dir1/dir2/" then "dir1/dir2/dir3/" and so on until
> it finds a cache-entry representing a sparse directory.
>
> The code creates "strbuf path_mutable" to contain the working pathname
> and modifies the buffer in-place by temporarily replacing the character
> following each successive "/" with NUL for the duration of the call to
> index_file_exists().
>
> It does not update the strbuf.len during this substitution.
>
> Pass the patched length of the prefix path instead.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---

This looked familiar, and it turns out that

https://lore.kernel.org/git/pull.1649.git.1706897095273.gitgitgadget@gmail.com/

has already been merged to 'master'.


>  sparse-index.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 3578feb2837..e48e40cae71 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -579,8 +579,9 @@ void expand_to_path(struct index_state *istate,
>  		replace++;
>  		temp = *replace;
>  		*replace = '\0';
> +		substr_len = replace - path_mutable.buf;
>  		if (index_file_exists(istate, path_mutable.buf,
> -				      path_mutable.len, icase)) {
> +				      substr_len, icase)) {
>  			/*
>  			 * We found a parent directory in the name-hash
>  			 * hashtable, because only sparse directory entries
> @@ -593,7 +594,6 @@ void expand_to_path(struct index_state *istate,
>  		}
>  
>  		*replace = temp;
> -		substr_len = replace - path_mutable.buf;
>  	}
>  
>  cleanup:

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events
  2024-02-13 20:52 ` [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
@ 2024-02-14  1:34   ` Junio C Hamano
  2024-02-15  9:32   ` Patrick Steinhardt
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-14  1:34 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 66 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 38 insertions(+), 28 deletions(-)

Up to this point, I found it a very pleasant read.  Nothing
surprising or unexpected.  Just a simple series of nice clean-ups.

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/12] fsmonitor: clarify handling of directory events in callback
  2024-02-13 20:52 ` [PATCH 06/12] fsmonitor: clarify handling of directory events in callback Jeff Hostetler via GitGitGadget
@ 2024-02-14  7:47   ` Junio C Hamano
  2024-02-20 18:56     ` Jeff Hostetler
  2024-02-15  9:32   ` Patrick Steinhardt
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2024-02-14  7:47 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 47 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 14 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 614270fa5e8..754fe20cfd0 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -219,24 +219,40 @@ static void fsmonitor_refresh_callback_unqualified(
>  	}
>  }
>  
> -static void fsmonitor_refresh_callback_slash(
> +/*
> + * The daemon can decorate directory events, such as a move or rename,
> + * by adding a trailing slash to the observed name.  Use this to
> + * explicitly invalidate the entire cone under that directory.
> + *
> + * The daemon can only reliably do that if the OS FSEvent contains
> + * sufficient information in the event.
> + *
> + * macOS FSEvents have enough information.
> + *
> + * Other platforms may or may not be able to do it (and it might
> + * depend on the type of event (for example, a daemon could lstat() an
> + * observed pathname after a rename, but not after a delete)).
> + *
> + * If we find an exact match in the index for a path with a trailing
> + * slash, it means that we matched a sparse-index directory in a
> + * cone-mode sparse-checkout (since that's the only time we have
> + * directories in the index).  We should never see this in practice
> + * (because sparse directories should not be present and therefore
> + * not generating FS events).  Either way, we can treat them in the
> + * same way and just invalidate the cache-entry and the untracked
> + * cache (and in this case, the forward cache-entry scan won't find
> + * anything and it doesn't hurt to let it run).
> + *
> + * Return the number of cache-entries that we invalidated.  We will
> + * use this later to determine if we need to attempt a second
> + * case-insensitive search.
> + */
> +static int fsmonitor_refresh_callback_slash(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {

This was split out a few patches ago, and the caller of course
ignored the return value (void), but now it turns an integer, and
this change is without a corresponding update to the caller, which
leaves readers puzzled.

Perhaps a future patch either updates the existing caller or adds a
new caller that utilize the returned value, but then at least the
proposed commit message for this step should hint how it helps the
caller(s) we will see in the future steps if this function returns
the number of entries invalidated, iow, how the caller is expected
to use the returned value from here, no?

Alternatively, this step can limit itself to what the commit title
claims to do---to clarify what the helper does with enhanced in-code
comments.  Then a future step that updates the caller to care about
the return value can have both the changes to this callee as well as
the caller---which may make it easier to see how the returned info
helps the caller.  I dunno which is more reasonable.

Thanks.



>  	int i;
> +	int nr_in_cone = 0;
>  
> -	/*
> -	 * The daemon can decorate directory events, such as
> -	 * moves or renames, with a trailing slash if the OS
> -	 * FS Event contains sufficient information, such as
> -	 * MacOS.
> -	 *
> -	 * Use this to invalidate the entire cone under that
> -	 * directory.
> -	 *
> -	 * We do not expect an exact match because the index
> -	 * does not normally contain directory entries, so we
> -	 * start at the insertion point and scan.
> -	 */
>  	if (pos < 0)
>  		pos = -pos - 1;
>  
> @@ -245,7 +261,10 @@ static void fsmonitor_refresh_callback_slash(
>  		if (!starts_with(istate->cache[i]->name, name))
>  			break;
>  		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		nr_in_cone++;
>  	}
> +
> +	return nr_in_cone;
>  }
>  
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation
  2024-02-13 20:52 ` [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
@ 2024-02-14 16:46   ` Junio C Hamano
  2024-02-15  9:32   ` Patrick Steinhardt
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-14 16:46 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 38 ++++++++++++++++++++++++++------------
>  1 file changed, 26 insertions(+), 12 deletions(-)

Sorry, but the proposed commit log is way lacking for this
particular step.  Readers have already understood, after reading
steps like [04/12] and [05/12], that you use the verb "refactor" in
its usual sense, i.e. reorganize the code around without changing
behaviour in order to enhance readability and to make it easier for
code reuse in future steps, and these two steps did exactly that:
helper functions are split out of larger functions, presumably
either to allow adding new callers to the helpers, or to make the
result of adding more code to the caller easier to follow [*].

However, the changes in this step look vastly different, and it is
not even clear if this change intends to keep the behaviour before
and after it the same, or if it does, how they are the same.

I can sort-of see that the original code made a call to
untracked_cache_invalidate_path() at the very end of the
fsmonitor_refresh_callback(), but the updated code no longer does
so.  Why?  Is it because it is the root cause of an unstated bug
that we don't do so until the end in the current code?  Is it
because the order does not matter (how and why?) and the resulting
code becomes better (how?  simpler to follow? more performant?
avoids duplicated work?  something else)?

It does not help to call a new helper function with a cryptic "my_"
name, either.

Please try again?  Thanks.


[Footnote] 

 * These two are vastly different goals, and there may be other
   reasons why you are doing such refactoring.  It would have been
   nicer if such a preliminary refactoring steps had explained what
   the intended course of evolution for the code involved in the
   refactoring is.



>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 754fe20cfd0..14585b6c516 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,11 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +/*
> + * Invalidate the untracked cache for the given pathname.  Copy the
> + * buffer to a proper null-terminated string (since the untracked
> + * cache code does not use (buf, len) style argument).  Also strip any
> + * trailing slash.
> + */
> +static void my_invalidate_untracked_cache(
> +	struct index_state *istate, const char *name, int len)
> +{
> +	struct strbuf work_path = STRBUF_INIT;
> +
> +	if (!len)
> +		return;
> +
> +	if (name[len-1] == '/')
> +		len--;
> +
> +	strbuf_add(&work_path, name, len);
> +	untracked_cache_invalidate_path(istate, work_path.buf, 0);
> +	strbuf_release(&work_path);
> +}
> +
>  static void fsmonitor_refresh_callback_unqualified(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {
>  	int i;
>  
> +	my_invalidate_untracked_cache(istate, name, len);
> +
>  	if (pos >= 0) {
>  		/*
>  		 * We have an exact match for this path and can just
> @@ -253,6 +277,8 @@ static int fsmonitor_refresh_callback_slash(
>  	int i;
>  	int nr_in_cone = 0;
>  
> +	my_invalidate_untracked_cache(istate, name, len);
> +
>  	if (pos < 0)
>  		pos = -pos - 1;
>  
> @@ -278,21 +304,9 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  
>  	if (name[len - 1] == '/') {
>  		fsmonitor_refresh_callback_slash(istate, name, len, pos);
> -
> -		/*
> -		 * We need to remove the traling "/" from the path
> -		 * for the untracked cache.
> -		 */
> -		name[len - 1] = '\0';
>  	} else {
>  		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
>  	}
> -
> -	/*
> -	 * Mark the untracked cache dirty even if it wasn't found in the index
> -	 * as it could be a new untracked file.
> -	 */
> -	untracked_cache_invalidate_path(istate, name, 0);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/12] name-hash: add index_dir_exists2()
  2024-02-13 20:52 ` [PATCH 02/12] name-hash: add index_dir_exists2() Jeff Hostetler via GitGitGadget
  2024-02-13 21:43   ` Junio C Hamano
@ 2024-02-15  9:31   ` Patrick Steinhardt
  1 sibling, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:31 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 2763 bytes --]

On Tue, Feb 13, 2024 at 08:52:11PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Create a new version of index_dir_exists() to return the canonical
> spelling of the matched directory prefix.
> 
> The existing index_dir_exists() returns a boolean to indicate if
> there is a case-insensitive match in the directory name-hash, but
> it doesn't tell the caller the exact spelling of that match.
> 
> The new version also copies the matched spelling to a provided strbuf.
> This lets the caller, for example, then call index_name_pos() with the
> correct case to search the cache-entry array for the real insertion
> position.
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  name-hash.c | 16 ++++++++++++++++
>  name-hash.h |  2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/name-hash.c b/name-hash.c
> index 251f036eef6..d735c81acb3 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>  	dir = find_dir_entry(istate, name, namelen);
>  	return dir && dir->nr;
>  }
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path)
> +{
> +	struct dir_entry *dir;
> +
> +	strbuf_init(canonical_path, namelen+1);

Missing spaces: `namelen + 1`.

> +	lazy_init_name_hash(istate);
> +	expand_to_path(istate, name, namelen, 0);
> +	dir = find_dir_entry(istate, name, namelen);
> +
> +	if (dir && dir->nr)
> +		strbuf_add(canonical_path, dir->name, dir->namelen);
> +
> +	return dir && dir->nr;
> +}

Can we maybe give this function a more descriptive name?
`index_dir_exists2()` doesn't give the reader any indicator what is
different about it compared to `index_dir_exists()`. How about
`index_dir_exists_with_canonical()`?

>  void adjust_dirname_case(struct index_state *istate, char *name)
>  {
> diff --git a/name-hash.h b/name-hash.h
> index b1b4b0fb337..2fcac5c4870 100644
> --- a/name-hash.h
> +++ b/name-hash.h
> @@ -5,6 +5,8 @@ struct cache_entry;
>  struct index_state;
>  
>  int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path);

It would also be great to add comments here that explain what those
functions do and what the difference between them is.

Patrick

>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
>  
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/12] fsmonitor: refactor refresh callback on directory events
  2024-02-13 20:52 ` [PATCH 04/12] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
@ 2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-20 18:54     ` Jeff Hostetler
  0 siblings, 1 reply; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 3170 bytes --]

On Tue, Feb 13, 2024 at 08:52:13PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index f670c509378..b1ef01bf3cd 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +static void fsmonitor_refresh_callback_slash(
> +	struct index_state *istate, const char *name, int len, int pos)

`len` should be `size_t` as it tracks the length of the name. This is
a preexisting issue already because `fsmonitor_refresh_callback()`
assigns `int len = strlen(name)`, which is wrong.

> +{
> +	int i;

`i` is used to iterate through `istate->cache_nr`, which is an `unsigned
int` and not an `int`. I really wish we would compile the Git code base
with `-Wconversion`, but that's a rather big undertaking.

Anyway, none of these issues are new as you merely move the code into a
new function.

Patrick

> +	/*
> +	 * The daemon can decorate directory events, such as
> +	 * moves or renames, with a trailing slash if the OS
> +	 * FS Event contains sufficient information, such as
> +	 * MacOS.
> +	 *
> +	 * Use this to invalidate the entire cone under that
> +	 * directory.
> +	 *
> +	 * We do not expect an exact match because the index
> +	 * does not normally contain directory entries, so we
> +	 * start at the insertion point and scan.
> +	 */
> +	if (pos < 0)
> +		pos = -pos - 1;
> +
> +	/* Mark all entries for the folder invalid */
> +	for (i = pos; i < istate->cache_nr; i++) {
> +		if (!starts_with(istate->cache[i]->name, name))
> +			break;
> +		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +	}
> +}
> +
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
>  	int i, len = strlen(name);
> @@ -193,28 +222,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  			 name, pos);
>  
>  	if (name[len - 1] == '/') {
> -		/*
> -		 * The daemon can decorate directory events, such as
> -		 * moves or renames, with a trailing slash if the OS
> -		 * FS Event contains sufficient information, such as
> -		 * MacOS.
> -		 *
> -		 * Use this to invalidate the entire cone under that
> -		 * directory.
> -		 *
> -		 * We do not expect an exact match because the index
> -		 * does not normally contain directory entries, so we
> -		 * start at the insertion point and scan.
> -		 */
> -		if (pos < 0)
> -			pos = -pos - 1;
> -
> -		/* Mark all entries for the folder invalid */
> -		for (i = pos; i < istate->cache_nr; i++) {
> -			if (!starts_with(istate->cache[i]->name, name))
> -				break;
> -			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> -		}
> +		fsmonitor_refresh_callback_slash(istate, name, len, pos);
>  
>  		/*
>  		 * We need to remove the traling "/" from the path
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events
  2024-02-13 20:52 ` [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
  2024-02-14  1:34   ` Junio C Hamano
@ 2024-02-15  9:32   ` Patrick Steinhardt
  1 sibling, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 3494 bytes --]

On Tue, Feb 13, 2024 at 08:52:14PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 66 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 38 insertions(+), 28 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index b1ef01bf3cd..614270fa5e8 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,6 +183,42 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +static void fsmonitor_refresh_callback_unqualified(
> +	struct index_state *istate, const char *name, int len, int pos)
> +{
> +	int i;

Same remarks here regarding the integer types. But again, not a fault of
your patch.

Patrick

> +
> +	if (pos >= 0) {
> +		/*
> +		 * We have an exact match for this path and can just
> +		 * invalidate it.
> +		 */
> +		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +	} else {
> +		/*
> +		 * The path is not a tracked file -or- it is a
> +		 * directory event on a platform that cannot
> +		 * distinguish between file and directory events in
> +		 * the event handler, such as Windows.
> +		 *
> +		 * Scan as if it is a directory and invalidate the
> +		 * cone under it.  (But remember to ignore items
> +		 * between "name" and "name/", such as "name-" and
> +		 * "name.".
> +		 */
> +		pos = -pos - 1;
> +
> +		for (i = pos; i < istate->cache_nr; i++) {
> +			if (!starts_with(istate->cache[i]->name, name))
> +				break;
> +			if ((unsigned char)istate->cache[i]->name[len] > '/')
> +				break;
> +			if (istate->cache[i]->name[len] == '/')
> +				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		}
> +	}
> +}
> +
>  static void fsmonitor_refresh_callback_slash(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {
> @@ -214,7 +250,7 @@ static void fsmonitor_refresh_callback_slash(
>  
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
> -	int i, len = strlen(name);
> +	int len = strlen(name);
>  	int pos = index_name_pos(istate, name, len);
>  
>  	trace_printf_key(&trace_fsmonitor,
> @@ -229,34 +265,8 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  		 * for the untracked cache.
>  		 */
>  		name[len - 1] = '\0';
> -	} else if (pos >= 0) {
> -		/*
> -		 * We have an exact match for this path and can just
> -		 * invalidate it.
> -		 */
> -		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
>  	} else {
> -		/*
> -		 * The path is not a tracked file -or- it is a
> -		 * directory event on a platform that cannot
> -		 * distinguish between file and directory events in
> -		 * the event handler, such as Windows.
> -		 *
> -		 * Scan as if it is a directory and invalidate the
> -		 * cone under it.  (But remember to ignore items
> -		 * between "name" and "name/", such as "name-" and
> -		 * "name.".
> -		 */
> -		pos = -pos - 1;
> -
> -		for (i = pos; i < istate->cache_nr; i++) {
> -			if (!starts_with(istate->cache[i]->name, name))
> -				break;
> -			if ((unsigned char)istate->cache[i]->name[len] > '/')
> -				break;
> -			if (istate->cache[i]->name[len] == '/')
> -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> -		}
> +		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
>  	}
>  
>  	/*
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/12] fsmonitor: clarify handling of directory events in callback
  2024-02-13 20:52 ` [PATCH 06/12] fsmonitor: clarify handling of directory events in callback Jeff Hostetler via GitGitGadget
  2024-02-14  7:47   ` Junio C Hamano
@ 2024-02-15  9:32   ` Patrick Steinhardt
  2024-02-20 19:10     ` Jeff Hostetler
  1 sibling, 1 reply; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

On Tue, Feb 13, 2024 at 08:52:15PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 47 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 14 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 614270fa5e8..754fe20cfd0 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -219,24 +219,40 @@ static void fsmonitor_refresh_callback_unqualified(
>  	}
>  }
>  
> -static void fsmonitor_refresh_callback_slash(
> +/*
> + * The daemon can decorate directory events, such as a move or rename,
> + * by adding a trailing slash to the observed name.  Use this to
> + * explicitly invalidate the entire cone under that directory.
> + *
> + * The daemon can only reliably do that if the OS FSEvent contains
> + * sufficient information in the event.
> + *
> + * macOS FSEvents have enough information.
> + *
> + * Other platforms may or may not be able to do it (and it might
> + * depend on the type of event (for example, a daemon could lstat() an
> + * observed pathname after a rename, but not after a delete)).
> + *
> + * If we find an exact match in the index for a path with a trailing
> + * slash, it means that we matched a sparse-index directory in a
> + * cone-mode sparse-checkout (since that's the only time we have
> + * directories in the index).  We should never see this in practice
> + * (because sparse directories should not be present and therefore
> + * not generating FS events).  Either way, we can treat them in the
> + * same way and just invalidate the cache-entry and the untracked
> + * cache (and in this case, the forward cache-entry scan won't find
> + * anything and it doesn't hurt to let it run).
> + *
> + * Return the number of cache-entries that we invalidated.  We will
> + * use this later to determine if we need to attempt a second
> + * case-insensitive search.
> + */
> +static int fsmonitor_refresh_callback_slash(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {
>  	int i;
> +	int nr_in_cone = 0;

Should we return `size_t` instead of `int`?

Patrick

> -	/*
> -	 * The daemon can decorate directory events, such as
> -	 * moves or renames, with a trailing slash if the OS
> -	 * FS Event contains sufficient information, such as
> -	 * MacOS.
> -	 *
> -	 * Use this to invalidate the entire cone under that
> -	 * directory.
> -	 *
> -	 * We do not expect an exact match because the index
> -	 * does not normally contain directory entries, so we
> -	 * start at the insertion point and scan.
> -	 */
>  	if (pos < 0)
>  		pos = -pos - 1;
>  
> @@ -245,7 +261,10 @@ static void fsmonitor_refresh_callback_slash(
>  		if (!starts_with(istate->cache[i]->name, name))
>  			break;
>  		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		nr_in_cone++;
>  	}
> +
> +	return nr_in_cone;
>  }
>  
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation
  2024-02-13 20:52 ` [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
  2024-02-14 16:46   ` Junio C Hamano
@ 2024-02-15  9:32   ` Patrick Steinhardt
  1 sibling, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 2616 bytes --]

On Tue, Feb 13, 2024 at 08:52:16PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>

Junio already mentioned that this change does more than a mere
refactoring, which leaves the reader puzzled a bit.

> ---
>  fsmonitor.c | 38 ++++++++++++++++++++++++++------------
>  1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 754fe20cfd0..14585b6c516 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,11 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +/*
> + * Invalidate the untracked cache for the given pathname.  Copy the
> + * buffer to a proper null-terminated string (since the untracked
> + * cache code does not use (buf, len) style argument).  Also strip any
> + * trailing slash.
> + */
> +static void my_invalidate_untracked_cache(
> +	struct index_state *istate, const char *name, int len)
> +{
> +	struct strbuf work_path = STRBUF_INIT;
> +
> +	if (!len)
> +		return;
> +
> +	if (name[len-1] == '/')
> +		len--;
> +
> +	strbuf_add(&work_path, name, len);
> +	untracked_cache_invalidate_path(istate, work_path.buf, 0);
> +	strbuf_release(&work_path);
> +}
> +
>  static void fsmonitor_refresh_callback_unqualified(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {
>  	int i;
>  
> +	my_invalidate_untracked_cache(istate, name, len);
> +
>  	if (pos >= 0) {
>  		/*
>  		 * We have an exact match for this path and can just
> @@ -253,6 +277,8 @@ static int fsmonitor_refresh_callback_slash(
>  	int i;
>  	int nr_in_cone = 0;
>  
> +	my_invalidate_untracked_cache(istate, name, len);
> +
>  	if (pos < 0)
>  		pos = -pos - 1;
>  
> @@ -278,21 +304,9 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  
>  	if (name[len - 1] == '/') {
>  		fsmonitor_refresh_callback_slash(istate, name, len, pos);
> -
> -		/*
> -		 * We need to remove the traling "/" from the path
> -		 * for the untracked cache.
> -		 */
> -		name[len - 1] = '\0';
>  	} else {
>  		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
>  	}

We can drop the braces here as both branches are now single-line
statements.

Patrick

> -
> -	/*
> -	 * Mark the untracked cache dirty even if it wasn't found in the index
> -	 * as it could be a new untracked file.
> -	 */
> -	untracked_cache_invalidate_path(istate, name, 0);
>  }
>  
>  /*
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/12] fsmonitor: support case-insensitive directory events
  2024-02-13 20:52 ` [PATCH 08/12] fsmonitor: support case-insensitive directory events Jeff Hostetler via GitGitGadget
@ 2024-02-15  9:32   ` Patrick Steinhardt
  0 siblings, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 8603 bytes --]

On Tue, Feb 13, 2024 at 08:52:17PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Teach fsmonitor_refresh_callback() to handle case-insensitive
> lookups if case-sensitive lookups fail on case-insensitive systems.
> This can cause 'git status' to report stale status for files if there
> are case issues/errors in the worktree.
> 
> The FSMonitor daemon sends FSEvents using the observed spelling
> of each pathname.  On case-insensitive file systems this may be
> different than the expected case spelling.
> 
> The existing code uses index_name_pos() to find the cache-entry for
> the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
> that the worktree scan/index refresh will revisit and revalidate the
> path.
> 
> On a case-insensitive file system, the exact match lookup may fail
> to find the associated cache-entry. This causes status to think that
> the cached CE flags are correct and skip over the file.
> 
> Update the handling of directory-style FSEvents (ones containing a
> path with a trailing slash) to optionally use the name-hash if the
> case-correct search does not find a match.
> 
> (The FSMonitor daemon can send directory FSEvents if the OS provides
> that information.)
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 120 insertions(+), 2 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 14585b6c516..73e6ac82af7 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -5,6 +5,7 @@
>  #include "ewah/ewok.h"
>  #include "fsmonitor.h"
>  #include "fsmonitor-ipc.h"
> +#include "name-hash.h"
>  #include "run-command.h"
>  #include "strbuf.h"
>  #include "trace2.h"
> @@ -183,6 +184,9 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +static int fsmonitor_refresh_callback_slash(
> +	struct index_state *istate, const char *name, int len, int pos);
> +
>  /*
>   * Invalidate the untracked cache for the given pathname.  Copy the
>   * buffer to a proper null-terminated string (since the untracked
> @@ -205,6 +209,84 @@ static void my_invalidate_untracked_cache(
>  	strbuf_release(&work_path);
>  }
>  
> +/*
> + * Use the name-hash to lookup the pathname.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */

The function not only looks up the path name, but also invalidates the
corresponding cache entry. You imply this with the second sentence, but
this could be a bit more explicit.

> +static int my_callback_name_hash(
> +	struct index_state *istate, const char *name, int len)

I find the naming conventions here to be weird with the `my_` prefix.

> +{
> +	struct cache_entry *ce = NULL;
> +
> +	ce = index_file_exists(istate, name, len, 1);
> +	if (!ce)
> +		return 0;

Okay, `index_file_exists()` is called with `icase == 1` here. But is
that the correct thing to do on case-sensitive platforms? I would have
expected us to honor `core.ignoreCase` here.

Turns out, we only end up calling this function when `ignore_case` is
set, so we already do. I think this can be clarified both by giving the
function a better name and by documenting this in the comment. Also,
neither of this or the next function really are callbacks -- they only
happen to be called by a callback function.

I'd think something like `lookup_and_invalidate_path_icase()` and
`lookup_and_invalidate_dir_icase()` could help to clarify intent.

> +	/*
> +	 * The index contains a case-insensitive match for the pathname.
> +	 * This could either be a regular file or a sparse-index directory.
> +	 *
> +	 * We should not have seen FSEvents for a sparse-index directory,
> +	 * but we handle it just in case.
> +	 *
> +	 * Either way, we know that there are not any cache-entries for
> +	 * children inside the cone of the directory, so we don't need to
> +	 * do the usual scan.
> +	 */
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback map '%s' '%s'",
> +			 name, ce->name);
> +
> +	my_invalidate_untracked_cache(istate, ce->name, ce->ce_namelen);
> +
> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +	return 1;
> +}
> +
> +/*
> + * Use the directory name-hash to find the correct-case spelling
> + * of the directory.  Use the canonical spelling to invalidate all
> + * of the cache-entries within the matching cone.
> + *
> + * The pathname MUST NOT have a trailing slash.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static int my_callback_dir_name_hash(
> +	struct index_state *istate, const char *name, int len)
> +{
> +	struct strbuf canonical_path = STRBUF_INIT;
> +	int pos;
> +	int nr_in_cone;
> +
> +	if (!index_dir_exists2(istate, name, len, &canonical_path))
> +		return 0; /* name is untracked */
> +	if (!memcmp(name, canonical_path.buf, len)) {
> +		strbuf_release(&canonical_path);
> +		return 0; /* should not happen */
> +	}

So in other words, this function should only be called when we know that
casing differs, and thus the passed-in name and the canonical name
should never be the same? If this case shouldn't ever happen, shouldn't
we report this as an error or use `BUG()` instead of silently ignoring
this mismatch of expectations?

Patrick

> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback map '%s' '%s'",
> +			 name, canonical_path.buf);
> +
> +	/*
> +	 * The directory name-hash only tells us the corrected
> +	 * spelling of the prefix.  We have to use this canonical
> +	 * path to do a lookup in the cache-entry array so that we
> +	 * we repeat the original search using the case-corrected
> +	 * spelling.
> +	 */
> +	strbuf_addch(&canonical_path, '/');
> +	pos = index_name_pos(istate, canonical_path.buf,
> +			     canonical_path.len);
> +	nr_in_cone = fsmonitor_refresh_callback_slash(
> +		istate, canonical_path.buf, canonical_path.len, pos);
> +	strbuf_release(&canonical_path);
> +	return nr_in_cone;
> +}
> +
>  static void fsmonitor_refresh_callback_unqualified(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {
> @@ -269,7 +351,10 @@ static void fsmonitor_refresh_callback_unqualified(
>   *
>   * Return the number of cache-entries that we invalidated.  We will
>   * use this later to determine if we need to attempt a second
> - * case-insensitive search.
> + * case-insensitive search.  That is, if a observed-case search yields
> + * any results, we assume the prefix is case-correct.  If there are
> + * no matches, we still don't know if the observed path is simply
> + * untracked or case-incorrect.
>   */
>  static int fsmonitor_refresh_callback_slash(
>  	struct index_state *istate, const char *name, int len, int pos)
> @@ -293,17 +378,50 @@ static int fsmonitor_refresh_callback_slash(
>  	return nr_in_cone;
>  }
>  
> +/*
> + * On a case-insensitive FS, use the name-hash and directory name-hash
> + * to map the case of the observed path to the canonical case expected
> + * by the index.
> + *
> + * The given pathname includes the trailing slash.
> + *
> + * Return the number of cache-entries that we invalidated.
> + */
> +static int fsmonitor_refresh_callback_slash_icase(
> +	struct index_state *istate, const char *name, int len)
> +{
> +	int nr_in_cone;
> +
> +	/*
> +	 * Look for a case-incorrect sparse-index directory.
> +	 */
> +	nr_in_cone = my_callback_name_hash(istate, name, len);
> +	if (nr_in_cone)
> +		return nr_in_cone;
> +
> +	/*
> +	 * (len-1) because we do not include the trailing slash in the
> +	 * pathname.
> +	 */
> +	nr_in_cone = my_callback_dir_name_hash(istate, name, len-1);
> +	return nr_in_cone;
> +}
> +
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
>  	int len = strlen(name);
>  	int pos = index_name_pos(istate, name, len);
> +	int nr_in_cone;
> +
>  
>  	trace_printf_key(&trace_fsmonitor,
>  			 "fsmonitor_refresh_callback '%s' (pos %d)",
>  			 name, pos);
>  
>  	if (name[len - 1] == '/') {
> -		fsmonitor_refresh_callback_slash(istate, name, len, pos);
> +		nr_in_cone = fsmonitor_refresh_callback_slash(istate, name, len, pos);
> +		if (ignore_case && !nr_in_cone)
> +			fsmonitor_refresh_callback_slash_icase(istate, name, len);
>  	} else {
>  		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
>  	}
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback
  2024-02-13 20:52 ` [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
@ 2024-02-15  9:32   ` Patrick Steinhardt
  0 siblings, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 2662 bytes --]

On Tue, Feb 13, 2024 at 08:52:20PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Refactor code in the fsmonitor_refresh_callback() call chain dealing
> with invalidating the CE_FSMONITOR_VALID bit and add a trace message.
> 
> During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
> data from the FSMonitor daemon (so that a later phase will lstat() and
> verify the true state of the file).
> 
> Create a new function to clear the bit and add some unique tracing for
> it to help debug edge cases.
> 
> This is similar to the existing `mark_fsmonitor_invalid()` function,
> but we don't need the extra stuff that it does.
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index a7847f07a40..75c7f73f68d 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -209,6 +209,20 @@ static void my_invalidate_untracked_cache(
>  	strbuf_release(&work_path);
>  }
>  
> +/*
> + * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
> + * but we've already handled the untracked-cache and I want a different
> + * trace message.
> + */
> +static void my_invalidate_ce_fsm(struct cache_entry *ce)
> +{
> +	if (ce->ce_flags & CE_FSMONITOR_VALID)
> +		trace_printf_key(&trace_fsmonitor,
> +				 "fsmonitor_refresh_cb_invalidate '%s'",
> +				 ce->name);
> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +}

Same comment here regarding the `my_` prefix.

Patrick

> +
>  /*
>   * Use the name-hash to lookup the pathname.
>   *
> @@ -240,7 +254,7 @@ static int my_callback_name_hash(
>  
>  	my_invalidate_untracked_cache(istate, ce->name, ce->ce_namelen);
>  
> -	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +	my_invalidate_ce_fsm(ce);
>  	return 1;
>  }
>  
> @@ -312,7 +326,7 @@ static int fsmonitor_refresh_callback_unqualified(
>  		 * cache-entry with the same pathname, nor for a cone
>  		 * at that directory. (That is, assume no D/F conflicts.)
>  		 */
> -		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		my_invalidate_ce_fsm(istate->cache[pos]);
>  		return 1;
>  	} else {
>  		int nr_in_cone;
> @@ -412,7 +426,7 @@ static int fsmonitor_refresh_callback_slash(
>  	for (i = pos; i < istate->cache_nr; i++) {
>  		if (!starts_with(istate->cache[i]->name, name))
>  			break;
> -		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		my_invalidate_ce_fsm(istate->cache[i]);
>  		nr_in_cone++;
>  	}
>  
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 09/12] fsmonitor: refactor non-directory callback
  2024-02-13 20:52 ` [PATCH 09/12] fsmonitor: refactor non-directory callback Jeff Hostetler via GitGitGadget
@ 2024-02-15  9:32   ` Patrick Steinhardt
  0 siblings, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-15  9:32 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 4163 bytes --]

On Tue, Feb 13, 2024 at 08:52:18PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Refactor the fsmonitor_refresh_callback_unqualified() code
> to try to use the _callback_slash() code and avoid having
> a custom filter in the child cache-entry scanner.
> 
> On platforms that DO NOT annotate FS events with a trailing
> slash, if we fail to find an exact match for the pathname
> in the index, we do not know if the pathname represents a
> directory or simply an untracked file.  Pretend that the pathname
> is a directory and try again before assuming it is an untracked
> file.
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 59 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 35 insertions(+), 24 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 73e6ac82af7..cb27bae8aa8 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -287,41 +287,52 @@ static int my_callback_dir_name_hash(
>  	return nr_in_cone;
>  }
>  
> -static void fsmonitor_refresh_callback_unqualified(
> +/*
> + * The daemon sent an observed pathname without a trailing slash.
> + * (This is the normal case.)  We do not know if it is a tracked or
> + * untracked file, a sparse-directory, or a populated directory (on a
> + * platform such as Windows where FSEvents are not qualified).
> + *
> + * The pathname contains the observed case reported by the FS. We
> + * do not know it is case-correct or -incorrect.
> + *
> + * Assume it is case-correct and try an exact match.
> + *
> + * Return the number of cache-entries that we invalidated.
> + */
> +static int fsmonitor_refresh_callback_unqualified(
>  	struct index_state *istate, const char *name, int len, int pos)
>  {
> -	int i;
> -
>  	my_invalidate_untracked_cache(istate, name, len);
>  
>  	if (pos >= 0) {
>  		/*
> -		 * We have an exact match for this path and can just
> -		 * invalidate it.
> +		 * An exact match on a tracked file. We assume that we
> +		 * do not need to scan forward for a sparse-directory
> +		 * cache-entry with the same pathname, nor for a cone
> +		 * at that directory. (That is, assume no D/F conflicts.)
>  		 */
>  		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		return 1;
>  	} else {
> +		int nr_in_cone;
> +		struct strbuf work_path = STRBUF_INIT;
> +
>  		/*
> -		 * The path is not a tracked file -or- it is a
> -		 * directory event on a platform that cannot
> -		 * distinguish between file and directory events in
> -		 * the event handler, such as Windows.
> -		 *
> -		 * Scan as if it is a directory and invalidate the
> -		 * cone under it.  (But remember to ignore items
> -		 * between "name" and "name/", such as "name-" and
> -		 * "name.".
> +		 * The negative "pos" gives us the suggested insertion
> +		 * point for the pathname (without the trailing slash).
> +		 * We need to see if there is a directory with that
> +		 * prefix, but there can be lots of pathnames between
> +		 * "foo" and "foo/" like "foo-" or "foo-bar", so we
> +		 * don't want to do our own scan.
>  		 */
> -		pos = -pos - 1;
> -
> -		for (i = pos; i < istate->cache_nr; i++) {
> -			if (!starts_with(istate->cache[i]->name, name))
> -				break;
> -			if ((unsigned char)istate->cache[i]->name[len] > '/')
> -				break;
> -			if (istate->cache[i]->name[len] == '/')
> -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> -		}
> +		strbuf_add(&work_path, name, len);
> +		strbuf_addch(&work_path, '/');
> +		pos = index_name_pos(istate, work_path.buf, work_path.len);
> +		nr_in_cone = fsmonitor_refresh_callback_slash(
> +			istate, work_path.buf, work_path.len, pos);
> +		strbuf_release(&work_path);
> +		return nr_in_cone;

I didn't spot any users of this return value, and Junio also mentioned
this in a preceding patch for a different function. Would it make sense
to introduce this return value as-needed in a later patch so that the
reader isn't left wondering why it's changed now?

Patrick

>  	}
>  }
>  
> -- 
> gitgitgadget
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/12] sparse-index: pass string length to index_file_exists()
  2024-02-13 22:07   ` Junio C Hamano
@ 2024-02-20 17:34     ` Jeff Hostetler
  0 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-20 17:34 UTC (permalink / raw
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 2/13/24 5:07 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> The call to index_file_exists() in the loop in expand_to_path() passes
>> the wrong string length.  Let's fix that.
>>
>> The loop in expand_to_path() searches the name-hash for each
>> sub-directory prefix in the provided pathname. That is, by searching
>> for "dir1/" then "dir1/dir2/" then "dir1/dir2/dir3/" and so on until
>> it finds a cache-entry representing a sparse directory.
>>
>> The code creates "strbuf path_mutable" to contain the working pathname
>> and modifies the buffer in-place by temporarily replacing the character
>> following each successive "/" with NUL for the duration of the call to
>> index_file_exists().
>>
>> It does not update the strbuf.len during this substitution.
>>
>> Pass the patched length of the prefix path instead.
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
> 
> This looked familiar, and it turns out that
> 
> https://lore.kernel.org/git/pull.1649.git.1706897095273.gitgitgadget@gmail.com/
> 
> has already been merged to 'master'.
> 
> 

Great, thanks!  I'll drop from the next version.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/12] name-hash: add index_dir_exists2()
  2024-02-13 21:43   ` Junio C Hamano
@ 2024-02-20 17:38     ` Jeff Hostetler
  2024-02-20 19:34       ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-20 17:38 UTC (permalink / raw
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 2/13/24 4:43 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Create a new version of index_dir_exists() to return the canonical
>> spelling of the matched directory prefix.
>>
>> The existing index_dir_exists() returns a boolean to indicate if
>> there is a case-insensitive match in the directory name-hash, but
>> it doesn't tell the caller the exact spelling of that match.
>>
>> The new version also copies the matched spelling to a provided strbuf.
>> This lets the caller, for example, then call index_name_pos() with the
>> correct case to search the cache-entry array for the real insertion
>> position.
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   name-hash.c | 16 ++++++++++++++++
>>   name-hash.h |  2 ++
>>   2 files changed, 18 insertions(+)
>>
>> diff --git a/name-hash.c b/name-hash.c
>> index 251f036eef6..d735c81acb3 100644
>> --- a/name-hash.c
>> +++ b/name-hash.c
>> @@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>>   	dir = find_dir_entry(istate, name, namelen);
>>   	return dir && dir->nr;
>>   }
>> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
>> +		      struct strbuf *canonical_path)
>> +{
>> +	struct dir_entry *dir;
>> +
>> +	strbuf_init(canonical_path, namelen+1);
>> +
>> +	lazy_init_name_hash(istate);
>> +	expand_to_path(istate, name, namelen, 0);
>> +	dir = find_dir_entry(istate, name, namelen);
>> +
>> +	if (dir && dir->nr)
>> +		strbuf_add(canonical_path, dir->name, dir->namelen);
>> +
>> +	return dir && dir->nr;
>> +}
>>   
>>   void adjust_dirname_case(struct index_state *istate, char *name)
> 
> Missing inter-function blank line, before the new function.
> 
> I wonder if we can avoid such repetition---the body of
> index_dir_exists() is 100% shared with this new function.
> 
> Isn't it extremely unusual to receive "struct strbuf *" and call
> strbuf_init() on it?  It means that the caller is expected to have a
> strbuf and pass a pointer to it, but also it is expected to leave
> the strbuf uninitialized.
> 
> I'd understand if it calls strbuf_reset(), but it may not even be
> necessary, if we make it responsibility of the caller to pass a
> valid strbuf to be appended into.
> 
> 	int index_dir_find(struct index_state *istate,
> 			   const char *name, int namelen,
> 			   struct strbuf *canonical_path)
> 	{
>                  struct dir_entry *dir;
> 
>                  lazy_init_name_hash(istate);
>                  expand_to_path(istate, name, namelen, 0);
>                  dir = find_dir_entry(istate, name, namelen);
> 
>                  if (canonical_path && dir && dir->nr) {
> 			// strbuf_reset(canonical_path); ???
>                  	strbuf_add(canonical_path, dir->name, dir->namelen);
> 		}
>                  return dir && dir->nr;
> 	}
> 
> Then we can do
> 
> 	#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)
> 
> in the header for existing callers.
> 

I'm always a little hesitant to change the signature of an existing
function and chasing all of the callers in the middle of another
task.  It can sometimes be distracting to reviewers.

I like your macro approach here. I'll do that in the next version.

Thanks
Jeff

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/12] fsmonitor: refactor refresh callback on directory events
  2024-02-15  9:32   ` Patrick Steinhardt
@ 2024-02-20 18:54     ` Jeff Hostetler
  2024-02-21 12:54       ` Patrick Steinhardt
  0 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-20 18:54 UTC (permalink / raw
  To: Patrick Steinhardt, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 2/15/24 4:32 AM, Patrick Steinhardt wrote:
> On Tue, Feb 13, 2024 at 08:52:13PM +0000, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
>>   1 file changed, 30 insertions(+), 22 deletions(-)
>>
>> diff --git a/fsmonitor.c b/fsmonitor.c
>> index f670c509378..b1ef01bf3cd 100644
>> --- a/fsmonitor.c
>> +++ b/fsmonitor.c
>> @@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
>>   	return result;
>>   }
>>   
>> +static void fsmonitor_refresh_callback_slash(
>> +	struct index_state *istate, const char *name, int len, int pos)
> 
> `len` should be `size_t` as it tracks the length of the name. This is
> a preexisting issue already because `fsmonitor_refresh_callback()`
> assigns `int len = strlen(name)`, which is wrong.
> 
>> +{
>> +	int i;
> 
> `i` is used to iterate through `istate->cache_nr`, which is an `unsigned
> int` and not an `int`. I really wish we would compile the Git code base
> with `-Wconversion`, but that's a rather big undertaking.
> 
> Anyway, none of these issues are new as you merely move the code into a
> new function.
> 
> Patrick
> 

Yeah, the int types are bit of a mess, but I'd like to defer that to
another series.

There are several things going on here as you point out.

(1) 'int len' for the length of a pathname buffer.  This could be fixed,
but the odds of Git correctly working with a >2Gb pathname doesn't make
this urgent.

(2) 'int i' is signed, but should be unsigned -- but elsewhere in this
function we use 'int pos' which an index on the same array and has
double duty as the suggested insertion point when negative.  So we can
fix the type of 'i', but that just sets us up to hide errors with 'pos',
since they'd have different types.  Also, since it is really unlikely
for Git to work with >2G cache-entries, I think this one can wait too.

I don't mean to be confrontational, but I think these changes should
wait until another series that is focused on just those types of issues.

Jeff


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/12] fsmonitor: clarify handling of directory events in callback
  2024-02-14  7:47   ` Junio C Hamano
@ 2024-02-20 18:56     ` Jeff Hostetler
  2024-02-20 19:24       ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-20 18:56 UTC (permalink / raw
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 2/14/24 2:47 AM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   fsmonitor.c | 47 +++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 33 insertions(+), 14 deletions(-)
>>
>> diff --git a/fsmonitor.c b/fsmonitor.c
>> index 614270fa5e8..754fe20cfd0 100644
>> --- a/fsmonitor.c
>> +++ b/fsmonitor.c
>> @@ -219,24 +219,40 @@ static void fsmonitor_refresh_callback_unqualified(
>>   	}
>>   }
>>   
>> -static void fsmonitor_refresh_callback_slash(
...
>> +static int fsmonitor_refresh_callback_slash(
>>   	struct index_state *istate, const char *name, int len, int pos)
>>   {
> 
> This was split out a few patches ago, and the caller of course
> ignored the return value (void), but now it turns an integer, and
> this change is without a corresponding update to the caller, which
> leaves readers puzzled.
> 
> Perhaps a future patch either updates the existing caller or adds a
> new caller that utilize the returned value, but then at least the
> proposed commit message for this step should hint how it helps the
> caller(s) we will see in the future steps if this function returns
> the number of entries invalidated, iow, how the caller is expected
> to use the returned value from here, no?
> 
> Alternatively, this step can limit itself to what the commit title
> claims to do---to clarify what the helper does with enhanced in-code
> comments.  Then a future step that updates the caller to care about
> the return value can have both the changes to this callee as well as
> the caller---which may make it easier to see how the returned info
> helps the caller.  I dunno which is more reasonable.
> 

I'll split this into 2 commits.  One for the refactor and one for
the new return value.

Jeff


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/12] fsmonitor: clarify handling of directory events in callback
  2024-02-15  9:32   ` Patrick Steinhardt
@ 2024-02-20 19:10     ` Jeff Hostetler
  0 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-20 19:10 UTC (permalink / raw
  To: Patrick Steinhardt, Jeff Hostetler via GitGitGadget; +Cc: git, Jeff Hostetler



On 2/15/24 4:32 AM, Patrick Steinhardt wrote:
> On Tue, Feb 13, 2024 at 08:52:15PM +0000, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   fsmonitor.c | 47 +++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 33 insertions(+), 14 deletions(-)
>>
>> diff --git a/fsmonitor.c b/fsmonitor.c
>> index 614270fa5e8..754fe20cfd0 100644
>> --- a/fsmonitor.c
>> +++ b/fsmonitor.c
>> @@ -219,24 +219,40 @@ static void fsmonitor_refresh_callback_unqualified(
...
>> +static int fsmonitor_refresh_callback_slash(
>>   	struct index_state *istate, const char *name, int len, int pos)
>>   {
>>   	int i;
>> +	int nr_in_cone = 0;
> 
> Should we return `size_t` instead of `int`?
> 
> Patrick

yeah, I can fix all of the return values to be 'size_t' since
that is new functionality and not colliding with the existing
usages for 'i' and 'pos' that I mentioned in a response on a
previous thread.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 06/12] fsmonitor: clarify handling of directory events in callback
  2024-02-20 18:56     ` Jeff Hostetler
@ 2024-02-20 19:24       ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-20 19:24 UTC (permalink / raw
  To: Jeff Hostetler; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> I'll split this into 2 commits.  One for the refactor and one for
> the new return value.

And the latter one that makes the return value richer contains the
caller that makes use of the returned value?  That's great.  It
would make it very much easier to read the resulting commit, as the
presence of the callers and how they use the returned value would
make it self evident why it makes sense to return the number of
entries invalidated.

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/12] name-hash: add index_dir_exists2()
  2024-02-20 17:38     ` Jeff Hostetler
@ 2024-02-20 19:34       ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-20 19:34 UTC (permalink / raw
  To: Jeff Hostetler; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> I'm always a little hesitant to change the signature of an existing
> function and chasing all of the callers in the middle of another
> task.  It can sometimes be distracting to reviewers.

Of course we all should be hesitant.  In addition to reviewers,
there are topics in flight and topics people are cooking but not
posted that will be affected.  So it is perfectly fine to introduce
an enhanced version as needed under different name (but let's not
give it a meaningless name like "foo2" where it is totally unclear
and unexplained what its difference from "foo" is from the name),
but if it is meant as an enhanced version, we should aim to share
the code and rewrite the original in terms of the enhanced one,
instead of simply duplicating to risk unnecessary divergence of the
two functions in the future.

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/12] fsmonitor: refactor refresh callback on directory events
  2024-02-20 18:54     ` Jeff Hostetler
@ 2024-02-21 12:54       ` Patrick Steinhardt
  0 siblings, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-02-21 12:54 UTC (permalink / raw
  To: Jeff Hostetler; +Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 2672 bytes --]

On Tue, Feb 20, 2024 at 01:54:44PM -0500, Jeff Hostetler wrote:
> 
> 
> On 2/15/24 4:32 AM, Patrick Steinhardt wrote:
> > On Tue, Feb 13, 2024 at 08:52:13PM +0000, Jeff Hostetler via GitGitGadget wrote:
> > > From: Jeff Hostetler <jeffhostetler@github.com>
> > > 
> > > Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> > > ---
> > >   fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
> > >   1 file changed, 30 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/fsmonitor.c b/fsmonitor.c
> > > index f670c509378..b1ef01bf3cd 100644
> > > --- a/fsmonitor.c
> > > +++ b/fsmonitor.c
> > > @@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
> > >   	return result;
> > >   }
> > > +static void fsmonitor_refresh_callback_slash(
> > > +	struct index_state *istate, const char *name, int len, int pos)
> > 
> > `len` should be `size_t` as it tracks the length of the name. This is
> > a preexisting issue already because `fsmonitor_refresh_callback()`
> > assigns `int len = strlen(name)`, which is wrong.
> > 
> > > +{
> > > +	int i;
> > 
> > `i` is used to iterate through `istate->cache_nr`, which is an `unsigned
> > int` and not an `int`. I really wish we would compile the Git code base
> > with `-Wconversion`, but that's a rather big undertaking.
> > 
> > Anyway, none of these issues are new as you merely move the code into a
> > new function.
> > 
> > Patrick
> > 
> 
> Yeah, the int types are bit of a mess, but I'd like to defer that to
> another series.
> 
> There are several things going on here as you point out.
> 
> (1) 'int len' for the length of a pathname buffer.  This could be fixed,
> but the odds of Git correctly working with a >2Gb pathname doesn't make
> this urgent.
> 
> (2) 'int i' is signed, but should be unsigned -- but elsewhere in this
> function we use 'int pos' which an index on the same array and has
> double duty as the suggested insertion point when negative.  So we can
> fix the type of 'i', but that just sets us up to hide errors with 'pos',
> since they'd have different types.  Also, since it is really unlikely
> for Git to work with >2G cache-entries, I think this one can wait too.
> 
> I don't mean to be confrontational, but I think these changes should
> wait until another series that is focused on just those types of issues.

No worries, this doesn't come across as confrontational at all. As I
said myself, the problems aren't really new to your patch series but are
are preexisting and run deeper than what you can see in the diffs here.
So I'm perfectly fine with deferring this.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems
  2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                   ` (11 preceding siblings ...)
  2024-02-13 20:52 ` [PATCH 12/12] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18 ` Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 01/16] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
                     ` (16 more replies)
  12 siblings, 17 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

Here is version 2. I think I have addressed all of the comments on the first
version and have greatly consolidated/streamlined the "icase" code.

============== Fix FSMonitor client code to detect case-incorrect FSEvents
and map them to the canonical case expected by the index.

FSEvents are delivered to the FSMonitor daemon using the observed case which
may or may not match the expected case stored in the index for tracked files
and/or directories. This caused index_name_pos() to report a negative index
position (defined as the suggested insertion point). Since the value was
negative, the FSMonitor refresh lookup would not invalidate the
CE_FSMONITOR_VALID bit on the "expected" (case-insensitive-equivalent)
cache-entries. Therefore, git status would not report them as modified.

This was a fairly obscure problem and only happened when the case of a
sub-directory or a file was artificially changed.

This first runs the original lookup using the observed case. If that fails,
it assumes that the observed pathname refers to a file and uses the
case-insensitive name-hash hashmap to find an equivalent path (cache-entry)
in the index. If that fails, it assumes the pathname refers to a directory
and uses the case-insensitive dir-name-hash to find the equivalent directory
and then repeats the index_name_pos() lookup to find a directory or
suggested insertion point with the expected case.

Two new test cases were added to t7527 to demonstrate this.

Since this was rather obscure, I also added some additional tracing under
the GIT_TRACE_FSMONITOR key.

I also did considerable refactoring of the original code before adding the
new lookups.

Finally, I made more explicit the relationship between the FSEvents and the
(new) sparse-index directory cache-entries, since sparse-index was added
slightly after the FSMonitor feature.

Jeff Hostetler (16):
  name-hash: add index_dir_find()
  t7527: add case-insensitve test for FSMonitor
  t7527: temporarily disable case-insensitive tests
  fsmonitor: refactor refresh callback on directory events
  fsmonitor: clarify handling of directory events in callback helper
  fsmonitor: refactor refresh callback for non-directory events
  dir: create untracked_cache_invalidate_trimmed_path()
  fsmonitor: refactor untracked-cache invalidation
  fsmonitor: move untracked invalidation into helper functions
  fsmonitor: return invalidated cache-entry count on directory event
  fsmonitor: remove custom loop from non-directory path handler
  fsmonitor: return invalided cache-entry count on non-directory event
  fsmonitor: trace the new invalidated cache-entry count
  fsmonitor: support case-insensitive events
  fsmonitor: refactor bit invalidation in refresh callback
  t7527: update case-insenstive fsmonitor test

 dir.c                        |  20 +++
 dir.h                        |   7 +
 fsmonitor.c                  | 299 ++++++++++++++++++++++++++++-------
 name-hash.c                  |   9 +-
 name-hash.h                  |   7 +-
 t/t7527-builtin-fsmonitor.sh | 220 ++++++++++++++++++++++++++
 6 files changed, 506 insertions(+), 56 deletions(-)


base-commit: f41f85c9ec8d4d46de0fd5fded88db94d3ec8c11
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1662%2Fjeffhostetler%2Ffsmonitor-ignore-case-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1662/jeffhostetler/fsmonitor-ignore-case-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1662

Range-diff vs v1:

  1:  6f81e2e3060 <  -:  ----------- sparse-index: pass string length to index_file_exists()
  2:  3464545fe3f !  1:  03b07d9c25e name-hash: add index_dir_exists2()
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    name-hash: add index_dir_exists2()
     +    name-hash: add index_dir_find()
      
     -    Create a new version of index_dir_exists() to return the canonical
     +    Replace the index_dir_exists() function with index_dir_find() and
     +    change the API to take an optional strbuf to return the canonical
          spelling of the matched directory prefix.
      
     +    Create an index_dir_exists() wrapper macro for existing callers.
     +
          The existing index_dir_exists() returns a boolean to indicate if
          there is a case-insensitive match in the directory name-hash, but
          it doesn't tell the caller the exact spelling of that match.
     @@ Commit message
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## name-hash.c ##
     -@@ name-hash.c: int index_dir_exists(struct index_state *istate, const char *name, int namelen)
     - 	dir = find_dir_entry(istate, name, namelen);
     - 	return dir && dir->nr;
     +@@ name-hash.c: static int same_name(const struct cache_entry *ce, const char *name, int namelen
     + 	return slow_same_name(name, namelen, ce->name, len);
       }
     -+int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
     -+		      struct strbuf *canonical_path)
     -+{
     -+	struct dir_entry *dir;
     -+
     -+	strbuf_init(canonical_path, namelen+1);
     -+
     -+	lazy_init_name_hash(istate);
     -+	expand_to_path(istate, name, namelen, 0);
     -+	dir = find_dir_entry(istate, name, namelen);
     + 
     +-int index_dir_exists(struct index_state *istate, const char *name, int namelen)
     ++int index_dir_find(struct index_state *istate, const char *name, int namelen,
     ++		   struct strbuf *canonical_path)
     + {
     + 	struct dir_entry *dir;
     + 
     + 	lazy_init_name_hash(istate);
     + 	expand_to_path(istate, name, namelen, 0);
     + 	dir = find_dir_entry(istate, name, namelen);
      +
     -+	if (dir && dir->nr)
     ++	if (canonical_path && dir && dir->nr) {
     ++		strbuf_reset(canonical_path);
      +		strbuf_add(canonical_path, dir->name, dir->namelen);
     ++	}
      +
     -+	return dir && dir->nr;
     -+}
     + 	return dir && dir->nr;
     + }
       
     - void adjust_dirname_case(struct index_state *istate, char *name)
     - {
      
       ## name-hash.h ##
     -@@ name-hash.h: struct cache_entry;
     +@@
     + struct cache_entry;
       struct index_state;
       
     - int index_dir_exists(struct index_state *istate, const char *name, int namelen);
     -+int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
     -+		      struct strbuf *canonical_path);
     +-int index_dir_exists(struct index_state *istate, const char *name, int namelen);
     ++
     ++int index_dir_find(struct index_state *istate, const char *name, int namelen,
     ++		   struct strbuf *canonical_path);
     ++
     ++#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)
     ++
       void adjust_dirname_case(struct index_state *istate, char *name);
       struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
       
  3:  272d7805f47 =  2:  7778cee1c10 t7527: add case-insensitve test for FSMonitor
  -:  ----------- >  3:  dad079ade7f t7527: temporarily disable case-insensitive tests
  4:  3fb8e0d0a7c !  4:  5516670e30e fsmonitor: refactor refresh callback on directory events
     @@ Metadata
       ## Commit message ##
          fsmonitor: refactor refresh callback on directory events
      
     +    Move the code to handle directory FSEvents (containing pathnames with
     +    a trailing slash) into a helper function.
     +
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
       	return result;
       }
       
     -+static void fsmonitor_refresh_callback_slash(
     -+	struct index_state *istate, const char *name, int len, int pos)
     ++static void handle_path_with_trailing_slash(
     ++	struct index_state *istate, const char *name, int pos)
      +{
      +	int i;
      +
     @@ fsmonitor.c: static void fsmonitor_refresh_callback(struct index_state *istate,
      -				break;
      -			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
      -		}
     -+		fsmonitor_refresh_callback_slash(istate, name, len, pos);
     ++		handle_path_with_trailing_slash(istate, name, pos);
       
       		/*
       		 * We need to remove the traling "/" from the path
  6:  5b6f8bd1fe7 !  5:  c04fd4eae94 fsmonitor: clarify handling of directory events in callback
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    fsmonitor: clarify handling of directory events in callback
     +    fsmonitor: clarify handling of directory events in callback helper
     +
     +    Improve documentation of the refresh callback helper function
     +    used for directory FSEvents.
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     -@@ fsmonitor.c: static void fsmonitor_refresh_callback_unqualified(
     - 	}
     +@@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     + 	return result;
       }
       
     --static void fsmonitor_refresh_callback_slash(
      +/*
      + * The daemon can decorate directory events, such as a move or rename,
      + * by adding a trailing slash to the observed name.  Use this to
     @@ fsmonitor.c: static void fsmonitor_refresh_callback_unqualified(
      + * same way and just invalidate the cache-entry and the untracked
      + * cache (and in this case, the forward cache-entry scan won't find
      + * anything and it doesn't hurt to let it run).
     -+ *
     -+ * Return the number of cache-entries that we invalidated.  We will
     -+ * use this later to determine if we need to attempt a second
     -+ * case-insensitive search.
      + */
     -+static int fsmonitor_refresh_callback_slash(
     - 	struct index_state *istate, const char *name, int len, int pos)
     + static void handle_path_with_trailing_slash(
     + 	struct index_state *istate, const char *name, int pos)
       {
       	int i;
     -+	int nr_in_cone = 0;
       
      -	/*
      -	 * The daemon can decorate directory events, such as
     @@ fsmonitor.c: static void fsmonitor_refresh_callback_unqualified(
       	if (pos < 0)
       		pos = -pos - 1;
       
     -@@ fsmonitor.c: static void fsmonitor_refresh_callback_slash(
     - 		if (!starts_with(istate->cache[i]->name, name))
     - 			break;
     - 		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
     -+		nr_in_cone++;
     - 	}
     -+
     -+	return nr_in_cone;
     - }
     - 
     - static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
  5:  0896d4af907 !  6:  7ee6ca1aefd fsmonitor: refactor refresh callback for non-directory events
     @@ Metadata
       ## Commit message ##
          fsmonitor: refactor refresh callback for non-directory events
      
     +    Move the code handle unqualified FSEvents (without a trailing slash)
     +    into a helper function.
     +
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
       	return result;
       }
       
     -+static void fsmonitor_refresh_callback_unqualified(
     -+	struct index_state *istate, const char *name, int len, int pos)
     ++static void handle_path_without_trailing_slash(
     ++	struct index_state *istate, const char *name, int pos)
      +{
      +	int i;
      +
     @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
      +		 * between "name" and "name/", such as "name-" and
      +		 * "name.".
      +		 */
     ++		int len = strlen(name);
      +		pos = -pos - 1;
      +
      +		for (i = pos; i < istate->cache_nr; i++) {
     @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
      +	}
      +}
      +
     - static void fsmonitor_refresh_callback_slash(
     - 	struct index_state *istate, const char *name, int len, int pos)
     - {
     -@@ fsmonitor.c: static void fsmonitor_refresh_callback_slash(
     + /*
     +  * The daemon can decorate directory events, such as a move or rename,
     +  * by adding a trailing slash to the observed name.  Use this to
     +@@ fsmonitor.c: static void handle_path_with_trailing_slash(
       
       static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
       {
     @@ fsmonitor.c: static void fsmonitor_refresh_callback(struct index_state *istate,
      -			if (istate->cache[i]->name[len] == '/')
      -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
      -		}
     -+		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
     ++		handle_path_without_trailing_slash(istate, name, pos);
       	}
       
       	/*
  -:  ----------- >  7:  99c0d3e0742 dir: create untracked_cache_invalidate_trimmed_path()
  -:  ----------- >  8:  f2d6765d84f fsmonitor: refactor untracked-cache invalidation
  7:  1df4019931c !  9:  af6f57ab3e6 fsmonitor: refactor untracked-cache invalidation
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    fsmonitor: refactor untracked-cache invalidation
     +    fsmonitor: move untracked invalidation into helper functions
     +
     +    Move the call to invalidate the untracked cache for the FSEvent
     +    pathname into the two helper functions.
     +
     +    In a later commit in this series, we will call these helpers
     +    from other contexts and it safer to include the UC invalidation
     +    in the helper than to remember to also add it to each helper
     +    call-site.
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     -@@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     - 	return result;
     - }
     - 
     -+/*
     -+ * Invalidate the untracked cache for the given pathname.  Copy the
     -+ * buffer to a proper null-terminated string (since the untracked
     -+ * cache code does not use (buf, len) style argument).  Also strip any
     -+ * trailing slash.
     -+ */
     -+static void my_invalidate_untracked_cache(
     -+	struct index_state *istate, const char *name, int len)
     -+{
     -+	struct strbuf work_path = STRBUF_INIT;
     -+
     -+	if (!len)
     -+		return;
     -+
     -+	if (name[len-1] == '/')
     -+		len--;
     -+
     -+	strbuf_add(&work_path, name, len);
     -+	untracked_cache_invalidate_path(istate, work_path.buf, 0);
     -+	strbuf_release(&work_path);
     -+}
     -+
     - static void fsmonitor_refresh_callback_unqualified(
     - 	struct index_state *istate, const char *name, int len, int pos)
     +@@ fsmonitor.c: static void handle_path_without_trailing_slash(
       {
       	int i;
       
     -+	my_invalidate_untracked_cache(istate, name, len);
     ++	/*
     ++	 * Mark the untracked cache dirty for this path (regardless of
     ++	 * whether or not we find an exact match for it in the index).
     ++	 * Since the path is unqualified (no trailing slash hint in the
     ++	 * FSEvent), it may refer to a file or directory. So we should
     ++	 * not assume one or the other and should always let the untracked
     ++	 * cache decide what needs to invalidated.
     ++	 */
     ++	untracked_cache_invalidate_trimmed_path(istate, name, 0);
      +
       	if (pos >= 0) {
       		/*
       		 * We have an exact match for this path and can just
     -@@ fsmonitor.c: static int fsmonitor_refresh_callback_slash(
     +@@ fsmonitor.c: static void handle_path_with_trailing_slash(
     + {
       	int i;
     - 	int nr_in_cone = 0;
       
     -+	my_invalidate_untracked_cache(istate, name, len);
     ++	/*
     ++	 * Mark the untracked cache dirty for this directory path
     ++	 * (regardless of whether or not we find an exact match for it
     ++	 * in the index or find it to be proper prefix of one or more
     ++	 * files in the index), since the FSEvent is hinting that
     ++	 * there may be changes on or within the directory.
     ++	 */
     ++	untracked_cache_invalidate_trimmed_path(istate, name, 0);
      +
       	if (pos < 0)
       		pos = -pos - 1;
       
      @@ fsmonitor.c: static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
     - 
     - 	if (name[len - 1] == '/') {
     - 		fsmonitor_refresh_callback_slash(istate, name, len, pos);
     --
     --		/*
     --		 * We need to remove the traling "/" from the path
     --		 * for the untracked cache.
     --		 */
     --		name[len - 1] = '\0';
       	} else {
     - 		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
     + 		handle_path_without_trailing_slash(istate, name, pos);
       	}
      -
      -	/*
      -	 * Mark the untracked cache dirty even if it wasn't found in the index
     --	 * as it could be a new untracked file.
     +-	 * as it could be a new untracked file.  (Let the untracked cache
     +-	 * layer silently deal with any trailing slash.)
      -	 */
     --	untracked_cache_invalidate_path(istate, name, 0);
     +-	untracked_cache_invalidate_trimmed_path(istate, name, 0);
       }
       
       /*
  -:  ----------- > 10:  623c6f06e21 fsmonitor: return invalidated cache-entry count on directory event
  9:  a0cc4c8274c ! 11:  1853f77d333 fsmonitor: refactor non-directory callback
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    fsmonitor: refactor non-directory callback
     +    fsmonitor: remove custom loop from non-directory path handler
      
     -    Refactor the fsmonitor_refresh_callback_unqualified() code
     -    to try to use the _callback_slash() code and avoid having
     -    a custom filter in the child cache-entry scanner.
     +    Refactor the code that handles refresh events for pathnames that do
     +    not contain a trailing slash.  Instead of using a custom loop to try
     +    to scan the index and detect if the FSEvent named a file or might be a
     +    directory prefix, use the recently created helper function to do that.
     +
     +    Also update the comments to describe what and why we are doing this.
      
          On platforms that DO NOT annotate FS events with a trailing
          slash, if we fail to find an exact match for the pathname
     @@ Commit message
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     -@@ fsmonitor.c: static int my_callback_dir_name_hash(
     - 	return nr_in_cone;
     +@@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     + 	return result;
       }
       
     --static void fsmonitor_refresh_callback_unqualified(
     ++static size_t handle_path_with_trailing_slash(
     ++	struct index_state *istate, const char *name, int pos);
     ++
      +/*
      + * The daemon sent an observed pathname without a trailing slash.
      + * (This is the normal case.)  We do not know if it is a tracked or
     @@ fsmonitor.c: static int my_callback_dir_name_hash(
      + * do not know it is case-correct or -incorrect.
      + *
      + * Assume it is case-correct and try an exact match.
     -+ *
     -+ * Return the number of cache-entries that we invalidated.
      + */
     -+static int fsmonitor_refresh_callback_unqualified(
     - 	struct index_state *istate, const char *name, int len, int pos)
     + static void handle_path_without_trailing_slash(
     + 	struct index_state *istate, const char *name, int pos)
       {
      -	int i;
      -
     - 	my_invalidate_untracked_cache(istate, name, len);
     + 	/*
     + 	 * Mark the untracked cache dirty for this path (regardless of
     + 	 * whether or not we find an exact match for it in the index).
     +@@ fsmonitor.c: static void handle_path_without_trailing_slash(
       
       	if (pos >= 0) {
       		/*
     @@ fsmonitor.c: static int my_callback_dir_name_hash(
      +		 * at that directory. (That is, assume no D/F conflicts.)
       		 */
       		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
     -+		return 1;
       	} else {
     -+		int nr_in_cone;
      +		struct strbuf work_path = STRBUF_INIT;
      +
       		/*
     @@ fsmonitor.c: static int my_callback_dir_name_hash(
      +		 * "foo" and "foo/" like "foo-" or "foo-bar", so we
      +		 * don't want to do our own scan.
       		 */
     +-		int len = strlen(name);
      -		pos = -pos - 1;
      -
      -		for (i = pos; i < istate->cache_nr; i++) {
     @@ fsmonitor.c: static int my_callback_dir_name_hash(
      -			if (istate->cache[i]->name[len] == '/')
      -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
      -		}
     -+		strbuf_add(&work_path, name, len);
     ++		strbuf_add(&work_path, name, strlen(name));
      +		strbuf_addch(&work_path, '/');
      +		pos = index_name_pos(istate, work_path.buf, work_path.len);
     -+		nr_in_cone = fsmonitor_refresh_callback_slash(
     -+			istate, work_path.buf, work_path.len, pos);
     ++		handle_path_with_trailing_slash(istate, work_path.buf, pos);
      +		strbuf_release(&work_path);
     -+		return nr_in_cone;
       	}
       }
       
 10:  bf18401f56c ! 12:  f77d68c78ad fsmonitor: support case-insensitive non-directory events
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    fsmonitor: support case-insensitive non-directory events
     +    fsmonitor: return invalided cache-entry count on non-directory event
     +
     +    Teah the refresh callback helper function for unqualified FSEvents
     +    (pathnames without a trailing slash) to return the number of
     +    cache-entries that were invalided in response to the event.
     +
     +    This will be used in a later commit to help determine if the observed
     +    pathname was (possibly) case-incorrect when (on a case-insensitive
     +    file system).
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     -@@ fsmonitor.c: static int fsmonitor_refresh_callback_unqualified(
     - 	}
     - }
     - 
     -+/*
     -+ * On a case-insensitive FS, use the name-hash to map the case of
     -+ * the observed path to the canonical case expected by the index.
     -+ *
     -+ * The given pathname DOES NOT include the trailing slash.
     +@@ fsmonitor.c: static size_t handle_path_with_trailing_slash(
     +  * do not know it is case-correct or -incorrect.
     +  *
     +  * Assume it is case-correct and try an exact match.
      + *
      + * Return the number of cache-entries that we invalidated.
     -+ */
     -+static int fsmonitor_refresh_callback_unqualified_icase(
     -+	struct index_state *istate, const char *name, int len)
     -+{
     -+	int nr_in_cone;
     -+
     -+	/*
     -+	 * Look for a case-incorrect match for this non-directory
     -+	 * pathname.
     -+	 */
     -+	nr_in_cone = my_callback_name_hash(istate, name, len);
     -+	if (nr_in_cone)
     -+		return nr_in_cone;
     -+
     -+	/*
     -+	 * Try the directory name-hash and see if there is a
     -+	 * case-incorrect directory with this pathanme.
     -+	 * (len) because we don't have a trailing slash.
     -+	 */
     -+	nr_in_cone = my_callback_dir_name_hash(istate, name, len);
     -+	return nr_in_cone;
     -+}
     -+
     - /*
     -  * The daemon can decorate directory events, such as a move or rename,
     -  * by adding a trailing slash to the observed name.  Use this to
     -@@ fsmonitor.c: static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
     - 		if (ignore_case && !nr_in_cone)
     - 			fsmonitor_refresh_callback_slash_icase(istate, name, len);
     +  */
     +-static void handle_path_without_trailing_slash(
     ++static size_t handle_path_without_trailing_slash(
     + 	struct index_state *istate, const char *name, int pos)
     + {
     + 	/*
     +@@ fsmonitor.c: static void handle_path_without_trailing_slash(
     + 		 * at that directory. (That is, assume no D/F conflicts.)
     + 		 */
     + 		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
     ++		return 1;
       	} else {
     --		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
     -+		nr_in_cone = fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
     -+		if (ignore_case && !nr_in_cone)
     -+			fsmonitor_refresh_callback_unqualified_icase(istate, name, len);
     ++		size_t nr_in_cone;
     + 		struct strbuf work_path = STRBUF_INIT;
     + 
     + 		/*
     +@@ fsmonitor.c: static void handle_path_without_trailing_slash(
     + 		strbuf_add(&work_path, name, strlen(name));
     + 		strbuf_addch(&work_path, '/');
     + 		pos = index_name_pos(istate, work_path.buf, work_path.len);
     +-		handle_path_with_trailing_slash(istate, work_path.buf, pos);
     ++		nr_in_cone = handle_path_with_trailing_slash(
     ++			istate, work_path.buf, pos);
     + 		strbuf_release(&work_path);
     ++		return nr_in_cone;
       	}
       }
       
  -:  ----------- > 13:  58b36673e15 fsmonitor: trace the new invalidated cache-entry count
  8:  e0029a2aad6 ! 14:  288f3f4e54e fsmonitor: support case-insensitive directory events
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    fsmonitor: support case-insensitive directory events
     +    fsmonitor: support case-insensitive events
      
          Teach fsmonitor_refresh_callback() to handle case-insensitive
          lookups if case-sensitive lookups fail on case-insensitive systems.
     @@ Commit message
          to find the associated cache-entry. This causes status to think that
          the cached CE flags are correct and skip over the file.
      
     -    Update the handling of directory-style FSEvents (ones containing a
     -    path with a trailing slash) to optionally use the name-hash if the
     -    case-correct search does not find a match.
     -
     -    (The FSMonitor daemon can send directory FSEvents if the OS provides
     -    that information.)
     +    Update event handling to optionally use the name-hash and dir-name-hash
     +    if necessary.
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
     @@ fsmonitor.c
       #include "strbuf.h"
       #include "trace2.h"
      @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     - 	return result;
     - }
     - 
     -+static int fsmonitor_refresh_callback_slash(
     -+	struct index_state *istate, const char *name, int len, int pos);
     -+
     - /*
     -  * Invalidate the untracked cache for the given pathname.  Copy the
     -  * buffer to a proper null-terminated string (since the untracked
     -@@ fsmonitor.c: static void my_invalidate_untracked_cache(
     - 	strbuf_release(&work_path);
     - }
     + static size_t handle_path_with_trailing_slash(
     + 	struct index_state *istate, const char *name, int pos);
       
      +/*
     -+ * Use the name-hash to lookup the pathname.
     ++ * Use the name-hash to do a case-insensitive cache-entry lookup with
     ++ * the pathname and invalidate the cache-entry.
      + *
      + * Returns the number of cache-entries that we invalidated.
      + */
     -+static int my_callback_name_hash(
     -+	struct index_state *istate, const char *name, int len)
     ++static size_t handle_using_name_hash_icase(
     ++	struct index_state *istate, const char *name)
      +{
      +	struct cache_entry *ce = NULL;
      +
     -+	ce = index_file_exists(istate, name, len, 1);
     ++	ce = index_file_exists(istate, name, strlen(name), 1);
      +	if (!ce)
      +		return 0;
      +
      +	/*
     -+	 * The index contains a case-insensitive match for the pathname.
     -+	 * This could either be a regular file or a sparse-index directory.
     ++	 * A case-insensitive search in the name-hash using the
     ++	 * observed pathname found a cache-entry, so the observed path
     ++	 * is case-incorrect.  Invalidate the cache-entry and use the
     ++	 * correct spelling from the cache-entry to invalidate the
     ++	 * untracked-cache.  Since we now have sparse-directories in
     ++	 * the index, the observed pathname may represent a regular
     ++	 * file or a sparse-index directory.
      +	 *
     -+	 * We should not have seen FSEvents for a sparse-index directory,
     -+	 * but we handle it just in case.
     ++	 * Note that we should not have seen FSEvents for a
     ++	 * sparse-index directory, but we handle it just in case.
      +	 *
      +	 * Either way, we know that there are not any cache-entries for
      +	 * children inside the cone of the directory, so we don't need to
      +	 * do the usual scan.
      +	 */
      +	trace_printf_key(&trace_fsmonitor,
     -+			 "fsmonitor_refresh_callback map '%s' '%s'",
     ++			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
      +			 name, ce->name);
      +
     -+	my_invalidate_untracked_cache(istate, ce->name, ce->ce_namelen);
     ++	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
      +
      +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
      +	return 1;
      +}
      +
      +/*
     -+ * Use the directory name-hash to find the correct-case spelling
     -+ * of the directory.  Use the canonical spelling to invalidate all
     -+ * of the cache-entries within the matching cone.
     -+ *
     -+ * The pathname MUST NOT have a trailing slash.
     ++ * Use the dir-name-hash to find the correct-case spelling of the
     ++ * directory.  Use the canonical spelling to invalidate all of the
     ++ * cache-entries within the matching cone.
      + *
      + * Returns the number of cache-entries that we invalidated.
      + */
     -+static int my_callback_dir_name_hash(
     -+	struct index_state *istate, const char *name, int len)
     ++static size_t handle_using_dir_name_hash_icase(
     ++	struct index_state *istate, const char *name)
      +{
      +	struct strbuf canonical_path = STRBUF_INIT;
      +	int pos;
     -+	int nr_in_cone;
     ++	size_t len = strlen(name);
     ++	size_t nr_in_cone;
     ++
     ++	if (name[len - 1] == '/')
     ++		len--;
      +
     -+	if (!index_dir_exists2(istate, name, len, &canonical_path))
     ++	if (!index_dir_find(istate, name, len, &canonical_path))
      +		return 0; /* name is untracked */
     -+	if (!memcmp(name, canonical_path.buf, len)) {
     ++
     ++	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
      +		strbuf_release(&canonical_path);
     ++		/*
     ++		 * NEEDSWORK: Our caller already tried an exact match
     ++		 * and failed to find one.  They called us to do an
     ++		 * ICASE match, so we should never get an exact match,
     ++		 * so we could promote this to a BUG() here if we
     ++		 * wanted to.  It doesn't hurt anything to just return
     ++		 * 0 and go on becaus we should never get here.  Or we
     ++		 * could just get rid of the memcmp() and this "if"
     ++		 * clause completely.
     ++		 */
      +		return 0; /* should not happen */
      +	}
      +
      +	trace_printf_key(&trace_fsmonitor,
     -+			 "fsmonitor_refresh_callback map '%s' '%s'",
     ++			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
      +			 name, canonical_path.buf);
      +
      +	/*
     -+	 * The directory name-hash only tells us the corrected
     -+	 * spelling of the prefix.  We have to use this canonical
     -+	 * path to do a lookup in the cache-entry array so that we
     -+	 * we repeat the original search using the case-corrected
     -+	 * spelling.
     ++	 * The dir-name-hash only tells us the corrected spelling of
     ++	 * the prefix.  We have to use this canonical path to do a
     ++	 * lookup in the cache-entry array so that we repeat the
     ++	 * original search using the case-corrected spelling.
      +	 */
      +	strbuf_addch(&canonical_path, '/');
      +	pos = index_name_pos(istate, canonical_path.buf,
      +			     canonical_path.len);
     -+	nr_in_cone = fsmonitor_refresh_callback_slash(
     -+		istate, canonical_path.buf, canonical_path.len, pos);
     ++	nr_in_cone = handle_path_with_trailing_slash(
     ++		istate, canonical_path.buf, pos);
      +	strbuf_release(&canonical_path);
      +	return nr_in_cone;
      +}
      +
     - static void fsmonitor_refresh_callback_unqualified(
     - 	struct index_state *istate, const char *name, int len, int pos)
     - {
     -@@ fsmonitor.c: static void fsmonitor_refresh_callback_unqualified(
     -  *
     -  * Return the number of cache-entries that we invalidated.  We will
     -  * use this later to determine if we need to attempt a second
     -- * case-insensitive search.
     -+ * case-insensitive search.  That is, if a observed-case search yields
     -+ * any results, we assume the prefix is case-correct.  If there are
     -+ * no matches, we still don't know if the observed path is simply
     -+ * untracked or case-incorrect.
     -  */
     - static int fsmonitor_refresh_callback_slash(
     - 	struct index_state *istate, const char *name, int len, int pos)
     -@@ fsmonitor.c: static int fsmonitor_refresh_callback_slash(
     - 	return nr_in_cone;
     - }
     + /*
     +  * The daemon sent an observed pathname without a trailing slash.
     +  * (This is the normal case.)  We do not know if it is a tracked or
     +@@ fsmonitor.c: static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
     + 	else
     + 		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
       
     -+/*
     -+ * On a case-insensitive FS, use the name-hash and directory name-hash
     -+ * to map the case of the observed path to the canonical case expected
     -+ * by the index.
     -+ *
     -+ * The given pathname includes the trailing slash.
     -+ *
     -+ * Return the number of cache-entries that we invalidated.
     -+ */
     -+static int fsmonitor_refresh_callback_slash_icase(
     -+	struct index_state *istate, const char *name, int len)
     -+{
     -+	int nr_in_cone;
     -+
     -+	/*
     -+	 * Look for a case-incorrect sparse-index directory.
     -+	 */
     -+	nr_in_cone = my_callback_name_hash(istate, name, len);
     -+	if (nr_in_cone)
     -+		return nr_in_cone;
     -+
      +	/*
     -+	 * (len-1) because we do not include the trailing slash in the
     -+	 * pathname.
     ++	 * If we did not find an exact match for this pathname or any
     ++	 * cache-entries with this directory prefix and we're on a
     ++	 * case-insensitive file system, try again using the name-hash
     ++	 * and dir-name-hash.
      +	 */
     -+	nr_in_cone = my_callback_dir_name_hash(istate, name, len-1);
     -+	return nr_in_cone;
     -+}
     -+
     - static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
     - {
     - 	int len = strlen(name);
     - 	int pos = index_name_pos(istate, name, len);
     -+	int nr_in_cone;
     ++	if (!nr_in_cone && ignore_case) {
     ++		nr_in_cone = handle_using_name_hash_icase(istate, name);
     ++		if (!nr_in_cone)
     ++			nr_in_cone = handle_using_dir_name_hash_icase(
     ++				istate, name);
     ++	}
      +
     - 
     - 	trace_printf_key(&trace_fsmonitor,
     - 			 "fsmonitor_refresh_callback '%s' (pos %d)",
     - 			 name, pos);
     - 
     - 	if (name[len - 1] == '/') {
     --		fsmonitor_refresh_callback_slash(istate, name, len, pos);
     -+		nr_in_cone = fsmonitor_refresh_callback_slash(istate, name, len, pos);
     -+		if (ignore_case && !nr_in_cone)
     -+			fsmonitor_refresh_callback_slash_icase(istate, name, len);
     - 	} else {
     - 		fsmonitor_refresh_callback_unqualified(istate, name, len, pos);
     - 	}
     + 	if (nr_in_cone)
     + 		trace_printf_key(&trace_fsmonitor,
     + 				 "fsmonitor_refresh_callback CNT: %d",
 11:  7775de735f4 ! 15:  3a20065dbf8 fsmonitor: refactor bit invalidation in refresh callback
     @@ Commit message
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     -@@ fsmonitor.c: static void my_invalidate_untracked_cache(
     - 	strbuf_release(&work_path);
     - }
     +@@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     + static size_t handle_path_with_trailing_slash(
     + 	struct index_state *istate, const char *name, int pos);
       
      +/*
      + * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
      + * but we've already handled the untracked-cache and I want a different
      + * trace message.
      + */
     -+static void my_invalidate_ce_fsm(struct cache_entry *ce)
     ++static void invalidate_ce_fsm(struct cache_entry *ce)
      +{
      +	if (ce->ce_flags & CE_FSMONITOR_VALID)
      +		trace_printf_key(&trace_fsmonitor,
     -+				 "fsmonitor_refresh_cb_invalidate '%s'",
     ++				 "fsmonitor_refresh_callback INV: '%s'",
      +				 ce->name);
      +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
      +}
      +
       /*
     -  * Use the name-hash to lookup the pathname.
     -  *
     -@@ fsmonitor.c: static int my_callback_name_hash(
     +  * Use the name-hash to do a case-insensitive cache-entry lookup with
     +  * the pathname and invalidate the cache-entry.
     +@@ fsmonitor.c: static size_t handle_using_name_hash_icase(
       
     - 	my_invalidate_untracked_cache(istate, ce->name, ce->ce_namelen);
     + 	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
       
      -	ce->ce_flags &= ~CE_FSMONITOR_VALID;
     -+	my_invalidate_ce_fsm(ce);
     ++	invalidate_ce_fsm(ce);
       	return 1;
       }
       
     -@@ fsmonitor.c: static int fsmonitor_refresh_callback_unqualified(
     +@@ fsmonitor.c: static size_t handle_path_without_trailing_slash(
       		 * cache-entry with the same pathname, nor for a cone
       		 * at that directory. (That is, assume no D/F conflicts.)
       		 */
      -		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
     -+		my_invalidate_ce_fsm(istate->cache[pos]);
     ++		invalidate_ce_fsm(istate->cache[pos]);
       		return 1;
       	} else {
     - 		int nr_in_cone;
     -@@ fsmonitor.c: static int fsmonitor_refresh_callback_slash(
     + 		size_t nr_in_cone;
     +@@ fsmonitor.c: static size_t handle_path_with_trailing_slash(
       	for (i = pos; i < istate->cache_nr; i++) {
       		if (!starts_with(istate->cache[i]->name, name))
       			break;
      -		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
     -+		my_invalidate_ce_fsm(istate->cache[i]);
     ++		invalidate_ce_fsm(istate->cache[i]);
       		nr_in_cone++;
       	}
       
 12:  63edb68303f ! 16:  467d3c1fe2c t7527: update case-insenstive fsmonitor test
     @@ Commit message
      
          Now that the FSMonitor client has been updated to better
          handle events on case-insenstive file systems, update the
     -    two tests that demonstrated the bug.
     +    two tests that demonstrated the bug and remove the temporary
     +    SKIPME prereq.
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## t/t7527-builtin-fsmonitor.sh ##
     -@@ t/t7527-builtin-fsmonitor.sh: test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     +@@ t/t7527-builtin-fsmonitor.sh: test_expect_success 'split-index and FSMonitor work well together' '
     + #
     + # The setup is a little contrived.
     + #
     +-test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     ++test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     + 	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
     + 
     + 	git init subdir_case_wrong &&
     +@@ t/t7527-builtin-fsmonitor.sh: test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on d
       
       	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
       
      +	# Also verify that we get a mapping event to correct the case.
     -+	grep -q "map.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
     ++	grep -q "MAP:.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
      +		"$PWD/subdir_case_wrong.log1" &&
      +
       	# The refresh-callbacks should have caused "git status" to clear
     @@ t/t7527-builtin-fsmonitor.sh: test_expect_success CASE_INSENSITIVE_FS 'fsmonitor
      +	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
       '
       
     - test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     -@@ t/t7527-builtin-fsmonitor.sh: test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     +-test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     ++test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     + 	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
     + 
     + 	git init file_case_wrong &&
     +@@ t/t7527-builtin-fsmonitor.sh: test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on dis
       	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
       	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
       
     @@ t/t7527-builtin-fsmonitor.sh: test_expect_success CASE_INSENSITIVE_FS 'fsmonitor
      -	! grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
      -	! grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
      +	# Also verify that we get a mapping event to correct the case.
     -+	grep -q "fsmonitor_refresh_callback map.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
     ++	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
      +		"$PWD/file_case_wrong-try3.log" &&
     -+	grep -q "fsmonitor_refresh_callback map.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
     ++	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
      +		"$PWD/file_case_wrong-try3.log" &&
      +
      +	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 01/16] name-hash: add index_dir_find()
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  6:37     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 02/16] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Replace the index_dir_exists() function with index_dir_find() and
change the API to take an optional strbuf to return the canonical
spelling of the matched directory prefix.

Create an index_dir_exists() wrapper macro for existing callers.

The existing index_dir_exists() returns a boolean to indicate if
there is a case-insensitive match in the directory name-hash, but
it doesn't tell the caller the exact spelling of that match.

The new version also copies the matched spelling to a provided strbuf.
This lets the caller, for example, then call index_name_pos() with the
correct case to search the cache-entry array for the real insertion
position.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 name-hash.c | 9 ++++++++-
 name-hash.h | 7 ++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index 251f036eef6..3a58ce03d9c 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -685,13 +685,20 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
 	return slow_same_name(name, namelen, ce->name, len);
 }
 
-int index_dir_exists(struct index_state *istate, const char *name, int namelen)
+int index_dir_find(struct index_state *istate, const char *name, int namelen,
+		   struct strbuf *canonical_path)
 {
 	struct dir_entry *dir;
 
 	lazy_init_name_hash(istate);
 	expand_to_path(istate, name, namelen, 0);
 	dir = find_dir_entry(istate, name, namelen);
+
+	if (canonical_path && dir && dir->nr) {
+		strbuf_reset(canonical_path);
+		strbuf_add(canonical_path, dir->name, dir->namelen);
+	}
+
 	return dir && dir->nr;
 }
 
diff --git a/name-hash.h b/name-hash.h
index b1b4b0fb337..0cbfc428631 100644
--- a/name-hash.h
+++ b/name-hash.h
@@ -4,7 +4,12 @@
 struct cache_entry;
 struct index_state;
 
-int index_dir_exists(struct index_state *istate, const char *name, int namelen);
+
+int index_dir_find(struct index_state *istate, const char *name, int namelen,
+		   struct strbuf *canonical_path);
+
+#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)
+
 void adjust_dirname_case(struct index_state *istate, char *name);
 struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 02/16] t7527: add case-insensitve test for FSMonitor
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 01/16] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests Jeff Hostetler via GitGitGadget
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

The FSMonitor client code trusts the spelling of the pathnames in the
FSEvents received from the FSMonitor daemon.  On case-insensitive file
systems, these OBSERVED pathnames may be spelled differently than the
EXPECTED pathnames listed in the .git/index.  This causes a miss when
using `index_name_pos()` which expects the given case to be correct.

When this happens, the FSMonitor client code does not update the state
of the CE_FSMONITOR_VALID bit when refreshing the index (and before
starting to scan the worktree).

This results in modified files NOT being reported by `git status` when
there is a discrepancy in the case-spelling of a tracked file's
pathname.

This commit contains a (rather contrived) test case to demonstrate
this.  A later commit in this series will update the FSMonitor client
code to recognize these discrepancies and update the CE_ bit accordingly.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t7527-builtin-fsmonitor.sh | 217 +++++++++++++++++++++++++++++++++++
 1 file changed, 217 insertions(+)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 363f9dc0e41..3d21295f789 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1037,4 +1037,221 @@ test_expect_success 'split-index and FSMonitor work well together' '
 	)
 '
 
+# The FSMonitor daemon reports the OBSERVED pathname of modified files
+# and thus contains the OBSERVED spelling on case-insensitive file
+# systems.  The daemon does not (and should not) load the .git/index
+# file and therefore does not know the expected case-spelling.  Since
+# it is possible for the user to create files/subdirectories with the
+# incorrect case, a modified file event for a tracked will not have
+# the EXPECTED case. This can cause `index_name_pos()` to incorrectly
+# report that the file is untracked. This causes the client to fail to
+# mark the file as possibly dirty (keeping the CE_FSMONITOR_VALID bit
+# set) so that `git status` will avoid inspecting it and thus not
+# present in the status output.
+#
+# The setup is a little contrived.
+#
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
+	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
+
+	git init subdir_case_wrong &&
+	(
+		cd subdir_case_wrong &&
+		echo x >AAA &&
+		echo x >BBB &&
+
+		mkdir dir1 &&
+		echo x >dir1/file1 &&
+		mkdir dir1/dir2 &&
+		echo x >dir1/dir2/file2 &&
+		mkdir dir1/dir2/dir3 &&
+		echo x >dir1/dir2/dir3/file3 &&
+
+		echo x >yyy &&
+		echo x >zzz &&
+		git add . &&
+		git commit -m "data" &&
+
+		# This will cause "dir1/" and everything under it
+		# to be deleted.
+		git sparse-checkout set --cone --sparse-index &&
+
+		# Create dir2 with the wrong case and then let Git
+		# repopulate dir3 -- it will not correct the spelling
+		# of dir2.
+		mkdir dir1 &&
+		mkdir dir1/DIR2 &&
+		git sparse-checkout add dir1/dir2/dir3
+	) &&
+
+	start_daemon -C subdir_case_wrong --tf "$PWD/subdir_case_wrong.trace" &&
+
+	# Enable FSMonitor in the client. Run enough commands for
+	# the .git/index to sync up with the daemon with everything
+	# marked clean.
+	git -C subdir_case_wrong config core.fsmonitor true &&
+	git -C subdir_case_wrong update-index --fsmonitor &&
+	git -C subdir_case_wrong status &&
+
+	# Make some files dirty so that FSMonitor gets FSEvents for
+	# each of them.
+	echo xx >>subdir_case_wrong/AAA &&
+	echo xx >>subdir_case_wrong/dir1/DIR2/dir3/file3 &&
+	echo xx >>subdir_case_wrong/zzz &&
+
+	GIT_TRACE_FSMONITOR="$PWD/subdir_case_wrong.log" \
+		git -C subdir_case_wrong --no-optional-locks status --short \
+			>"$PWD/subdir_case_wrong.out" &&
+
+	# "git status" should have gotten file events for each of
+	# the 3 files.
+	#
+	# "dir2" should be in the observed case on disk.
+	grep "fsmonitor_refresh_callback" \
+		<"$PWD/subdir_case_wrong.log" \
+		>"$PWD/subdir_case_wrong.log1" &&
+
+	grep -q "AAA.*pos 0" "$PWD/subdir_case_wrong.log1" &&
+	grep -q "zzz.*pos 6" "$PWD/subdir_case_wrong.log1" &&
+
+	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
+
+	# The refresh-callbacks should have caused "git status" to clear
+	# the CE_FSMONITOR_VALID bit on each of those files and caused
+	# the worktree scan to visit them and mark them as modified.
+	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
+	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
+
+	# However, with the fsmonitor client bug, the "(pos -3)" causes
+	# the client to not update the bit and never rescan the file
+	# and therefore not report it as dirty.
+	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
+'
+
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
+	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
+
+	git init file_case_wrong &&
+	(
+		cd file_case_wrong &&
+		echo x >AAA &&
+		echo x >BBB &&
+
+		mkdir dir1 &&
+		mkdir dir1/dir2 &&
+		mkdir dir1/dir2/dir3 &&
+		echo x >dir1/dir2/dir3/FILE-3-B &&
+		echo x >dir1/dir2/dir3/XXXX-3-X &&
+		echo x >dir1/dir2/dir3/file-3-a &&
+		echo x >dir1/dir2/dir3/yyyy-3-y &&
+		mkdir dir1/dir2/dir4 &&
+		echo x >dir1/dir2/dir4/FILE-4-A &&
+		echo x >dir1/dir2/dir4/XXXX-4-X &&
+		echo x >dir1/dir2/dir4/file-4-b &&
+		echo x >dir1/dir2/dir4/yyyy-4-y &&
+
+		echo x >yyy &&
+		echo x >zzz &&
+		git add . &&
+		git commit -m "data"
+	) &&
+
+	start_daemon -C file_case_wrong --tf "$PWD/file_case_wrong.trace" &&
+
+	# Enable FSMonitor in the client. Run enough commands for
+	# the .git/index to sync up with the daemon with everything
+	# marked clean.
+	git -C file_case_wrong config core.fsmonitor true &&
+	git -C file_case_wrong update-index --fsmonitor &&
+	git -C file_case_wrong status &&
+
+	# Make some files dirty so that FSMonitor gets FSEvents for
+	# each of them.
+	echo xx >>file_case_wrong/AAA &&
+	echo xx >>file_case_wrong/zzz &&
+
+	# Rename some files so that FSMonitor sees a create and delete
+	# FSEvent for each.  (A simple "mv foo FOO" is not portable
+	# between macOS and Windows. It works on both platforms, but makes
+	# the test messy, since (1) one platform updates "ctime" on the
+	# moved file and one does not and (2) it causes a directory event
+	# on one platform and not on the other which causes additional
+	# scanning during "git status" which causes a "H" vs "h" discrepancy
+	# in "git ls-files -f".)  So old-school it and move it out of the
+	# way and copy it to the case-incorrect name so that we get fresh
+	# "ctime" and "mtime" values.
+
+	mv file_case_wrong/dir1/dir2/dir3/file-3-a file_case_wrong/dir1/dir2/dir3/ORIG &&
+	cp file_case_wrong/dir1/dir2/dir3/ORIG     file_case_wrong/dir1/dir2/dir3/FILE-3-A &&
+	rm file_case_wrong/dir1/dir2/dir3/ORIG &&
+	mv file_case_wrong/dir1/dir2/dir4/FILE-4-A file_case_wrong/dir1/dir2/dir4/ORIG &&
+	cp file_case_wrong/dir1/dir2/dir4/ORIG     file_case_wrong/dir1/dir2/dir4/file-4-a &&
+	rm file_case_wrong/dir1/dir2/dir4/ORIG &&
+
+	# Run status enough times to fully sync.
+	#
+	# The first instance should get the create and delete FSEvents
+	# for each pair.  Status should update the index with a new FSM
+	# token (so the next invocation will not see data for these
+	# events).
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try1.log" \
+		git -C file_case_wrong status --short \
+			>"$PWD/file_case_wrong-try1.out" &&
+	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-3-a.*pos 4"  "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*FILE-4-A.*pos 6"  "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try1.log" &&
+
+	# FSM refresh will have invalidated the FSM bit and cause a regular
+	# (real) scan of these tracked files, so they should have "H" status.
+	# (We will not see a "h" status until the next refresh (on the next
+	# command).)
+
+	git -C file_case_wrong ls-files -f >"$PWD/file_case_wrong-lsf1.out" &&
+	grep -q "H dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-lsf1.out" &&
+	grep -q "H dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-lsf1.out" &&
+
+
+	# Try the status again. We assume that the above status command
+	# advanced the token so that the next one will not see those events.
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try2.log" \
+		git -C file_case_wrong status --short \
+			>"$PWD/file_case_wrong-try2.out" &&
+	! grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*file-3-a.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*FILE-4-A.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*file-4-a.*pos" "$PWD/file_case_wrong-try2.log" &&
+
+	# FSM refresh saw nothing, so it will mark all files as valid,
+	# so they should now have "h" status.
+
+	git -C file_case_wrong ls-files -f >"$PWD/file_case_wrong-lsf2.out" &&
+	grep -q "h dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-lsf2.out" &&
+	grep -q "h dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-lsf2.out" &&
+
+
+	# We now have files with clean content, but with case-incorrect
+	# file names.  Modify them to see if status properly reports
+	# them.
+
+	echo xx >>file_case_wrong/dir1/dir2/dir3/FILE-3-A &&
+	echo xx >>file_case_wrong/dir1/dir2/dir4/file-4-a &&
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try3.log" \
+		git -C file_case_wrong --no-optional-locks status --short \
+			>"$PWD/file_case_wrong-try3.out" &&
+	# FSEvents are in observed case.
+	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
+
+	# Status should say these files are modified, but with the case
+	# bug, the "pos -3" cause the client to not update the FSM bit
+	# and never cause the file to be rescanned and therefore to not
+	# report it dirty.
+	! grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
+	! grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 01/16] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 02/16] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  8:17     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
                     ` (13 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Add non-existent "SKIPME" prereq to the case-insensitive tests.

The previous commit added test cases to demonstrate an error where
FSMonitor can get confused on a case-insensitive file system when the
on-disk spelling of a file or directory is wrong.  Let's disable those
tests before we incrementally teach Git to properly recognize and
handle those types of problems (so that a bisect between here and the
final commit in this patch series won't throw a false alarm).

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t7527-builtin-fsmonitor.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 3d21295f789..4acb547819c 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1051,7 +1051,7 @@ test_expect_success 'split-index and FSMonitor work well together' '
 #
 # The setup is a little contrived.
 #
-test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
+test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
 	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
 
 	git init subdir_case_wrong &&
@@ -1128,7 +1128,7 @@ test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
 	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
 '
 
-test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
+test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
 	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
 
 	git init file_case_wrong &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (2 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  8:18     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 05/16] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
                     ` (12 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Move the code to handle directory FSEvents (containing pathnames with
a trailing slash) into a helper function.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index f670c509378..6fecae9aeb2 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static void handle_path_with_trailing_slash(
+	struct index_state *istate, const char *name, int pos)
+{
+	int i;
+
+	/*
+	 * The daemon can decorate directory events, such as
+	 * moves or renames, with a trailing slash if the OS
+	 * FS Event contains sufficient information, such as
+	 * MacOS.
+	 *
+	 * Use this to invalidate the entire cone under that
+	 * directory.
+	 *
+	 * We do not expect an exact match because the index
+	 * does not normally contain directory entries, so we
+	 * start at the insertion point and scan.
+	 */
+	if (pos < 0)
+		pos = -pos - 1;
+
+	/* Mark all entries for the folder invalid */
+	for (i = pos; i < istate->cache_nr; i++) {
+		if (!starts_with(istate->cache[i]->name, name))
+			break;
+		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+	}
+}
+
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
 	int i, len = strlen(name);
@@ -193,28 +222,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 			 name, pos);
 
 	if (name[len - 1] == '/') {
-		/*
-		 * The daemon can decorate directory events, such as
-		 * moves or renames, with a trailing slash if the OS
-		 * FS Event contains sufficient information, such as
-		 * MacOS.
-		 *
-		 * Use this to invalidate the entire cone under that
-		 * directory.
-		 *
-		 * We do not expect an exact match because the index
-		 * does not normally contain directory entries, so we
-		 * start at the insertion point and scan.
-		 */
-		if (pos < 0)
-			pos = -pos - 1;
-
-		/* Mark all entries for the folder invalid */
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		handle_path_with_trailing_slash(istate, name, pos);
 
 		/*
 		 * We need to remove the traling "/" from the path
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 05/16] fsmonitor: clarify handling of directory events in callback helper
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (3 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Improve documentation of the refresh callback helper function
used for directory FSEvents.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 6fecae9aeb2..29cce32d81c 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,24 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+/*
+ * The daemon can decorate directory events, such as a move or rename,
+ * by adding a trailing slash to the observed name.  Use this to
+ * explicitly invalidate the entire cone under that directory.
+ *
+ * The daemon can only reliably do that if the OS FSEvent contains
+ * sufficient information in the event.
+ *
+ * macOS FSEvents have enough information.
+ *
+ * Other platforms may or may not be able to do it (and it might
+ * depend on the type of event (for example, a daemon could lstat() an
+ * observed pathname after a rename, but not after a delete)).
+ *
+ * If we find an exact match in the index for a path with a trailing
+ * slash, it means that we matched a sparse-index directory in a
+ * cone-mode sparse-checkout (since that's the only time we have
+ * directories in the index).  We should never see this in practice
+ * (because sparse directories should not be present and therefore
+ * not generating FS events).  Either way, we can treat them in the
+ * same way and just invalidate the cache-entry and the untracked
+ * cache (and in this case, the forward cache-entry scan won't find
+ * anything and it doesn't hurt to let it run).
+ */
 static void handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
 	int i;
 
-	/*
-	 * The daemon can decorate directory events, such as
-	 * moves or renames, with a trailing slash if the OS
-	 * FS Event contains sufficient information, such as
-	 * MacOS.
-	 *
-	 * Use this to invalidate the entire cone under that
-	 * directory.
-	 *
-	 * We do not expect an exact match because the index
-	 * does not normally contain directory entries, so we
-	 * start at the insertion point and scan.
-	 */
 	if (pos < 0)
 		pos = -pos - 1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (4 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 05/16] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  8:18     ` Junio C Hamano
  2024-02-25 12:30     ` Torsten Bögershausen
  2024-02-23  3:18   ` [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
                     ` (10 subsequent siblings)
  16 siblings, 2 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Move the code handle unqualified FSEvents (without a trailing slash)
into a helper function.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 67 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 29cce32d81c..364198d258f 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,43 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static void handle_path_without_trailing_slash(
+	struct index_state *istate, const char *name, int pos)
+{
+	int i;
+
+	if (pos >= 0) {
+		/*
+		 * We have an exact match for this path and can just
+		 * invalidate it.
+		 */
+		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+	} else {
+		/*
+		 * The path is not a tracked file -or- it is a
+		 * directory event on a platform that cannot
+		 * distinguish between file and directory events in
+		 * the event handler, such as Windows.
+		 *
+		 * Scan as if it is a directory and invalidate the
+		 * cone under it.  (But remember to ignore items
+		 * between "name" and "name/", such as "name-" and
+		 * "name.".
+		 */
+		int len = strlen(name);
+		pos = -pos - 1;
+
+		for (i = pos; i < istate->cache_nr; i++) {
+			if (!starts_with(istate->cache[i]->name, name))
+				break;
+			if ((unsigned char)istate->cache[i]->name[len] > '/')
+				break;
+			if (istate->cache[i]->name[len] == '/')
+				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		}
+	}
+}
+
 /*
  * The daemon can decorate directory events, such as a move or rename,
  * by adding a trailing slash to the observed name.  Use this to
@@ -225,7 +262,7 @@ static void handle_path_with_trailing_slash(
 
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
-	int i, len = strlen(name);
+	int len = strlen(name);
 	int pos = index_name_pos(istate, name, len);
 
 	trace_printf_key(&trace_fsmonitor,
@@ -240,34 +277,8 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 		 * for the untracked cache.
 		 */
 		name[len - 1] = '\0';
-	} else if (pos >= 0) {
-		/*
-		 * We have an exact match for this path and can just
-		 * invalidate it.
-		 */
-		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
 	} else {
-		/*
-		 * The path is not a tracked file -or- it is a
-		 * directory event on a platform that cannot
-		 * distinguish between file and directory events in
-		 * the event handler, such as Windows.
-		 *
-		 * Scan as if it is a directory and invalidate the
-		 * cone under it.  (But remember to ignore items
-		 * between "name" and "name/", such as "name-" and
-		 * "name.".
-		 */
-		pos = -pos - 1;
-
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			if ((unsigned char)istate->cache[i]->name[len] > '/')
-				break;
-			if (istate->cache[i]->name[len] == '/')
-				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		handle_path_without_trailing_slash(istate, name, pos);
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path()
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (5 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-25 12:35     ` Torsten Bögershausen
  2024-02-23  3:18   ` [PATCH v2 08/16] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
                     ` (9 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Create a wrapper function for untracked_cache_invalidate_path()
that silently trims a trailing slash, if present, before calling
the wrapped function.

The untracked cache expects to be called with a pathname that
does not contain a trailing slash.  This can make it inconvenient
for callers that have a directory path.  Lets hide this complexity.

This will be used by a later commit in the FSMonitor code which
may receive directory pathnames from an FSEvent.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 dir.c | 20 ++++++++++++++++++++
 dir.h |  7 +++++++
 2 files changed, 27 insertions(+)

diff --git a/dir.c b/dir.c
index ac699542302..1157f3e43fa 100644
--- a/dir.c
+++ b/dir.c
@@ -3918,6 +3918,26 @@ void untracked_cache_invalidate_path(struct index_state *istate,
 				 path, strlen(path));
 }
 
+void untracked_cache_invalidate_trimmed_path(struct index_state *istate,
+					     const char *path,
+					     int safe_path)
+{
+	size_t len = strlen(path);
+
+	if (!len)
+		return; /* should not happen */
+
+	if (path[len - 1] != '/') {
+		untracked_cache_invalidate_path(istate, path, safe_path);
+	} else {
+		struct strbuf tmp = STRBUF_INIT;
+
+		strbuf_add(&tmp, path, len - 1);
+		untracked_cache_invalidate_path(istate, tmp.buf, safe_path);
+		strbuf_release(&tmp);
+	}
+}
+
 void untracked_cache_remove_from_index(struct index_state *istate,
 				       const char *path)
 {
diff --git a/dir.h b/dir.h
index 98aa85fcc0e..45a7b9ec5f2 100644
--- a/dir.h
+++ b/dir.h
@@ -576,6 +576,13 @@ int cmp_dir_entry(const void *p1, const void *p2);
 int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry *in);
 
 void untracked_cache_invalidate_path(struct index_state *, const char *, int safe_path);
+/*
+ * Invalidate the untracked-cache for this path, but first strip
+ * off a trailing slash, if present.
+ */
+void untracked_cache_invalidate_trimmed_path(struct index_state *,
+					     const char *path,
+					     int safe_path);
 void untracked_cache_remove_from_index(struct index_state *, const char *);
 void untracked_cache_add_to_index(struct index_state *, const char *);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 08/16] fsmonitor: refactor untracked-cache invalidation
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (6 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions Jeff Hostetler via GitGitGadget
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Update fsmonitor_refresh_callback() to use the new
untracked_cache_invalidate_trimmed_path() to invalidate
the cache using the observed pathname without needing to
modify the caller's buffer.

Previously, we modified the caller's buffer when the observed pathname
contained a trailing slash (and did not restore it).  This wasn't a
problem for the single use-case caller, but felt dirty nontheless.  In
a later commit we will want to invalidate case-corrected versions of
the pathname (using possibly borrowed pathnames from the name-hash or
dir-name-hash) and we may not want to keep the tradition of altering
the passed-in pathname.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 364198d258f..2787f7ca5d1 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -271,21 +271,16 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 
 	if (name[len - 1] == '/') {
 		handle_path_with_trailing_slash(istate, name, pos);
-
-		/*
-		 * We need to remove the traling "/" from the path
-		 * for the untracked cache.
-		 */
-		name[len - 1] = '\0';
 	} else {
 		handle_path_without_trailing_slash(istate, name, pos);
 	}
 
 	/*
 	 * Mark the untracked cache dirty even if it wasn't found in the index
-	 * as it could be a new untracked file.
+	 * as it could be a new untracked file.  (Let the untracked cache
+	 * layer silently deal with any trailing slash.)
 	 */
-	untracked_cache_invalidate_path(istate, name, 0);
+	untracked_cache_invalidate_trimmed_path(istate, name, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (7 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 08/16] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23 17:36     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 10/16] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
                     ` (7 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Move the call to invalidate the untracked cache for the FSEvent
pathname into the two helper functions.

In a later commit in this series, we will call these helpers
from other contexts and it safer to include the UC invalidation
in the helper than to remember to also add it to each helper
call-site.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 2787f7ca5d1..2f58ee2fe5a 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -188,6 +188,16 @@ static void handle_path_without_trailing_slash(
 {
 	int i;
 
+	/*
+	 * Mark the untracked cache dirty for this path (regardless of
+	 * whether or not we find an exact match for it in the index).
+	 * Since the path is unqualified (no trailing slash hint in the
+	 * FSEvent), it may refer to a file or directory. So we should
+	 * not assume one or the other and should always let the untracked
+	 * cache decide what needs to invalidated.
+	 */
+	untracked_cache_invalidate_trimmed_path(istate, name, 0);
+
 	if (pos >= 0) {
 		/*
 		 * We have an exact match for this path and can just
@@ -249,6 +259,15 @@ static void handle_path_with_trailing_slash(
 {
 	int i;
 
+	/*
+	 * Mark the untracked cache dirty for this directory path
+	 * (regardless of whether or not we find an exact match for it
+	 * in the index or find it to be proper prefix of one or more
+	 * files in the index), since the FSEvent is hinting that
+	 * there may be changes on or within the directory.
+	 */
+	untracked_cache_invalidate_trimmed_path(istate, name, 0);
+
 	if (pos < 0)
 		pos = -pos - 1;
 
@@ -274,13 +293,6 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 	} else {
 		handle_path_without_trailing_slash(istate, name, pos);
 	}
-
-	/*
-	 * Mark the untracked cache dirty even if it wasn't found in the index
-	 * as it could be a new untracked file.  (Let the untracked cache
-	 * layer silently deal with any trailing slash.)
-	 */
-	untracked_cache_invalidate_trimmed_path(istate, name, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 10/16] fsmonitor: return invalidated cache-entry count on directory event
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (8 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23  3:18   ` [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teach the refresh callback helper function for directory FSEvents to
return the number of cache-entries that were invalidated in response
to a directory event.

This will be used in a later commit to help determine if the observed
pathname in the FSEvent was a (possibly) case-incorrect directory
prefix (on a case-insensitive filesystem) of one or more actual
cache-entries.

If there exists at least one case-insensitive prefix match, then we
can assume that the directory is a (case-incorrect) prefix of at least
one tracked item rather than a completely unknown/untracked file or
directory.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 2f58ee2fe5a..9424bd17230 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -253,11 +253,20 @@ static void handle_path_without_trailing_slash(
  * same way and just invalidate the cache-entry and the untracked
  * cache (and in this case, the forward cache-entry scan won't find
  * anything and it doesn't hurt to let it run).
+ *
+ * Return the number of cache-entries that we invalidated.  We will
+ * use this later to determine if we need to attempt a second
+ * case-insensitive search on case-insensitive file systems.  That is,
+ * if the search using the observed-case in the FSEvent yields any
+ * results, we assume the prefix is case-correct.  If there are no
+ * matches, we still don't know if the observed path is simply
+ * untracked or case-incorrect.
  */
-static void handle_path_with_trailing_slash(
+static size_t handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
 	int i;
+	size_t nr_in_cone = 0;
 
 	/*
 	 * Mark the untracked cache dirty for this directory path
@@ -276,7 +285,10 @@ static void handle_path_with_trailing_slash(
 		if (!starts_with(istate->cache[i]->name, name))
 			break;
 		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		nr_in_cone++;
 	}
+
+	return nr_in_cone;
 }
 
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (9 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 10/16] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23 17:47     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
                     ` (5 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Refactor the code that handles refresh events for pathnames that do
not contain a trailing slash.  Instead of using a custom loop to try
to scan the index and detect if the FSEvent named a file or might be a
directory prefix, use the recently created helper function to do that.

Also update the comments to describe what and why we are doing this.

On platforms that DO NOT annotate FS events with a trailing
slash, if we fail to find an exact match for the pathname
in the index, we do not know if the pathname represents a
directory or simply an untracked file.  Pretend that the pathname
is a directory and try again before assuming it is an untracked
file.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 55 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 9424bd17230..a51c17cda70 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,11 +183,23 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static size_t handle_path_with_trailing_slash(
+	struct index_state *istate, const char *name, int pos);
+
+/*
+ * The daemon sent an observed pathname without a trailing slash.
+ * (This is the normal case.)  We do not know if it is a tracked or
+ * untracked file, a sparse-directory, or a populated directory (on a
+ * platform such as Windows where FSEvents are not qualified).
+ *
+ * The pathname contains the observed case reported by the FS. We
+ * do not know it is case-correct or -incorrect.
+ *
+ * Assume it is case-correct and try an exact match.
+ */
 static void handle_path_without_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
-	int i;
-
 	/*
 	 * Mark the untracked cache dirty for this path (regardless of
 	 * whether or not we find an exact match for it in the index).
@@ -200,33 +212,28 @@ static void handle_path_without_trailing_slash(
 
 	if (pos >= 0) {
 		/*
-		 * We have an exact match for this path and can just
-		 * invalidate it.
+		 * An exact match on a tracked file. We assume that we
+		 * do not need to scan forward for a sparse-directory
+		 * cache-entry with the same pathname, nor for a cone
+		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
 		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
 	} else {
+		struct strbuf work_path = STRBUF_INIT;
+
 		/*
-		 * The path is not a tracked file -or- it is a
-		 * directory event on a platform that cannot
-		 * distinguish between file and directory events in
-		 * the event handler, such as Windows.
-		 *
-		 * Scan as if it is a directory and invalidate the
-		 * cone under it.  (But remember to ignore items
-		 * between "name" and "name/", such as "name-" and
-		 * "name.".
+		 * The negative "pos" gives us the suggested insertion
+		 * point for the pathname (without the trailing slash).
+		 * We need to see if there is a directory with that
+		 * prefix, but there can be lots of pathnames between
+		 * "foo" and "foo/" like "foo-" or "foo-bar", so we
+		 * don't want to do our own scan.
 		 */
-		int len = strlen(name);
-		pos = -pos - 1;
-
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			if ((unsigned char)istate->cache[i]->name[len] > '/')
-				break;
-			if (istate->cache[i]->name[len] == '/')
-				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		strbuf_add(&work_path, name, strlen(name));
+		strbuf_addch(&work_path, '/');
+		pos = index_name_pos(istate, work_path.buf, work_path.len);
+		handle_path_with_trailing_slash(istate, work_path.buf, pos);
+		strbuf_release(&work_path);
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (10 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23 17:51     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
                     ` (4 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teah the refresh callback helper function for unqualified FSEvents
(pathnames without a trailing slash) to return the number of
cache-entries that were invalided in response to the event.

This will be used in a later commit to help determine if the observed
pathname was (possibly) case-incorrect when (on a case-insensitive
file system).

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index a51c17cda70..c16ed5d8758 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -196,8 +196,10 @@ static size_t handle_path_with_trailing_slash(
  * do not know it is case-correct or -incorrect.
  *
  * Assume it is case-correct and try an exact match.
+ *
+ * Return the number of cache-entries that we invalidated.
  */
-static void handle_path_without_trailing_slash(
+static size_t handle_path_without_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
 	/*
@@ -218,7 +220,9 @@ static void handle_path_without_trailing_slash(
 		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
 		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+		return 1;
 	} else {
+		size_t nr_in_cone;
 		struct strbuf work_path = STRBUF_INIT;
 
 		/*
@@ -232,8 +236,10 @@ static void handle_path_without_trailing_slash(
 		strbuf_add(&work_path, name, strlen(name));
 		strbuf_addch(&work_path, '/');
 		pos = index_name_pos(istate, work_path.buf, work_path.len);
-		handle_path_with_trailing_slash(istate, work_path.buf, pos);
+		nr_in_cone = handle_path_with_trailing_slash(
+			istate, work_path.buf, pos);
 		strbuf_release(&work_path);
+		return nr_in_cone;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (11 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23 17:53     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 14/16] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
                     ` (3 subsequent siblings)
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Consolidate the directory/non-directory calls to the refresh handler
code.  Log the resulting count of invalidated cache-entries.

The nr_in_cone value will be used in a later commit to decide if
we also need to try to do case-insensitive lookups.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index c16ed5d8758..739ddbf7aca 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -308,16 +308,21 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
 	int len = strlen(name);
 	int pos = index_name_pos(istate, name, len);
+	size_t nr_in_cone;
 
 	trace_printf_key(&trace_fsmonitor,
 			 "fsmonitor_refresh_callback '%s' (pos %d)",
 			 name, pos);
 
-	if (name[len - 1] == '/') {
-		handle_path_with_trailing_slash(istate, name, pos);
-	} else {
-		handle_path_without_trailing_slash(istate, name, pos);
-	}
+	if (name[len - 1] == '/')
+		nr_in_cone = handle_path_with_trailing_slash(istate, name, pos);
+	else
+		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
+
+	if (nr_in_cone)
+		trace_printf_key(&trace_fsmonitor,
+				 "fsmonitor_refresh_callback CNT: %d",
+				 (int)nr_in_cone);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 14/16] fsmonitor: support case-insensitive events
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (12 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23 18:14     ` Junio C Hamano
  2024-02-25 13:10     ` Torsten Bögershausen
  2024-02-23  3:18   ` [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
                     ` (2 subsequent siblings)
  16 siblings, 2 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teach fsmonitor_refresh_callback() to handle case-insensitive
lookups if case-sensitive lookups fail on case-insensitive systems.
This can cause 'git status' to report stale status for files if there
are case issues/errors in the worktree.

The FSMonitor daemon sends FSEvents using the observed spelling
of each pathname.  On case-insensitive file systems this may be
different than the expected case spelling.

The existing code uses index_name_pos() to find the cache-entry for
the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
that the worktree scan/index refresh will revisit and revalidate the
path.

On a case-insensitive file system, the exact match lookup may fail
to find the associated cache-entry. This causes status to think that
the cached CE flags are correct and skip over the file.

Update event handling to optionally use the name-hash and dir-name-hash
if necessary.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/fsmonitor.c b/fsmonitor.c
index 739ddbf7aca..ac638a61c00 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -5,6 +5,7 @@
 #include "ewah/ewok.h"
 #include "fsmonitor.h"
 #include "fsmonitor-ipc.h"
+#include "name-hash.h"
 #include "run-command.h"
 #include "strbuf.h"
 #include "trace2.h"
@@ -186,6 +187,102 @@ static int query_fsmonitor_hook(struct repository *r,
 static size_t handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos);
 
+/*
+ * Use the name-hash to do a case-insensitive cache-entry lookup with
+ * the pathname and invalidate the cache-entry.
+ *
+ * Returns the number of cache-entries that we invalidated.
+ */
+static size_t handle_using_name_hash_icase(
+	struct index_state *istate, const char *name)
+{
+	struct cache_entry *ce = NULL;
+
+	ce = index_file_exists(istate, name, strlen(name), 1);
+	if (!ce)
+		return 0;
+
+	/*
+	 * A case-insensitive search in the name-hash using the
+	 * observed pathname found a cache-entry, so the observed path
+	 * is case-incorrect.  Invalidate the cache-entry and use the
+	 * correct spelling from the cache-entry to invalidate the
+	 * untracked-cache.  Since we now have sparse-directories in
+	 * the index, the observed pathname may represent a regular
+	 * file or a sparse-index directory.
+	 *
+	 * Note that we should not have seen FSEvents for a
+	 * sparse-index directory, but we handle it just in case.
+	 *
+	 * Either way, we know that there are not any cache-entries for
+	 * children inside the cone of the directory, so we don't need to
+	 * do the usual scan.
+	 */
+	trace_printf_key(&trace_fsmonitor,
+			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
+			 name, ce->name);
+
+	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
+
+	ce->ce_flags &= ~CE_FSMONITOR_VALID;
+	return 1;
+}
+
+/*
+ * Use the dir-name-hash to find the correct-case spelling of the
+ * directory.  Use the canonical spelling to invalidate all of the
+ * cache-entries within the matching cone.
+ *
+ * Returns the number of cache-entries that we invalidated.
+ */
+static size_t handle_using_dir_name_hash_icase(
+	struct index_state *istate, const char *name)
+{
+	struct strbuf canonical_path = STRBUF_INIT;
+	int pos;
+	size_t len = strlen(name);
+	size_t nr_in_cone;
+
+	if (name[len - 1] == '/')
+		len--;
+
+	if (!index_dir_find(istate, name, len, &canonical_path))
+		return 0; /* name is untracked */
+
+	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
+		strbuf_release(&canonical_path);
+		/*
+		 * NEEDSWORK: Our caller already tried an exact match
+		 * and failed to find one.  They called us to do an
+		 * ICASE match, so we should never get an exact match,
+		 * so we could promote this to a BUG() here if we
+		 * wanted to.  It doesn't hurt anything to just return
+		 * 0 and go on becaus we should never get here.  Or we
+		 * could just get rid of the memcmp() and this "if"
+		 * clause completely.
+		 */
+		return 0; /* should not happen */
+	}
+
+	trace_printf_key(&trace_fsmonitor,
+			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
+			 name, canonical_path.buf);
+
+	/*
+	 * The dir-name-hash only tells us the corrected spelling of
+	 * the prefix.  We have to use this canonical path to do a
+	 * lookup in the cache-entry array so that we repeat the
+	 * original search using the case-corrected spelling.
+	 */
+	strbuf_addch(&canonical_path, '/');
+	pos = index_name_pos(istate, canonical_path.buf,
+			     canonical_path.len);
+	nr_in_cone = handle_path_with_trailing_slash(
+		istate, canonical_path.buf, pos);
+	strbuf_release(&canonical_path);
+	return nr_in_cone;
+}
+
 /*
  * The daemon sent an observed pathname without a trailing slash.
  * (This is the normal case.)  We do not know if it is a tracked or
@@ -319,6 +416,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 	else
 		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
 
+	/*
+	 * If we did not find an exact match for this pathname or any
+	 * cache-entries with this directory prefix and we're on a
+	 * case-insensitive file system, try again using the name-hash
+	 * and dir-name-hash.
+	 */
+	if (!nr_in_cone && ignore_case) {
+		nr_in_cone = handle_using_name_hash_icase(istate, name);
+		if (!nr_in_cone)
+			nr_in_cone = handle_using_dir_name_hash_icase(
+				istate, name);
+	}
+
 	if (nr_in_cone)
 		trace_printf_key(&trace_fsmonitor,
 				 "fsmonitor_refresh_callback CNT: %d",
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (13 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 14/16] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-23 18:18     ` Junio C Hamano
  2024-02-23  3:18   ` [PATCH v2 16/16] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  16 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Refactor code in the fsmonitor_refresh_callback() call chain dealing
with invalidating the CE_FSMONITOR_VALID bit and add a trace message.

During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
data from the FSMonitor daemon (so that a later phase will lstat() and
verify the true state of the file).

Create a new function to clear the bit and add some unique tracing for
it to help debug edge cases.

This is similar to the existing `mark_fsmonitor_invalid()` function,
but we don't need the extra stuff that it does.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index ac638a61c00..0667a8c297c 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -187,6 +187,20 @@ static int query_fsmonitor_hook(struct repository *r,
 static size_t handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos);
 
+/*
+ * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
+ * but we've already handled the untracked-cache and I want a different
+ * trace message.
+ */
+static void invalidate_ce_fsm(struct cache_entry *ce)
+{
+	if (ce->ce_flags & CE_FSMONITOR_VALID)
+		trace_printf_key(&trace_fsmonitor,
+				 "fsmonitor_refresh_callback INV: '%s'",
+				 ce->name);
+	ce->ce_flags &= ~CE_FSMONITOR_VALID;
+}
+
 /*
  * Use the name-hash to do a case-insensitive cache-entry lookup with
  * the pathname and invalidate the cache-entry.
@@ -224,7 +238,7 @@ static size_t handle_using_name_hash_icase(
 
 	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
 
-	ce->ce_flags &= ~CE_FSMONITOR_VALID;
+	invalidate_ce_fsm(ce);
 	return 1;
 }
 
@@ -316,7 +330,7 @@ static size_t handle_path_without_trailing_slash(
 		 * cache-entry with the same pathname, nor for a cone
 		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
-		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+		invalidate_ce_fsm(istate->cache[pos]);
 		return 1;
 	} else {
 		size_t nr_in_cone;
@@ -394,7 +408,7 @@ static size_t handle_path_with_trailing_slash(
 	for (i = pos; i < istate->cache_nr; i++) {
 		if (!starts_with(istate->cache[i]->name, name))
 			break;
-		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		invalidate_ce_fsm(istate->cache[i]);
 		nr_in_cone++;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v2 16/16] t7527: update case-insenstive fsmonitor test
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (14 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
@ 2024-02-23  3:18   ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  16 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-23  3:18 UTC (permalink / raw
  To: git; +Cc: Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler,
	Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Now that the FSMonitor client has been updated to better
handle events on case-insenstive file systems, update the
two tests that demonstrated the bug and remove the temporary
SKIPME prereq.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t7527-builtin-fsmonitor.sh | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 4acb547819c..939521a0fac 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1051,7 +1051,7 @@ test_expect_success 'split-index and FSMonitor work well together' '
 #
 # The setup is a little contrived.
 #
-test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
 	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
 
 	git init subdir_case_wrong &&
@@ -1116,19 +1116,20 @@ test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on d
 
 	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
 
+	# Also verify that we get a mapping event to correct the case.
+	grep -q "MAP:.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
+		"$PWD/subdir_case_wrong.log1" &&
+
 	# The refresh-callbacks should have caused "git status" to clear
 	# the CE_FSMONITOR_VALID bit on each of those files and caused
 	# the worktree scan to visit them and mark them as modified.
 	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
 	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
 
-	# However, with the fsmonitor client bug, the "(pos -3)" causes
-	# the client to not update the bit and never rescan the file
-	# and therefore not report it as dirty.
-	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
+	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
 '
 
-test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
 	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
 
 	git init file_case_wrong &&
@@ -1246,12 +1247,14 @@ test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on dis
 	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
 	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
 
-	# Status should say these files are modified, but with the case
-	# bug, the "pos -3" cause the client to not update the FSM bit
-	# and never cause the file to be rescanned and therefore to not
-	# report it dirty.
-	! grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
-	! grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
+	# Also verify that we get a mapping event to correct the case.
+	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
+		"$PWD/file_case_wrong-try3.log" &&
+	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
+		"$PWD/file_case_wrong-try3.log" &&
+
+	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
+	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 01/16] name-hash: add index_dir_find()
  2024-02-23  3:18   ` [PATCH v2 01/16] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
@ 2024-02-23  6:37     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23  6:37 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Replace the index_dir_exists() function with index_dir_find() and
> change the API to take an optional strbuf to return the canonical
> spelling of the matched directory prefix.
>
> Create an index_dir_exists() wrapper macro for existing callers.
>
> The existing index_dir_exists() returns a boolean to indicate if
> there is a case-insensitive match in the directory name-hash, but
> it doesn't tell the caller the exact spelling of that match.
>
> The new version also copies the matched spelling to a provided strbuf.
> This lets the caller, for example, then call index_name_pos() with the
> correct case to search the cache-entry array for the real insertion
> position.

The usual way to compose a log message of this project is to

 - Give an observation on how the current system work in the present
   tense (so no need to say "Currently X is Y", just "X is Y"), and
   discuss what you perceive as a problem in it.

 - Propose a solution (optional---often, problem description
   trivially leads to an obvious solution in reader's minds).

 - Give commands to the codebase to "become like so".

in this order.

I think the third paragraph you wrote should come at the beginning,
then the first (now second) paragraph should describe more clearly
that index_dir_find() is a new function and what it does (perhaps by
reusing what is in the "The new version also..."  paragraph),
without mentioning index_dir_exists().

The second (now third) paragraph then can talk about reimplementing
index_dir_exists() in terms of index_dir_find().

The patch text looks good.

Thanks.

> -int index_dir_exists(struct index_state *istate, const char *name, int namelen)
> +int index_dir_find(struct index_state *istate, const char *name, int namelen,
> +		   struct strbuf *canonical_path)
>  {
>  	struct dir_entry *dir;
>  
>  	lazy_init_name_hash(istate);
>  	expand_to_path(istate, name, namelen, 0);
>  	dir = find_dir_entry(istate, name, namelen);
> +
> +	if (canonical_path && dir && dir->nr) {
> +		strbuf_reset(canonical_path);
> +		strbuf_add(canonical_path, dir->name, dir->namelen);
> +	}
> +
>  	return dir && dir->nr;
>  }
>  
> diff --git a/name-hash.h b/name-hash.h
> index b1b4b0fb337..0cbfc428631 100644
> --- a/name-hash.h
> +++ b/name-hash.h
> @@ -4,7 +4,12 @@
>  struct cache_entry;
>  struct index_state;
>  
> -int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +
> +int index_dir_find(struct index_state *istate, const char *name, int namelen,
> +		   struct strbuf *canonical_path);
> +
> +#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)
> +
>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests
  2024-02-23  3:18   ` [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests Jeff Hostetler via GitGitGadget
@ 2024-02-23  8:17     ` Junio C Hamano
  2024-02-26 17:12       ` Jeff Hostetler
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23  8:17 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Add non-existent "SKIPME" prereq to the case-insensitive tests.
>
> The previous commit added test cases to demonstrate an error where
> FSMonitor can get confused on a case-insensitive file system when the
> on-disk spelling of a file or directory is wrong.  Let's disable those
> tests before we incrementally teach Git to properly recognize and
> handle those types of problems (so that a bisect between here and the
> final commit in this patch series won't throw a false alarm).

You talk about bisection, but hasn't the previous step already
broken bisection without these SKIPME prerequisites?  IOW, shouldn't
this step squashed into the previous?

Also, it is much more common to replace "test_expect_success" with
"test_expect_failure" to indicate that the steps are broken.  Was
there a reason why we choose to do it differently?

> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  t/t7527-builtin-fsmonitor.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index 3d21295f789..4acb547819c 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1051,7 +1051,7 @@ test_expect_success 'split-index and FSMonitor work well together' '
>  #
>  # The setup is a little contrived.
>  #
> -test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
> +test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>  	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
>  
>  	git init subdir_case_wrong &&
> @@ -1128,7 +1128,7 @@ test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>  	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
>  '
>  
> -test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
> +test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
>  	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
>  
>  	git init file_case_wrong &&

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events
  2024-02-23  3:18   ` [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
@ 2024-02-23  8:18     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23  8:18 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Move the code to handle directory FSEvents (containing pathnames with
> a trailing slash) into a helper function.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 22 deletions(-)

Nothing unexpected to see here.  Looking good.

> diff --git a/fsmonitor.c b/fsmonitor.c
> index f670c509378..6fecae9aeb2 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +static void handle_path_with_trailing_slash(
> +	struct index_state *istate, const char *name, int pos)
> +{
> +	int i;
> +
> +	/*
> +	 * The daemon can decorate directory events, such as
> +	 * moves or renames, with a trailing slash if the OS
> +	 * FS Event contains sufficient information, such as
> +	 * MacOS.
> +	 *
> +	 * Use this to invalidate the entire cone under that
> +	 * directory.
> +	 *
> +	 * We do not expect an exact match because the index
> +	 * does not normally contain directory entries, so we
> +	 * start at the insertion point and scan.
> +	 */
> +	if (pos < 0)
> +		pos = -pos - 1;
> +
> +	/* Mark all entries for the folder invalid */
> +	for (i = pos; i < istate->cache_nr; i++) {
> +		if (!starts_with(istate->cache[i]->name, name))
> +			break;
> +		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +	}
> +}
> +
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
>  	int i, len = strlen(name);
> @@ -193,28 +222,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  			 name, pos);
>  
>  	if (name[len - 1] == '/') {
> -		/*
> -		 * The daemon can decorate directory events, such as
> -		 * moves or renames, with a trailing slash if the OS
> -		 * FS Event contains sufficient information, such as
> -		 * MacOS.
> -		 *
> -		 * Use this to invalidate the entire cone under that
> -		 * directory.
> -		 *
> -		 * We do not expect an exact match because the index
> -		 * does not normally contain directory entries, so we
> -		 * start at the insertion point and scan.
> -		 */
> -		if (pos < 0)
> -			pos = -pos - 1;
> -
> -		/* Mark all entries for the folder invalid */
> -		for (i = pos; i < istate->cache_nr; i++) {
> -			if (!starts_with(istate->cache[i]->name, name))
> -				break;
> -			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> -		}
> +		handle_path_with_trailing_slash(istate, name, pos);
>  
>  		/*
>  		 * We need to remove the traling "/" from the path

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events
  2024-02-23  3:18   ` [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
@ 2024-02-23  8:18     ` Junio C Hamano
  2024-02-25 12:30     ` Torsten Bögershausen
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23  8:18 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Move the code handle unqualified FSEvents (without a trailing slash)
> into a helper function.

-ECANNOTPARSE.  "code handle" -> "code that handles"?

In the patch text itself, there is nothing unexpected.  Looking
good.

Thanks.

> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 67 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 39 insertions(+), 28 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 29cce32d81c..364198d258f 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,6 +183,43 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>  
> +static void handle_path_without_trailing_slash(
> +	struct index_state *istate, const char *name, int pos)
> +{
> +	int i;
> +
> +	if (pos >= 0) {
> +		/*
> +		 * We have an exact match for this path and can just
> +		 * invalidate it.
> +		 */
> +		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +	} else {
> +		/*
> +		 * The path is not a tracked file -or- it is a
> +		 * directory event on a platform that cannot
> +		 * distinguish between file and directory events in
> +		 * the event handler, such as Windows.
> +		 *
> +		 * Scan as if it is a directory and invalidate the
> +		 * cone under it.  (But remember to ignore items
> +		 * between "name" and "name/", such as "name-" and
> +		 * "name.".
> +		 */
> +		int len = strlen(name);
> +		pos = -pos - 1;
> +
> +		for (i = pos; i < istate->cache_nr; i++) {
> +			if (!starts_with(istate->cache[i]->name, name))
> +				break;
> +			if ((unsigned char)istate->cache[i]->name[len] > '/')
> +				break;
> +			if (istate->cache[i]->name[len] == '/')
> +				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		}
> +	}
> +}
> +
>  /*
>   * The daemon can decorate directory events, such as a move or rename,
>   * by adding a trailing slash to the observed name.  Use this to
> @@ -225,7 +262,7 @@ static void handle_path_with_trailing_slash(
>  
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
> -	int i, len = strlen(name);
> +	int len = strlen(name);
>  	int pos = index_name_pos(istate, name, len);
>  
>  	trace_printf_key(&trace_fsmonitor,
> @@ -240,34 +277,8 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  		 * for the untracked cache.
>  		 */
>  		name[len - 1] = '\0';
> -	} else if (pos >= 0) {
> -		/*
> -		 * We have an exact match for this path and can just
> -		 * invalidate it.
> -		 */
> -		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
>  	} else {
> -		/*
> -		 * The path is not a tracked file -or- it is a
> -		 * directory event on a platform that cannot
> -		 * distinguish between file and directory events in
> -		 * the event handler, such as Windows.
> -		 *
> -		 * Scan as if it is a directory and invalidate the
> -		 * cone under it.  (But remember to ignore items
> -		 * between "name" and "name/", such as "name-" and
> -		 * "name.".
> -		 */
> -		pos = -pos - 1;
> -
> -		for (i = pos; i < istate->cache_nr; i++) {
> -			if (!starts_with(istate->cache[i]->name, name))
> -				break;
> -			if ((unsigned char)istate->cache[i]->name[len] > '/')
> -				break;
> -			if (istate->cache[i]->name[len] == '/')
> -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> -		}
> +		handle_path_without_trailing_slash(istate, name, pos);
>  	}
>  
>  	/*

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions
  2024-02-23  3:18   ` [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions Jeff Hostetler via GitGitGadget
@ 2024-02-23 17:36     ` Junio C Hamano
  2024-02-26 18:45       ` Jeff Hostetler
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23 17:36 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Move the call to invalidate the untracked cache for the FSEvent
> pathname into the two helper functions.
>
> In a later commit in this series, we will call these helpers
> from other contexts and it safer to include the UC invalidation
> in the helper than to remember to also add it to each helper
> call-site.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)

Thanks.  The steps in this iteration makes this move much less
confusing to me than in the previous one.  We used to call one of
"handle path with/without trailing slash" functions and then called
the invalidation.  Now the invalidation happens in these "handle path"
functions.

The unexplained change in behaviour is that we used to do the rest
of "handle path" and invalidation was done at the end.  Now we do it
upfront.  I think the "rest" works solely based on what is in the
main in-core index array (i.e. the_index.cache[] aka active_cache[])
and affects only what is in the in-core index array, while
untracked_cache_invalidate*() works solely based on what is in the
untracked cache extension (i.e. the_index.untracked) and affects
only what is in there, so the order of these two does not matter.

Am I correct?

Or does it affect correctness or performance or whatever in any way?
IOW, is there a reason why it is better to do the invalidation first
and then doing the "rest" after (hence this patch flips the order of
two to _improve_ something)?

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler
  2024-02-23  3:18   ` [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
@ 2024-02-23 17:47     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23 17:47 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  static void handle_path_without_trailing_slash(
>  	struct index_state *istate, const char *name, int pos)
>  {
> -	int i;
> -
>  	/*
>  	 * Mark the untracked cache dirty for this path (regardless of
>  	 * whether or not we find an exact match for it in the index).
> @@ -200,33 +212,28 @@ static void handle_path_without_trailing_slash(
>  
>  	if (pos >= 0) {
>  		/*
> -		 * We have an exact match for this path and can just
> -		 * invalidate it.
> +		 * An exact match on a tracked file. We assume that we
> +		 * do not need to scan forward for a sparse-directory
> +		 * cache-entry with the same pathname, nor for a cone
> +		 * at that directory. (That is, assume no D/F conflicts.)
>  		 */
>  		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
>  	} else {
> +		struct strbuf work_path = STRBUF_INIT;
> +
>  		/*
> +		strbuf_add(&work_path, name, strlen(name));
> +		strbuf_addch(&work_path, '/');
> +		pos = index_name_pos(istate, work_path.buf, work_path.len);
> +		handle_path_with_trailing_slash(istate, work_path.buf, pos);
> +		strbuf_release(&work_path);
>  	}
>  }

The "with trailing slash" variant is returning a useful value to
this caller that ignores it, but we do not yet return a value from
this function, so that is OK.  The name being a name that may be in
different case from what we know in the index is not yet handled in
this step (we have "Assume it is case-correct" in the comment) and
that applies for both the main array of cache entries as well as the
untracked cache.

It will be exciting to see how these are lifted.  The main array has
some helper functions that uses name-hash features to help icase
matches, but I do not offhand recall what we have for the untracked
cache side.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event
  2024-02-23  3:18   ` [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
@ 2024-02-23 17:51     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23 17:51 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Teah the refresh callback helper function for unqualified FSEvents

Teach?

> (pathnames without a trailing slash) to return the number of
> cache-entries that were invalided in response to the event.
>
> This will be used in a later commit to help determine if the observed
> pathname was (possibly) case-incorrect when (on a case-insensitive
> file system).
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

I do not see anything unexpected in the change to the code below.
Looking good.

Thanks.

> diff --git a/fsmonitor.c b/fsmonitor.c
> index a51c17cda70..c16ed5d8758 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -196,8 +196,10 @@ static size_t handle_path_with_trailing_slash(
>   * do not know it is case-correct or -incorrect.
>   *
>   * Assume it is case-correct and try an exact match.
> + *
> + * Return the number of cache-entries that we invalidated.
>   */
> -static void handle_path_without_trailing_slash(
> +static size_t handle_path_without_trailing_slash(
>  	struct index_state *istate, const char *name, int pos)
>  {
>  	/*
> @@ -218,7 +220,9 @@ static void handle_path_without_trailing_slash(
>  		 * at that directory. (That is, assume no D/F conflicts.)
>  		 */
>  		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		return 1;
>  	} else {
> +		size_t nr_in_cone;
>  		struct strbuf work_path = STRBUF_INIT;
>  
>  		/*
> @@ -232,8 +236,10 @@ static void handle_path_without_trailing_slash(
>  		strbuf_add(&work_path, name, strlen(name));
>  		strbuf_addch(&work_path, '/');
>  		pos = index_name_pos(istate, work_path.buf, work_path.len);
> -		handle_path_with_trailing_slash(istate, work_path.buf, pos);
> +		nr_in_cone = handle_path_with_trailing_slash(
> +			istate, work_path.buf, pos);
>  		strbuf_release(&work_path);
> +		return nr_in_cone;
>  	}
>  }

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count
  2024-02-23  3:18   ` [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
@ 2024-02-23 17:53     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23 17:53 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Consolidate the directory/non-directory calls to the refresh handler
> code.  Log the resulting count of invalidated cache-entries.

OK.  Again, there is nothing surprising in the changes in the patch.
Looking good.

> The nr_in_cone value will be used in a later commit to decide if
> we also need to try to do case-insensitive lookups.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)

> diff --git a/fsmonitor.c b/fsmonitor.c
> index c16ed5d8758..739ddbf7aca 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -308,16 +308,21 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
>  	int len = strlen(name);
>  	int pos = index_name_pos(istate, name, len);
> +	size_t nr_in_cone;
>  
>  	trace_printf_key(&trace_fsmonitor,
>  			 "fsmonitor_refresh_callback '%s' (pos %d)",
>  			 name, pos);
>  
> -	if (name[len - 1] == '/') {
> -		handle_path_with_trailing_slash(istate, name, pos);
> -	} else {
> -		handle_path_without_trailing_slash(istate, name, pos);
> -	}
> +	if (name[len - 1] == '/')
> +		nr_in_cone = handle_path_with_trailing_slash(istate, name, pos);
> +	else
> +		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
> +
> +	if (nr_in_cone)
> +		trace_printf_key(&trace_fsmonitor,
> +				 "fsmonitor_refresh_callback CNT: %d",
> +				 (int)nr_in_cone);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 14/16] fsmonitor: support case-insensitive events
  2024-02-23  3:18   ` [PATCH v2 14/16] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
@ 2024-02-23 18:14     ` Junio C Hamano
  2024-02-26 20:41       ` Jeff Hostetler
  2024-02-25 13:10     ` Torsten Bögershausen
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23 18:14 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/*
> + * Use the name-hash to do a case-insensitive cache-entry lookup with
> + * the pathname and invalidate the cache-entry.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_name_hash_icase(
> +	struct index_state *istate, const char *name)
> +{
> +	struct cache_entry *ce = NULL;
> +
> +	ce = index_file_exists(istate, name, strlen(name), 1);
> +	if (!ce)
> +		return 0;
> +
> +	/*
> +	 * A case-insensitive search in the name-hash using the
> +	 * observed pathname found a cache-entry, so the observed path
> +	 * is case-incorrect.  Invalidate the cache-entry and use the
> +	 * correct spelling from the cache-entry to invalidate the
> +	 * untracked-cache.  Since we now have sparse-directories in
> +	 * the index, the observed pathname may represent a regular
> +	 * file or a sparse-index directory.
> +	 *
> +	 * Note that we should not have seen FSEvents for a
> +	 * sparse-index directory, but we handle it just in case.
> +	 *
> +	 * Either way, we know that there are not any cache-entries for
> +	 * children inside the cone of the directory, so we don't need to
> +	 * do the usual scan.
> +	 */
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, ce->name);
> +
> +	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +	return 1;
> +}

You first ask the name-hash to turn the incoming "name" into the
case variant that we know about, i.e. ce->name, and use that to
access the untracked cache.  Clever and makes sense.  But if we have
ce->name, doesn't it mean the name is tracked?  Do we find anything
useful to do in the untracked cache invalidation codepath in that
case?

An FSmonitor event with case-incorrect pathname for a directory may
not be this trivial, I presume, and I expect that is what the
remainder of this patch is about.

> +
> +/*
> + * Use the dir-name-hash to find the correct-case spelling of the
> + * directory.  Use the canonical spelling to invalidate all of the
> + * cache-entries within the matching cone.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_dir_name_hash_icase(
> +	struct index_state *istate, const char *name)

It is a bit unfortunate that here on the name-hash side we contrast
the two helper function variants as "dir-name" vs "name", while the
original handle_path side use "without_slash" vs "with_slash".

If I understand correctly, it is not like there are two distinct
hashes, "name-hash" vs "dir-name-hash".  Both of these helpers use
the same "name-hash" mechanism, and this function differs from the
previous one in that it is about a directory, which is why it has
"dir" in its name.  I wonder if we renamed the other one with
"nondir" in its name, and the other without_slash and with_slash
pair to match, e.g., handle_nondir_path() vs handle_dir_path(), or
something like that, the resulting names for these four functions
become easier to contrast and understand?

> +{
> +	struct strbuf canonical_path = STRBUF_INIT;
> +	int pos;
> +	size_t len = strlen(name);
> +	size_t nr_in_cone;
> +
> +	if (name[len - 1] == '/')
> +		len--;
> +
> +	if (!index_dir_find(istate, name, len, &canonical_path))
> +		return 0; /* name is untracked */
> +
> +	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
> +		strbuf_release(&canonical_path);
> +		/*
> +		 * NEEDSWORK: Our caller already tried an exact match
> +		 * and failed to find one.  They called us to do an
> +		 * ICASE match, so we should never get an exact match,
> +		 * so we could promote this to a BUG() here if we
> +		 * wanted to.  It doesn't hurt anything to just return
> +		 * 0 and go on becaus we should never get here.  Or we
> +		 * could just get rid of the memcmp() and this "if"
> +		 * clause completely.
> +		 */
> +		return 0; /* should not happen */
> +	}

"becaus" -> "because".

If we should never get here, having BUG("we should never get here")
would not hurt anything, either.  On the other hand, silently
returning 0 will hide the bug under the carpet, and I am not sure it
is fair to call it "doesn't hurt anything".

> +
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, canonical_path.buf);
> +
> +	/*
> +	 * The dir-name-hash only tells us the corrected spelling of
> +	 * the prefix.  We have to use this canonical path to do a
> +	 * lookup in the cache-entry array so that we repeat the
> +	 * original search using the case-corrected spelling.
> +	 */
> +	strbuf_addch(&canonical_path, '/');
> +	pos = index_name_pos(istate, canonical_path.buf,
> +			     canonical_path.len);
> +	nr_in_cone = handle_path_with_trailing_slash(
> +		istate, canonical_path.buf, pos);
> +	strbuf_release(&canonical_path);
> +	return nr_in_cone;
> +}

Nice.  Do we need to give this corrected name to help untracked
cache invalidation from the caller that called us?

> @@ -319,6 +416,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  	else
>  		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
>  
> +	/*
> +	 * If we did not find an exact match for this pathname or any
> +	 * cache-entries with this directory prefix and we're on a
> +	 * case-insensitive file system, try again using the name-hash
> +	 * and dir-name-hash.
> +	 */
> +	if (!nr_in_cone && ignore_case) {
> +		nr_in_cone = handle_using_name_hash_icase(istate, name);
> +		if (!nr_in_cone)
> +			nr_in_cone = handle_using_dir_name_hash_icase(
> +				istate, name);
> +	}

It might be interesting to learn how often we go through these
"fallback" code paths by tracing.  Maybe it will become too noisy?
I dunno.

>  	if (nr_in_cone)
>  		trace_printf_key(&trace_fsmonitor,
>  				 "fsmonitor_refresh_callback CNT: %d",

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback
  2024-02-23  3:18   ` [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
@ 2024-02-23 18:18     ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-23 18:18 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Refactor code in the fsmonitor_refresh_callback() call chain dealing
> with invalidating the CE_FSMONITOR_VALID bit and add a trace message.
>
> During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
> data from the FSMonitor daemon (so that a later phase will lstat() and
> verify the true state of the file).
>
> Create a new function to clear the bit and add some unique tracing for
> it to help debug edge cases.
>
> This is similar to the existing `mark_fsmonitor_invalid()` function,
> but we don't need the extra stuff that it does.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index ac638a61c00..0667a8c297c 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -187,6 +187,20 @@ static int query_fsmonitor_hook(struct repository *r,
>  static size_t handle_path_with_trailing_slash(
>  	struct index_state *istate, const char *name, int pos);
>  
> +/*
> + * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
> + * but we've already handled the untracked-cache and I want a different
> + * trace message.
> + */

"I want" -> "want" perhaps.

More importantly, when new developers come and want to touch this
file in the future, how would they choose which one to call?  Would
it make a better comment if we rewrote the above for such future
developers as intended audiences?

> +static void invalidate_ce_fsm(struct cache_entry *ce)
> +{
> +	if (ce->ce_flags & CE_FSMONITOR_VALID)
> +		trace_printf_key(&trace_fsmonitor,
> +				 "fsmonitor_refresh_callback INV: '%s'",
> +				 ce->name);
> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +}
> +
>  /*
>   * Use the name-hash to do a case-insensitive cache-entry lookup with
>   * the pathname and invalidate the cache-entry.
> @@ -224,7 +238,7 @@ static size_t handle_using_name_hash_icase(
>  
>  	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
>  
> -	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +	invalidate_ce_fsm(ce);
>  	return 1;
>  }
>  
> @@ -316,7 +330,7 @@ static size_t handle_path_without_trailing_slash(
>  		 * cache-entry with the same pathname, nor for a cone
>  		 * at that directory. (That is, assume no D/F conflicts.)
>  		 */
> -		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		invalidate_ce_fsm(istate->cache[pos]);
>  		return 1;
>  	} else {
>  		size_t nr_in_cone;
> @@ -394,7 +408,7 @@ static size_t handle_path_with_trailing_slash(
>  	for (i = pos; i < istate->cache_nr; i++) {
>  		if (!starts_with(istate->cache[i]->name, name))
>  			break;
> -		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		invalidate_ce_fsm(istate->cache[i]);
>  		nr_in_cone++;
>  	}

Nice.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events
  2024-02-23  3:18   ` [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
  2024-02-23  8:18     ` Junio C Hamano
@ 2024-02-25 12:30     ` Torsten Bögershausen
  2024-02-25 17:24       ` Junio C Hamano
  1 sibling, 1 reply; 91+ messages in thread
From: Torsten Bögershausen @ 2024-02-25 12:30 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

On Fri, Feb 23, 2024 at 03:18:10AM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Move the code handle unqualified FSEvents (without a trailing slash)
> into a helper function.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 67 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 39 insertions(+), 28 deletions(-)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 29cce32d81c..364198d258f 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -183,6 +183,43 @@ static int query_fsmonitor_hook(struct repository *r,
>  	return result;
>  }
>
> +static void handle_path_without_trailing_slash(
> +	struct index_state *istate, const char *name, int pos)
> +{
> +	int i;
> +
> +	if (pos >= 0) {
> +		/*
> +		 * We have an exact match for this path and can just
> +		 * invalidate it.
> +		 */
> +		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +	} else {
> +		/*
> +		 * The path is not a tracked file -or- it is a
> +		 * directory event on a platform that cannot
> +		 * distinguish between file and directory events in
> +		 * the event handler, such as Windows.
> +		 *
> +		 * Scan as if it is a directory and invalidate the
> +		 * cone under it.  (But remember to ignore items
> +		 * between "name" and "name/", such as "name-" and
> +		 * "name.".
> +		 */
> +		int len = strlen(name);

should this be
	size_t len = strlen(name);

> +		pos = -pos - 1;
> +
> +		for (i = pos; i < istate->cache_nr; i++) {
> +			if (!starts_with(istate->cache[i]->name, name))
> +				break;
> +			if ((unsigned char)istate->cache[i]->name[len] > '/')
> +				break;

Hm, this covers all digits, letters, :;<=>?
but not e.g. !+-. (and others). What do i miss ?


> +			if (istate->cache[i]->name[len] == '/')
> +				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		}
> +	}
> +}
> +
>  /*
>   * The daemon can decorate directory events, such as a move or rename,
>   * by adding a trailing slash to the observed name.  Use this to
> @@ -225,7 +262,7 @@ static void handle_path_with_trailing_slash(
>
>  static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  {
> -	int i, len = strlen(name);
> +	int len = strlen(name);

Same here: size_t len = strlen() ?

>  	int pos = index_name_pos(istate, name, len);
>
>  	trace_printf_key(&trace_fsmonitor,
> @@ -240,34 +277,8 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  		 * for the untracked cache.
>  		 */
>  		name[len - 1] = '\0';
> -	} else if (pos >= 0) {
> -		/*
> -		 * We have an exact match for this path and can just
> -		 * invalidate it.
> -		 */
> -		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
>  	} else {
> -		/*
> -		 * The path is not a tracked file -or- it is a
> -		 * directory event on a platform that cannot
> -		 * distinguish between file and directory events in
> -		 * the event handler, such as Windows.
> -		 *
> -		 * Scan as if it is a directory and invalidate the
> -		 * cone under it.  (But remember to ignore items
> -		 * between "name" and "name/", such as "name-" and
> -		 * "name.".
> -		 */
> -		pos = -pos - 1;
> -
> -		for (i = pos; i < istate->cache_nr; i++) {
> -			if (!starts_with(istate->cache[i]->name, name))
> -				break;
> -			if ((unsigned char)istate->cache[i]->name[len] > '/')
> -				break;
> -			if (istate->cache[i]->name[len] == '/')
> -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
> -		}
> +		handle_path_without_trailing_slash(istate, name, pos);
>  	}
>
>  	/*
> --
> gitgitgadget
>
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path()
  2024-02-23  3:18   ` [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
@ 2024-02-25 12:35     ` Torsten Bögershausen
  0 siblings, 0 replies; 91+ messages in thread
From: Torsten Bögershausen @ 2024-02-25 12:35 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

On Fri, Feb 23, 2024 at 03:18:11AM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Create a wrapper function for untracked_cache_invalidate_path()
> that silently trims a trailing slash, if present, before calling
> the wrapped function.
>
> The untracked cache expects to be called with a pathname that
> does not contain a trailing slash.  This can make it inconvenient
> for callers that have a directory path.  Lets hide this complexity.
>
> This will be used by a later commit in the FSMonitor code which
> may receive directory pathnames from an FSEvent.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  dir.c | 20 ++++++++++++++++++++
>  dir.h |  7 +++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index ac699542302..1157f3e43fa 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -3918,6 +3918,26 @@ void untracked_cache_invalidate_path(struct index_state *istate,
>  				 path, strlen(path));
>  }
>
> +void untracked_cache_invalidate_trimmed_path(struct index_state *istate,
> +					     const char *path,
> +					     int safe_path)
> +{
> +	size_t len = strlen(path);
> +
> +	if (!len)
> +		return; /* should not happen */

Should a BUG() be used ? Or bug(), for the record:
Please see Documentation/technical/api-error-handling.txt
> +
> +	if (path[len - 1] != '/') {
> +		untracked_cache_invalidate_path(istate, path, safe_path);
> +	} else {
> +		struct strbuf tmp = STRBUF_INIT;
> +
> +		strbuf_add(&tmp, path, len - 1);
> +		untracked_cache_invalidate_path(istate, tmp.buf, safe_path);
> +		strbuf_release(&tmp);
> +	}
> +}
> +
>  void untracked_cache_remove_from_index(struct index_state *istate,
>  				       const char *path)
>  {
> diff --git a/dir.h b/dir.h
> index 98aa85fcc0e..45a7b9ec5f2 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -576,6 +576,13 @@ int cmp_dir_entry(const void *p1, const void *p2);
>  int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry *in);
>
>  void untracked_cache_invalidate_path(struct index_state *, const char *, int safe_path);
> +/*
> + * Invalidate the untracked-cache for this path, but first strip
> + * off a trailing slash, if present.
> + */
> +void untracked_cache_invalidate_trimmed_path(struct index_state *,
> +					     const char *path,
> +					     int safe_path);
>  void untracked_cache_remove_from_index(struct index_state *, const char *);
>  void untracked_cache_add_to_index(struct index_state *, const char *);
>
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 14/16] fsmonitor: support case-insensitive events
  2024-02-23  3:18   ` [PATCH v2 14/16] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
  2024-02-23 18:14     ` Junio C Hamano
@ 2024-02-25 13:10     ` Torsten Bögershausen
  2024-02-26 20:47       ` Jeff Hostetler
  1 sibling, 1 reply; 91+ messages in thread
From: Torsten Bögershausen @ 2024-02-25 13:10 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler, Jeff Hostetler

On Fri, Feb 23, 2024 at 03:18:18AM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Teach fsmonitor_refresh_callback() to handle case-insensitive
> lookups if case-sensitive lookups fail on case-insensitive systems.
> This can cause 'git status' to report stale status for files if there
> are case issues/errors in the worktree.
>
> The FSMonitor daemon sends FSEvents using the observed spelling
> of each pathname.  On case-insensitive file systems this may be
> different than the expected case spelling.
>
> The existing code uses index_name_pos() to find the cache-entry for
> the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
> that the worktree scan/index refresh will revisit and revalidate the
> path.
>
> On a case-insensitive file system, the exact match lookup may fail
> to find the associated cache-entry. This causes status to think that
> the cached CE flags are correct and skip over the file.
>
> Update event handling to optionally use the name-hash and dir-name-hash
> if necessary.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 110 insertions(+)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 739ddbf7aca..ac638a61c00 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -5,6 +5,7 @@
>  #include "ewah/ewok.h"
>  #include "fsmonitor.h"
>  #include "fsmonitor-ipc.h"
> +#include "name-hash.h"
>  #include "run-command.h"
>  #include "strbuf.h"
>  #include "trace2.h"
> @@ -186,6 +187,102 @@ static int query_fsmonitor_hook(struct repository *r,
>  static size_t handle_path_with_trailing_slash(
>  	struct index_state *istate, const char *name, int pos);
>
> +/*
> + * Use the name-hash to do a case-insensitive cache-entry lookup with
> + * the pathname and invalidate the cache-entry.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_name_hash_icase(
> +	struct index_state *istate, const char *name)
> +{
> +	struct cache_entry *ce = NULL;
> +
> +	ce = index_file_exists(istate, name, strlen(name), 1);
> +	if (!ce)
> +		return 0;
> +
> +	/*
> +	 * A case-insensitive search in the name-hash using the
> +	 * observed pathname found a cache-entry, so the observed path
> +	 * is case-incorrect.  Invalidate the cache-entry and use the
> +	 * correct spelling from the cache-entry to invalidate the
> +	 * untracked-cache.  Since we now have sparse-directories in
> +	 * the index, the observed pathname may represent a regular
> +	 * file or a sparse-index directory.
> +	 *
> +	 * Note that we should not have seen FSEvents for a
> +	 * sparse-index directory, but we handle it just in case.
> +	 *
> +	 * Either way, we know that there are not any cache-entries for
> +	 * children inside the cone of the directory, so we don't need to
> +	 * do the usual scan.
> +	 */
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, ce->name);
> +
> +	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
> +
> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +	return 1;
> +}
> +
> +/*
> + * Use the dir-name-hash to find the correct-case spelling of the
> + * directory.  Use the canonical spelling to invalidate all of the
> + * cache-entries within the matching cone.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_dir_name_hash_icase(
> +	struct index_state *istate, const char *name)
> +{
> +	struct strbuf canonical_path = STRBUF_INIT;
> +	int pos;
> +	size_t len = strlen(name);
> +	size_t nr_in_cone;
> +
> +	if (name[len - 1] == '/')
> +		len--;
> +
> +	if (!index_dir_find(istate, name, len, &canonical_path))
> +		return 0; /* name is untracked */
> +
> +	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
> +		strbuf_release(&canonical_path);
> +		/*
> +		 * NEEDSWORK: Our caller already tried an exact match
> +		 * and failed to find one.  They called us to do an
> +		 * ICASE match, so we should never get an exact match,
> +		 * so we could promote this to a BUG() here if we
> +		 * wanted to.  It doesn't hurt anything to just return
> +		 * 0 and go on becaus we should never get here.  Or we
> +		 * could just get rid of the memcmp() and this "if"
> +		 * clause completely.
> +		 */
> +		return 0; /* should not happen */

In very very theory, there may be a race-condition,
when a directory is renamed very fast, more than once.
I don't think, that the "it did not match exactly, but
now it matches" is a problem.
Question: Does it make sense to just remove this ?
And, may be, find out that the "corrected spelling (tm)"
of "DIR1" is not "dir1", neither "Dir1", but, exactly, "DIR1" ?
Would that be a problem ?


> +	}
> +
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, canonical_path.buf);
> +
> +	/*
> +	 * The dir-name-hash only tells us the corrected spelling of
> +	 * the prefix.  We have to use this canonical path to do a
> +	 * lookup in the cache-entry array so that we repeat the
> +	 * original search using the case-corrected spelling.
> +	 */
> +	strbuf_addch(&canonical_path, '/');
> +	pos = index_name_pos(istate, canonical_path.buf,
> +			     canonical_path.len);
> +	nr_in_cone = handle_path_with_trailing_slash(
> +		istate, canonical_path.buf, pos);
> +	strbuf_release(&canonical_path);
> +	return nr_in_cone;
> +}
> +
>  /*
>   * The daemon sent an observed pathname without a trailing slash.
>   * (This is the normal case.)  We do not know if it is a tracked or
> @@ -319,6 +416,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  	else
>  		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
>
> +	/*
> +	 * If we did not find an exact match for this pathname or any
> +	 * cache-entries with this directory prefix and we're on a
> +	 * case-insensitive file system, try again using the name-hash
> +	 * and dir-name-hash.
> +	 */
> +	if (!nr_in_cone && ignore_case) {
> +		nr_in_cone = handle_using_name_hash_icase(istate, name);
> +		if (!nr_in_cone)
> +			nr_in_cone = handle_using_dir_name_hash_icase(
> +				istate, name);
> +	}
> +
>  	if (nr_in_cone)
>  		trace_printf_key(&trace_fsmonitor,
>  				 "fsmonitor_refresh_callback CNT: %d",
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events
  2024-02-25 12:30     ` Torsten Bögershausen
@ 2024-02-25 17:24       ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-25 17:24 UTC (permalink / raw
  To: Torsten Bögershausen
  Cc: Jeff Hostetler via GitGitGadget, git, Patrick Steinhardt,
	Jeff Hostetler, Jeff Hostetler

Torsten Bögershausen <tboegi@web.de> writes:

>> +		pos = -pos - 1;
>> +
>> +		for (i = pos; i < istate->cache_nr; i++) {
>> +			if (!starts_with(istate->cache[i]->name, name))
>> +				break;
>> +			if ((unsigned char)istate->cache[i]->name[len] > '/')
>> +				break;
>
> Hm, this covers all digits, letters, :;<=>?
> but not e.g. !+-. (and others). What do i miss ?

This is scanning an in-core array of cache entries, which is sorted
by name in lexicographic order, and the loop knows that files under
the directory "foo", whose pathnames all share prefix "foo/", would
sort between "foo.h" and "foo00", because "." sorts before "/" and
"0" sorts after "/".

It is trying to find where in the array a hypothetical directory
would appear, if any of the files in it existed in the array, and
exiting early, taking advantage of the fact that after seeing
something that sorts after a '/', it will never see an entry that
shares cache[i]->name[] as a prefix.

It is not a new code in the patch, of course; merely got moved from
elsewhere below.

>> -		/*
>> -		 * The path is not a tracked file -or- it is a
>> -		 * directory event on a platform that cannot
>> -		 * distinguish between file and directory events in
>> -		 * the event handler, such as Windows.
>> -		 *
>> -		 * Scan as if it is a directory and invalidate the
>> -		 * cone under it.  (But remember to ignore items
>> -		 * between "name" and "name/", such as "name-" and
>> -		 * "name.".
>> -		 */
>> -		pos = -pos - 1;
>> -
>> -		for (i = pos; i < istate->cache_nr; i++) {
>> -			if (!starts_with(istate->cache[i]->name, name))
>> -				break;
>> -			if ((unsigned char)istate->cache[i]->name[len] > '/')
>> -				break;
>> -			if (istate->cache[i]->name[len] == '/')
>> -				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
>> -		}
>> +		handle_path_without_trailing_slash(istate, name, pos);
>>  	}
>>
>>  	/*
>> --
>> gitgitgadget
>>
>>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests
  2024-02-23  8:17     ` Junio C Hamano
@ 2024-02-26 17:12       ` Jeff Hostetler
  0 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-26 17:12 UTC (permalink / raw
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler



On 2/23/24 3:17 AM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Add non-existent "SKIPME" prereq to the case-insensitive tests.
>>
>> The previous commit added test cases to demonstrate an error where
>> FSMonitor can get confused on a case-insensitive file system when the
>> on-disk spelling of a file or directory is wrong.  Let's disable those
>> tests before we incrementally teach Git to properly recognize and
>> handle those types of problems (so that a bisect between here and the
>> final commit in this patch series won't throw a false alarm).
> 
> You talk about bisection, but hasn't the previous step already
> broken bisection without these SKIPME prerequisites?  IOW, shouldn't
> this step squashed into the previous?
> 
> Also, it is much more common to replace "test_expect_success" with
> "test_expect_failure" to indicate that the steps are broken.  Was
> there a reason why we choose to do it differently?

In step 2 I created test with individual step failures baked into
the "! grep -q" steps in the bottom of each test.  I didn't want a
failure in the 50-60 lines of setup code to cause a false alarm.
So the step 2 test "succeeds" by detecting that the output is
incomplete/wrong.

I wanted to use a "test_must_fail" on those individual grep lines
rather than a negated grep, but something complained about that
function only worked on "git" commands.

I added the SKIPME here in step 3 so that I could fix the series
in small steps and without worrying about which of the small steps
caused the file or directory case to stop being broken (which might
cause confusion if someone were bisecting in this part of the history.

Let me try again with the normal "test_expect_failure" in step 2,
drop step 3, and smash step 16 into step 14.  With the rearranging
that I did in V2, both directories and files should be fixed in the
same final step -- rather than in separate steps.

Thanks
Jeff


> 
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   t/t7527-builtin-fsmonitor.sh | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
>> index 3d21295f789..4acb547819c 100755
>> --- a/t/t7527-builtin-fsmonitor.sh
>> +++ b/t/t7527-builtin-fsmonitor.sh
>> @@ -1051,7 +1051,7 @@ test_expect_success 'split-index and FSMonitor work well together' '
>>   #
>>   # The setup is a little contrived.
>>   #
>> -test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>> +test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>>   	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
>>   
>>   	git init subdir_case_wrong &&
>> @@ -1128,7 +1128,7 @@ test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>>   	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
>>   '
>>   
>> -test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
>> +test_expect_success SKIPME,CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
>>   	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
>>   
>>   	git init file_case_wrong &&

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions
  2024-02-23 17:36     ` Junio C Hamano
@ 2024-02-26 18:45       ` Jeff Hostetler
  0 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-26 18:45 UTC (permalink / raw
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler



On 2/23/24 12:36 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Move the call to invalidate the untracked cache for the FSEvent
>> pathname into the two helper functions.
>>
>> In a later commit in this series, we will call these helpers
>> from other contexts and it safer to include the UC invalidation
>> in the helper than to remember to also add it to each helper
>> call-site.
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   fsmonitor.c | 26 +++++++++++++++++++-------
>>   1 file changed, 19 insertions(+), 7 deletions(-)
> 
> Thanks.  The steps in this iteration makes this move much less
> confusing to me than in the previous one.  We used to call one of
> "handle path with/without trailing slash" functions and then called
> the invalidation.  Now the invalidation happens in these "handle path"
> functions.
> 
> The unexplained change in behaviour is that we used to do the rest
> of "handle path" and invalidation was done at the end.  Now we do it
> upfront.  I think the "rest" works solely based on what is in the
> main in-core index array (i.e. the_index.cache[] aka active_cache[])
> and affects only what is in the in-core index array, while
> untracked_cache_invalidate*() works solely based on what is in the
> untracked cache extension (i.e. the_index.untracked) and affects
> only what is in there, so the order of these two does not matter.
> 
> Am I correct?
> 
> Or does it affect correctness or performance or whatever in any way?
> IOW, is there a reason why it is better to do the invalidation first
> and then doing the "rest" after (hence this patch flips the order of
> two to _improve_ something)?
> 
> Thanks.

The ce_flags invalidation and the untracked-cache invalidation are
independent (as far as I could tell) and it doesn't matter which
order we do them.  Moving the UC to the start of the function was
an attempt to avoid the usual "goto the bottom" or the need to guard
against early "return" statements that were present in some of the
original code (or my various refactorings).  Moving it to the top
just let me get it out of the way and not have to contrive things.

I'll update the commit message.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 14/16] fsmonitor: support case-insensitive events
  2024-02-23 18:14     ` Junio C Hamano
@ 2024-02-26 20:41       ` Jeff Hostetler
  2024-02-26 21:18         ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-26 20:41 UTC (permalink / raw
  To: Junio C Hamano, Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler



On 2/23/24 1:14 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +/*
>> + * Use the name-hash to do a case-insensitive cache-entry lookup with
>> + * the pathname and invalidate the cache-entry.
>> + *
>> + * Returns the number of cache-entries that we invalidated.
>> + */
>> +static size_t handle_using_name_hash_icase(
>> +	struct index_state *istate, const char *name)
>> +{
>> +	struct cache_entry *ce = NULL;
>> +
>> +	ce = index_file_exists(istate, name, strlen(name), 1);
>> +	if (!ce)
>> +		return 0;
>> +
>> +	/*
>> +	 * A case-insensitive search in the name-hash using the
>> +	 * observed pathname found a cache-entry, so the observed path
>> +	 * is case-incorrect.  Invalidate the cache-entry and use the
>> +	 * correct spelling from the cache-entry to invalidate the
>> +	 * untracked-cache.  Since we now have sparse-directories in
>> +	 * the index, the observed pathname may represent a regular
>> +	 * file or a sparse-index directory.
>> +	 *
>> +	 * Note that we should not have seen FSEvents for a
>> +	 * sparse-index directory, but we handle it just in case.
>> +	 *
>> +	 * Either way, we know that there are not any cache-entries for
>> +	 * children inside the cone of the directory, so we don't need to
>> +	 * do the usual scan.
>> +	 */
>> +	trace_printf_key(&trace_fsmonitor,
>> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
>> +			 name, ce->name);
>> +
>> +	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
>> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
>> +	return 1;
>> +}
> 
> You first ask the name-hash to turn the incoming "name" into the
> case variant that we know about, i.e. ce->name, and use that to
> access the untracked cache.  Clever and makes sense.  But if we have
> ce->name, doesn't it mean the name is tracked?  Do we find anything
> useful to do in the untracked cache invalidation codepath in that
> case?
> 
> An FSmonitor event with case-incorrect pathname for a directory may
> not be this trivial, I presume, and I expect that is what the
> remainder of this patch is about.

We're going to use "handle_using_name_hash_icase()" to lookup both
qualified (with trailing slash provided by the daemon) paths and
unqualified paths (either a file or a directory on a platform that
can't tell), so there are 3 cases to worry about.

If we fail to find a cache-entry in the name-hash, we know nothing
about the path and we still have the three cases to worry about and
we let the caller deal with that.

If we DO find a matching cache-entry, then it is either a tracked
file or one of the new sparse-directories cache-entries.  We now know
the correct case-spelling.  I don't think it is possible for the UC
to have an entry for this spelling, so you're right, we may not need
to explicitly invalidate the UC here.  I'll add a comment to the code
about this.


> 
>> +
>> +/*
>> + * Use the dir-name-hash to find the correct-case spelling of the
>> + * directory.  Use the canonical spelling to invalidate all of the
>> + * cache-entries within the matching cone.
>> + *
>> + * Returns the number of cache-entries that we invalidated.
>> + */
>> +static size_t handle_using_dir_name_hash_icase(
>> +	struct index_state *istate, const char *name)
> 
> It is a bit unfortunate that here on the name-hash side we contrast
> the two helper function variants as "dir-name" vs "name", while the
> original handle_path side use "without_slash" vs "with_slash".
> 
> If I understand correctly, it is not like there are two distinct
> hashes, "name-hash" vs "dir-name-hash".  Both of these helpers use
> the same "name-hash" mechanism, and this function differs from the
> previous one in that it is about a directory, which is why it has
> "dir" in its name.  I wonder if we renamed the other one with
> "nondir" in its name, and the other without_slash and with_slash
> pair to match, e.g., handle_nondir_path() vs handle_dir_path(), or
> something like that, the resulting names for these four functions
> become easier to contrast and understand?

name-hash.[ch] has 2 distinct hash-maps inside it. The "name-hash"
that we typically think about. And a well-hidden "dir-name-hash"
in the same source file.

The "name-hash" maps each cache-entry's pathname to its ce* in
the cache-entry[] (case-insensitively).

The "dir-name-hash" maps each unique directory prefix over all
of the cache-entries to the case-correct prefix.  That is, if the
index contains "dir1/Dir2/DIR3/file1" and "dir1/dir4/file2", the
dir hash will have 4 entries
     { "dir1", "dir1/Dir2", "dir1/Dir2/DIR3", "dir1/dir4" }.
This lets us do lookups without having to do a linear search on
the entire cache-entry[] every time.

These 2 hashes are demand-loaded only when needed (and usually only
when ignore_case is set IIRC).

When "handle_using_dir_name_hash_icase()" is called we still don't
know if the pathname is actually a file or directory, all we know
is that we did not find a case-sensitive exact match nor a
case-insensitive match against the cache-entry[] using the name-hash.
The pathname could be a (unqualified) directory or just a plain
untracked file.  So here, if we find it in the dir-name-hash, we now
know that it is a directory and that there was a case-error and we now
know the directory's correct case-spelling.

So we use that discovered case-correct spelling to invalidate the
untracked-cache.

> 
>> +{
>> +	struct strbuf canonical_path = STRBUF_INIT;
>> +	int pos;
>> +	size_t len = strlen(name);
>> +	size_t nr_in_cone;
>> +
>> +	if (name[len - 1] == '/')
>> +		len--;
>> +
>> +	if (!index_dir_find(istate, name, len, &canonical_path))
>> +		return 0; /* name is untracked */
>> +
>> +	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
>> +		strbuf_release(&canonical_path);
>> +		/*
>> +		 * NEEDSWORK: Our caller already tried an exact match
>> +		 * and failed to find one.  They called us to do an
>> +		 * ICASE match, so we should never get an exact match,
>> +		 * so we could promote this to a BUG() here if we
>> +		 * wanted to.  It doesn't hurt anything to just return
>> +		 * 0 and go on becaus we should never get here.  Or we
>> +		 * could just get rid of the memcmp() and this "if"
>> +		 * clause completely.
>> +		 */
>> +		return 0; /* should not happen */
>> +	}
> 
> "becaus" -> "because".
> 
> If we should never get here, having BUG("we should never get here")
> would not hurt anything, either.  On the other hand, silently
> returning 0 will hide the bug under the carpet, and I am not sure it
> is fair to call it "doesn't hurt anything".

I'll make it a BUG().

> 
>> +
>> +	trace_printf_key(&trace_fsmonitor,
>> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
>> +			 name, canonical_path.buf);
>> +
>> +	/*
>> +	 * The dir-name-hash only tells us the corrected spelling of
>> +	 * the prefix.  We have to use this canonical path to do a
>> +	 * lookup in the cache-entry array so that we repeat the
>> +	 * original search using the case-corrected spelling.
>> +	 */
>> +	strbuf_addch(&canonical_path, '/');
>> +	pos = index_name_pos(istate, canonical_path.buf,
>> +			     canonical_path.len);
>> +	nr_in_cone = handle_path_with_trailing_slash(
>> +		istate, canonical_path.buf, pos);
>> +	strbuf_release(&canonical_path);
>> +	return nr_in_cone;
>> +}
> 
> Nice.  Do we need to give this corrected name to help untracked
> cache invalidation from the caller that called us?

In an earlier commit, I moved the call to invalidate the untracked-cache
into the two handle_path_with[out]_trailing_slash() functions so that
we wouldn't have to worry about it here.

> 
>> @@ -319,6 +416,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>>   	else
>>   		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
>>   
>> +	/*
>> +	 * If we did not find an exact match for this pathname or any
>> +	 * cache-entries with this directory prefix and we're on a
>> +	 * case-insensitive file system, try again using the name-hash
>> +	 * and dir-name-hash.
>> +	 */
>> +	if (!nr_in_cone && ignore_case) {
>> +		nr_in_cone = handle_using_name_hash_icase(istate, name);
>> +		if (!nr_in_cone)
>> +			nr_in_cone = handle_using_dir_name_hash_icase(
>> +				istate, name);
>> +	}
> 
> It might be interesting to learn how often we go through these
> "fallback" code paths by tracing.  Maybe it will become too noisy?
> I dunno.

I'm afraid it will be very noisy. On Windows and Mac we'll probably end
up falling back for anything that is untracked, unfortunately.  That is,
if there is no cache-entry for "foo.obj", then we'll look for a
case-error (maybe there is a tracked "FOO.OBJ" file in the index or a
"Foo.Obj" directory), before we can say it is untracked.

I'm not happy about this (and no, I haven't had time to measure the
perf hit we'll take), but right now I'm just worried about the
correctness -- I've had several reports of stale/incomplete status
when IDE tools change file/directory case in unexpected ways....

Thanks
Jeff

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 14/16] fsmonitor: support case-insensitive events
  2024-02-25 13:10     ` Torsten Bögershausen
@ 2024-02-26 20:47       ` Jeff Hostetler
  0 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler @ 2024-02-26 20:47 UTC (permalink / raw
  To: Torsten Bögershausen, Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler



On 2/25/24 8:10 AM, Torsten Bögershausen wrote:
>> +	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
>> +		strbuf_release(&canonical_path);
>> +		/*
>> +		 * NEEDSWORK: Our caller already tried an exact match
>> +		 * and failed to find one.  They called us to do an
>> +		 * ICASE match, so we should never get an exact match,
>> +		 * so we could promote this to a BUG() here if we
>> +		 * wanted to.  It doesn't hurt anything to just return
>> +		 * 0 and go on becaus we should never get here.  Or we
>> +		 * could just get rid of the memcmp() and this "if"
>> +		 * clause completely.
>> +		 */
>> +		return 0; /* should not happen */
> 
> In very very theory, there may be a race-condition,
> when a directory is renamed very fast, more than once.
> I don't think, that the "it did not match exactly, but
> now it matches" is a problem.
> Question: Does it make sense to just remove this ?
> And, may be, find out that the "corrected spelling (tm)"
> of "DIR1" is not "dir1", neither "Dir1", but, exactly, "DIR1" ?
> Would that be a problem ?
> 

I just meant that the dir-name-hash that we computed when
we loaded the index found an exact-case match here that
wasn't found when called index_name_pos() and the negative
"pos" didn't point to this exact-case prefix.  This should
not happen.

Yeah I didn't think it should be a fatal condition, but
since it shouldn't happen, we can make it a BUG() and see.

Jeff

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 14/16] fsmonitor: support case-insensitive events
  2024-02-26 20:41       ` Jeff Hostetler
@ 2024-02-26 21:18         ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-26 21:18 UTC (permalink / raw
  To: Jeff Hostetler
  Cc: Jeff Hostetler via GitGitGadget, git, Patrick Steinhardt,
	Jeff Hostetler

Jeff Hostetler <git@jeffhostetler.com> writes:

> I'm not happy about this (and no, I haven't had time to measure the
> perf hit we'll take), but right now I'm just worried about the
> correctness -- I've had several reports of stale/incomplete status
> when IDE tools change file/directory case in unexpected ways....

Of course, it is a good discipline, and I fully support the
direction, to focus on the correctness first.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems
  2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                     ` (15 preceding siblings ...)
  2024-02-23  3:18   ` [PATCH v2 16/16] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39   ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 01/14] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
                       ` (15 more replies)
  16 siblings, 16 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler

Here is version 3. I think I have addressed the remaining comments.

I cleaned up the test code to use the test_expect_failure at the beginning
and squashed in the test_expect_success version of tests into the final
commit in the series.

I moved the invalidate_ce_fsm() commit earlier in the series, so that the
final commit actually uses all of the up-to-this-point changes to fix the
problem.

I converted a few "should not happens" to BUG()s.

Thanks to everyone for their time and attention reviewing this. Jeff

Jeff Hostetler (14):
  name-hash: add index_dir_find()
  t7527: add case-insensitve test for FSMonitor
  fsmonitor: refactor refresh callback on directory events
  fsmonitor: clarify handling of directory events in callback helper
  fsmonitor: refactor refresh callback for non-directory events
  dir: create untracked_cache_invalidate_trimmed_path()
  fsmonitor: refactor untracked-cache invalidation
  fsmonitor: move untracked-cache invalidation into helper functions
  fsmonitor: return invalidated cache-entry count on directory event
  fsmonitor: remove custom loop from non-directory path handler
  fsmonitor: return invalided cache-entry count on non-directory event
  fsmonitor: trace the new invalidated cache-entry count
  fsmonitor: refactor bit invalidation in refresh callback
  fsmonitor: support case-insensitive events

 dir.c                        |  20 +++
 dir.h                        |   7 +
 fsmonitor.c                  | 312 +++++++++++++++++++++++++++++------
 name-hash.c                  |   9 +-
 name-hash.h                  |   7 +-
 t/t7527-builtin-fsmonitor.sh | 223 +++++++++++++++++++++++++
 6 files changed, 522 insertions(+), 56 deletions(-)


base-commit: f41f85c9ec8d4d46de0fd5fded88db94d3ec8c11
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1662%2Fjeffhostetler%2Ffsmonitor-ignore-case-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1662/jeffhostetler/fsmonitor-ignore-case-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1662

Range-diff vs v2:

  1:  03b07d9c25e !  1:  64ae07aaeaa name-hash: add index_dir_find()
     @@ Metadata
       ## Commit message ##
          name-hash: add index_dir_find()
      
     -    Replace the index_dir_exists() function with index_dir_find() and
     -    change the API to take an optional strbuf to return the canonical
     -    spelling of the matched directory prefix.
     +    index_dir_exists() returns a boolean to indicate if there is a
     +    case-insensitive match in the directory name-hash, but does not
     +    provide the caller with the exact spelling of that match.
      
     -    Create an index_dir_exists() wrapper macro for existing callers.
     +    Create index_dir_find() to do the case-insensitive search *and*
     +    optionally return the spelling of the matched directory prefix in a
     +    provided strbuf.
      
     -    The existing index_dir_exists() returns a boolean to indicate if
     -    there is a case-insensitive match in the directory name-hash, but
     -    it doesn't tell the caller the exact spelling of that match.
     -
     -    The new version also copies the matched spelling to a provided strbuf.
     -    This lets the caller, for example, then call index_name_pos() with the
     -    correct case to search the cache-entry array for the real insertion
     -    position.
     +    To avoid code duplication, convert index_dir_exists() to be a trivial
     +    wrapper around the new index_dir_find().
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
  2:  7778cee1c10 !  2:  beeebf55963 t7527: add case-insensitve test for FSMonitor
     @@ t/t7527-builtin-fsmonitor.sh: test_expect_success 'split-index and FSMonitor wor
      +#
      +# The setup is a little contrived.
      +#
     -+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     ++test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
      +	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
      +
      +	git init subdir_case_wrong &&
     @@ t/t7527-builtin-fsmonitor.sh: test_expect_success 'split-index and FSMonitor wor
      +	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
      +	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
      +
     -+	# However, with the fsmonitor client bug, the "(pos -3)" causes
     -+	# the client to not update the bit and never rescan the file
     -+	# and therefore not report it as dirty.
     -+	! grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
     ++	# Expect Breakage: with the case confusion, the "(pos -3)" causes
     ++	# the client to not clear the CE_FSMONITOR_VALID bit and therefore
     ++	# status will not rescan the file and therefore not report it as dirty.
     ++	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
      +'
      +
     -+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     ++test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
      +	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
      +
      +	git init file_case_wrong &&
     @@ t/t7527-builtin-fsmonitor.sh: test_expect_success 'split-index and FSMonitor wor
      +	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
      +	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
      +
     -+	# Status should say these files are modified, but with the case
     -+	# bug, the "pos -3" cause the client to not update the FSM bit
     -+	# and never cause the file to be rescanned and therefore to not
     -+	# report it dirty.
     -+	! grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
     -+	! grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
     ++	# Expect Breakage: with the case confusion, the "(pos-3)" and
     ++	# "(pos -9)" causes the client to not clear the CE_FSMONITOR_VALID
     ++	# bit and therefore status will not rescan the files and therefore
     ++	# not report them as dirty.
     ++	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
     ++	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
      +'
      +
       test_done
  3:  dad079ade7f <  -:  ----------- t7527: temporarily disable case-insensitive tests
  4:  5516670e30e =  3:  518cb4dd5df fsmonitor: refactor refresh callback on directory events
  5:  c04fd4eae94 =  4:  9a4b5bf990b fsmonitor: clarify handling of directory events in callback helper
  6:  7ee6ca1aefd !  5:  348b9b0c94e fsmonitor: refactor refresh callback for non-directory events
     @@ Metadata
       ## Commit message ##
          fsmonitor: refactor refresh callback for non-directory events
      
     -    Move the code handle unqualified FSEvents (without a trailing slash)
     -    into a helper function.
     +    Move the code that handles unqualified FSEvents (without a trailing
     +    slash) into a helper function.
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
  7:  99c0d3e0742 !  6:  ed735e3f1cb dir: create untracked_cache_invalidate_trimmed_path()
     @@ dir.c: void untracked_cache_invalidate_path(struct index_state *istate,
      +	size_t len = strlen(path);
      +
      +	if (!len)
     -+		return; /* should not happen */
     ++		BUG("untracked_cache_invalidate_trimmed_path given zero length path");
      +
      +	if (path[len - 1] != '/') {
      +		untracked_cache_invalidate_path(istate, path, safe_path);
  8:  f2d6765d84f =  7:  2a43c6cbe0d fsmonitor: refactor untracked-cache invalidation
  9:  af6f57ab3e6 !  8:  6e87ea6deaf fsmonitor: move untracked invalidation into helper functions
     @@ Metadata
      Author: Jeff Hostetler <jeffhostetler@github.com>
      
       ## Commit message ##
     -    fsmonitor: move untracked invalidation into helper functions
     +    fsmonitor: move untracked-cache invalidation into helper functions
      
     -    Move the call to invalidate the untracked cache for the FSEvent
     +    Move the call to invalidate the untracked-cache for the FSEvent
          pathname into the two helper functions.
      
          In a later commit in this series, we will call these helpers
          from other contexts and it safer to include the UC invalidation
     -    in the helper than to remember to also add it to each helper
     +    in the helpers than to remember to also add it to each helper
          call-site.
      
     +    This has the side-effect of invalidating the UC *before* we
     +    invalidate the ce_flags in the cache-entry.  These activities
     +    are independent and do not affect each other.  Also, by doing
     +    the UC work first, we can avoid worrying about "early returns"
     +    or the need for the usual "goto the end" in each of the
     +    handler functions.
     +
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
 10:  623c6f06e21 =  9:  5fea8b9476e fsmonitor: return invalidated cache-entry count on directory event
 11:  1853f77d333 = 10:  3fa7536cf80 fsmonitor: remove custom loop from non-directory path handler
 12:  f77d68c78ad ! 11:  53f73c1515d fsmonitor: return invalided cache-entry count on non-directory event
     @@ Metadata
       ## Commit message ##
          fsmonitor: return invalided cache-entry count on non-directory event
      
     -    Teah the refresh callback helper function for unqualified FSEvents
     +    Teach the refresh callback helper function for unqualified FSEvents
          (pathnames without a trailing slash) to return the number of
          cache-entries that were invalided in response to the event.
      
 13:  58b36673e15 = 12:  0148319aea5 fsmonitor: trace the new invalidated cache-entry count
 15:  3a20065dbf8 ! 13:  04867eccfcd fsmonitor: refactor bit invalidation in refresh callback
     @@ Commit message
          it to help debug edge cases.
      
          This is similar to the existing `mark_fsmonitor_invalid()` function,
     -    but we don't need the extra stuff that it does.
     +    but it also does untracked-cache invalidation and we've already
     +    handled that in the refresh-callback handlers, so but we don't need
     +    to repeat that.
      
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
      @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     - static size_t handle_path_with_trailing_slash(
     - 	struct index_state *istate, const char *name, int pos);
     + 	return result;
     + }
       
      +/*
      + * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
     -+ * but we've already handled the untracked-cache and I want a different
     -+ * trace message.
     ++ * but we've already handled the untracked-cache, so let's not repeat that
     ++ * work.  This also lets us have a different trace message so that we can
     ++ * see everything that was done as part of the refresh-callback.
      + */
      +static void invalidate_ce_fsm(struct cache_entry *ce)
      +{
     -+	if (ce->ce_flags & CE_FSMONITOR_VALID)
     ++	if (ce->ce_flags & CE_FSMONITOR_VALID) {
      +		trace_printf_key(&trace_fsmonitor,
      +				 "fsmonitor_refresh_callback INV: '%s'",
      +				 ce->name);
     -+	ce->ce_flags &= ~CE_FSMONITOR_VALID;
     ++		ce->ce_flags &= ~CE_FSMONITOR_VALID;
     ++	}
      +}
      +
     - /*
     -  * Use the name-hash to do a case-insensitive cache-entry lookup with
     -  * the pathname and invalidate the cache-entry.
     -@@ fsmonitor.c: static size_t handle_using_name_hash_icase(
     - 
     - 	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
     - 
     --	ce->ce_flags &= ~CE_FSMONITOR_VALID;
     -+	invalidate_ce_fsm(ce);
     - 	return 1;
     - }
     + static size_t handle_path_with_trailing_slash(
     + 	struct index_state *istate, const char *name, int pos);
       
      @@ fsmonitor.c: static size_t handle_path_without_trailing_slash(
       		 * cache-entry with the same pathname, nor for a cone
 14:  288f3f4e54e ! 14:  ec036c04d1b fsmonitor: support case-insensitive events
     @@ Commit message
          Update event handling to optionally use the name-hash and dir-name-hash
          if necessary.
      
     +    Also update t7527 to convert the "test_expect_failure" to "_success"
     +    now that we have fixed the bug.
     +
          Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
      
       ## fsmonitor.c ##
     @@ fsmonitor.c
       #include "run-command.h"
       #include "strbuf.h"
       #include "trace2.h"
     -@@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
     +@@ fsmonitor.c: static void invalidate_ce_fsm(struct cache_entry *ce)
       static size_t handle_path_with_trailing_slash(
       	struct index_state *istate, const char *name, int pos);
       
     @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
      +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
      +			 name, ce->name);
      +
     ++	/*
     ++	 * NEEDSWORK: We used the name-hash to find the correct
     ++	 * case-spelling of the pathname in the cache-entry[], so
     ++	 * technically this is a tracked file or a sparse-directory.
     ++	 * It should not have any entries in the untracked-cache, so
     ++	 * we should not need to use the case-corrected spelling to
     ++	 * invalidate the the untracked-cache.  So we may not need to
     ++	 * do this.  For now, I'm going to be conservative and always
     ++	 * do it; we can revisit this later.
     ++	 */
      +	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
      +
     -+	ce->ce_flags &= ~CE_FSMONITOR_VALID;
     ++	invalidate_ce_fsm(ce);
      +	return 1;
      +}
      +
     @@ fsmonitor.c: static int query_fsmonitor_hook(struct repository *r,
      +		 * ICASE match, so we should never get an exact match,
      +		 * so we could promote this to a BUG() here if we
      +		 * wanted to.  It doesn't hurt anything to just return
     -+		 * 0 and go on becaus we should never get here.  Or we
     ++		 * 0 and go on because we should never get here.  Or we
      +		 * could just get rid of the memcmp() and this "if"
      +		 * clause completely.
      +		 */
     -+		return 0; /* should not happen */
     ++		BUG("handle_using_dir_name_hash_icase(%s) did not exact match",
     ++		    name);
      +	}
      +
      +	trace_printf_key(&trace_fsmonitor,
     @@ fsmonitor.c: static void fsmonitor_refresh_callback(struct index_state *istate,
       	if (nr_in_cone)
       		trace_printf_key(&trace_fsmonitor,
       				 "fsmonitor_refresh_callback CNT: %d",
     +
     + ## t/t7527-builtin-fsmonitor.sh ##
     +@@ t/t7527-builtin-fsmonitor.sh: test_expect_success 'split-index and FSMonitor work well together' '
     + #
     + # The setup is a little contrived.
     + #
     +-test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     ++test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     + 	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
     + 
     + 	git init subdir_case_wrong &&
     +@@ t/t7527-builtin-fsmonitor.sh: test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
     + 
     + 	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
     + 
     ++	# Verify that we get a mapping event to correct the case.
     ++	grep -q "MAP:.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
     ++		"$PWD/subdir_case_wrong.log1" &&
     ++
     + 	# The refresh-callbacks should have caused "git status" to clear
     + 	# the CE_FSMONITOR_VALID bit on each of those files and caused
     + 	# the worktree scan to visit them and mark them as modified.
     + 	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
     + 	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
     +-
     +-	# Expect Breakage: with the case confusion, the "(pos -3)" causes
     +-	# the client to not clear the CE_FSMONITOR_VALID bit and therefore
     +-	# status will not rescan the file and therefore not report it as dirty.
     + 	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
     + '
     + 
     +-test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     ++test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     + 	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
     + 
     + 	git init file_case_wrong &&
     +@@ t/t7527-builtin-fsmonitor.sh: test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
     + 	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try3.log" \
     + 		git -C file_case_wrong --no-optional-locks status --short \
     + 			>"$PWD/file_case_wrong-try3.out" &&
     ++
     ++	# Verify that we get a mapping event to correct the case.
     ++	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
     ++		"$PWD/file_case_wrong-try3.log" &&
     ++	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
     ++		"$PWD/file_case_wrong-try3.log" &&
     ++
     + 	# FSEvents are in observed case.
     + 	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
     + 	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
     + 
     +-	# Expect Breakage: with the case confusion, the "(pos-3)" and
     +-	# "(pos -9)" causes the client to not clear the CE_FSMONITOR_VALID
     +-	# bit and therefore status will not rescan the files and therefore
     +-	# not report them as dirty.
     ++	# The refresh-callbacks should have caused "git status" to clear
     ++	# the CE_FSMONITOR_VALID bit on each of those files and caused
     ++	# the worktree scan to visit them and mark them as modified.
     + 	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
     + 	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
     + '
 16:  467d3c1fe2c <  -:  ----------- t7527: update case-insenstive fsmonitor test

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 01/14] name-hash: add index_dir_find()
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 02/14] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
                       ` (14 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

index_dir_exists() returns a boolean to indicate if there is a
case-insensitive match in the directory name-hash, but does not
provide the caller with the exact spelling of that match.

Create index_dir_find() to do the case-insensitive search *and*
optionally return the spelling of the matched directory prefix in a
provided strbuf.

To avoid code duplication, convert index_dir_exists() to be a trivial
wrapper around the new index_dir_find().

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 name-hash.c | 9 ++++++++-
 name-hash.h | 7 ++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index 251f036eef6..3a58ce03d9c 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -685,13 +685,20 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
 	return slow_same_name(name, namelen, ce->name, len);
 }
 
-int index_dir_exists(struct index_state *istate, const char *name, int namelen)
+int index_dir_find(struct index_state *istate, const char *name, int namelen,
+		   struct strbuf *canonical_path)
 {
 	struct dir_entry *dir;
 
 	lazy_init_name_hash(istate);
 	expand_to_path(istate, name, namelen, 0);
 	dir = find_dir_entry(istate, name, namelen);
+
+	if (canonical_path && dir && dir->nr) {
+		strbuf_reset(canonical_path);
+		strbuf_add(canonical_path, dir->name, dir->namelen);
+	}
+
 	return dir && dir->nr;
 }
 
diff --git a/name-hash.h b/name-hash.h
index b1b4b0fb337..0cbfc428631 100644
--- a/name-hash.h
+++ b/name-hash.h
@@ -4,7 +4,12 @@
 struct cache_entry;
 struct index_state;
 
-int index_dir_exists(struct index_state *istate, const char *name, int namelen);
+
+int index_dir_find(struct index_state *istate, const char *name, int namelen,
+		   struct strbuf *canonical_path);
+
+#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)
+
 void adjust_dirname_case(struct index_state *istate, char *name);
 struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 02/14] t7527: add case-insensitve test for FSMonitor
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 01/14] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 03/14] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
                       ` (13 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

The FSMonitor client code trusts the spelling of the pathnames in the
FSEvents received from the FSMonitor daemon.  On case-insensitive file
systems, these OBSERVED pathnames may be spelled differently than the
EXPECTED pathnames listed in the .git/index.  This causes a miss when
using `index_name_pos()` which expects the given case to be correct.

When this happens, the FSMonitor client code does not update the state
of the CE_FSMONITOR_VALID bit when refreshing the index (and before
starting to scan the worktree).

This results in modified files NOT being reported by `git status` when
there is a discrepancy in the case-spelling of a tracked file's
pathname.

This commit contains a (rather contrived) test case to demonstrate
this.  A later commit in this series will update the FSMonitor client
code to recognize these discrepancies and update the CE_ bit accordingly.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t7527-builtin-fsmonitor.sh | 217 +++++++++++++++++++++++++++++++++++
 1 file changed, 217 insertions(+)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 363f9dc0e41..830f2d9de33 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1037,4 +1037,221 @@ test_expect_success 'split-index and FSMonitor work well together' '
 	)
 '
 
+# The FSMonitor daemon reports the OBSERVED pathname of modified files
+# and thus contains the OBSERVED spelling on case-insensitive file
+# systems.  The daemon does not (and should not) load the .git/index
+# file and therefore does not know the expected case-spelling.  Since
+# it is possible for the user to create files/subdirectories with the
+# incorrect case, a modified file event for a tracked will not have
+# the EXPECTED case. This can cause `index_name_pos()` to incorrectly
+# report that the file is untracked. This causes the client to fail to
+# mark the file as possibly dirty (keeping the CE_FSMONITOR_VALID bit
+# set) so that `git status` will avoid inspecting it and thus not
+# present in the status output.
+#
+# The setup is a little contrived.
+#
+test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
+	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
+
+	git init subdir_case_wrong &&
+	(
+		cd subdir_case_wrong &&
+		echo x >AAA &&
+		echo x >BBB &&
+
+		mkdir dir1 &&
+		echo x >dir1/file1 &&
+		mkdir dir1/dir2 &&
+		echo x >dir1/dir2/file2 &&
+		mkdir dir1/dir2/dir3 &&
+		echo x >dir1/dir2/dir3/file3 &&
+
+		echo x >yyy &&
+		echo x >zzz &&
+		git add . &&
+		git commit -m "data" &&
+
+		# This will cause "dir1/" and everything under it
+		# to be deleted.
+		git sparse-checkout set --cone --sparse-index &&
+
+		# Create dir2 with the wrong case and then let Git
+		# repopulate dir3 -- it will not correct the spelling
+		# of dir2.
+		mkdir dir1 &&
+		mkdir dir1/DIR2 &&
+		git sparse-checkout add dir1/dir2/dir3
+	) &&
+
+	start_daemon -C subdir_case_wrong --tf "$PWD/subdir_case_wrong.trace" &&
+
+	# Enable FSMonitor in the client. Run enough commands for
+	# the .git/index to sync up with the daemon with everything
+	# marked clean.
+	git -C subdir_case_wrong config core.fsmonitor true &&
+	git -C subdir_case_wrong update-index --fsmonitor &&
+	git -C subdir_case_wrong status &&
+
+	# Make some files dirty so that FSMonitor gets FSEvents for
+	# each of them.
+	echo xx >>subdir_case_wrong/AAA &&
+	echo xx >>subdir_case_wrong/dir1/DIR2/dir3/file3 &&
+	echo xx >>subdir_case_wrong/zzz &&
+
+	GIT_TRACE_FSMONITOR="$PWD/subdir_case_wrong.log" \
+		git -C subdir_case_wrong --no-optional-locks status --short \
+			>"$PWD/subdir_case_wrong.out" &&
+
+	# "git status" should have gotten file events for each of
+	# the 3 files.
+	#
+	# "dir2" should be in the observed case on disk.
+	grep "fsmonitor_refresh_callback" \
+		<"$PWD/subdir_case_wrong.log" \
+		>"$PWD/subdir_case_wrong.log1" &&
+
+	grep -q "AAA.*pos 0" "$PWD/subdir_case_wrong.log1" &&
+	grep -q "zzz.*pos 6" "$PWD/subdir_case_wrong.log1" &&
+
+	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
+
+	# The refresh-callbacks should have caused "git status" to clear
+	# the CE_FSMONITOR_VALID bit on each of those files and caused
+	# the worktree scan to visit them and mark them as modified.
+	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
+	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
+
+	# Expect Breakage: with the case confusion, the "(pos -3)" causes
+	# the client to not clear the CE_FSMONITOR_VALID bit and therefore
+	# status will not rescan the file and therefore not report it as dirty.
+	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
+'
+
+test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
+	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
+
+	git init file_case_wrong &&
+	(
+		cd file_case_wrong &&
+		echo x >AAA &&
+		echo x >BBB &&
+
+		mkdir dir1 &&
+		mkdir dir1/dir2 &&
+		mkdir dir1/dir2/dir3 &&
+		echo x >dir1/dir2/dir3/FILE-3-B &&
+		echo x >dir1/dir2/dir3/XXXX-3-X &&
+		echo x >dir1/dir2/dir3/file-3-a &&
+		echo x >dir1/dir2/dir3/yyyy-3-y &&
+		mkdir dir1/dir2/dir4 &&
+		echo x >dir1/dir2/dir4/FILE-4-A &&
+		echo x >dir1/dir2/dir4/XXXX-4-X &&
+		echo x >dir1/dir2/dir4/file-4-b &&
+		echo x >dir1/dir2/dir4/yyyy-4-y &&
+
+		echo x >yyy &&
+		echo x >zzz &&
+		git add . &&
+		git commit -m "data"
+	) &&
+
+	start_daemon -C file_case_wrong --tf "$PWD/file_case_wrong.trace" &&
+
+	# Enable FSMonitor in the client. Run enough commands for
+	# the .git/index to sync up with the daemon with everything
+	# marked clean.
+	git -C file_case_wrong config core.fsmonitor true &&
+	git -C file_case_wrong update-index --fsmonitor &&
+	git -C file_case_wrong status &&
+
+	# Make some files dirty so that FSMonitor gets FSEvents for
+	# each of them.
+	echo xx >>file_case_wrong/AAA &&
+	echo xx >>file_case_wrong/zzz &&
+
+	# Rename some files so that FSMonitor sees a create and delete
+	# FSEvent for each.  (A simple "mv foo FOO" is not portable
+	# between macOS and Windows. It works on both platforms, but makes
+	# the test messy, since (1) one platform updates "ctime" on the
+	# moved file and one does not and (2) it causes a directory event
+	# on one platform and not on the other which causes additional
+	# scanning during "git status" which causes a "H" vs "h" discrepancy
+	# in "git ls-files -f".)  So old-school it and move it out of the
+	# way and copy it to the case-incorrect name so that we get fresh
+	# "ctime" and "mtime" values.
+
+	mv file_case_wrong/dir1/dir2/dir3/file-3-a file_case_wrong/dir1/dir2/dir3/ORIG &&
+	cp file_case_wrong/dir1/dir2/dir3/ORIG     file_case_wrong/dir1/dir2/dir3/FILE-3-A &&
+	rm file_case_wrong/dir1/dir2/dir3/ORIG &&
+	mv file_case_wrong/dir1/dir2/dir4/FILE-4-A file_case_wrong/dir1/dir2/dir4/ORIG &&
+	cp file_case_wrong/dir1/dir2/dir4/ORIG     file_case_wrong/dir1/dir2/dir4/file-4-a &&
+	rm file_case_wrong/dir1/dir2/dir4/ORIG &&
+
+	# Run status enough times to fully sync.
+	#
+	# The first instance should get the create and delete FSEvents
+	# for each pair.  Status should update the index with a new FSM
+	# token (so the next invocation will not see data for these
+	# events).
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try1.log" \
+		git -C file_case_wrong status --short \
+			>"$PWD/file_case_wrong-try1.out" &&
+	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-3-a.*pos 4"  "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*FILE-4-A.*pos 6"  "$PWD/file_case_wrong-try1.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try1.log" &&
+
+	# FSM refresh will have invalidated the FSM bit and cause a regular
+	# (real) scan of these tracked files, so they should have "H" status.
+	# (We will not see a "h" status until the next refresh (on the next
+	# command).)
+
+	git -C file_case_wrong ls-files -f >"$PWD/file_case_wrong-lsf1.out" &&
+	grep -q "H dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-lsf1.out" &&
+	grep -q "H dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-lsf1.out" &&
+
+
+	# Try the status again. We assume that the above status command
+	# advanced the token so that the next one will not see those events.
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try2.log" \
+		git -C file_case_wrong status --short \
+			>"$PWD/file_case_wrong-try2.out" &&
+	! grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*file-3-a.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*FILE-4-A.*pos" "$PWD/file_case_wrong-try2.log" &&
+	! grep -q "fsmonitor_refresh_callback.*file-4-a.*pos" "$PWD/file_case_wrong-try2.log" &&
+
+	# FSM refresh saw nothing, so it will mark all files as valid,
+	# so they should now have "h" status.
+
+	git -C file_case_wrong ls-files -f >"$PWD/file_case_wrong-lsf2.out" &&
+	grep -q "h dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-lsf2.out" &&
+	grep -q "h dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-lsf2.out" &&
+
+
+	# We now have files with clean content, but with case-incorrect
+	# file names.  Modify them to see if status properly reports
+	# them.
+
+	echo xx >>file_case_wrong/dir1/dir2/dir3/FILE-3-A &&
+	echo xx >>file_case_wrong/dir1/dir2/dir4/file-4-a &&
+
+	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try3.log" \
+		git -C file_case_wrong --no-optional-locks status --short \
+			>"$PWD/file_case_wrong-try3.out" &&
+	# FSEvents are in observed case.
+	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
+	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
+
+	# Expect Breakage: with the case confusion, the "(pos-3)" and
+	# "(pos -9)" causes the client to not clear the CE_FSMONITOR_VALID
+	# bit and therefore status will not rescan the files and therefore
+	# not report them as dirty.
+	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
+	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 03/14] fsmonitor: refactor refresh callback on directory events
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 01/14] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 02/14] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 04/14] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
                       ` (12 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Move the code to handle directory FSEvents (containing pathnames with
a trailing slash) into a helper function.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 52 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index f670c509378..6fecae9aeb2 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static void handle_path_with_trailing_slash(
+	struct index_state *istate, const char *name, int pos)
+{
+	int i;
+
+	/*
+	 * The daemon can decorate directory events, such as
+	 * moves or renames, with a trailing slash if the OS
+	 * FS Event contains sufficient information, such as
+	 * MacOS.
+	 *
+	 * Use this to invalidate the entire cone under that
+	 * directory.
+	 *
+	 * We do not expect an exact match because the index
+	 * does not normally contain directory entries, so we
+	 * start at the insertion point and scan.
+	 */
+	if (pos < 0)
+		pos = -pos - 1;
+
+	/* Mark all entries for the folder invalid */
+	for (i = pos; i < istate->cache_nr; i++) {
+		if (!starts_with(istate->cache[i]->name, name))
+			break;
+		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+	}
+}
+
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
 	int i, len = strlen(name);
@@ -193,28 +222,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 			 name, pos);
 
 	if (name[len - 1] == '/') {
-		/*
-		 * The daemon can decorate directory events, such as
-		 * moves or renames, with a trailing slash if the OS
-		 * FS Event contains sufficient information, such as
-		 * MacOS.
-		 *
-		 * Use this to invalidate the entire cone under that
-		 * directory.
-		 *
-		 * We do not expect an exact match because the index
-		 * does not normally contain directory entries, so we
-		 * start at the insertion point and scan.
-		 */
-		if (pos < 0)
-			pos = -pos - 1;
-
-		/* Mark all entries for the folder invalid */
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		handle_path_with_trailing_slash(istate, name, pos);
 
 		/*
 		 * We need to remove the traling "/" from the path
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 04/14] fsmonitor: clarify handling of directory events in callback helper
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (2 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 03/14] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 05/14] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
                       ` (11 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Improve documentation of the refresh callback helper function
used for directory FSEvents.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 6fecae9aeb2..29cce32d81c 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,24 +183,35 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+/*
+ * The daemon can decorate directory events, such as a move or rename,
+ * by adding a trailing slash to the observed name.  Use this to
+ * explicitly invalidate the entire cone under that directory.
+ *
+ * The daemon can only reliably do that if the OS FSEvent contains
+ * sufficient information in the event.
+ *
+ * macOS FSEvents have enough information.
+ *
+ * Other platforms may or may not be able to do it (and it might
+ * depend on the type of event (for example, a daemon could lstat() an
+ * observed pathname after a rename, but not after a delete)).
+ *
+ * If we find an exact match in the index for a path with a trailing
+ * slash, it means that we matched a sparse-index directory in a
+ * cone-mode sparse-checkout (since that's the only time we have
+ * directories in the index).  We should never see this in practice
+ * (because sparse directories should not be present and therefore
+ * not generating FS events).  Either way, we can treat them in the
+ * same way and just invalidate the cache-entry and the untracked
+ * cache (and in this case, the forward cache-entry scan won't find
+ * anything and it doesn't hurt to let it run).
+ */
 static void handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
 	int i;
 
-	/*
-	 * The daemon can decorate directory events, such as
-	 * moves or renames, with a trailing slash if the OS
-	 * FS Event contains sufficient information, such as
-	 * MacOS.
-	 *
-	 * Use this to invalidate the entire cone under that
-	 * directory.
-	 *
-	 * We do not expect an exact match because the index
-	 * does not normally contain directory entries, so we
-	 * start at the insertion point and scan.
-	 */
 	if (pos < 0)
 		pos = -pos - 1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 05/14] fsmonitor: refactor refresh callback for non-directory events
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (3 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 04/14] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 06/14] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
                       ` (10 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Move the code that handles unqualified FSEvents (without a trailing
slash) into a helper function.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 67 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 29cce32d81c..364198d258f 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,43 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static void handle_path_without_trailing_slash(
+	struct index_state *istate, const char *name, int pos)
+{
+	int i;
+
+	if (pos >= 0) {
+		/*
+		 * We have an exact match for this path and can just
+		 * invalidate it.
+		 */
+		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+	} else {
+		/*
+		 * The path is not a tracked file -or- it is a
+		 * directory event on a platform that cannot
+		 * distinguish between file and directory events in
+		 * the event handler, such as Windows.
+		 *
+		 * Scan as if it is a directory and invalidate the
+		 * cone under it.  (But remember to ignore items
+		 * between "name" and "name/", such as "name-" and
+		 * "name.".
+		 */
+		int len = strlen(name);
+		pos = -pos - 1;
+
+		for (i = pos; i < istate->cache_nr; i++) {
+			if (!starts_with(istate->cache[i]->name, name))
+				break;
+			if ((unsigned char)istate->cache[i]->name[len] > '/')
+				break;
+			if (istate->cache[i]->name[len] == '/')
+				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		}
+	}
+}
+
 /*
  * The daemon can decorate directory events, such as a move or rename,
  * by adding a trailing slash to the observed name.  Use this to
@@ -225,7 +262,7 @@ static void handle_path_with_trailing_slash(
 
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
-	int i, len = strlen(name);
+	int len = strlen(name);
 	int pos = index_name_pos(istate, name, len);
 
 	trace_printf_key(&trace_fsmonitor,
@@ -240,34 +277,8 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 		 * for the untracked cache.
 		 */
 		name[len - 1] = '\0';
-	} else if (pos >= 0) {
-		/*
-		 * We have an exact match for this path and can just
-		 * invalidate it.
-		 */
-		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
 	} else {
-		/*
-		 * The path is not a tracked file -or- it is a
-		 * directory event on a platform that cannot
-		 * distinguish between file and directory events in
-		 * the event handler, such as Windows.
-		 *
-		 * Scan as if it is a directory and invalidate the
-		 * cone under it.  (But remember to ignore items
-		 * between "name" and "name/", such as "name-" and
-		 * "name.".
-		 */
-		pos = -pos - 1;
-
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			if ((unsigned char)istate->cache[i]->name[len] > '/')
-				break;
-			if (istate->cache[i]->name[len] == '/')
-				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		handle_path_without_trailing_slash(istate, name, pos);
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 06/14] dir: create untracked_cache_invalidate_trimmed_path()
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (4 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 05/14] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 07/14] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
                       ` (9 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Create a wrapper function for untracked_cache_invalidate_path()
that silently trims a trailing slash, if present, before calling
the wrapped function.

The untracked cache expects to be called with a pathname that
does not contain a trailing slash.  This can make it inconvenient
for callers that have a directory path.  Lets hide this complexity.

This will be used by a later commit in the FSMonitor code which
may receive directory pathnames from an FSEvent.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 dir.c | 20 ++++++++++++++++++++
 dir.h |  7 +++++++
 2 files changed, 27 insertions(+)

diff --git a/dir.c b/dir.c
index ac699542302..20ebe4cba26 100644
--- a/dir.c
+++ b/dir.c
@@ -3918,6 +3918,26 @@ void untracked_cache_invalidate_path(struct index_state *istate,
 				 path, strlen(path));
 }
 
+void untracked_cache_invalidate_trimmed_path(struct index_state *istate,
+					     const char *path,
+					     int safe_path)
+{
+	size_t len = strlen(path);
+
+	if (!len)
+		BUG("untracked_cache_invalidate_trimmed_path given zero length path");
+
+	if (path[len - 1] != '/') {
+		untracked_cache_invalidate_path(istate, path, safe_path);
+	} else {
+		struct strbuf tmp = STRBUF_INIT;
+
+		strbuf_add(&tmp, path, len - 1);
+		untracked_cache_invalidate_path(istate, tmp.buf, safe_path);
+		strbuf_release(&tmp);
+	}
+}
+
 void untracked_cache_remove_from_index(struct index_state *istate,
 				       const char *path)
 {
diff --git a/dir.h b/dir.h
index 98aa85fcc0e..45a7b9ec5f2 100644
--- a/dir.h
+++ b/dir.h
@@ -576,6 +576,13 @@ int cmp_dir_entry(const void *p1, const void *p2);
 int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry *in);
 
 void untracked_cache_invalidate_path(struct index_state *, const char *, int safe_path);
+/*
+ * Invalidate the untracked-cache for this path, but first strip
+ * off a trailing slash, if present.
+ */
+void untracked_cache_invalidate_trimmed_path(struct index_state *,
+					     const char *path,
+					     int safe_path);
 void untracked_cache_remove_from_index(struct index_state *, const char *);
 void untracked_cache_add_to_index(struct index_state *, const char *);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 07/14] fsmonitor: refactor untracked-cache invalidation
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (5 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 06/14] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 08/14] fsmonitor: move untracked-cache invalidation into helper functions Jeff Hostetler via GitGitGadget
                       ` (8 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Update fsmonitor_refresh_callback() to use the new
untracked_cache_invalidate_trimmed_path() to invalidate
the cache using the observed pathname without needing to
modify the caller's buffer.

Previously, we modified the caller's buffer when the observed pathname
contained a trailing slash (and did not restore it).  This wasn't a
problem for the single use-case caller, but felt dirty nontheless.  In
a later commit we will want to invalidate case-corrected versions of
the pathname (using possibly borrowed pathnames from the name-hash or
dir-name-hash) and we may not want to keep the tradition of altering
the passed-in pathname.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 364198d258f..2787f7ca5d1 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -271,21 +271,16 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 
 	if (name[len - 1] == '/') {
 		handle_path_with_trailing_slash(istate, name, pos);
-
-		/*
-		 * We need to remove the traling "/" from the path
-		 * for the untracked cache.
-		 */
-		name[len - 1] = '\0';
 	} else {
 		handle_path_without_trailing_slash(istate, name, pos);
 	}
 
 	/*
 	 * Mark the untracked cache dirty even if it wasn't found in the index
-	 * as it could be a new untracked file.
+	 * as it could be a new untracked file.  (Let the untracked cache
+	 * layer silently deal with any trailing slash.)
 	 */
-	untracked_cache_invalidate_path(istate, name, 0);
+	untracked_cache_invalidate_trimmed_path(istate, name, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 08/14] fsmonitor: move untracked-cache invalidation into helper functions
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (6 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 07/14] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 09/14] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
                       ` (7 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Move the call to invalidate the untracked-cache for the FSEvent
pathname into the two helper functions.

In a later commit in this series, we will call these helpers
from other contexts and it safer to include the UC invalidation
in the helpers than to remember to also add it to each helper
call-site.

This has the side-effect of invalidating the UC *before* we
invalidate the ce_flags in the cache-entry.  These activities
are independent and do not affect each other.  Also, by doing
the UC work first, we can avoid worrying about "early returns"
or the need for the usual "goto the end" in each of the
handler functions.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 2787f7ca5d1..2f58ee2fe5a 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -188,6 +188,16 @@ static void handle_path_without_trailing_slash(
 {
 	int i;
 
+	/*
+	 * Mark the untracked cache dirty for this path (regardless of
+	 * whether or not we find an exact match for it in the index).
+	 * Since the path is unqualified (no trailing slash hint in the
+	 * FSEvent), it may refer to a file or directory. So we should
+	 * not assume one or the other and should always let the untracked
+	 * cache decide what needs to invalidated.
+	 */
+	untracked_cache_invalidate_trimmed_path(istate, name, 0);
+
 	if (pos >= 0) {
 		/*
 		 * We have an exact match for this path and can just
@@ -249,6 +259,15 @@ static void handle_path_with_trailing_slash(
 {
 	int i;
 
+	/*
+	 * Mark the untracked cache dirty for this directory path
+	 * (regardless of whether or not we find an exact match for it
+	 * in the index or find it to be proper prefix of one or more
+	 * files in the index), since the FSEvent is hinting that
+	 * there may be changes on or within the directory.
+	 */
+	untracked_cache_invalidate_trimmed_path(istate, name, 0);
+
 	if (pos < 0)
 		pos = -pos - 1;
 
@@ -274,13 +293,6 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 	} else {
 		handle_path_without_trailing_slash(istate, name, pos);
 	}
-
-	/*
-	 * Mark the untracked cache dirty even if it wasn't found in the index
-	 * as it could be a new untracked file.  (Let the untracked cache
-	 * layer silently deal with any trailing slash.)
-	 */
-	untracked_cache_invalidate_trimmed_path(istate, name, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 09/14] fsmonitor: return invalidated cache-entry count on directory event
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (7 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 08/14] fsmonitor: move untracked-cache invalidation into helper functions Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 10/14] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
                       ` (6 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teach the refresh callback helper function for directory FSEvents to
return the number of cache-entries that were invalidated in response
to a directory event.

This will be used in a later commit to help determine if the observed
pathname in the FSEvent was a (possibly) case-incorrect directory
prefix (on a case-insensitive filesystem) of one or more actual
cache-entries.

If there exists at least one case-insensitive prefix match, then we
can assume that the directory is a (case-incorrect) prefix of at least
one tracked item rather than a completely unknown/untracked file or
directory.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 2f58ee2fe5a..9424bd17230 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -253,11 +253,20 @@ static void handle_path_without_trailing_slash(
  * same way and just invalidate the cache-entry and the untracked
  * cache (and in this case, the forward cache-entry scan won't find
  * anything and it doesn't hurt to let it run).
+ *
+ * Return the number of cache-entries that we invalidated.  We will
+ * use this later to determine if we need to attempt a second
+ * case-insensitive search on case-insensitive file systems.  That is,
+ * if the search using the observed-case in the FSEvent yields any
+ * results, we assume the prefix is case-correct.  If there are no
+ * matches, we still don't know if the observed path is simply
+ * untracked or case-incorrect.
  */
-static void handle_path_with_trailing_slash(
+static size_t handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
 	int i;
+	size_t nr_in_cone = 0;
 
 	/*
 	 * Mark the untracked cache dirty for this directory path
@@ -276,7 +285,10 @@ static void handle_path_with_trailing_slash(
 		if (!starts_with(istate->cache[i]->name, name))
 			break;
 		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		nr_in_cone++;
 	}
+
+	return nr_in_cone;
 }
 
 static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 10/14] fsmonitor: remove custom loop from non-directory path handler
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (8 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 09/14] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
                       ` (5 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Refactor the code that handles refresh events for pathnames that do
not contain a trailing slash.  Instead of using a custom loop to try
to scan the index and detect if the FSEvent named a file or might be a
directory prefix, use the recently created helper function to do that.

Also update the comments to describe what and why we are doing this.

On platforms that DO NOT annotate FS events with a trailing
slash, if we fail to find an exact match for the pathname
in the index, we do not know if the pathname represents a
directory or simply an untracked file.  Pretend that the pathname
is a directory and try again before assuming it is an untracked
file.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 55 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 9424bd17230..a51c17cda70 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,11 +183,23 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+static size_t handle_path_with_trailing_slash(
+	struct index_state *istate, const char *name, int pos);
+
+/*
+ * The daemon sent an observed pathname without a trailing slash.
+ * (This is the normal case.)  We do not know if it is a tracked or
+ * untracked file, a sparse-directory, or a populated directory (on a
+ * platform such as Windows where FSEvents are not qualified).
+ *
+ * The pathname contains the observed case reported by the FS. We
+ * do not know it is case-correct or -incorrect.
+ *
+ * Assume it is case-correct and try an exact match.
+ */
 static void handle_path_without_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
-	int i;
-
 	/*
 	 * Mark the untracked cache dirty for this path (regardless of
 	 * whether or not we find an exact match for it in the index).
@@ -200,33 +212,28 @@ static void handle_path_without_trailing_slash(
 
 	if (pos >= 0) {
 		/*
-		 * We have an exact match for this path and can just
-		 * invalidate it.
+		 * An exact match on a tracked file. We assume that we
+		 * do not need to scan forward for a sparse-directory
+		 * cache-entry with the same pathname, nor for a cone
+		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
 		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
 	} else {
+		struct strbuf work_path = STRBUF_INIT;
+
 		/*
-		 * The path is not a tracked file -or- it is a
-		 * directory event on a platform that cannot
-		 * distinguish between file and directory events in
-		 * the event handler, such as Windows.
-		 *
-		 * Scan as if it is a directory and invalidate the
-		 * cone under it.  (But remember to ignore items
-		 * between "name" and "name/", such as "name-" and
-		 * "name.".
+		 * The negative "pos" gives us the suggested insertion
+		 * point for the pathname (without the trailing slash).
+		 * We need to see if there is a directory with that
+		 * prefix, but there can be lots of pathnames between
+		 * "foo" and "foo/" like "foo-" or "foo-bar", so we
+		 * don't want to do our own scan.
 		 */
-		int len = strlen(name);
-		pos = -pos - 1;
-
-		for (i = pos; i < istate->cache_nr; i++) {
-			if (!starts_with(istate->cache[i]->name, name))
-				break;
-			if ((unsigned char)istate->cache[i]->name[len] > '/')
-				break;
-			if (istate->cache[i]->name[len] == '/')
-				istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
-		}
+		strbuf_add(&work_path, name, strlen(name));
+		strbuf_addch(&work_path, '/');
+		pos = index_name_pos(istate, work_path.buf, work_path.len);
+		handle_path_with_trailing_slash(istate, work_path.buf, pos);
+		strbuf_release(&work_path);
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (9 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 10/14] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-03-06 12:58       ` Patrick Steinhardt
  2024-02-26 21:39     ` [PATCH v3 12/14] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
                       ` (4 subsequent siblings)
  15 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teach the refresh callback helper function for unqualified FSEvents
(pathnames without a trailing slash) to return the number of
cache-entries that were invalided in response to the event.

This will be used in a later commit to help determine if the observed
pathname was (possibly) case-incorrect when (on a case-insensitive
file system).

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index a51c17cda70..c16ed5d8758 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -196,8 +196,10 @@ static size_t handle_path_with_trailing_slash(
  * do not know it is case-correct or -incorrect.
  *
  * Assume it is case-correct and try an exact match.
+ *
+ * Return the number of cache-entries that we invalidated.
  */
-static void handle_path_without_trailing_slash(
+static size_t handle_path_without_trailing_slash(
 	struct index_state *istate, const char *name, int pos)
 {
 	/*
@@ -218,7 +220,9 @@ static void handle_path_without_trailing_slash(
 		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
 		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+		return 1;
 	} else {
+		size_t nr_in_cone;
 		struct strbuf work_path = STRBUF_INIT;
 
 		/*
@@ -232,8 +236,10 @@ static void handle_path_without_trailing_slash(
 		strbuf_add(&work_path, name, strlen(name));
 		strbuf_addch(&work_path, '/');
 		pos = index_name_pos(istate, work_path.buf, work_path.len);
-		handle_path_with_trailing_slash(istate, work_path.buf, pos);
+		nr_in_cone = handle_path_with_trailing_slash(
+			istate, work_path.buf, pos);
 		strbuf_release(&work_path);
+		return nr_in_cone;
 	}
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 12/14] fsmonitor: trace the new invalidated cache-entry count
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (10 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 13/14] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
                       ` (3 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Consolidate the directory/non-directory calls to the refresh handler
code.  Log the resulting count of invalidated cache-entries.

The nr_in_cone value will be used in a later commit to decide if
we also need to try to do case-insensitive lookups.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index c16ed5d8758..739ddbf7aca 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -308,16 +308,21 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 {
 	int len = strlen(name);
 	int pos = index_name_pos(istate, name, len);
+	size_t nr_in_cone;
 
 	trace_printf_key(&trace_fsmonitor,
 			 "fsmonitor_refresh_callback '%s' (pos %d)",
 			 name, pos);
 
-	if (name[len - 1] == '/') {
-		handle_path_with_trailing_slash(istate, name, pos);
-	} else {
-		handle_path_without_trailing_slash(istate, name, pos);
-	}
+	if (name[len - 1] == '/')
+		nr_in_cone = handle_path_with_trailing_slash(istate, name, pos);
+	else
+		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
+
+	if (nr_in_cone)
+		trace_printf_key(&trace_fsmonitor,
+				 "fsmonitor_refresh_callback CNT: %d",
+				 (int)nr_in_cone);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 13/14] fsmonitor: refactor bit invalidation in refresh callback
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (11 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 12/14] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-02-26 21:39     ` [PATCH v3 14/14] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
                       ` (2 subsequent siblings)
  15 siblings, 0 replies; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Refactor code in the fsmonitor_refresh_callback() call chain dealing
with invalidating the CE_FSMONITOR_VALID bit and add a trace message.

During the refresh, we clear the CE_FSMONITOR_VALID bit in response to
data from the FSMonitor daemon (so that a later phase will lstat() and
verify the true state of the file).

Create a new function to clear the bit and add some unique tracing for
it to help debug edge cases.

This is similar to the existing `mark_fsmonitor_invalid()` function,
but it also does untracked-cache invalidation and we've already
handled that in the refresh-callback handlers, so but we don't need
to repeat that.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 739ddbf7aca..3c87449be87 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -183,6 +183,22 @@ static int query_fsmonitor_hook(struct repository *r,
 	return result;
 }
 
+/*
+ * Invalidate the FSM bit on this CE.  This is like mark_fsmonitor_invalid()
+ * but we've already handled the untracked-cache, so let's not repeat that
+ * work.  This also lets us have a different trace message so that we can
+ * see everything that was done as part of the refresh-callback.
+ */
+static void invalidate_ce_fsm(struct cache_entry *ce)
+{
+	if (ce->ce_flags & CE_FSMONITOR_VALID) {
+		trace_printf_key(&trace_fsmonitor,
+				 "fsmonitor_refresh_callback INV: '%s'",
+				 ce->name);
+		ce->ce_flags &= ~CE_FSMONITOR_VALID;
+	}
+}
+
 static size_t handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos);
 
@@ -219,7 +235,7 @@ static size_t handle_path_without_trailing_slash(
 		 * cache-entry with the same pathname, nor for a cone
 		 * at that directory. (That is, assume no D/F conflicts.)
 		 */
-		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
+		invalidate_ce_fsm(istate->cache[pos]);
 		return 1;
 	} else {
 		size_t nr_in_cone;
@@ -297,7 +313,7 @@ static size_t handle_path_with_trailing_slash(
 	for (i = pos; i < istate->cache_nr; i++) {
 		if (!starts_with(istate->cache[i]->name, name))
 			break;
-		istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
+		invalidate_ce_fsm(istate->cache[i]);
 		nr_in_cone++;
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH v3 14/14] fsmonitor: support case-insensitive events
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (12 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 13/14] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
@ 2024-02-26 21:39     ` Jeff Hostetler via GitGitGadget
  2024-03-06 12:58       ` Patrick Steinhardt
  2024-02-27  1:40     ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Junio C Hamano
  2024-03-06 12:58     ` Patrick Steinhardt
  15 siblings, 1 reply; 91+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2024-02-26 21:39 UTC (permalink / raw
  To: git
  Cc: Patrick Steinhardt, Jeff Hostetler, Torsten Bögershausen,
	Jeff Hostetler, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Teach fsmonitor_refresh_callback() to handle case-insensitive
lookups if case-sensitive lookups fail on case-insensitive systems.
This can cause 'git status' to report stale status for files if there
are case issues/errors in the worktree.

The FSMonitor daemon sends FSEvents using the observed spelling
of each pathname.  On case-insensitive file systems this may be
different than the expected case spelling.

The existing code uses index_name_pos() to find the cache-entry for
the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
that the worktree scan/index refresh will revisit and revalidate the
path.

On a case-insensitive file system, the exact match lookup may fail
to find the associated cache-entry. This causes status to think that
the cached CE flags are correct and skip over the file.

Update event handling to optionally use the name-hash and dir-name-hash
if necessary.

Also update t7527 to convert the "test_expect_failure" to "_success"
now that we have fixed the bug.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 fsmonitor.c                  | 121 +++++++++++++++++++++++++++++++++++
 t/t7527-builtin-fsmonitor.sh |  26 +++++---
 2 files changed, 137 insertions(+), 10 deletions(-)

diff --git a/fsmonitor.c b/fsmonitor.c
index 3c87449be87..2b17d60bbbe 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -5,6 +5,7 @@
 #include "ewah/ewok.h"
 #include "fsmonitor.h"
 #include "fsmonitor-ipc.h"
+#include "name-hash.h"
 #include "run-command.h"
 #include "strbuf.h"
 #include "trace2.h"
@@ -202,6 +203,113 @@ static void invalidate_ce_fsm(struct cache_entry *ce)
 static size_t handle_path_with_trailing_slash(
 	struct index_state *istate, const char *name, int pos);
 
+/*
+ * Use the name-hash to do a case-insensitive cache-entry lookup with
+ * the pathname and invalidate the cache-entry.
+ *
+ * Returns the number of cache-entries that we invalidated.
+ */
+static size_t handle_using_name_hash_icase(
+	struct index_state *istate, const char *name)
+{
+	struct cache_entry *ce = NULL;
+
+	ce = index_file_exists(istate, name, strlen(name), 1);
+	if (!ce)
+		return 0;
+
+	/*
+	 * A case-insensitive search in the name-hash using the
+	 * observed pathname found a cache-entry, so the observed path
+	 * is case-incorrect.  Invalidate the cache-entry and use the
+	 * correct spelling from the cache-entry to invalidate the
+	 * untracked-cache.  Since we now have sparse-directories in
+	 * the index, the observed pathname may represent a regular
+	 * file or a sparse-index directory.
+	 *
+	 * Note that we should not have seen FSEvents for a
+	 * sparse-index directory, but we handle it just in case.
+	 *
+	 * Either way, we know that there are not any cache-entries for
+	 * children inside the cone of the directory, so we don't need to
+	 * do the usual scan.
+	 */
+	trace_printf_key(&trace_fsmonitor,
+			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
+			 name, ce->name);
+
+	/*
+	 * NEEDSWORK: We used the name-hash to find the correct
+	 * case-spelling of the pathname in the cache-entry[], so
+	 * technically this is a tracked file or a sparse-directory.
+	 * It should not have any entries in the untracked-cache, so
+	 * we should not need to use the case-corrected spelling to
+	 * invalidate the the untracked-cache.  So we may not need to
+	 * do this.  For now, I'm going to be conservative and always
+	 * do it; we can revisit this later.
+	 */
+	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
+
+	invalidate_ce_fsm(ce);
+	return 1;
+}
+
+/*
+ * Use the dir-name-hash to find the correct-case spelling of the
+ * directory.  Use the canonical spelling to invalidate all of the
+ * cache-entries within the matching cone.
+ *
+ * Returns the number of cache-entries that we invalidated.
+ */
+static size_t handle_using_dir_name_hash_icase(
+	struct index_state *istate, const char *name)
+{
+	struct strbuf canonical_path = STRBUF_INIT;
+	int pos;
+	size_t len = strlen(name);
+	size_t nr_in_cone;
+
+	if (name[len - 1] == '/')
+		len--;
+
+	if (!index_dir_find(istate, name, len, &canonical_path))
+		return 0; /* name is untracked */
+
+	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
+		strbuf_release(&canonical_path);
+		/*
+		 * NEEDSWORK: Our caller already tried an exact match
+		 * and failed to find one.  They called us to do an
+		 * ICASE match, so we should never get an exact match,
+		 * so we could promote this to a BUG() here if we
+		 * wanted to.  It doesn't hurt anything to just return
+		 * 0 and go on because we should never get here.  Or we
+		 * could just get rid of the memcmp() and this "if"
+		 * clause completely.
+		 */
+		BUG("handle_using_dir_name_hash_icase(%s) did not exact match",
+		    name);
+	}
+
+	trace_printf_key(&trace_fsmonitor,
+			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
+			 name, canonical_path.buf);
+
+	/*
+	 * The dir-name-hash only tells us the corrected spelling of
+	 * the prefix.  We have to use this canonical path to do a
+	 * lookup in the cache-entry array so that we repeat the
+	 * original search using the case-corrected spelling.
+	 */
+	strbuf_addch(&canonical_path, '/');
+	pos = index_name_pos(istate, canonical_path.buf,
+			     canonical_path.len);
+	nr_in_cone = handle_path_with_trailing_slash(
+		istate, canonical_path.buf, pos);
+	strbuf_release(&canonical_path);
+	return nr_in_cone;
+}
+
 /*
  * The daemon sent an observed pathname without a trailing slash.
  * (This is the normal case.)  We do not know if it is a tracked or
@@ -335,6 +443,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
 	else
 		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
 
+	/*
+	 * If we did not find an exact match for this pathname or any
+	 * cache-entries with this directory prefix and we're on a
+	 * case-insensitive file system, try again using the name-hash
+	 * and dir-name-hash.
+	 */
+	if (!nr_in_cone && ignore_case) {
+		nr_in_cone = handle_using_name_hash_icase(istate, name);
+		if (!nr_in_cone)
+			nr_in_cone = handle_using_dir_name_hash_icase(
+				istate, name);
+	}
+
 	if (nr_in_cone)
 		trace_printf_key(&trace_fsmonitor,
 				 "fsmonitor_refresh_callback CNT: %d",
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 830f2d9de33..730f3c7f810 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -1051,7 +1051,7 @@ test_expect_success 'split-index and FSMonitor work well together' '
 #
 # The setup is a little contrived.
 #
-test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
 	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
 
 	git init subdir_case_wrong &&
@@ -1116,19 +1116,19 @@ test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
 
 	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
 
+	# Verify that we get a mapping event to correct the case.
+	grep -q "MAP:.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
+		"$PWD/subdir_case_wrong.log1" &&
+
 	# The refresh-callbacks should have caused "git status" to clear
 	# the CE_FSMONITOR_VALID bit on each of those files and caused
 	# the worktree scan to visit them and mark them as modified.
 	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
 	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
-
-	# Expect Breakage: with the case confusion, the "(pos -3)" causes
-	# the client to not clear the CE_FSMONITOR_VALID bit and therefore
-	# status will not rescan the file and therefore not report it as dirty.
 	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
 '
 
-test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
+test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
 	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
 
 	git init file_case_wrong &&
@@ -1242,14 +1242,20 @@ test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
 	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try3.log" \
 		git -C file_case_wrong --no-optional-locks status --short \
 			>"$PWD/file_case_wrong-try3.out" &&
+
+	# Verify that we get a mapping event to correct the case.
+	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
+		"$PWD/file_case_wrong-try3.log" &&
+	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
+		"$PWD/file_case_wrong-try3.log" &&
+
 	# FSEvents are in observed case.
 	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
 	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
 
-	# Expect Breakage: with the case confusion, the "(pos-3)" and
-	# "(pos -9)" causes the client to not clear the CE_FSMONITOR_VALID
-	# bit and therefore status will not rescan the files and therefore
-	# not report them as dirty.
+	# The refresh-callbacks should have caused "git status" to clear
+	# the CE_FSMONITOR_VALID bit on each of those files and caused
+	# the worktree scan to visit them and mark them as modified.
 	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
 	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
 '
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (13 preceding siblings ...)
  2024-02-26 21:39     ` [PATCH v3 14/14] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
@ 2024-02-27  1:40     ` Junio C Hamano
  2024-03-06 12:58     ` Patrick Steinhardt
  15 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-02-27  1:40 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Patrick Steinhardt, Jeff Hostetler,
	Torsten Bögershausen, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Here is version 3. I think I have addressed the remaining comments.
>
> I cleaned up the test code to use the test_expect_failure at the beginning
> and squashed in the test_expect_success version of tests into the final
> commit in the series.
>
> I moved the invalidate_ce_fsm() commit earlier in the series, so that the
> final commit actually uses all of the up-to-this-point changes to fix the
> problem.
>
> I converted a few "should not happens" to BUG()s.
>
> Thanks to everyone for their time and attention reviewing this. Jeff
>
> Jeff Hostetler (14):
>   name-hash: add index_dir_find()
>   t7527: add case-insensitve test for FSMonitor
>   fsmonitor: refactor refresh callback on directory events
>   fsmonitor: clarify handling of directory events in callback helper
>   fsmonitor: refactor refresh callback for non-directory events
>   dir: create untracked_cache_invalidate_trimmed_path()
>   fsmonitor: refactor untracked-cache invalidation
>   fsmonitor: move untracked-cache invalidation into helper functions
>   fsmonitor: return invalidated cache-entry count on directory event
>   fsmonitor: remove custom loop from non-directory path handler
>   fsmonitor: return invalided cache-entry count on non-directory event
>   fsmonitor: trace the new invalidated cache-entry count
>   fsmonitor: refactor bit invalidation in refresh callback
>   fsmonitor: support case-insensitive events
>
>  dir.c                        |  20 +++
>  dir.h                        |   7 +
>  fsmonitor.c                  | 312 +++++++++++++++++++++++++++++------
>  name-hash.c                  |   9 +-
>  name-hash.h                  |   7 +-
>  t/t7527-builtin-fsmonitor.sh | 223 +++++++++++++++++++++++++
>  6 files changed, 522 insertions(+), 56 deletions(-)

Much nicer.  Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems
  2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
                       ` (14 preceding siblings ...)
  2024-02-27  1:40     ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Junio C Hamano
@ 2024-03-06 12:58     ` Patrick Steinhardt
  2024-03-06 17:09       ` Junio C Hamano
  2024-03-06 18:10       ` Jeff Hostetler
  15 siblings, 2 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-03-06 12:58 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Torsten Bögershausen, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]

On Mon, Feb 26, 2024 at 09:39:11PM +0000, Jeff Hostetler via GitGitGadget wrote:
> Here is version 3. I think I have addressed the remaining comments.
> 
> I cleaned up the test code to use the test_expect_failure at the beginning
> and squashed in the test_expect_success version of tests into the final
> commit in the series.
> 
> I moved the invalidate_ce_fsm() commit earlier in the series, so that the
> final commit actually uses all of the up-to-this-point changes to fix the
> problem.
> 
> I converted a few "should not happens" to BUG()s.
> 
> Thanks to everyone for their time and attention reviewing this. Jeff

I gave this whole patch series a read and didn't much to complain about.
There are a couple of nits, but none of them really require a reroll in
my opinion.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event
  2024-02-26 21:39     ` [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
@ 2024-03-06 12:58       ` Patrick Steinhardt
  0 siblings, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-03-06 12:58 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Torsten Bögershausen, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

Nit, not worth a reroll: the subject says "invalided" instead of
"invalidated".

Patrick

On Mon, Feb 26, 2024 at 09:39:22PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Teach the refresh callback helper function for unqualified FSEvents
> (pathnames without a trailing slash) to return the number of
> cache-entries that were invalided in response to the event.
> 
> This will be used in a later commit to help determine if the observed
> pathname was (possibly) case-incorrect when (on a case-insensitive
> file system).
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index a51c17cda70..c16ed5d8758 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -196,8 +196,10 @@ static size_t handle_path_with_trailing_slash(
>   * do not know it is case-correct or -incorrect.
>   *
>   * Assume it is case-correct and try an exact match.
> + *
> + * Return the number of cache-entries that we invalidated.
>   */
> -static void handle_path_without_trailing_slash(
> +static size_t handle_path_without_trailing_slash(
>  	struct index_state *istate, const char *name, int pos)
>  {
>  	/*
> @@ -218,7 +220,9 @@ static void handle_path_without_trailing_slash(
>  		 * at that directory. (That is, assume no D/F conflicts.)
>  		 */
>  		istate->cache[pos]->ce_flags &= ~CE_FSMONITOR_VALID;
> +		return 1;
>  	} else {
> +		size_t nr_in_cone;
>  		struct strbuf work_path = STRBUF_INIT;
>  
>  		/*
> @@ -232,8 +236,10 @@ static void handle_path_without_trailing_slash(
>  		strbuf_add(&work_path, name, strlen(name));
>  		strbuf_addch(&work_path, '/');
>  		pos = index_name_pos(istate, work_path.buf, work_path.len);
> -		handle_path_with_trailing_slash(istate, work_path.buf, pos);
> +		nr_in_cone = handle_path_with_trailing_slash(
> +			istate, work_path.buf, pos);
>  		strbuf_release(&work_path);
> +		return nr_in_cone;
>  	}
>  }
>  
> -- 
> gitgitgadget
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 14/14] fsmonitor: support case-insensitive events
  2024-02-26 21:39     ` [PATCH v3 14/14] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
@ 2024-03-06 12:58       ` Patrick Steinhardt
  0 siblings, 0 replies; 91+ messages in thread
From: Patrick Steinhardt @ 2024-03-06 12:58 UTC (permalink / raw
  To: Jeff Hostetler via GitGitGadget
  Cc: git, Jeff Hostetler, Torsten Bögershausen, Jeff Hostetler

[-- Attachment #1: Type: text/plain, Size: 10382 bytes --]

On Mon, Feb 26, 2024 at 09:39:25PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Teach fsmonitor_refresh_callback() to handle case-insensitive
> lookups if case-sensitive lookups fail on case-insensitive systems.
> This can cause 'git status' to report stale status for files if there
> are case issues/errors in the worktree.
> 
> The FSMonitor daemon sends FSEvents using the observed spelling
> of each pathname.  On case-insensitive file systems this may be
> different than the expected case spelling.
> 
> The existing code uses index_name_pos() to find the cache-entry for
> the pathname in the FSEvent and clear the CE_FSMONITOR_VALID bit so
> that the worktree scan/index refresh will revisit and revalidate the
> path.
> 
> On a case-insensitive file system, the exact match lookup may fail
> to find the associated cache-entry. This causes status to think that
> the cached CE flags are correct and skip over the file.
> 
> Update event handling to optionally use the name-hash and dir-name-hash
> if necessary.
> 
> Also update t7527 to convert the "test_expect_failure" to "_success"
> now that we have fixed the bug.
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  fsmonitor.c                  | 121 +++++++++++++++++++++++++++++++++++
>  t/t7527-builtin-fsmonitor.sh |  26 +++++---
>  2 files changed, 137 insertions(+), 10 deletions(-)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 3c87449be87..2b17d60bbbe 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -5,6 +5,7 @@
>  #include "ewah/ewok.h"
>  #include "fsmonitor.h"
>  #include "fsmonitor-ipc.h"
> +#include "name-hash.h"
>  #include "run-command.h"
>  #include "strbuf.h"
>  #include "trace2.h"
> @@ -202,6 +203,113 @@ static void invalidate_ce_fsm(struct cache_entry *ce)
>  static size_t handle_path_with_trailing_slash(
>  	struct index_state *istate, const char *name, int pos);
>  
> +/*
> + * Use the name-hash to do a case-insensitive cache-entry lookup with
> + * the pathname and invalidate the cache-entry.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_name_hash_icase(
> +	struct index_state *istate, const char *name)
> +{
> +	struct cache_entry *ce = NULL;
> +
> +	ce = index_file_exists(istate, name, strlen(name), 1);
> +	if (!ce)
> +		return 0;
> +
> +	/*
> +	 * A case-insensitive search in the name-hash using the
> +	 * observed pathname found a cache-entry, so the observed path
> +	 * is case-incorrect.  Invalidate the cache-entry and use the
> +	 * correct spelling from the cache-entry to invalidate the
> +	 * untracked-cache.  Since we now have sparse-directories in
> +	 * the index, the observed pathname may represent a regular
> +	 * file or a sparse-index directory.
> +	 *
> +	 * Note that we should not have seen FSEvents for a
> +	 * sparse-index directory, but we handle it just in case.
> +	 *
> +	 * Either way, we know that there are not any cache-entries for
> +	 * children inside the cone of the directory, so we don't need to
> +	 * do the usual scan.
> +	 */
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, ce->name);
> +
> +	/*
> +	 * NEEDSWORK: We used the name-hash to find the correct
> +	 * case-spelling of the pathname in the cache-entry[], so
> +	 * technically this is a tracked file or a sparse-directory.
> +	 * It should not have any entries in the untracked-cache, so
> +	 * we should not need to use the case-corrected spelling to
> +	 * invalidate the the untracked-cache.  So we may not need to
> +	 * do this.  For now, I'm going to be conservative and always
> +	 * do it; we can revisit this later.
> +	 */
> +	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
> +
> +	invalidate_ce_fsm(ce);
> +	return 1;
> +}
> +
> +/*
> + * Use the dir-name-hash to find the correct-case spelling of the
> + * directory.  Use the canonical spelling to invalidate all of the
> + * cache-entries within the matching cone.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_dir_name_hash_icase(
> +	struct index_state *istate, const char *name)
> +{
> +	struct strbuf canonical_path = STRBUF_INIT;
> +	int pos;
> +	size_t len = strlen(name);
> +	size_t nr_in_cone;
> +
> +	if (name[len - 1] == '/')
> +		len--;

Nit: this could use `strip_suffix()`.

> +
> +	if (!index_dir_find(istate, name, len, &canonical_path))
> +		return 0; /* name is untracked */
> +
> +	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
> +		strbuf_release(&canonical_path);
> +		/*
> +		 * NEEDSWORK: Our caller already tried an exact match
> +		 * and failed to find one.  They called us to do an
> +		 * ICASE match, so we should never get an exact match,
> +		 * so we could promote this to a BUG() here if we
> +		 * wanted to.  It doesn't hurt anything to just return

Nit: this comment is stale as this has been promoted to a BUG already.

Patrick

> +		 * 0 and go on because we should never get here.  Or we
> +		 * could just get rid of the memcmp() and this "if"
> +		 * clause completely.
> +		 */
> +		BUG("handle_using_dir_name_hash_icase(%s) did not exact match",
> +		    name);
> +	}
> +
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, canonical_path.buf);
> +
> +	/*
> +	 * The dir-name-hash only tells us the corrected spelling of
> +	 * the prefix.  We have to use this canonical path to do a
> +	 * lookup in the cache-entry array so that we repeat the
> +	 * original search using the case-corrected spelling.
> +	 */
> +	strbuf_addch(&canonical_path, '/');
> +	pos = index_name_pos(istate, canonical_path.buf,
> +			     canonical_path.len);
> +	nr_in_cone = handle_path_with_trailing_slash(
> +		istate, canonical_path.buf, pos);
> +	strbuf_release(&canonical_path);
> +	return nr_in_cone;
> +}
> +
>  /*
>   * The daemon sent an observed pathname without a trailing slash.
>   * (This is the normal case.)  We do not know if it is a tracked or
> @@ -335,6 +443,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  	else
>  		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
>  
> +	/*
> +	 * If we did not find an exact match for this pathname or any
> +	 * cache-entries with this directory prefix and we're on a
> +	 * case-insensitive file system, try again using the name-hash
> +	 * and dir-name-hash.
> +	 */
> +	if (!nr_in_cone && ignore_case) {
> +		nr_in_cone = handle_using_name_hash_icase(istate, name);
> +		if (!nr_in_cone)
> +			nr_in_cone = handle_using_dir_name_hash_icase(
> +				istate, name);
> +	}
> +
>  	if (nr_in_cone)
>  		trace_printf_key(&trace_fsmonitor,
>  				 "fsmonitor_refresh_callback CNT: %d",
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index 830f2d9de33..730f3c7f810 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -1051,7 +1051,7 @@ test_expect_success 'split-index and FSMonitor work well together' '
>  #
>  # The setup is a little contrived.
>  #
> -test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
> +test_expect_success CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>  	test_when_finished "stop_daemon_delete_repo subdir_case_wrong" &&
>  
>  	git init subdir_case_wrong &&
> @@ -1116,19 +1116,19 @@ test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor subdir case wrong on disk' '
>  
>  	grep -q "dir1/DIR2/dir3/file3.*pos -3" "$PWD/subdir_case_wrong.log1" &&
>  
> +	# Verify that we get a mapping event to correct the case.
> +	grep -q "MAP:.*dir1/DIR2/dir3/file3.*dir1/dir2/dir3/file3" \
> +		"$PWD/subdir_case_wrong.log1" &&
> +
>  	# The refresh-callbacks should have caused "git status" to clear
>  	# the CE_FSMONITOR_VALID bit on each of those files and caused
>  	# the worktree scan to visit them and mark them as modified.
>  	grep -q " M AAA" "$PWD/subdir_case_wrong.out" &&
>  	grep -q " M zzz" "$PWD/subdir_case_wrong.out" &&
> -
> -	# Expect Breakage: with the case confusion, the "(pos -3)" causes
> -	# the client to not clear the CE_FSMONITOR_VALID bit and therefore
> -	# status will not rescan the file and therefore not report it as dirty.
>  	grep -q " M dir1/dir2/dir3/file3" "$PWD/subdir_case_wrong.out"
>  '
>  
> -test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
> +test_expect_success CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
>  	test_when_finished "stop_daemon_delete_repo file_case_wrong" &&
>  
>  	git init file_case_wrong &&
> @@ -1242,14 +1242,20 @@ test_expect_failure CASE_INSENSITIVE_FS 'fsmonitor file case wrong on disk' '
>  	GIT_TRACE_FSMONITOR="$PWD/file_case_wrong-try3.log" \
>  		git -C file_case_wrong --no-optional-locks status --short \
>  			>"$PWD/file_case_wrong-try3.out" &&
> +
> +	# Verify that we get a mapping event to correct the case.
> +	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir3/FILE-3-A.*dir1/dir2/dir3/file-3-a" \
> +		"$PWD/file_case_wrong-try3.log" &&
> +	grep -q "fsmonitor_refresh_callback MAP:.*dir1/dir2/dir4/file-4-a.*dir1/dir2/dir4/FILE-4-A" \
> +		"$PWD/file_case_wrong-try3.log" &&
> +
>  	# FSEvents are in observed case.
>  	grep -q "fsmonitor_refresh_callback.*FILE-3-A.*pos -3" "$PWD/file_case_wrong-try3.log" &&
>  	grep -q "fsmonitor_refresh_callback.*file-4-a.*pos -9" "$PWD/file_case_wrong-try3.log" &&
>  
> -	# Expect Breakage: with the case confusion, the "(pos-3)" and
> -	# "(pos -9)" causes the client to not clear the CE_FSMONITOR_VALID
> -	# bit and therefore status will not rescan the files and therefore
> -	# not report them as dirty.
> +	# The refresh-callbacks should have caused "git status" to clear
> +	# the CE_FSMONITOR_VALID bit on each of those files and caused
> +	# the worktree scan to visit them and mark them as modified.
>  	grep -q " M dir1/dir2/dir3/file-3-a" "$PWD/file_case_wrong-try3.out" &&
>  	grep -q " M dir1/dir2/dir4/FILE-4-A" "$PWD/file_case_wrong-try3.out"
>  '
> -- 
> gitgitgadget

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems
  2024-03-06 12:58     ` Patrick Steinhardt
@ 2024-03-06 17:09       ` Junio C Hamano
  2024-03-06 18:10       ` Jeff Hostetler
  1 sibling, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2024-03-06 17:09 UTC (permalink / raw
  To: Patrick Steinhardt
  Cc: Jeff Hostetler via GitGitGadget, git, Jeff Hostetler,
	Torsten Bögershausen, Jeff Hostetler

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Feb 26, 2024 at 09:39:11PM +0000, Jeff Hostetler via GitGitGadget wrote:
>> Here is version 3. I think I have addressed the remaining comments.
>> 
>> I cleaned up the test code to use the test_expect_failure at the beginning
>> and squashed in the test_expect_success version of tests into the final
>> commit in the series.
>> 
>> I moved the invalidate_ce_fsm() commit earlier in the series, so that the
>> final commit actually uses all of the up-to-this-point changes to fix the
>> problem.
>> 
>> I converted a few "should not happens" to BUG()s.
>> 
>> Thanks to everyone for their time and attention reviewing this. Jeff
>
> I gave this whole patch series a read and didn't much to complain about.
> There are a couple of nits, but none of them really require a reroll in
> my opinion.
>
> Thanks!
>
> Patrick

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems
  2024-03-06 12:58     ` Patrick Steinhardt
  2024-03-06 17:09       ` Junio C Hamano
@ 2024-03-06 18:10       ` Jeff Hostetler
  1 sibling, 0 replies; 91+ messages in thread
From: Jeff Hostetler @ 2024-03-06 18:10 UTC (permalink / raw
  To: Patrick Steinhardt, Jeff Hostetler via GitGitGadget
  Cc: git, Torsten Bögershausen, Jeff Hostetler



On 3/6/24 7:58 AM, Patrick Steinhardt wrote:
> On Mon, Feb 26, 2024 at 09:39:11PM +0000, Jeff Hostetler via GitGitGadget wrote:
>> Here is version 3. I think I have addressed the remaining comments.
>>
>> I cleaned up the test code to use the test_expect_failure at the beginning
>> and squashed in the test_expect_success version of tests into the final
>> commit in the series.
>>
>> I moved the invalidate_ce_fsm() commit earlier in the series, so that the
>> final commit actually uses all of the up-to-this-point changes to fix the
>> problem.
>>
>> I converted a few "should not happens" to BUG()s.
>>
>> Thanks to everyone for their time and attention reviewing this. Jeff
> 
> I gave this whole patch series a read and didn't much to complain about.
> There are a couple of nits, but none of them really require a reroll in
> my opinion.
> 
> Thanks!
> 
> Patrick

Thanks!!!
Jeff

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2024-03-06 18:20 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
2024-02-13 20:52 ` [PATCH 01/12] sparse-index: pass string length to index_file_exists() Jeff Hostetler via GitGitGadget
2024-02-13 22:07   ` Junio C Hamano
2024-02-20 17:34     ` Jeff Hostetler
2024-02-13 20:52 ` [PATCH 02/12] name-hash: add index_dir_exists2() Jeff Hostetler via GitGitGadget
2024-02-13 21:43   ` Junio C Hamano
2024-02-20 17:38     ` Jeff Hostetler
2024-02-20 19:34       ` Junio C Hamano
2024-02-15  9:31   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 03/12] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
2024-02-13 20:52 ` [PATCH 04/12] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-20 18:54     ` Jeff Hostetler
2024-02-21 12:54       ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
2024-02-14  1:34   ` Junio C Hamano
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 06/12] fsmonitor: clarify handling of directory events in callback Jeff Hostetler via GitGitGadget
2024-02-14  7:47   ` Junio C Hamano
2024-02-20 18:56     ` Jeff Hostetler
2024-02-20 19:24       ` Junio C Hamano
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-20 19:10     ` Jeff Hostetler
2024-02-13 20:52 ` [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
2024-02-14 16:46   ` Junio C Hamano
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 08/12] fsmonitor: support case-insensitive directory events Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 09/12] fsmonitor: refactor non-directory callback Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 10/12] fsmonitor: support case-insensitive non-directory events Jeff Hostetler via GitGitGadget
2024-02-13 20:52 ` [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 12/12] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 01/16] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
2024-02-23  6:37     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 02/16] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests Jeff Hostetler via GitGitGadget
2024-02-23  8:17     ` Junio C Hamano
2024-02-26 17:12       ` Jeff Hostetler
2024-02-23  3:18   ` [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
2024-02-23  8:18     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 05/16] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
2024-02-23  8:18     ` Junio C Hamano
2024-02-25 12:30     ` Torsten Bögershausen
2024-02-25 17:24       ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
2024-02-25 12:35     ` Torsten Bögershausen
2024-02-23  3:18   ` [PATCH v2 08/16] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions Jeff Hostetler via GitGitGadget
2024-02-23 17:36     ` Junio C Hamano
2024-02-26 18:45       ` Jeff Hostetler
2024-02-23  3:18   ` [PATCH v2 10/16] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
2024-02-23 17:47     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
2024-02-23 17:51     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
2024-02-23 17:53     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 14/16] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
2024-02-23 18:14     ` Junio C Hamano
2024-02-26 20:41       ` Jeff Hostetler
2024-02-26 21:18         ` Junio C Hamano
2024-02-25 13:10     ` Torsten Bögershausen
2024-02-26 20:47       ` Jeff Hostetler
2024-02-23  3:18   ` [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
2024-02-23 18:18     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 16/16] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 01/14] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 02/14] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 03/14] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 04/14] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 05/14] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 06/14] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 07/14] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 08/14] fsmonitor: move untracked-cache invalidation into helper functions Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 09/14] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 10/14] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
2024-03-06 12:58       ` Patrick Steinhardt
2024-02-26 21:39     ` [PATCH v3 12/14] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 13/14] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 14/14] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
2024-03-06 12:58       ` Patrick Steinhardt
2024-02-27  1:40     ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Junio C Hamano
2024-03-06 12:58     ` Patrick Steinhardt
2024-03-06 17:09       ` Junio C Hamano
2024-03-06 18:10       ` Jeff Hostetler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.