Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* Design issue in git merge driver interface
@ 2023-06-22 16:50 Joshua Hudson
  2023-06-22 19:12 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Joshua Hudson @ 2023-06-22 16:50 UTC (permalink / raw
  To: git

We had a major incident here involving bad merges. It was traced to a 
problem with a custom merge driver.

On studying why the merge driver failed, we found it was in an 
unrunnable state (the logical equivalent of calling abort()) and could 
not mark files as conflicted. Since the users were using a graphical 
frontend the error messages disappeared into the bitbucket.

I've come to the definite conclusion that treating the file as 
conflicted was wrong and the merge needed to bail out in the middle as 
though git merge itself bombed. But there doesn't seem to be any way in 
the protocol to actually do it.

Looking at the merge driver found that some things cannot be handled, 
such as OOM condition. The fault has to propagate upwards, unwinding as 
it goes.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Design issue in git merge driver interface
  2023-06-22 16:50 Design issue in git merge driver interface Joshua Hudson
@ 2023-06-22 19:12 ` Junio C Hamano
  2023-06-23  0:33   ` [PATCH] ll-merge: killing the external merge driver aborts the merge Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-06-22 19:12 UTC (permalink / raw
  To: Joshua Hudson; +Cc: git, Elijah Newren

Joshua Hudson <jhudson@cedaron.com> writes:

> Looking at the merge driver found that some things cannot be handled,
> such as OOM condition. The fault has to propagate upwards, unwinding
> as it goes.

Even though the end-user facing documentation says:

    The merge driver is expected to leave the result of the merge in
    the file named with `%A` by overwriting it, and exit with zero
    status if it managed to merge them cleanly, or non-zero if there
    were conflicts.

the ll-merge.c:ll_ext_merge() function that calls an external merge
driver does this:

        static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
                ...
                status = run_command(&child);
                ...
                ret = (status > 0) ? LL_MERGE_CONFLICT : status;
                return ret;
        }

so a true "failure" from run_command() to run the external merge
driver will be noticed as a failure by the upper layer of the
callchain.  merge-ort.c:merge_3way() relays the return value of
ll-merge.c:ll_merge() and merge-ort.c:handle_content_merge() reacts
to a negative return as an _("Failed to execute internal merge")
error, for example.  merge-recursive uses the same logic.

Unfortunately, I see no provision for the merge driver to actively
signal such a condition.  The return value of run_command() is a
return value from run-command.c:wait_or_whine() and exit status of
the process is cleansed with WEXITSTATUS() so we cannot make it
negative X-<.

In the worst case, we may retroactively have to reserve one exit
status so that the external merge driver can actively say "I give
up" to cause LL_MERGE_ERROR to be returned from the codepath, but I
wonder if it is safe to abuse "exit due to signal" (which shows up
as a return value greater than 128) as such a "merge driver went
away without leaving a useful result"?  Elijah, what do you think?

Stepping back a bit and even disregarding such a merge driver that
OOMs, if a long-running merge driver is killed, by definition we
cannot trust what the driver left on the filesystem, so handling
"exit due to signal" case differently does sound like a sensible
thing to do, at least to me, offhand.

And once we have such an enhancement to the ll-ext-merge interface,
a merge driver that voluntarily "gives up" can send a signal to kill
itself (or call abort(3)).

With a tentative commit log message (which would need to be updated
to mention what the triggering topic was that led to this
enhancement) but without associated documentation update and test,
here is to summarize and illustrate the above idea.

----- >8 ---------- >8 ---------- >8 -----
ll-merge: external merge driver died with a signal causes an error

When an external merge driver dies with a signal, we should not
expect that the result left on the filesystem is in any useful
state.  However, because the current code uses the return value from
run_command() and declares any positive value as a sign that the
driver successfully left conflicts in the result, and because the
return value from run_command() for a subprocess that died upon a
signal is positive, we end up treating whatever garbage left on the
filesystem as the result the merge driver wanted to leave us.

run_command() returns larger than 128 (WTERMSIG(status) + 128, to be
exact) when it notices that the subprocess died with a signal, so
detect such a case and return LL_MERGE_ERROR from ll_ext_merge().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 ll-merge.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git c/ll-merge.c w/ll-merge.c
index 07ec16e8e5..5599f55ffc 100644
--- c/ll-merge.c
+++ w/ll-merge.c
@@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
 		unlink_or_warn(temp[i]);
 	strbuf_release(&cmd);
 	strbuf_release(&path_sq);
-	ret = (status > 0) ? LL_MERGE_CONFLICT : status;
+
+	if (!status)
+		ret = LL_MERGE_OK;
+	else if (status <= 128)
+		ret = LL_MERGE_CONFLICT;
+	else
+		/* died due to a signal: WTERMSIG(status) + 128 */
+		ret = LL_MERGE_ERROR;
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-22 19:12 ` Junio C Hamano
@ 2023-06-23  0:33   ` Junio C Hamano
  2023-06-23  6:25     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-06-23  0:33 UTC (permalink / raw
  To: git; +Cc: Joshua Hudson, Elijah Newren

When an external merge driver dies with a signal, we should not
expect that the result left on the filesystem is in any useful
state.  However, because the current code uses the return value from
run_command() and declares any positive value as a sign that the
driver successfully left conflicts in the result, and because the
return value from run_command() for a subprocess that died upon a
signal is positive, we end up treating whatever garbage left on the
filesystem as the result the merge driver wanted to leave us.

run_command() returns larger than 128 (WTERMSIG(status) + 128, to be
exact) when it notices that the subprocess died with a signal, so
detect such a case and return LL_MERGE_ERROR from ll_ext_merge().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This time with an updated title, a minimal documentation, and an
   additional test.

 Documentation/gitattributes.txt |  5 ++++-
 ll-merge.c                      |  9 ++++++++-
 t/t6406-merge-attr.sh           | 23 +++++++++++++++++++++++
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 02a3ec83e4..6deb89a296 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1132,7 +1132,10 @@ size (see below).
 The merge driver is expected to leave the result of the merge in
 the file named with `%A` by overwriting it, and exit with zero
 status if it managed to merge them cleanly, or non-zero if there
-were conflicts.
+were conflicts.  When the driver crashes (e.g. killed by SEGV),
+it is expected to exit with non-zero status that are higher than
+128, and in such a case, the merge results in a failure (which is
+different from producing a conflict).
 
 The `merge.*.recursive` variable specifies what other merge
 driver to use when the merge driver is called for an internal
diff --git a/ll-merge.c b/ll-merge.c
index 07ec16e8e5..ba45aa2f79 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
 		unlink_or_warn(temp[i]);
 	strbuf_release(&cmd);
 	strbuf_release(&path_sq);
-	ret = (status > 0) ? LL_MERGE_CONFLICT : status;
+
+	if (!status)
+		ret = LL_MERGE_OK;
+	else if (status <= 128)
+		ret = LL_MERGE_CONFLICT;
+	else
+		/* died due to a signal: WTERMSIG(status) + 128 */
+		ret = LL_MERGE_ERROR;
 	return ret;
 }
 
diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
index 5e4e4dd6d9..b50aedbc00 100755
--- a/t/t6406-merge-attr.sh
+++ b/t/t6406-merge-attr.sh
@@ -56,6 +56,12 @@ test_expect_success setup '
 	) >"$ours+"
 	cat "$ours+" >"$ours"
 	rm -f "$ours+"
+
+	if test -f ./please-abort
+	then
+		echo >>./please-abort killing myself
+		kill -9 $$
+	fi
 	exit "$exit"
 	EOF
 	chmod +x ./custom-merge
@@ -162,6 +168,23 @@ test_expect_success 'custom merge backend' '
 	rm -f $o $a $b
 '
 
+test_expect_success 'custom merge driver that is killed with a signal' '
+	test_when_finished "rm -f output please-abort" &&
+
+	git reset --hard anchor &&
+	git config --replace-all \
+	merge.custom.driver "./custom-merge %O %A %B 0 %P" &&
+	git config --replace-all \
+	merge.custom.name "custom merge driver for testing" &&
+
+	>./please-abort &&
+	echo "* merge=custom" >.gitattributes &&
+	test_must_fail git merge main &&
+	git ls-files -u >output &&
+	git diff --name-only HEAD >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success 'up-to-date merge without common ancestor' '
 	git init repo1 &&
 	git init repo2 &&
-- 
2.41.0-159-g0bfa463d37


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-23  0:33   ` [PATCH] ll-merge: killing the external merge driver aborts the merge Junio C Hamano
@ 2023-06-23  6:25     ` Elijah Newren
  2023-06-23 16:26       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2023-06-23  6:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Joshua Hudson

On Thu, Jun 22, 2023 at 5:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> When an external merge driver dies with a signal, we should not
> expect that the result left on the filesystem is in any useful
> state.  However, because the current code uses the return value from
> run_command() and declares any positive value as a sign that the
> driver successfully left conflicts in the result, and because the
> return value from run_command() for a subprocess that died upon a
> signal is positive, we end up treating whatever garbage left on the
> filesystem as the result the merge driver wanted to leave us.

Yeah, I think the tradition was exit code == number of conflicts for
some merge processes.  Not particularly useful when the driver died
from some signal.

> run_command() returns larger than 128 (WTERMSIG(status) + 128, to be
> exact) when it notices that the subprocess died with a signal, so
> detect such a case and return LL_MERGE_ERROR from ll_ext_merge().

Makes sense.

>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * This time with an updated title, a minimal documentation, and an
>    additional test.
>
>  Documentation/gitattributes.txt |  5 ++++-
>  ll-merge.c                      |  9 ++++++++-
>  t/t6406-merge-attr.sh           | 23 +++++++++++++++++++++++
>  3 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 02a3ec83e4..6deb89a296 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -1132,7 +1132,10 @@ size (see below).
>  The merge driver is expected to leave the result of the merge in
>  the file named with `%A` by overwriting it, and exit with zero
>  status if it managed to merge them cleanly, or non-zero if there
> -were conflicts.
> +were conflicts.  When the driver crashes (e.g. killed by SEGV),
> +it is expected to exit with non-zero status that are higher than
> +128, and in such a case, the merge results in a failure (which is
> +different from producing a conflict).

Looks good.

>  The `merge.*.recursive` variable specifies what other merge
>  driver to use when the merge driver is called for an internal
> diff --git a/ll-merge.c b/ll-merge.c
> index 07ec16e8e5..ba45aa2f79 100644
> --- a/ll-merge.c
> +++ b/ll-merge.c
> @@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
>                 unlink_or_warn(temp[i]);
>         strbuf_release(&cmd);
>         strbuf_release(&path_sq);
> -       ret = (status > 0) ? LL_MERGE_CONFLICT : status;
> +
> +       if (!status)
> +               ret = LL_MERGE_OK;
> +       else if (status <= 128)
> +               ret = LL_MERGE_CONFLICT;
> +       else
> +               /* died due to a signal: WTERMSIG(status) + 128 */
> +               ret = LL_MERGE_ERROR;
>         return ret;
>  }

Likewise.

> diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
> index 5e4e4dd6d9..b50aedbc00 100755
> --- a/t/t6406-merge-attr.sh
> +++ b/t/t6406-merge-attr.sh
> @@ -56,6 +56,12 @@ test_expect_success setup '
>         ) >"$ours+"
>         cat "$ours+" >"$ours"
>         rm -f "$ours+"
> +
> +       if test -f ./please-abort
> +       then
> +               echo >>./please-abort killing myself
> +               kill -9 $$
> +       fi
>         exit "$exit"
>         EOF
>         chmod +x ./custom-merge
> @@ -162,6 +168,23 @@ test_expect_success 'custom merge backend' '
>         rm -f $o $a $b
>  '
>
> +test_expect_success 'custom merge driver that is killed with a signal' '
> +       test_when_finished "rm -f output please-abort" &&
> +
> +       git reset --hard anchor &&
> +       git config --replace-all \
> +       merge.custom.driver "./custom-merge %O %A %B 0 %P" &&
> +       git config --replace-all \
> +       merge.custom.name "custom merge driver for testing" &&
> +
> +       >./please-abort &&
> +       echo "* merge=custom" >.gitattributes &&
> +       test_must_fail git merge main &&
> +       git ls-files -u >output &&
> +       git diff --name-only HEAD >>output &&
> +       test_must_be_empty output
> +'
> +

I was about to comment that we needed to clean up the please-abort
file, then realized I just missed it in my first reading of the
test_when_finished line.  So, patch looks good.

Reviewed-by: Elijah Newren <newren@gmail.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-23  6:25     ` Elijah Newren
@ 2023-06-23 16:26       ` Junio C Hamano
  2023-06-23 23:31         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-06-23 16:26 UTC (permalink / raw
  To: Elijah Newren; +Cc: git, Joshua Hudson

Elijah Newren <newren@gmail.com> writes:

> Reviewed-by: Elijah Newren <newren@gmail.com>


Thanks for a quick review.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-23 16:26       ` Junio C Hamano
@ 2023-06-23 23:31         ` Junio C Hamano
  2023-06-25 13:40           ` Joshua Hudson
  2023-06-27 12:02           ` Johannes Schindelin
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2023-06-23 23:31 UTC (permalink / raw
  To: git; +Cc: Elijah Newren, Joshua Hudson

Junio C Hamano <gitster@pobox.com> writes:

> Elijah Newren <newren@gmail.com> writes:
>
>> Reviewed-by: Elijah Newren <newren@gmail.com>
>
>
> Thanks for a quick review.

Unfortunately Windows does not seem to correctly detect the aborting
merge driver.  Does run_command() there report process death due to
signals differently, I wonder?

https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285

shows that on Windows, aborted external merge driver is not noticed
and we happily take the auto-merged result, ouch.

I am tempted to protect this step of the test with a prerequisite to
skip it on Windows for now.  Anybody with better idea?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-23 23:31         ` Junio C Hamano
@ 2023-06-25 13:40           ` Joshua Hudson
  2023-06-27 12:02           ` Johannes Schindelin
  1 sibling, 0 replies; 15+ messages in thread
From: Joshua Hudson @ 2023-06-25 13:40 UTC (permalink / raw
  To: Junio C Hamano, git; +Cc: Elijah Newren

On 6/23/2023 4:31 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Elijah Newren <newren@gmail.com> writes:
>>
>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>
>> Thanks for a quick review.
> Unfortunately Windows does not seem to correctly detect the aborting
> merge driver.  Does run_command() there report process death due to
> signals differently, I wonder?
>
> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285
>
> shows that on Windows, aborted external merge driver is not noticed
> and we happily take the auto-merged result, ouch.
>
> I am tempted to protect this step of the test with a prerequisite to
> skip it on Windows for now.  Anybody with better idea?
>
> Thanks.

I would suggest putting in the correct test harness on Windows. abort() 
doesn't work very well.

(Sample code only--read description below)

#ifdef WIN32
     TerminateProcess(GetCurrentProcess(), 131); /* something that looks 
like it passes the test */
     TerminateProcess(GetCurrentProcess(), 0x80070485); /* actual exit 
code for process that cannot start because its missing a shared library */
#else
     abort();
#endif

I only want to fight so hard with the Unix to Windows translation layer 
you use. Strictly speaking, the second TerminateProcess() line is what 
should be in the test harness, but if it doesn't work go with the first 
one. Then I at least have something to work with.

I'm not going to lie to you. We are doing development on Windows, and 
the merge driver is written using a different portability layer. I am 
prepared to build a native shim to make it work if I have to. I am not 
in a position where I can build git and test any development code; 
getting the fix to us will be a long journey.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-23 23:31         ` Junio C Hamano
  2023-06-25 13:40           ` Joshua Hudson
@ 2023-06-27 12:02           ` Johannes Schindelin
  2023-06-27 13:43             ` Joshua Hudson
  2023-06-27 19:08             ` Junio C Hamano
  1 sibling, 2 replies; 15+ messages in thread
From: Johannes Schindelin @ 2023-06-27 12:02 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Elijah Newren, Joshua Hudson

Hi,


On Fri, 23 Jun 2023, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Elijah Newren <newren@gmail.com> writes:
> >
> >> Reviewed-by: Elijah Newren <newren@gmail.com>
> >
> >
> > Thanks for a quick review.
>
> Unfortunately Windows does not seem to correctly detect the aborting
> merge driver.  Does run_command() there report process death due to
> signals differently, I wonder?
>
> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285
>
> shows that on Windows, aborted external merge driver is not noticed
> and we happily take the auto-merged result, ouch.

Hmm. I tried to verify this, but failed. With this patch:

```diff
diff --git a/git.c b/git.c
index 2f42da20f4e0..3c513e3f2cb1 100644
--- a/git.c
+++ b/git.c
@@ -330,6 +330,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1);
 			if (envchanged)
 				*envchanged = 1;
+		} else if (!strcmp(cmd, "--abort")) {
+			abort();
 		} else {
 			fprintf(stderr, _("unknown option: %s\n"), cmd);
 			usage(git_usage_string);
```

I get this:


```console
$ ./git.exe --abort

$ echo $?
3
```

For that reason, I am somehow doubtful that the `abort()` is actually
called?!?

Ciao,
Johannes

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 12:02           ` Johannes Schindelin
@ 2023-06-27 13:43             ` Joshua Hudson
  2023-06-27 19:08             ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Joshua Hudson @ 2023-06-27 13:43 UTC (permalink / raw
  To: Johannes Schindelin, Junio C Hamano; +Cc: git, Elijah Newren


On 6/27/2023 5:02 AM, Johannes Schindelin wrote:
> Hi,
>
>
> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Elijah Newren <newren@gmail.com> writes:
>>>
>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>
>>> Thanks for a quick review.
>> Unfortunately Windows does not seem to correctly detect the aborting
>> merge driver.  Does run_command() there report process death due to
>> signals differently, I wonder?
>>
>> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285
>>
>> shows that on Windows, aborted external merge driver is not noticed
>> and we happily take the auto-merged result, ouch.
> Hmm. I tried to verify this, but failed. With this patch:
>
> ```diff
> diff --git a/git.c b/git.c
> index 2f42da20f4e0..3c513e3f2cb1 100644
> --- a/git.c
> +++ b/git.c
> @@ -330,6 +330,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>   			setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1);
>   			if (envchanged)
>   				*envchanged = 1;
> +		} else if (!strcmp(cmd, "--abort")) {
> +			abort();
>   		} else {
>   			fprintf(stderr, _("unknown option: %s\n"), cmd);
>   			usage(git_usage_string);
> ```
>
> I get this:
>
>
> ```console
> $ ./git.exe --abort
>
> $ echo $?
> 3
> ```
>
> For that reason, I am somehow doubtful that the `abort()` is actually
> called?!?
>
> Ciao,
> Johannes

abort(); does _exit(3); on Windows because there are no signals. This is 
easily changed by providing abort like so:

void abort() { _exit(131 /* or whatever else you think goes here */); }


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 12:02           ` Johannes Schindelin
  2023-06-27 13:43             ` Joshua Hudson
@ 2023-06-27 19:08             ` Junio C Hamano
  2023-06-27 19:10               ` Joshua Hudson
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-06-27 19:08 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git, Elijah Newren, Joshua Hudson

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>> > Elijah Newren <newren@gmail.com> writes:
>> >
>> >> Reviewed-by: Elijah Newren <newren@gmail.com>
>> >
>> >
>> > Thanks for a quick review.
>>
>> Unfortunately Windows does not seem to correctly detect the aborting

Sorry, I did not mean "abort(3)" literally.  What I meant was that
an external merge driver that gets spawned via the run_command()
interface may not die by calling exit()---like "killed by signal"
(including "segfaulting").  The new test script piece added in the
patch did "kill -9 $$" to kill the external merge driver itself,
which gets reported as "killed by signal" from run_command() by
returning the signal number + 128, but that did not pass Windows CI.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 19:08             ` Junio C Hamano
@ 2023-06-27 19:10               ` Joshua Hudson
  2023-06-27 20:04                 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Joshua Hudson @ 2023-06-27 19:10 UTC (permalink / raw
  To: Junio C Hamano, Johannes Schindelin; +Cc: git, Elijah Newren


On 6/27/2023 12:08 PM, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>
>>>> Elijah Newren <newren@gmail.com> writes:
>>>>
>>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>>
>>>> Thanks for a quick review.
>>> Unfortunately Windows does not seem to correctly detect the aborting
> Sorry, I did not mean "abort(3)" literally.  What I meant was that
> an external merge driver that gets spawned via the run_command()
> interface may not die by calling exit()---like "killed by signal"
> (including "segfaulting").  The new test script piece added in the
> patch did "kill -9 $$" to kill the external merge driver itself,
> which gets reported as "killed by signal" from run_command() by
> returning the signal number + 128, but that did not pass Windows CI.
>
Do you need me to provide a windows test harness?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 19:10               ` Joshua Hudson
@ 2023-06-27 20:04                 ` Junio C Hamano
  2023-06-27 21:14                   ` Joshua Hudson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-06-27 20:04 UTC (permalink / raw
  To: Joshua Hudson; +Cc: Johannes Schindelin, git, Elijah Newren

Joshua Hudson <jhudson@cedaron.com> writes:

> On 6/27/2023 12:08 PM, Junio C Hamano wrote:
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>>> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>>>
>>>> Junio C Hamano <gitster@pobox.com> writes:
>>>>
>>>>> Elijah Newren <newren@gmail.com> writes:
>>>>>
>>>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>>>
>>>>> Thanks for a quick review.
>>>> Unfortunately Windows does not seem to correctly detect the aborting
>> Sorry, I did not mean "abort(3)" literally.  What I meant was that
>> an external merge driver that gets spawned via the run_command()
>> interface may not die by calling exit()---like "killed by signal"
>> (including "segfaulting").  The new test script piece added in the
>> patch did "kill -9 $$" to kill the external merge driver itself,
>> which gets reported as "killed by signal" from run_command() by
>> returning the signal number + 128, but that did not pass Windows CI.
>>
> Do you need me to provide a windows test harness?

Sorry, I do not understand the question.

FWIW how "external merge driver that kills itself by sending a
signal to itself does not get noticed on Windows" appears in our
tests can be seen at

https://github.com/git/git/actions/runs/5360824580/jobs/9727137272

The job is "win test(0)", part of our standard Windows test harness
implemented as part of our GitHub Actions CI test.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 20:04                 ` Junio C Hamano
@ 2023-06-27 21:14                   ` Joshua Hudson
  2023-06-27 21:26                     ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Joshua Hudson @ 2023-06-27 21:14 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Elijah Newren


On 6/27/2023 1:04 PM, Junio C Hamano wrote:
> Joshua Hudson <jhudson@cedaron.com> writes:
>
>> On 6/27/2023 12:08 PM, Junio C Hamano wrote:
>>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>>
>>>> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>>>>
>>>>> Junio C Hamano <gitster@pobox.com> writes:
>>>>>
>>>>>> Elijah Newren <newren@gmail.com> writes:
>>>>>>
>>>>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>>>> Thanks for a quick review.
>>>>> Unfortunately Windows does not seem to correctly detect the aborting
>>> Sorry, I did not mean "abort(3)" literally.  What I meant was that
>>> an external merge driver that gets spawned via the run_command()
>>> interface may not die by calling exit()---like "killed by signal"
>>> (including "segfaulting").  The new test script piece added in the
>>> patch did "kill -9 $$" to kill the external merge driver itself,
>>> which gets reported as "killed by signal" from run_command() by
>>> returning the signal number + 128, but that did not pass Windows CI.
>>>
>> Do you need me to provide a windows test harness?
> Sorry, I do not understand the question.
>
> FWIW how "external merge driver that kills itself by sending a
> signal to itself does not get noticed on Windows" appears in our
> tests can be seen at
>
> https://github.com/git/git/actions/runs/5360824580/jobs/9727137272
>
> The job is "win test(0)", part of our standard Windows test harness
> implemented as part of our GitHub Actions CI test.
>
> Thanks.
Try changing kill -9 $$ to exit 137 # 128 + 9

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 21:14                   ` Joshua Hudson
@ 2023-06-27 21:26                     ` Junio C Hamano
  2023-06-27 21:32                       ` Joshua Hudson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2023-06-27 21:26 UTC (permalink / raw
  To: Joshua Hudson; +Cc: Johannes Schindelin, git, Elijah Newren

Joshua Hudson <jhudson@cedaron.com> writes:

> Try changing kill -9 $$ to exit 137 # 128 + 9

Yeah, but then (1) we are not simulating a case where the external
merge driver hits a segfault or receives a signal from outside and
dies involuntarily, and (2) we are codifying that even on Windows,
program that was killed by signal N must exit with 128 + N, and
these are the reasons why I did not go that route.

Stepping back a bit, how does one typically diagnose programatically
on Windows, after "spawning" a separate program, if the program died
involuntarily and/or got killed?  It does not have to be "exit with
128 + signal number"---as long as it can be done programatically and
reliably, we would be happy.  The code to diagnose how the spawned
program exited in run_command(), which is in finish_command() and
then in wait_or_whine(), may have to be updated with such a piece of
Windows specific knowledge.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] ll-merge: killing the external merge driver aborts the merge
  2023-06-27 21:26                     ` Junio C Hamano
@ 2023-06-27 21:32                       ` Joshua Hudson
  0 siblings, 0 replies; 15+ messages in thread
From: Joshua Hudson @ 2023-06-27 21:32 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Elijah Newren

On 6/27/2023 2:26 PM, Junio C Hamano wrote:
> Joshua Hudson <jhudson@cedaron.com> writes:
>
>> Try changing kill -9 $$ to exit 137 # 128 + 9
> Yeah, but then (1) we are not simulating a case where the external
> merge driver hits a segfault or receives a signal from outside and
> dies involuntarily, and (2) we are codifying that even on Windows,
> program that was killed by signal N must exit with 128 + N, and
> these are the reasons why I did not go that route.
>
> Stepping back a bit, how does one typically diagnose programatically
> on Windows, after "spawning" a separate program, if the program died
> involuntarily and/or got killed?  It does not have to be "exit with
> 128 + signal number"---as long as it can be done programatically and
> reliably, we would be happy.  The code to diagnose how the spawned
> program exited in run_command(), which is in finish_command() and
> then in wait_or_whine(), may have to be updated with such a piece of
> Windows specific knowledge.

abort() => 3

Killed => no you can't detect it

Faulted => exit code has the high bit set ( >= 0x8000000 )

My starting off with "the logical equivalent of calling abort()" has 
proven to be an unfortunate word choice. I need to harden up the exit 
pathway on my side _anyway_. An OOM at least does turn into Faulted.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-06-27 21:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-22 16:50 Design issue in git merge driver interface Joshua Hudson
2023-06-22 19:12 ` Junio C Hamano
2023-06-23  0:33   ` [PATCH] ll-merge: killing the external merge driver aborts the merge Junio C Hamano
2023-06-23  6:25     ` Elijah Newren
2023-06-23 16:26       ` Junio C Hamano
2023-06-23 23:31         ` Junio C Hamano
2023-06-25 13:40           ` Joshua Hudson
2023-06-27 12:02           ` Johannes Schindelin
2023-06-27 13:43             ` Joshua Hudson
2023-06-27 19:08             ` Junio C Hamano
2023-06-27 19:10               ` Joshua Hudson
2023-06-27 20:04                 ` Junio C Hamano
2023-06-27 21:14                   ` Joshua Hudson
2023-06-27 21:26                     ` Junio C Hamano
2023-06-27 21:32                       ` Joshua Hudson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).