Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* tests broken with curl-8.7.0
@ 2024-03-30  0:02 Jeff King
  2024-03-30  8:54 ` Daniel Stenberg
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2024-03-30  0:02 UTC (permalink / raw
  To: git; +Cc: Daniel Stenberg

I noticed some http-related failures in the test suite on my Debian
unstable system, which recently got an upgraded curl package. It looks
like it's related to cases where we use the remote-curl helper in
"connect" mode (i.e., protocol v2) and the http buffer is small
(requiring us to stream the data to curl). Besides just running t5551,
an easy reproduction is:

  [this works]
  $ git ls-remote https://git.kernel.org/pub/scm/git/git.git | wc -l
  1867

  [this doesn't]
  $ git -c http.postbuffer=65536 ls-remote https://git.kernel.org/pub/scm/git/git.git
  fatal: expected flush after ref listing

The error message comes from ls-remote itself, which was expecting a
FLUSH packet from the remote. Instead it gets the RESPONSE_END from
remote-curl (remember that in connect mode, remote-curl is just ferrying
bytes back and forth between ls-remote and the server).

It works with older versions of libcurl, but not 8.7.0 (or 8.7.1).
Bisecting in libcurl points to 9369c30cd (lib: Curl_read/Curl_write
clarifications, 2024-02-15).

Running with GIT_TRACE_CURL=1 shows weirdness on the POST we send to
issue the ls-refs command. With older curl, I see this:

  => Send header: POST /pub/scm/git/git.git/git-upload-pack HTTP/1.1
  => Send header: Host: git.kernel.org
  => Send header: User-Agent: git/2.44.0.789.g252ee96bc5.dirty
  => Send header: Accept-Encoding: deflate, gzip
  => Send header: Content-Type: application/x-git-upload-pack-request
  => Send header: Accept: application/x-git-upload-pack-result
  => Send header: Git-Protocol: version=2
  => Send header: Transfer-Encoding: chunked
  => Send header:
  => Send data: 14..0014command=ls-refs...
  => Send data: 2a..002aagent=git/2.44.0.789.g252ee96bc5.dirty..
  [and so on until...]
  == Info: Signaling end of chunked upload via terminating chunk.

But with the broken version, I get:

  => Send header: POST /pub/scm/git/git.git/git-upload-pack HTTP/1.1
  => Send header: Host: git.kernel.org
  => Send header: User-Agent: git/2.44.0.789.g252ee96bc5.dirty
  => Send header: Accept-Encoding: deflate, gzip, br, zstd
  => Send header: Content-Type: application/x-git-upload-pack-request
  => Send header: Accept: application/x-git-upload-pack-result
  => Send header: Git-Protocol: version=2
  => Send header: Transfer-Encoding: chunked
  => Send header:
  => Send data, 0000000014 bytes (0x0000000e)
  => Send data: 4..0014..0....
  == Info: upload completely sent off: 14 bytes

So we only get the first 4 bytes, and then we quit (the double mention
of 14 is confusing, but I think it is both the size of the pkt-line
("command=ls-refs\n") but also the length of the 4-byte string when
framed with chunked transfer-encoding). Those 4 bytes are the first
thing returned by rpc_out(), which we use as our CURLOPT_READFUNCTION.

It's possible that we're doing something wrong with our read/write
function callbacks. But I don't see how; we say "here's 4 bytes", but
then we never get called again. It's like curl is giving up on trying to
read the post input early for some reason.

I'm not sure how to dig further. That commit is pretty big and scary. I
did check that the tip of master in curl.git is still affected (I'd
hoped maybe the 0-length write fixes in b30d694a027 would be related,
but that's not it).

Ideas?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: tests broken with curl-8.7.0
  2024-03-30  0:02 tests broken with curl-8.7.0 Jeff King
@ 2024-03-30  8:54 ` Daniel Stenberg
  2024-04-02 20:02   ` [PATCH 0/2] git+curl 8.7.0 workaround Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Stenberg @ 2024-03-30  8:54 UTC (permalink / raw
  To: Jeff King; +Cc: git

On Fri, 29 Mar 2024, Jeff King wrote:

> I noticed some http-related failures in the test suite on my Debian unstable 
> system, which recently got an upgraded curl package. It looks like it's 
> related to cases where we use the remote-curl helper in "connect" mode 
> (i.e., protocol v2) and the http buffer is small (requiring us to stream the 
> data to curl). Besides just running t5551, an easy reproduction is:

This smells like a libcurl regression to me. I "imported" this into our issue 
tracker here: https://github.com/curl/curl/issues/13229

-- 

  / daniel.haxx.se

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/2] git+curl 8.7.0 workaround
  2024-03-30  8:54 ` Daniel Stenberg
@ 2024-04-02 20:02   ` Jeff King
  2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
                       ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jeff King @ 2024-04-02 20:02 UTC (permalink / raw
  To: git; +Cc: Daniel Stenberg

On Sat, Mar 30, 2024 at 09:54:02AM +0100, Daniel Stenberg wrote:

> On Fri, 29 Mar 2024, Jeff King wrote:
> 
> > I noticed some http-related failures in the test suite on my Debian
> > unstable system, which recently got an upgraded curl package. It looks
> > like it's related to cases where we use the remote-curl helper in
> > "connect" mode (i.e., protocol v2) and the http buffer is small
> > (requiring us to stream the data to curl). Besides just running t5551,
> > an easy reproduction is:
> 
> This smells like a libcurl regression to me. I "imported" this into our
> issue tracker here: https://github.com/curl/curl/issues/13229

This was all resolved in that issue, but just to summarize for the list
here: it was a regression in curl and there's a fix already. Thanks
Daniel for your (as usual) prompt digging into the problem (and likewise
to Stefan for the actual fix).

Ultimately the issue will be fixed by moving to a different version of
libcurl, but here's an easy workaround in the meantime, with a small doc
cleanup I found along the way.

  [1/2]: http: reset POSTFIELDSIZE when clearing curl handle
  [2/2]: INSTALL: bump libcurl version to 7.21.3

 INSTALL | 2 +-
 http.c  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle
  2024-04-02 20:02   ` [PATCH 0/2] git+curl 8.7.0 workaround Jeff King
@ 2024-04-02 20:05     ` Jeff King
  2024-04-02 20:27       ` Junio C Hamano
                         ` (2 more replies)
  2024-04-02 20:06     ` [PATCH 2/2] INSTALL: bump libcurl version to 7.21.3 Jeff King
                       ` (2 subsequent siblings)
  3 siblings, 3 replies; 15+ messages in thread
From: Jeff King @ 2024-04-02 20:05 UTC (permalink / raw
  To: git; +Cc: Daniel Stenberg

In get_active_slot(), we return a CURL handle that may have been used
before (reusing them is good because it lets curl reuse the same
connection across many requests). We set a few curl options back to
defaults that may have been modified by previous requests.

We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
defaults to "-1"). This usually doesn't matter because most POSTs will
set both fields together anyway. But there is one exception: when
handling a large request in remote-curl's post_rpc(), we don't set
_either_, and instead set a READFUNCTION to stream data into libcurl.

This can interact weirdly with a stale POSTFIELDSIZE setting, because
curl will assume it should read only some set number of bytes from our
READFUNCTION. However, it has worked in practice because we also
manually set a "Transfer-Encoding: chunked" header, which libcurl uses
as a clue to set the POSTFIELDSIZE to -1 itself.

So everything works, but we're better off resetting the size manually
for a few reasons:

  - there was a regression in curl 8.7.0 where the chunked header
    detection didn't kick in, causing any large HTTP requests made by
    Git to fail. This has since been fixed (but not yet released). In
    the issue, curl folks recommended setting it explicitly to -1:

      https://github.com/curl/curl/issues/13229#issuecomment-2029826058

    and it indeed works around the regression. So even though it won't
    be strictly necessary after the fix there, this will help folks who
    end up using the affected libcurl versions.

  - it's consistent with what a new curl handle would look like. Since
    get_active_slot() may or may not return a used handle, this reduces
    the possibility of heisenbugs that only appear with certain request
    patterns.

Note that the recommendation in the curl issue is to actually drop the
manual Transfer-Encoding header. Modern libcurl will add the header
itself when streaming from a READFUNCTION. However, that code wasn't
added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
support back to 7.19.5, so those older versions still need the manual
header.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/http.c b/http.c
index e73b136e58..3d80bd6116 100644
--- a/http.c
+++ b/http.c
@@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
 	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
+	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
 	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
 	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
 	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
-- 
2.44.0.789.g5ea01f6724


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] INSTALL: bump libcurl version to 7.21.3
  2024-04-02 20:02   ` [PATCH 0/2] git+curl 8.7.0 workaround Jeff King
  2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
@ 2024-04-02 20:06     ` Jeff King
  2024-04-02 20:21     ` [PATCH 0/2] git+curl 8.7.0 workaround rsbecker
  2024-04-05 20:04     ` [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl Jeff King
  3 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2024-04-02 20:06 UTC (permalink / raw
  To: git; +Cc: Daniel Stenberg

Our documentation claims we support curl versions back to 7.19.5. But we
can no longer compile with that version since adding an unconditional
use of CURLOPT_RESOLVE in 511cfd3bff (http: add custom hostname to IP
address resolutions, 2022-05-16). That feature wasn't added to libcurl
until 7.21.3.

We could add #ifdefs to make this work back to 7.19.5. But given that
nobody noticed the compilation failure in the intervening two years, it
makes more sense to bump the version in the documentation to 7.21.3
(which is itself over 13 years old).

We could perhaps go forward even more (which would let us drop some
cruft from git-curl-compat.h), but this should be an obviously safe
jump, and we can move forward later.

Note that user-visible syntax for CURLOPT_RESOLVE has grown new features
in subsequent curl versions. Our documentation mentions "+" and "-"
entries, which require more recent versions than 7.21.3. We could
perhaps clarify that in our docs, but it's probably not worth cluttering
them with restrictions of ancient curl versions.

Signed-off-by: Jeff King <peff@peff.net>
---
 INSTALL | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/INSTALL b/INSTALL
index c6fb240c91..2a46d04592 100644
--- a/INSTALL
+++ b/INSTALL
@@ -139,7 +139,7 @@ Issues of note:
 	  not need that functionality, use NO_CURL to build without
 	  it.
 
-	  Git requires version "7.19.5" or later of "libcurl" to build
+	  Git requires version "7.21.3" or later of "libcurl" to build
 	  without NO_CURL. This version requirement may be bumped in
 	  the future.
 
-- 
2.44.0.789.g5ea01f6724

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH 0/2] git+curl 8.7.0 workaround
  2024-04-02 20:02   ` [PATCH 0/2] git+curl 8.7.0 workaround Jeff King
  2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
  2024-04-02 20:06     ` [PATCH 2/2] INSTALL: bump libcurl version to 7.21.3 Jeff King
@ 2024-04-02 20:21     ` rsbecker
  2024-04-02 20:31       ` Jeff King
  2024-04-05 20:04     ` [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl Jeff King
  3 siblings, 1 reply; 15+ messages in thread
From: rsbecker @ 2024-04-02 20:21 UTC (permalink / raw
  To: 'Jeff King', git; +Cc: 'Daniel Stenberg'

On Tuesday, April 2, 2024 4:03 PM, Peff wrote:
>To: git@vger.kernel.org
>Cc: Daniel Stenberg <daniel@haxx.se>
>Subject: [PATCH 0/2] git+curl 8.7.0 workaround
>
>On Sat, Mar 30, 2024 at 09:54:02AM +0100, Daniel Stenberg wrote:
>
>> On Fri, 29 Mar 2024, Jeff King wrote:
>>
>> > I noticed some http-related failures in the test suite on my Debian
>> > unstable system, which recently got an upgraded curl package. It
>> > looks like it's related to cases where we use the remote-curl helper
>> > in "connect" mode (i.e., protocol v2) and the http buffer is small
>> > (requiring us to stream the data to curl). Besides just running
>> > t5551, an easy reproduction is:
>>
>> This smells like a libcurl regression to me. I "imported" this into
>> our issue tracker here: https://github.com/curl/curl/issues/13229
>
>This was all resolved in that issue, but just to summarize for the list
>here: it was a regression in curl and there's a fix already. Thanks Daniel for your (as
>usual) prompt digging into the problem (and likewise to Stefan for the actual fix).
>
>Ultimately the issue will be fixed by moving to a different version of libcurl, but
>here's an easy workaround in the meantime, with a small doc cleanup I found along
>the way.
>
>  [1/2]: http: reset POSTFIELDSIZE when clearing curl handle
>  [2/2]: INSTALL: bump libcurl version to 7.21.3
>
> INSTALL | 2 +-
> http.c  | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)

Do we have an ETA for this fix? That or do we know when curl is planning on resolving this?

Thanks,
Randall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle
  2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
@ 2024-04-02 20:27       ` Junio C Hamano
  2024-04-03  3:20       ` Jeff King
  2024-04-03  6:30       ` Patrick Steinhardt
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2024-04-02 20:27 UTC (permalink / raw
  To: Jeff King; +Cc: git, Daniel Stenberg

Jeff King <peff@peff.net> writes:

> In get_active_slot(), we return a CURL handle that may have been used
> before (reusing them is good because it lets curl reuse the same
> connection across many requests). We set a few curl options back to
> defaults that may have been modified by previous requests.
>
> We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
> defaults to "-1"). This usually doesn't matter because most POSTs will
> set both fields together anyway. But there is one exception: when
> handling a large request in remote-curl's post_rpc(), we don't set
> _either_, and instead set a READFUNCTION to stream data into libcurl.
>
> This can interact weirdly with a stale POSTFIELDSIZE setting, because
> curl will assume it should read only some set number of bytes from our
> READFUNCTION. However, it has worked in practice because we also
> manually set a "Transfer-Encoding: chunked" header, which libcurl uses
> as a clue to set the POSTFIELDSIZE to -1 itself.
>
> So everything works, but we're better off resetting the size manually
> for a few reasons:
>
>   - there was a regression in curl 8.7.0 where the chunked header
>     detection didn't kick in, causing any large HTTP requests made by
>     Git to fail. This has since been fixed (but not yet released). In
>     the issue, curl folks recommended setting it explicitly to -1:
>
>       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
>
>     and it indeed works around the regression. So even though it won't
>     be strictly necessary after the fix there, this will help folks who
>     end up using the affected libcurl versions.
>
>   - it's consistent with what a new curl handle would look like. Since
>     get_active_slot() may or may not return a used handle, this reduces
>     the possibility of heisenbugs that only appear with certain request
>     patterns.
>
> Note that the recommendation in the curl issue is to actually drop the
> manual Transfer-Encoding header. Modern libcurl will add the header
> itself when streaming from a READFUNCTION. However, that code wasn't
> added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> support back to 7.19.5, so those older versions still need the manual
> header.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http.c | 1 +
>  1 file changed, 1 insertion(+)

As always, well articulated.  Thanks.  Will queue.

> diff --git a/http.c b/http.c
> index e73b136e58..3d80bd6116 100644
> --- a/http.c
> +++ b/http.c
> @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
>  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
>  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] git+curl 8.7.0 workaround
  2024-04-02 20:21     ` [PATCH 0/2] git+curl 8.7.0 workaround rsbecker
@ 2024-04-02 20:31       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2024-04-02 20:31 UTC (permalink / raw
  To: rsbecker; +Cc: git, 'Daniel Stenberg'

On Tue, Apr 02, 2024 at 04:21:56PM -0400, rsbecker@nexbridge.com wrote:

> Do we have an ETA for this fix? That or do we know when curl is
> planning on resolving this?

The fix is in curl's master branch already. According to their
docs/RELEASE-PROCEDURE.md, the next release would be May 22. But Daniel
is the more authoritative word. ;)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle
  2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
  2024-04-02 20:27       ` Junio C Hamano
@ 2024-04-03  3:20       ` Jeff King
  2024-04-03  6:30       ` Patrick Steinhardt
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2024-04-03  3:20 UTC (permalink / raw
  To: git; +Cc: Daniel Stenberg

On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote:

> So everything works, but we're better off resetting the size manually
> for a few reasons:
> 
>   - there was a regression in curl 8.7.0 where the chunked header
>     detection didn't kick in, causing any large HTTP requests made by
>     Git to fail. This has since been fixed (but not yet released). In
>     the issue, curl folks recommended setting it explicitly to -1:
> 
>       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
> 
>     and it indeed works around the regression. So even though it won't
>     be strictly necessary after the fix there, this will help folks who
>     end up using the affected libcurl versions.

Hmph. This isn't quite enough to make things work with 8.7.0, because
there are two things wrong there:

  1. curl uses the leftover POSTFIELDSIZE, which this patch fixes. So
     HTTP/1.1 is fine (including t5551).

  2. In HTTP/2 mode, it sends chunked data, even though HTTP/2 does not
     use "Transfer-Encoding: chunked" (it correctly does not send the
     header, but it puts unnecessary chunk framing around the data).

     t5559 (which covers HTTP/2) fails, and so does accessing HTTP/2
     capable hosts like github.com (though score another win for t5559;
     it has introduced some hassles, but it diagnosed a real problem we
     would not have otherwise seen in the test suite).

This second problem can be fixed by eliminating the manual
transfer-encoding header, which is what's confusing curl's HTTP/2 code.
But as I said earlier...

> Note that the recommendation in the curl issue is to actually drop the
> manual Transfer-Encoding header. Modern libcurl will add the header
> itself when streaming from a READFUNCTION. However, that code wasn't
> added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> support back to 7.19.5, so those older versions still need the manual
> header.

...we can't do so unconditionally. So we'd need something like:

diff --git a/remote-curl.c b/remote-curl.c
index 31b02b8840..215bcc6e10 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -955,7 +955,9 @@ static int post_rpc(struct rpc_state *rpc, int stateless_connect, int flush_rece
 		/* The request body is large and the size cannot be predicted.
 		 * We must use chunked encoding to send it.
 		 */
+#if LIBCURL_VERSION_NUM < 0x074200
 		headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
+#endif
 		rpc->initial_buffer = 1;
 		curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
 		curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc);

I was hoping not to have to carry any version-specific cruft (especially
since newer versions of curl should fix the regression). But it's not
too bad, and it actually marks the code as something that we can ditch
in the future when that version of curl becomes obsolete.

I can try to prepare a cleaner patch for it tomorrow, but comments
welcome in the meantime.

-Peff

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle
  2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
  2024-04-02 20:27       ` Junio C Hamano
  2024-04-03  3:20       ` Jeff King
@ 2024-04-03  6:30       ` Patrick Steinhardt
  2024-04-03  6:34         ` Patrick Steinhardt
  2 siblings, 1 reply; 15+ messages in thread
From: Patrick Steinhardt @ 2024-04-03  6:30 UTC (permalink / raw
  To: Jeff King; +Cc: git, Daniel Stenberg

[-- Attachment #1: Type: text/plain, Size: 4596 bytes --]

On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote:
> In get_active_slot(), we return a CURL handle that may have been used
> before (reusing them is good because it lets curl reuse the same
> connection across many requests). We set a few curl options back to
> defaults that may have been modified by previous requests.
> 
> We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
> defaults to "-1"). This usually doesn't matter because most POSTs will
> set both fields together anyway. But there is one exception: when
> handling a large request in remote-curl's post_rpc(), we don't set
> _either_, and instead set a READFUNCTION to stream data into libcurl.
> 
> This can interact weirdly with a stale POSTFIELDSIZE setting, because
> curl will assume it should read only some set number of bytes from our
> READFUNCTION. However, it has worked in practice because we also
> manually set a "Transfer-Encoding: chunked" header, which libcurl uses
> as a clue to set the POSTFIELDSIZE to -1 itself.
> 
> So everything works, but we're better off resetting the size manually
> for a few reasons:
> 
>   - there was a regression in curl 8.7.0 where the chunked header
>     detection didn't kick in, causing any large HTTP requests made by
>     Git to fail. This has since been fixed (but not yet released). In
>     the issue, curl folks recommended setting it explicitly to -1:
> 
>       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
> 
>     and it indeed works around the regression. So even though it won't
>     be strictly necessary after the fix there, this will help folks who
>     end up using the affected libcurl versions.
> 
>   - it's consistent with what a new curl handle would look like. Since
>     get_active_slot() may or may not return a used handle, this reduces
>     the possibility of heisenbugs that only appear with certain request
>     patterns.
> 
> Note that the recommendation in the curl issue is to actually drop the
> manual Transfer-Encoding header. Modern libcurl will add the header
> itself when streaming from a READFUNCTION. However, that code wasn't
> added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> support back to 7.19.5, so those older versions still need the manual
> header.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/http.c b/http.c
> index e73b136e58..3d80bd6116 100644
> --- a/http.c
> +++ b/http.c
> @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
>  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
>  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);

Can't we refactor this code to instead use `curl_easy_reset()`? That
function already resets most of the data we want to reset and would also
end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So
wouldn't the following be a more sensible fix?

diff --git a/http.c b/http.c
index e73b136e58..e5f5bc23db 100644
--- a/http.c
+++ b/http.c
@@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void)
 	slot->finished = NULL;
 	slot->callback_data = NULL;
 	slot->callback_func = NULL;
+	curl_easy_reset(slot->curl);
 	curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file);
 	if (curl_save_cookies)
 		curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file);
 	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header);
 	curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions);
 	curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr);
-	curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
-	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
 	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
-	curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL);
 
 	/*
 	 * Default following to off unless "ALWAYS" is configured; this gives

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle
  2024-04-03  6:30       ` Patrick Steinhardt
@ 2024-04-03  6:34         ` Patrick Steinhardt
  2024-04-03 20:18           ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Steinhardt @ 2024-04-03  6:34 UTC (permalink / raw
  To: Jeff King; +Cc: git, Daniel Stenberg

[-- Attachment #1: Type: text/plain, Size: 5054 bytes --]

On Wed, Apr 03, 2024 at 08:30:54AM +0200, Patrick Steinhardt wrote:
> On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote:
> > In get_active_slot(), we return a CURL handle that may have been used
> > before (reusing them is good because it lets curl reuse the same
> > connection across many requests). We set a few curl options back to
> > defaults that may have been modified by previous requests.
> > 
> > We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
> > defaults to "-1"). This usually doesn't matter because most POSTs will
> > set both fields together anyway. But there is one exception: when
> > handling a large request in remote-curl's post_rpc(), we don't set
> > _either_, and instead set a READFUNCTION to stream data into libcurl.
> > 
> > This can interact weirdly with a stale POSTFIELDSIZE setting, because
> > curl will assume it should read only some set number of bytes from our
> > READFUNCTION. However, it has worked in practice because we also
> > manually set a "Transfer-Encoding: chunked" header, which libcurl uses
> > as a clue to set the POSTFIELDSIZE to -1 itself.
> > 
> > So everything works, but we're better off resetting the size manually
> > for a few reasons:
> > 
> >   - there was a regression in curl 8.7.0 where the chunked header
> >     detection didn't kick in, causing any large HTTP requests made by
> >     Git to fail. This has since been fixed (but not yet released). In
> >     the issue, curl folks recommended setting it explicitly to -1:
> > 
> >       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
> > 
> >     and it indeed works around the regression. So even though it won't
> >     be strictly necessary after the fix there, this will help folks who
> >     end up using the affected libcurl versions.
> > 
> >   - it's consistent with what a new curl handle would look like. Since
> >     get_active_slot() may or may not return a used handle, this reduces
> >     the possibility of heisenbugs that only appear with certain request
> >     patterns.
> > 
> > Note that the recommendation in the curl issue is to actually drop the
> > manual Transfer-Encoding header. Modern libcurl will add the header
> > itself when streaming from a READFUNCTION. However, that code wasn't
> > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> > support back to 7.19.5, so those older versions still need the manual
> > header.
> > 
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  http.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/http.c b/http.c
> > index e73b136e58..3d80bd6116 100644
> > --- a/http.c
> > +++ b/http.c
> > @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
> >  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
> >  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
> >  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> > +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
> >  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
> >  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
> >  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
> 
> Can't we refactor this code to instead use `curl_easy_reset()`? That
> function already resets most of the data we want to reset and would also
> end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So
> wouldn't the following be a more sensible fix?
> 
> diff --git a/http.c b/http.c
> index e73b136e58..e5f5bc23db 100644
> --- a/http.c
> +++ b/http.c
> @@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void)
>  	slot->finished = NULL;
>  	slot->callback_data = NULL;
>  	slot->callback_func = NULL;
> +	curl_easy_reset(slot->curl);
>  	curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file);
>  	if (curl_save_cookies)
>  		curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header);
>  	curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions);
>  	curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr);
> -	curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
> -	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
> -	curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL);
>  
>  	/*
>  	 * Default following to off unless "ALWAYS" is configured; this gives

Oh well, the answer is "no", or at least not as easily as this, as the
failing tests tell us. I guess it resets more data than we actually want
it to reset, but I didn't dig any deeper than that.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle
  2024-04-03  6:34         ` Patrick Steinhardt
@ 2024-04-03 20:18           ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2024-04-03 20:18 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Daniel Stenberg

On Wed, Apr 03, 2024 at 08:34:37AM +0200, Patrick Steinhardt wrote:

> > Can't we refactor this code to instead use `curl_easy_reset()`? That
> > function already resets most of the data we want to reset and would also
> > end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So
> > wouldn't the following be a more sensible fix?
> [...]
> Oh well, the answer is "no", or at least not as easily as this, as the
> failing tests tell us. I guess it resets more data than we actually want
> it to reset, but I didn't dig any deeper than that.

Yeah. The curl setup is really in two parts:

  1. we make a handle in get_curl_handle(), which also sets up a bunch
     of options. We use that to make a single "curl_default" handle, and
     then when we want a new handle we curl_easy_duphandle() that

  2. when we want to make a request we call get_active_slot(), which
     will either return an already-used handle or duphandle() a new one.
     And then reset some options, but also do some more setup.

Your patch touches spot (2), so it's erasing all of the setup done in
(1). I don't think there's a way to say "go back to state when we called
duphandle(), but keep reusing connections, etc".

Possibly it could be solved by pushing all of the setup from (1) into
(2). Though that would probably require separating out some config
handling, etc, from the actual curl_easy_setopt() calls (we wouldn't
want to complain about invalid config for _every_ request, for example,
just once per program run).

This is all also a bit more complicated than it needs to be for
smart-http. The dumb-http code may want to have several handles going at
once to do individual object/pack downloads. Whereas for smart http, we
really are only working with one handle, and doing one request at a
time. I don't know if we'll ever drop dumb-http support, but there would
probably be a lot of cleanup possibilities if we did.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl
  2024-04-02 20:02   ` [PATCH 0/2] git+curl 8.7.0 workaround Jeff King
                       ` (2 preceding siblings ...)
  2024-04-02 20:21     ` [PATCH 0/2] git+curl 8.7.0 workaround rsbecker
@ 2024-04-05 20:04     ` Jeff King
  2024-04-05 21:30       ` Daniel Stenberg
  2024-04-05 21:49       ` Junio C Hamano
  3 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2024-04-05 20:04 UTC (permalink / raw
  To: git; +Cc: Daniel Stenberg, Junio C Hamano

On Tue, Apr 02, 2024 at 04:02:54PM -0400, Jeff King wrote:

> Ultimately the issue will be fixed by moving to a different version of
> libcurl, but here's an easy workaround in the meantime, with a small doc
> cleanup I found along the way.
> 
>   [1/2]: http: reset POSTFIELDSIZE when clearing curl handle
>   [2/2]: INSTALL: bump libcurl version to 7.21.3

Here's a final patch on top that gives us a workaround for the HTTP/2
half of this. It's a cleaned-up version of what I showed earlier.

-- >8 --
Subject: remote-curl: add Transfer-Encoding header only for older curl

As of curl 7.66.0, we don't need to manually specify a "chunked"
Transfer-Encoding header. Instead, modern curl deduces the need for it
in a POST that has a POSTFIELDSIZE of -1 and uses READFUNCTION rather
than POSTFIELDS.

That version is recent enough that we can't just drop the header; we
need to do so conditionally. Since it's only a single line, it seems
like the simplest thing would just be to keep setting it unconditionally
(after all, the #ifdefs are much longer than the actual code). But
there's another wrinkle: HTTP/2.

Curl may choose to use HTTP/2 under the hood if the server supports it.
And in that protocol, we do not use the chunked encoding for streaming
at all. Most versions of curl handle this just fine by recognizing and
removing the header. But there's a regression in curl 8.7.0 and 8.7.1
where it doesn't, and large requests over HTTP/2 are broken (which t5559
notices). That regression has since been fixed upstream, but not yet
released.

Make the setting of this header conditional, which will let Git work
even with those buggy curl versions. And as a bonus, it serves as a
reminder that we can eventually clean up the code as we bump the
supported curl versions.

Signed-off-by: Jeff King <peff@peff.net>
---
Manually tested against curl 8.7.0 (where skipping the header fixes
t5559), as well as 7.65.0 (which continues to pass t5551, but would fail
if we didn't set the header).

 git-curl-compat.h | 9 +++++++++
 remote-curl.c     | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/git-curl-compat.h b/git-curl-compat.h
index fd96b3cdff..e1d0bdd273 100644
--- a/git-curl-compat.h
+++ b/git-curl-compat.h
@@ -126,6 +126,15 @@
 #define GIT_CURL_HAVE_CURLSSLSET_NO_BACKENDS
 #endif
 
+/**
+ * Versions before curl 7.66.0 (September 2019) required manually setting the
+ * transfer-encoding for a streaming POST; after that this is handled
+ * automatically.
+ */
+#if LIBCURL_VERSION_NUM < 0x074200
+#define GIT_CURL_NEED_TRANSFER_ENCODING_HEADER
+#endif
+
 /**
  * CURLOPT_PROTOCOLS_STR and CURLOPT_REDIR_PROTOCOLS_STR were added in 7.85.0,
  * released in August 2022.
diff --git a/remote-curl.c b/remote-curl.c
index 31b02b8840..0b6d7815fd 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1,4 +1,5 @@
 #include "git-compat-util.h"
+#include "git-curl-compat.h"
 #include "config.h"
 #include "environment.h"
 #include "gettext.h"
@@ -955,7 +956,9 @@ static int post_rpc(struct rpc_state *rpc, int stateless_connect, int flush_rece
 		/* The request body is large and the size cannot be predicted.
 		 * We must use chunked encoding to send it.
 		 */
+#ifdef GIT_CURL_NEED_TRANSFER_ENCODING_HEADER
 		headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
+#endif
 		rpc->initial_buffer = 1;
 		curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
 		curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc);
-- 
2.44.0.872.gbe303efee2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl
  2024-04-05 20:04     ` [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl Jeff King
@ 2024-04-05 21:30       ` Daniel Stenberg
  2024-04-05 21:49       ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Stenberg @ 2024-04-05 21:30 UTC (permalink / raw
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, 5 Apr 2024, Jeff King wrote:

> Here's a final patch on top that gives us a workaround for the HTTP/2
> half of this. It's a cleaned-up version of what I showed earlier.

+1 from me.

(and appologies for this regression, but you know...)

-- 

  / daniel.haxx.se

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl
  2024-04-05 20:04     ` [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl Jeff King
  2024-04-05 21:30       ` Daniel Stenberg
@ 2024-04-05 21:49       ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2024-04-05 21:49 UTC (permalink / raw
  To: Jeff King; +Cc: git, Daniel Stenberg

Jeff King <peff@peff.net> writes:

> ... And as a bonus, it serves as a
> reminder that we can eventually clean up the code as we bump the
> supported curl versions.

;-).

The way git-curl-compat.h lays out various features/workarounds and
the lowest version that needs them makes it very easy to do that.

Thanks, will queue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-04-05 21:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-30  0:02 tests broken with curl-8.7.0 Jeff King
2024-03-30  8:54 ` Daniel Stenberg
2024-04-02 20:02   ` [PATCH 0/2] git+curl 8.7.0 workaround Jeff King
2024-04-02 20:05     ` [PATCH 1/2] http: reset POSTFIELDSIZE when clearing curl handle Jeff King
2024-04-02 20:27       ` Junio C Hamano
2024-04-03  3:20       ` Jeff King
2024-04-03  6:30       ` Patrick Steinhardt
2024-04-03  6:34         ` Patrick Steinhardt
2024-04-03 20:18           ` Jeff King
2024-04-02 20:06     ` [PATCH 2/2] INSTALL: bump libcurl version to 7.21.3 Jeff King
2024-04-02 20:21     ` [PATCH 0/2] git+curl 8.7.0 workaround rsbecker
2024-04-02 20:31       ` Jeff King
2024-04-05 20:04     ` [PATCH 3/2] remote-curl: add Transfer-Encoding header only for older curl Jeff King
2024-04-05 21:30       ` Daniel Stenberg
2024-04-05 21:49       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).