All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] builtin: implement, document and test url-parse
@ 2024-04-28 22:30 Matheus Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Matheus Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira

Git commands accept a wide variety of URLs syntaxes, not just standard URLs.
This can make parsing git URLs difficult since standard URL parsers cannot
be used. Even if an external parser were implemented, it would have to track
git's development closely in case support for any new URL schemes are added.

These patches introduce a new url-parse builtin command that exposes git's
native URL parsing algorithms as a plumbing command, allowing other programs
to then call upon git itself to parse the git URLs and their components.

This should be quite useful for scripts. For example, a script might want to
add remotes to repositories, naming them according to the domain name where
the repository is hosted. This new builtin allows it to parse the git URL
and extract its host name which can then be used as input for other
operations. This would be difficult to implement otherwise due to git's
support for scp style URLs.

Signed-off-by: Matheus Afonso Martins Moreira matheus@matheusmoreira.com

Matheus Afonso Martins Moreira (13):
  url: move helper function to URL header and source
  urlmatch: define url_parse function
  builtin: create url-parse command
  url-parse: add URL parsing helper function
  url-parse: enumerate possible URL components
  url-parse: define component extraction helper fn
  url-parse: define string to component converter fn
  url-parse: define usage and options
  url-parse: parse options given on the command line
  url-parse: validate all given git URLs
  url-parse: output URL components selected by user
  Documentation: describe the url-parse builtin
  tests: add tests for the new url-parse builtin

 .gitignore                      |   1 +
 Documentation/git-url-parse.txt |  59 ++++++++++
 Makefile                        |   1 +
 builtin.h                       |   1 +
 builtin/url-parse.c             | 132 ++++++++++++++++++++++
 command-list.txt                |   1 +
 connect.c                       |   8 --
 connect.h                       |   1 -
 git.c                           |   1 +
 remote.c                        |   1 +
 t/t9904-url-parse.sh            | 194 ++++++++++++++++++++++++++++++++
 url.c                           |   8 ++
 url.h                           |   2 +
 urlmatch.c                      |  90 +++++++++++++++
 urlmatch.h                      |   1 +
 15 files changed, 492 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/git-url-parse.txt
 create mode 100644 builtin/url-parse.c
 create mode 100755 t/t9904-url-parse.sh


base-commit: e326e520101dcf43a0499c3adc2df7eca30add2d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1715%2Fmatheusmoreira%2Furl-parse-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1715/matheusmoreira/url-parse-v1
Pull-Request: https://github.com/git/git/pull/1715
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/13] url: move helper function to URL header and source
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

It will be used in more places so it should be placed in url.h.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 connect.c | 8 --------
 connect.h | 1 -
 remote.c  | 1 +
 url.c     | 8 ++++++++
 url.h     | 2 ++
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/connect.c b/connect.c
index 0d77737a536..0cd9439501b 100644
--- a/connect.c
+++ b/connect.c
@@ -693,14 +693,6 @@ enum protocol {
 	PROTO_GIT
 };
 
-int url_is_local_not_ssh(const char *url)
-{
-	const char *colon = strchr(url, ':');
-	const char *slash = strchr(url, '/');
-	return !colon || (slash && slash < colon) ||
-		(has_dos_drive_prefix(url) && is_valid_path(url));
-}
-
 static const char *prot_name(enum protocol protocol)
 {
 	switch (protocol) {
diff --git a/connect.h b/connect.h
index 1645126c17f..8d84f6656b1 100644
--- a/connect.h
+++ b/connect.h
@@ -13,7 +13,6 @@ int git_connection_is_socket(struct child_process *conn);
 int server_supports(const char *feature);
 int parse_feature_request(const char *features, const char *feature);
 const char *server_feature_value(const char *feature, size_t *len_ret);
-int url_is_local_not_ssh(const char *url);
 
 struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
diff --git a/remote.c b/remote.c
index 2b650b813b7..2425dbc4660 100644
--- a/remote.c
+++ b/remote.c
@@ -5,6 +5,7 @@
 #include "gettext.h"
 #include "hex.h"
 #include "remote.h"
+#include "url.h"
 #include "urlmatch.h"
 #include "refs.h"
 #include "refspec.h"
diff --git a/url.c b/url.c
index 282b12495ae..c36818c3037 100644
--- a/url.c
+++ b/url.c
@@ -119,3 +119,11 @@ void str_end_url_with_slash(const char *url, char **dest)
 	free(*dest);
 	*dest = strbuf_detach(&buf, NULL);
 }
+
+int url_is_local_not_ssh(const char *url)
+{
+	const char *colon = strchr(url, ':');
+	const char *slash = strchr(url, '/');
+	return !colon || (slash && slash < colon) ||
+		(has_dos_drive_prefix(url) && is_valid_path(url));
+}
diff --git a/url.h b/url.h
index 2a27c342776..867d3af6691 100644
--- a/url.h
+++ b/url.h
@@ -21,4 +21,6 @@ char *url_decode_parameter_value(const char **query);
 void end_url_with_slash(struct strbuf *buf, const char *url);
 void str_end_url_with_slash(const char *url, char **dest);
 
+int url_is_local_not_ssh(const char *url);
+
 #endif /* URL_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/13] urlmatch: define url_parse function
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-05-01 22:18   ` Ghanshyam Thakkar
  2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Define general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
Has the same interface as the URL normalization function
and uses the same data structures, facilitating its use.
It's adapted from the algorithm used to process URLs in connect.c,
so it should support the same inputs.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 urlmatch.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 urlmatch.h |  1 +
 2 files changed, 91 insertions(+)

diff --git a/urlmatch.c b/urlmatch.c
index 1d0254abacb..5a442e31fa2 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -3,6 +3,7 @@
 #include "hex-ll.h"
 #include "strbuf.h"
 #include "urlmatch.h"
+#include "url.h"
 
 #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
 #define URL_DIGIT "0123456789"
@@ -438,6 +439,95 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return url_normalize_1(url, out_info, 0);
 }
 
+enum protocol {
+	PROTO_UNKNOWN = 0,
+	PROTO_LOCAL,
+	PROTO_FILE,
+	PROTO_SSH,
+	PROTO_GIT,
+};
+
+static enum protocol url_get_protocol(const char *name, size_t n)
+{
+	if (!strncmp(name, "ssh", n))
+		return PROTO_SSH;
+	if (!strncmp(name, "git", n))
+		return PROTO_GIT;
+	if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
+		return PROTO_SSH;
+	if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
+		return PROTO_SSH;
+	if (!strncmp(name, "file", n))
+		return PROTO_FILE;
+	return PROTO_UNKNOWN;
+}
+
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+	struct strbuf url;
+	char *host, *separator;
+	char *detached, *normalized;
+	enum protocol protocol = PROTO_LOCAL;
+	struct url_info local_info;
+	struct url_info *info = out_info? out_info : &local_info;
+	bool scp_syntax = false;
+
+	if (is_url(url_orig)) {
+		url_orig = url_decode(url_orig);
+	} else {
+		url_orig = xstrdup(url_orig);
+	}
+
+	strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
+	strbuf_addstr(&url, url_orig);
+
+	host = strstr(url.buf, "://");
+	if (host) {
+		protocol = url_get_protocol(url.buf, host - url.buf);
+		host += 3;
+	} else {
+		if (!url_is_local_not_ssh(url.buf)) {
+			scp_syntax = true;
+			protocol = PROTO_SSH;
+			strbuf_insertstr(&url, 0, "ssh://");
+			host = url.buf + 6;
+		}
+	}
+
+	/* path starts after ':' in scp style SSH URLs */
+	if (scp_syntax) {
+		separator = strchr(host, ':');
+		if (separator) {
+			if (separator[1] == '/')
+				strbuf_remove(&url, separator - url.buf, 1);
+			else
+				*separator = '/';
+		}
+	}
+
+	detached = strbuf_detach(&url, NULL);
+	normalized = url_normalize(detached, info);
+	free(detached);
+
+	if (!normalized) {
+		return NULL;
+	}
+
+	/* point path to ~ for URL's like this:
+	 *
+	 *     ssh://host.xz/~user/repo
+	 *     git://host.xz/~user/repo
+	 *     host.xz:~user/repo
+	 *
+	 */
+	if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
+		if (normalized[info->path_off + 1] == '~')
+			info->path_off++;
+	}
+
+	return normalized;
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea139..6b3ce428582 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -35,6 +35,7 @@ struct url_info {
 };
 
 char *url_normalize(const char *, struct url_info *);
+char *url_parse(const char *, struct url_info *);
 
 struct urlmatch_item {
 	size_t hostmatch_len;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/13] builtin: create url-parse command
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Git commands can accept a rather wide variety of URLs syntaxes.
The range of accepted inputs might expand even more in the future.
This makes the parsing of URL components difficult since standard URL
parsers cannot be used. Extracting the components of a git URL would
require implementing all the schemes that git itself supports, not to
mention tracking its development continuously in case new URL schemes
are added.

The url-parse builtin command is designed to solve this problem
by exposing git's native URL parsing facilities as a plumbing command.
Other programs can then call upon git itself to parse the git URLs and
extract their components. This should be quite useful for scripts.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 .gitignore          |  1 +
 Makefile            |  1 +
 builtin.h           |  1 +
 builtin/url-parse.c | 18 ++++++++++++++++++
 command-list.txt    |  1 +
 git.c               |  1 +
 6 files changed, 23 insertions(+)
 create mode 100644 builtin/url-parse.c

diff --git a/.gitignore b/.gitignore
index 612c0f6a0ff..4f8dde600a5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -174,6 +174,7 @@
 /git-update-server-info
 /git-upload-archive
 /git-upload-pack
+/git-url-parse
 /git-var
 /git-verify-commit
 /git-verify-pack
diff --git a/Makefile b/Makefile
index 1e31acc72ec..b6054b5c1f4 100644
--- a/Makefile
+++ b/Makefile
@@ -1326,6 +1326,7 @@ BUILTIN_OBJS += builtin/update-ref.o
 BUILTIN_OBJS += builtin/update-server-info.o
 BUILTIN_OBJS += builtin/upload-archive.o
 BUILTIN_OBJS += builtin/upload-pack.o
+BUILTIN_OBJS += builtin/url-parse.o
 BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 28280636da8..e8858808943 100644
--- a/builtin.h
+++ b/builtin.h
@@ -240,6 +240,7 @@ int cmd_update_server_info(int argc, const char **argv, const char *prefix);
 int cmd_upload_archive(int argc, const char **argv, const char *prefix);
 int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix);
 int cmd_upload_pack(int argc, const char **argv, const char *prefix);
+int cmd_url_parse(int argc, const char **argv, const char *prefix);
 int cmd_var(int argc, const char **argv, const char *prefix);
 int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 int cmd_verify_tag(int argc, const char **argv, const char *prefix);
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
new file mode 100644
index 00000000000..994ccec4b2e
--- /dev/null
+++ b/builtin/url-parse.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * url-parse - parses git URLs and extracts their components
+ *
+ * Copyright © 2024 Matheus Afonso Martins Moreira
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2.
+ */
+
+#include "builtin.h"
+#include "gettext.h"
+
+int cmd_url_parse(int argc, const char **argv, const char *prefix)
+{
+	return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index c4cd0f352b8..6d89b6c4dc6 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -196,6 +196,7 @@ git-update-ref                          plumbingmanipulators
 git-update-server-info                  synchingrepositories
 git-upload-archive                      synchelpers
 git-upload-pack                         synchelpers
+git-url-parse                           plumbinginterrogators
 git-var                                 plumbinginterrogators
 git-verify-commit                       ancillaryinterrogators
 git-verify-pack                         plumbinginterrogators
diff --git a/git.c b/git.c
index 654d615a188..7aac812d9d4 100644
--- a/git.c
+++ b/git.c
@@ -625,6 +625,7 @@ static struct cmd_struct commands[] = {
 	{ "upload-archive", cmd_upload_archive, NO_PARSEOPT },
 	{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
 	{ "upload-pack", cmd_upload_pack },
+	{ "url-parse", cmd_url_parse, NO_PARSEOPT },
 	{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "verify-commit", cmd_verify_commit, RUN_SETUP },
 	{ "verify-pack", cmd_verify_pack },
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/13] url-parse: add URL parsing helper function
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (2 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

This function either successfully parses an URL
or dies with an error message. Since this is a
plumbing command, the error message is not translated.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 994ccec4b2e..933e63aaa0a 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -11,6 +11,16 @@
 
 #include "builtin.h"
 #include "gettext.h"
+#include "urlmatch.h"
+
+static void parse_or_die(const char *url, struct url_info *info)
+{
+	if (url_parse(url, info)) {
+		return;
+	} else {
+		die("invalid git URL '%s', %s", url, info->err);
+	}
+}
 
 int cmd_url_parse(int argc, const char **argv, const char *prefix)
 {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/13] url-parse: enumerate possible URL components
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (3 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Create an enumeration containing all possible git URL components
which may be selected by the user. The URL_NONE component is used
when the user did not request the parsing of any component.
In this case, the command will return successfully if the URL parses.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 933e63aaa0a..d250338422e 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -13,6 +13,16 @@
 #include "gettext.h"
 #include "urlmatch.h"
 
+enum url_component {
+	URL_NONE = 0,
+	URL_PROTOCOL,
+	URL_USER,
+	URL_PASSWORD,
+	URL_HOST,
+	URL_PORT,
+	URL_PATH,
+};
+
 static void parse_or_die(const char *url, struct url_info *info)
 {
 	if (url_parse(url, info)) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/13] url-parse: define component extraction helper fn
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (4 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

The extract function returns a newly allocated string
whose contents are the specified git URL component.
The string must be freed later.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index d250338422e..b8ac46dcdeb 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -32,6 +32,42 @@ static void parse_or_die(const char *url, struct url_info *info)
 	}
 }
 
+static char *extract(enum url_component component, struct url_info *info)
+{
+	size_t offset, length;
+
+	switch (component) {
+	case URL_PROTOCOL:
+		offset = 0;
+		length = info->scheme_len;
+		break;
+	case URL_USER:
+		offset = info->user_off;
+		length = info->user_len;
+		break;
+	case URL_PASSWORD:
+		offset = info->passwd_off;
+		length = info->passwd_len;
+		break;
+	case URL_HOST:
+		offset = info->host_off;
+		length = info->host_len;
+		break;
+	case URL_PORT:
+		offset = info->port_off;
+		length = info->port_len;
+		break;
+	case URL_PATH:
+		offset = info->path_off;
+		length = info->path_len;
+		break;
+	case URL_NONE:
+		return NULL;
+	}
+
+	return xstrndup(info->url + offset, length);
+}
+
 int cmd_url_parse(int argc, const char **argv, const char *prefix)
 {
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/13] url-parse: define string to component converter fn
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (5 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Converts a git URL component name to its corresponding
enumeration value so that it can be conveniently used
internally by the url-parse command.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index b8ac46dcdeb..15923460a78 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -32,6 +32,23 @@ static void parse_or_die(const char *url, struct url_info *info)
 	}
 }
 
+static enum url_component get_component_or_die(const char *arg)
+{
+	if (!strcmp("path", arg))
+		return URL_PATH;
+	if (!strcmp("host", arg))
+		return URL_HOST;
+	if (!strcmp("protocol", arg))
+		return URL_PROTOCOL;
+	if (!strcmp("user", arg))
+		return URL_USER;
+	if (!strcmp("password", arg))
+		return URL_PASSWORD;
+	if (!strcmp("port", arg))
+		return URL_PORT;
+	die("invalid git URL component '%s'", arg);
+}
+
 static char *extract(enum url_component component, struct url_info *info)
 {
 	size_t offset, length;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/13] url-parse: define usage and options
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (6 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Create the data structures expected by the git option parser.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 15923460a78..c6095b37ede 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -11,8 +11,22 @@
 
 #include "builtin.h"
 #include "gettext.h"
+#include "parse-options.h"
 #include "urlmatch.h"
 
+static const char * const builtin_url_parse_usage[] = {
+	N_("git url-parse [<options>] [--] <url>..."),
+	NULL
+};
+
+static char *component_arg = NULL;
+
+static struct option builtin_url_parse_options[] = {
+	OPT_STRING('c', "component", &component_arg, "<component>", \
+		N_("which URL component to extract")),
+	OPT_END(),
+};
+
 enum url_component {
 	URL_NONE = 0,
 	URL_PROTOCOL,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/13] url-parse: parse options given on the command line
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (7 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Prepare to handle input by parsing the command line options
and removing them from the arguments vector.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index c6095b37ede..03030035b4f 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -101,5 +101,10 @@ static char *extract(enum url_component component, struct url_info *info)
 
 int cmd_url_parse(int argc, const char **argv, const char *prefix)
 {
+	argc = parse_options(argc, argv, prefix,
+		builtin_url_parse_options,
+		builtin_url_parse_usage,
+		0);
+
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/13] url-parse: validate all given git URLs
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (8 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Parse all the git URLs given as input on the command line.
Die if an URL cannot be parsed.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 03030035b4f..ab996eadf38 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -101,10 +101,18 @@ static char *extract(enum url_component component, struct url_info *info)
 
 int cmd_url_parse(int argc, const char **argv, const char *prefix)
 {
+	struct url_info info;
+	int i;
+
 	argc = parse_options(argc, argv, prefix,
 		builtin_url_parse_options,
 		builtin_url_parse_usage,
 		0);
 
+	for (i = 0; i < argc; ++i) {
+		parse_or_die(argv[i], &info);
+		free(info.url);
+	}
+
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/13] url-parse: output URL components selected by user
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (9 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Parse the specified git URL component from each of the given git URLs
and print them to standard output, one per line.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 builtin/url-parse.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index ab996eadf38..6c1a8676bad 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -102,6 +102,8 @@ static char *extract(enum url_component component, struct url_info *info)
 int cmd_url_parse(int argc, const char **argv, const char *prefix)
 {
 	struct url_info info;
+	enum url_component selected = URL_NONE;
+	char *extracted;
 	int i;
 
 	argc = parse_options(argc, argv, prefix,
@@ -109,8 +111,20 @@ int cmd_url_parse(int argc, const char **argv, const char *prefix)
 		builtin_url_parse_usage,
 		0);
 
+	if (component_arg)
+		selected = get_component_or_die(component_arg);
+
 	for (i = 0; i < argc; ++i) {
 		parse_or_die(argv[i], &info);
+
+		if (selected != URL_NONE) {
+			extracted = extract(selected, &info);
+			if (extracted) {
+				puts(extracted);
+				free(extracted);
+			}
+		}
+
 		free(info.url);
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/13] Documentation: describe the url-parse builtin
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (10 preceding siblings ...)
  2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:31 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-30  7:37   ` Ghanshyam Thakkar
  2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
  13 siblings, 1 reply; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:31 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

The new url-parse builtin validates git URLs
and optionally extracts their components.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 Documentation/git-url-parse.txt | 59 +++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 Documentation/git-url-parse.txt

diff --git a/Documentation/git-url-parse.txt b/Documentation/git-url-parse.txt
new file mode 100644
index 00000000000..bfbbad6c033
--- /dev/null
+++ b/Documentation/git-url-parse.txt
@@ -0,0 +1,59 @@
+git-url-parse(1)
+================
+
+NAME
+----
+git-url-parse - Parse and extract git URL components
+
+SYNOPSIS
+--------
+[verse]
+'git url-parse' [<options>] [--] <url>...
+
+DESCRIPTION
+-----------
+
+Git supports many ways to specify URLs, some of them non-standard.
+For example, git supports the scp style [user@]host:[path] format.
+This command eases interoperability with git URLs by enabling the
+parsing and extraction of the components of all git URLs.
+
+OPTIONS
+-------
+
+-c <arg>::
+--component <arg>::
+	Extract the `<arg>` component from the given git URLs.
+	`<arg>` can be one of:
+	`protocol`, `user`, `password`, `host`, `port`, `path`.
+
+EXAMPLES
+--------
+
+* Print the host name:
++
+------------
+$ git url-parse --component host https://example.com/user/repo
+example.com
+------------
+
+* Print the path:
++
+------------
+$ git url-parse --component path https://example.com/user/repo
+/usr/repo
+$ git url-parse --component path example.com:~user/repo
+~user/repo
+$ git url-parse --component path example.com:user/repo
+/user/repo
+------------
+
+* Validate URLs without outputting anything:
++
+------------
+$ git url-parse https://example.com/user/repo example.com:~user/repo
+------------
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/13] tests: add tests for the new url-parse builtin
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (11 preceding siblings ...)
  2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:31 ` Matheus Afonso Martins Moreira via GitGitGadget
  2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
  13 siblings, 0 replies; 20+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:31 UTC (permalink / raw
  To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>

Test git URL parsing, validation and component extraction
on all documented git URL schemes and syntaxes.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
 t/t9904-url-parse.sh | 194 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100755 t/t9904-url-parse.sh

diff --git a/t/t9904-url-parse.sh b/t/t9904-url-parse.sh
new file mode 100755
index 00000000000..f147f00591c
--- /dev/null
+++ b/t/t9904-url-parse.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+#
+# Copyright © 2024 Matheus Afonso Martins Moreira
+#
+
+test_description='git url-parse tests'
+
+. ./test-lib.sh
+
+test_expect_success 'git url-parse -- ssh syntax' '
+	git url-parse "ssh://user@example.com:1234/repository/path" &&
+	git url-parse "ssh://user@example.com/repository/path" &&
+	git url-parse "ssh://example.com:1234/repository/path" &&
+	git url-parse "ssh://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- git syntax' '
+	git url-parse "git://example.com:1234/repository/path" &&
+	git url-parse "git://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- http syntax' '
+	git url-parse "https://example.com:1234/repository/path" &&
+	git url-parse "https://example.com/repository/path" &&
+	git url-parse "http://example.com:1234/repository/path" &&
+	git url-parse "http://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- scp syntax' '
+	git url-parse "user@example.com:/repository/path" &&
+	git url-parse "example.com:/repository/path"
+'
+
+test_expect_success 'git url-parse -- username expansion - ssh syntax' '
+	git url-parse "ssh://user@example.com:1234/~user/repository" &&
+	git url-parse "ssh://user@example.com/~user/repository" &&
+	git url-parse "ssh://example.com:1234/~user/repository" &&
+	git url-parse "ssh://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - git syntax' '
+	git url-parse "git://example.com:1234/~user/repository" &&
+	git url-parse "git://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - scp syntax' '
+	git url-parse "user@example.com:~user/repository" &&
+	git url-parse "example.com:~user/repository"
+'
+
+test_expect_success 'git url-parse -- file urls' '
+	git url-parse "file:///repository/path" &&
+	git url-parse "file:///" &&
+	git url-parse "file://"
+'
+
+test_expect_success 'git url-parse -c protocol -- ssh syntax' '
+	test ssh = "$(git url-parse -c protocol "ssh://user@example.com:1234/repository/path")" &&
+	test ssh = "$(git url-parse -c protocol "ssh://user@example.com/repository/path")" &&
+	test ssh = "$(git url-parse -c protocol "ssh://example.com:1234/repository/path")" &&
+	test ssh = "$(git url-parse -c protocol "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c protocol -- git syntax' '
+	test git = "$(git url-parse -c protocol "git://example.com:1234/repository/path")" &&
+	test git = "$(git url-parse -c protocol "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c protocol -- http syntax' '
+	test https = "$(git url-parse -c protocol "https://example.com:1234/repository/path")" &&
+	test https = "$(git url-parse -c protocol "https://example.com/repository/path")" &&
+	test http = "$(git url-parse -c protocol "http://example.com:1234/repository/path")" &&
+	test http = "$(git url-parse -c protocol "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c protocol -- scp syntax' '
+	test ssh = "$(git url-parse -c protocol "user@example.com:/repository/path")" &&
+	test ssh = "$(git url-parse -c protocol "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- ssh syntax' '
+	test user = "$(git url-parse -c user "ssh://user@example.com:1234/repository/path")" &&
+	test user = "$(git url-parse -c user "ssh://user@example.com/repository/path")" &&
+	test "" = "$(git url-parse -c user "ssh://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c user "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- git syntax' '
+	test "" = "$(git url-parse -c user "git://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c user "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- http syntax' '
+	test "" = "$(git url-parse -c user "https://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c user "https://example.com/repository/path")" &&
+	test "" = "$(git url-parse -c user "http://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c user "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- scp syntax' '
+	test user = "$(git url-parse -c user "user@example.com:/repository/path")" &&
+	test "" = "$(git url-parse -c user "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- ssh syntax' '
+	test example.com = "$(git url-parse -c host "ssh://user@example.com:1234/repository/path")" &&
+	test example.com = "$(git url-parse -c host "ssh://user@example.com/repository/path")" &&
+	test example.com = "$(git url-parse -c host "ssh://example.com:1234/repository/path")" &&
+	test example.com = "$(git url-parse -c host "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- git syntax' '
+	test example.com = "$(git url-parse -c host "git://example.com:1234/repository/path")" &&
+	test example.com = "$(git url-parse -c host "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- http syntax' '
+	test example.com = "$(git url-parse -c host "https://example.com:1234/repository/path")" &&
+	test example.com = "$(git url-parse -c host "https://example.com/repository/path")" &&
+	test example.com = "$(git url-parse -c host "http://example.com:1234/repository/path")" &&
+	test example.com = "$(git url-parse -c host "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- scp syntax' '
+	test example.com = "$(git url-parse -c host "user@example.com:/repository/path")" &&
+	test example.com = "$(git url-parse -c host "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- ssh syntax' '
+	test 1234 = "$(git url-parse -c port "ssh://user@example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c port "ssh://user@example.com/repository/path")" &&
+	test 1234 = "$(git url-parse -c port "ssh://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c port "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- git syntax' '
+	test 1234 = "$(git url-parse -c port "git://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c port "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- http syntax' '
+	test 1234 = "$(git url-parse -c port "https://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c port "https://example.com/repository/path")" &&
+	test 1234 = "$(git url-parse -c port "http://example.com:1234/repository/path")" &&
+	test "" = "$(git url-parse -c port "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- scp syntax' '
+	test "" = "$(git url-parse -c port "user@example.com:/repository/path")" &&
+	test "" = "$(git url-parse -c port "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- ssh syntax' '
+	test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com:1234/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "ssh://example.com:1234/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- git syntax' '
+	test "/repository/path" = "$(git url-parse -c path "git://example.com:1234/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- http syntax' '
+	test "/repository/path" = "$(git url-parse -c path "https://example.com:1234/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "https://example.com/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "http://example.com:1234/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- scp syntax' '
+	test "/repository/path" = "$(git url-parse -c path "user@example.com:/repository/path")" &&
+	test "/repository/path" = "$(git url-parse -c path "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - ssh syntax' '
+	test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com:1234/~user/repository")" &&
+	test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com/~user/repository")" &&
+	test "~user/repository" = "$(git url-parse -c path "ssh://example.com:1234/~user/repository")" &&
+	test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - git syntax' '
+	test "~user/repository" = "$(git url-parse -c path "git://example.com:1234/~user/repository")" &&
+	test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - scp syntax' '
+	test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository")" &&
+	test "~user/repository" = "$(git url-parse -c path "example.com:~user/repository")"
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 00/13] builtin: implement, document and test url-parse
  2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
                   ` (12 preceding siblings ...)
  2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-29 20:53 ` Torsten Bögershausen
  2024-04-29 22:04   ` Reply to community feedback Matheus Afonso Martins Moreira
  13 siblings, 1 reply; 20+ messages in thread
From: Torsten Bögershausen @ 2024-04-29 20:53 UTC (permalink / raw
  To: Matheus Moreira via GitGitGadget; +Cc: git, Matheus Moreira

On Sun, Apr 28, 2024 at 10:30:48PM +0000, Matheus Moreira via GitGitGadget wrote:
> Git commands accept a wide variety of URLs syntaxes, not just standard URLs.
> This can make parsing git URLs difficult since standard URL parsers cannot
> be used. Even if an external parser were implemented, it would have to track
> git's development closely in case support for any new URL schemes are added.
>
> These patches introduce a new url-parse builtin command that exposes git's
> native URL parsing algorithms as a plumbing command, allowing other programs
> to then call upon git itself to parse the git URLs and their components.
>
> This should be quite useful for scripts. For example, a script might want to
> add remotes to repositories, naming them according to the domain name where
> the repository is hosted. This new builtin allows it to parse the git URL
> and extract its host name which can then be used as input for other
> operations. This would be difficult to implement otherwise due to git's
> support for scp style URLs.
>

All in all, having a URL parser as such is a good thing, thanks for working
on that.

There are, however, some notes and questions, up for discussion:

- are there any plans to integrate the parser into connect.c and fetch ?
  Speaking as a person, who manage to break the parsing of URLs once,
  with the good intention to improve things, I need to learn that
  test cases are important.
  Some work can be seen in t5601-clone.sh
  Especially, when dealing with literal IPv6 addresses, the ones with []
  and the simplified ssh syntax 'myhost:src' are interesting to test.
  Git itself strives to be RFC compliant when parsing URLs, but
  we do not fully guarantee to be "fully certified".
  And some features using the [] syntax to embedd a port number
  inside the simplified ssh syntax had not been documented,
  but used in practise, and are now part of the test suite.
  See "[myhost:123]:src" in t5601

- Or is this new tool just a helper, to verify "good" URL's,
  and not accepting our legacy parser quirks ?
  Then we still should see some IPv6 tests ?
  Or may be not, as we prefer hostnames these days ?

- One minor comment:
  in 02/13 we read:
        +enum protocol {
        +       PROTO_UNKNOWN = 0,
        +       PROTO_LOCAL,
        +       PROTO_FILE,
        +       PROTO_SSH,
        +       PROTO_GIT,
  The RFC 1738 uses the term "scheme" here, and using the very generic
  term "protocol" may lead to name clashes later.
  Would something like "git_scheme" or so be better ?

- One minor comment:
   In 13/13 we read:
        +       git url-parse "file:///" &&
        +       git url-parse "file://"

  I think that the "///" version is superflous, it should already
  be covered by the "//" version


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Reply to community feedback
  2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
@ 2024-04-29 22:04   ` Matheus Afonso Martins Moreira
  2024-04-30  6:51     ` Torsten Bögershausen
  0 siblings, 1 reply; 20+ messages in thread
From: Matheus Afonso Martins Moreira @ 2024-04-29 22:04 UTC (permalink / raw
  To: tboegi; +Cc: git

Thank you for your feedback.

> are there any plans to integrate the parser into connect.c and fetch ?

Yes.

That was my intention but I was not confident enough to touch connect.c
before getting feedback from the community, since it's critical code
and it is my first contribution.

I do want to merge all URL parsing in git into this one function though,
thereby creating a "single point of truth". This is so that if the algorithm
is modified the changes are visible to the URL parser builtin as well.

> Speaking as a person, who manage to break the parsing of URLs once,
> with the good intention to improve things, I need to learn that
> test cases are important.

Absolutely agree.

When adding test cases, I looked at the possibilities enumerated in urls.txt
and generated test cases based on those. I also looked at the urlmatch.h
test cases. However...

> Some work can be seen in t5601-clone.sh

... I did not think to check those.

> Especially, when dealing with literal IPv6 addresses,
> the ones with [] and the simplified ssh syntax 'myhost:src'
> are interesting to test.

You're right about that. I shall prepare an updated v2 patchset
with more test cases, and also any other changes/improvements
requested by maintainers.

> And some features using the [] syntax to embedd a port number
> inside the simplified ssh syntax had not been documented,
> but used in practise, and are now part of the test suite.
> See "[myhost:123]:src" in t5601

Indeed, I did not read anything of the sort when I checked it.
Would you like me to commit a note to this effect to urls.txt ?

> Or is this new tool just a helper, to verify "good" URL's,
> and not accepting our legacy parser quirks ?

It is my intention that this builtin be able to accept, parse
and decompose all types of URLs that git itself can accept.

> Then we still should see some IPv6 tests ?

I will add them!

> Or may be not, as we prefer hostnames these days ?

I would have to defer that choice to someone more experienced
with the codebase. Please advise on how to proceed.

> The RFC 1738 uses the term "scheme" here, and using the very generic
> term "protocol" may lead to name clashes later.
> Would something like "git_scheme" or so be better ?

Scheme does seem like a better word if it's the terminology used by RFCs.
I can change that in a new version if necessary.
That code is based on the existing connect.c parsing code though.

> I think that the "///" version is superflous, it should already
> be covered by the "//" version

I thought it was a good idea because of existing precedent:
my first approach to creating the test cases was to copy the
ones from t0110-urlmatch-normalization.sh which did have many
cases such as those. Then as I developed the code I came to
believe that it was not necessary: I call url_normalize
in the url_parse function and url_normalize is already being
tested. I think I just forgot to delete those lines.

Reading that file over once again, it does have IPv6 address
test cases. So I should probably go over it again.

Thanks again for the feedback,

  Matheus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Reply to community feedback
  2024-04-29 22:04   ` Reply to community feedback Matheus Afonso Martins Moreira
@ 2024-04-30  6:51     ` Torsten Bögershausen
  0 siblings, 0 replies; 20+ messages in thread
From: Torsten Bögershausen @ 2024-04-30  6:51 UTC (permalink / raw
  To: Matheus Afonso Martins Moreira; +Cc: git

On Mon, Apr 29, 2024 at 07:04:40PM -0300, Matheus Afonso Martins Moreira wrote:

> Thank you for your feedback.
>
> > are there any plans to integrate the parser into connect.c and fetch ?
>
> Yes.
>
> That was my intention but I was not confident enough to touch connect.c
> before getting feedback from the community, since it's critical code
> and it is my first contribution.

Welcome to the Git community.

I wasn't aware of t0110 as a test case...

>
> I do want to merge all URL parsing in git into this one function though,
> thereby creating a "single point of truth". This is so that if the algorithm
> is modified the changes are visible to the URL parser builtin as well.
>

That is a good thing to do. Be prepared for a longer journey, since we have
this legacy stuff to deal with. But I am happy to help with reviews, even
if that may take some days,

[]

> When adding test cases, I looked at the possibilities enumerated in urls.txt
> and generated test cases based on those. I also looked at the urlmatch.h
> test cases. However...
>
> > Some work can be seen in t5601-clone.sh
>
> ... I did not think to check those.
>
> > Especially, when dealing with literal IPv6 addresses,
> > the ones with [] and the simplified ssh syntax 'myhost:src'
> > are interesting to test.
>
> You're right about that. I shall prepare an updated v2 patchset
> with more test cases, and also any other changes/improvements
> requested by maintainers.
>
> > And some features using the [] syntax to embedd a port number
> > inside the simplified ssh syntax had not been documented,
> > but used in practise, and are now part of the test suite.
> > See "[myhost:123]:src" in t5601
>
> Indeed, I did not read anything of the sort when I checked it.
> Would you like me to commit a note to this effect to urls.txt ?

On short: please not.
This kind of syntax was never ment to be used.
The official "ssh://myhost:123/src" is recommended.
When IPv6 parsing was added, people discovered that it could be
used to "protect" the ':' from being a seperator between the hostname
and the path, and can be used to seperate the hostname from the port.
Once that was used in real live, it was too late to change it.
If we now get a better debug tool, it could mention that this is
a legacy feature, and recommend the longer "ssh://" syntax.

>
> > Or is this new tool just a helper, to verify "good" URL's,
> > and not accepting our legacy parser quirks ?
>
> It is my intention that this builtin be able to accept, parse
> and decompose all types of URLs that git itself can accept.
>
> > Then we still should see some IPv6 tests ?
>
> I will add them!
>
> > Or may be not, as we prefer hostnames these days ?
>
> I would have to defer that choice to someone more experienced
> with the codebase. Please advise on how to proceed.

Re-reading this email conversation,
I think that we should support (in the future),
what we support today.
Having a new parser tool means, that there is a chance to reject
those URLs with the note/hint, that they are depracted, and should
be replaced by a proper one.
From my point of view this means that all existing test case should pass
even with the new parser, as a general approach.
Deprecating things is hard, may take years, and may be done in a seperate
task/patch series. Or may be part of this one, in seperate commits.

>
> > The RFC 1738 uses the term "scheme" here, and using the very generic
> > term "protocol" may lead to name clashes later.
> > Would something like "git_scheme" or so be better ?
>
> Scheme does seem like a better word if it's the terminology used by RFCs.
> I can change that in a new version if necessary.
> That code is based on the existing connect.c parsing code though.
>
> > I think that the "///" version is superflous, it should already
> > be covered by the "//" version
>
> I thought it was a good idea because of existing precedent:
> my first approach to creating the test cases was to copy the
> ones from t0110-urlmatch-normalization.sh which did have many
> cases such as those. Then as I developed the code I came to
> believe that it was not necessary: I call url_normalize
> in the url_parse function and url_normalize is already being
> tested. I think I just forgot to delete those lines.
>
> Reading that file over once again, it does have IPv6 address
> test cases. So I should probably go over it again.
>
> Thanks again for the feedback,
>
>   Matheus
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 12/13] Documentation: describe the url-parse builtin
  2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-30  7:37   ` Ghanshyam Thakkar
  0 siblings, 0 replies; 20+ messages in thread
From: Ghanshyam Thakkar @ 2024-04-30  7:37 UTC (permalink / raw
  To: Matheus Afonso Martins Moreira via GitGitGadget
  Cc: git, Matheus Moreira, Matheus Afonso Martins Moreira

On Sun, 28 Apr 2024, Matheus Afonso Martins Moreira via GitGitGadget <gitgitgadget@gmail.com> wrote:
> +* Print the path:
> ++
> +------------
> +$ git url-parse --component path https://example.com/user/repo
> +/usr/repo

s/usr/user/

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/13] urlmatch: define url_parse function
  2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-05-01 22:18   ` Ghanshyam Thakkar
  2024-05-02  4:02     ` Torsten Bögershausen
  0 siblings, 1 reply; 20+ messages in thread
From: Ghanshyam Thakkar @ 2024-05-01 22:18 UTC (permalink / raw
  To: Matheus Afonso Martins Moreira via GitGitGadget
  Cc: git, Matheus Moreira, Matheus Afonso Martins Moreira

On Sun, 28 Apr 2024, Matheus Afonso Martins Moreira via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
> 
> Define general parsing function that supports all Git URLs
> including scp style URLs such as hostname:~user/repo.
> Has the same interface as the URL normalization function
> and uses the same data structures, facilitating its use.
> It's adapted from the algorithm used to process URLs in connect.c,
> so it should support the same inputs.
> 
> Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
> ---
>  urlmatch.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  urlmatch.h |  1 +
>  2 files changed, 91 insertions(+)
> 
> diff --git a/urlmatch.c b/urlmatch.c
> index 1d0254abacb..5a442e31fa2 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -3,6 +3,7 @@
>  #include "hex-ll.h"
>  #include "strbuf.h"
>  #include "urlmatch.h"
> +#include "url.h"
>  
>  #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
>  #define URL_DIGIT "0123456789"
> @@ -438,6 +439,95 @@ char *url_normalize(const char *url, struct url_info *out_info)
>  	return url_normalize_1(url, out_info, 0);
>  }
>  
> +enum protocol {
> +	PROTO_UNKNOWN = 0,
> +	PROTO_LOCAL,
> +	PROTO_FILE,
> +	PROTO_SSH,
> +	PROTO_GIT,
> +};
> +
> +static enum protocol url_get_protocol(const char *name, size_t n)
> +{
> +	if (!strncmp(name, "ssh", n))
> +		return PROTO_SSH;
> +	if (!strncmp(name, "git", n))
> +		return PROTO_GIT;
> +	if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
> +		return PROTO_SSH;
> +	if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
> +		return PROTO_SSH;
> +	if (!strncmp(name, "file", n))
> +		return PROTO_FILE;
> +	return PROTO_UNKNOWN;
> +}
> +
> +char *url_parse(const char *url_orig, struct url_info *out_info)
> +{
> +	struct strbuf url;
> +	char *host, *separator;
> +	char *detached, *normalized;
> +	enum protocol protocol = PROTO_LOCAL;
> +	struct url_info local_info;
> +	struct url_info *info = out_info? out_info : &local_info;
> +	bool scp_syntax = false;
> +
> +	if (is_url(url_orig)) {
> +		url_orig = url_decode(url_orig);
> +	} else {
> +		url_orig = xstrdup(url_orig);
> +	}
> +
> +	strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
> +	strbuf_addstr(&url, url_orig);
> +
> +	host = strstr(url.buf, "://");
> +	if (host) {
> +		protocol = url_get_protocol(url.buf, host - url.buf);
> +		host += 3;
> +	} else {
> +		if (!url_is_local_not_ssh(url.buf)) {
> +			scp_syntax = true;
> +			protocol = PROTO_SSH;
> +			strbuf_insertstr(&url, 0, "ssh://");
> +			host = url.buf + 6;
> +		}
> +	}
Interesting. 

    `
    $ ./git url-parse -c protocol file:/test/test
    ssh
    `

seems like only having a single slash after the 'protocol:' prints
'ssh' always (I think this may not even be a valid url). After this 'else'
block, the url turns into 'ssh://file/test/test'. Will examine the details
later. Not that it's your code's doing, and rather the result of
url_is_local_not_ssh(). But just wanted to point this out and ask if this
should error out or is this an intended behavior that I can't figure out. 

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/13] urlmatch: define url_parse function
  2024-05-01 22:18   ` Ghanshyam Thakkar
@ 2024-05-02  4:02     ` Torsten Bögershausen
  0 siblings, 0 replies; 20+ messages in thread
From: Torsten Bögershausen @ 2024-05-02  4:02 UTC (permalink / raw
  To: Ghanshyam Thakkar
  Cc: Matheus Afonso Martins Moreira via GitGitGadget, git,
	Matheus Moreira, Matheus Afonso Martins Moreira

[]
> Interesting.
>
>     `
>     $ ./git url-parse -c protocol file:/test/test
>     ssh
>     `
>
> seems like only having a single slash after the 'protocol:' prints
> 'ssh' always (I think this may not even be a valid url). After this 'else'
> block, the url turns into 'ssh://file/test/test'. Will examine the details
> later. Not that it's your code's doing, and rather the result of
> url_is_local_not_ssh(). But just wanted to point this out and ask if this
> should error out or is this an intended behavior that I can't figure out.

ssh is the correct answer, try something like

`git clone localhost:/home/myself/project/git.git`

It is the scp syntax, supported by Git as well.
From `man scp`

    scp copies files between hosts on a network.
    []
    The source and target may be specified as a local pathname,
    a remote host with optional path in the form
    [user@]host:[path],
    or a URI in the form scp://[user@]host[:port][/path].
    Local file names can be made explicit using absolute or relative pathnames
    to avoid scp treating file names containing ‘:’ as host specifiers.

So yes, they share similar problems
with the ':' that could mean different things when using the short form.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-05-02  4:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2024-05-01 22:18   ` Ghanshyam Thakkar
2024-05-02  4:02     ` Torsten Bögershausen
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30  7:37   ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2024-04-29 22:04   ` Reply to community feedback Matheus Afonso Martins Moreira
2024-04-30  6:51     ` Torsten Bögershausen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.