All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag
Date: Tue, 30 Apr 2024 06:01:45 -0400	[thread overview]
Message-ID: <20240430100145.GB1279403@coredump.intra.peff.net> (raw)
In-Reply-To: <ZjB5bHiF5kAcRMpP@tanuki>

On Tue, Apr 30, 2024 at 06:54:04AM +0200, Patrick Steinhardt wrote:

> On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
> > Before operating on a refname we get from a user, we usually check that
> > it's syntactically valid. As a general rule, refs should be in the
> > "refs/" namespace, the exception being HEAD and pseudorefs like
> > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> > dash. But the syntactic rules are not enforced by check_refname_format().
> 
> s/dash/underscore, right?

Yep, thanks.

> > Making things even more complicated, refname_is_safe() does enforce
> > these syntax restrictions! When that function was added in 2014, we
> > would have refused to work with such refs entirely. But we stopped being
> > so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> > refs, 2015-04-16). That rationale there is that check_refname_format()
> > is supposed to contain a superset of the checks of refname_is_safe().
> > The idea being that we usually would rely on the more-strict
> > check_refname_format(), but for certain operations (e.g., deleting a
> > ref) we want to allow invalid names as long as they are not unsafe
> > (e.g., not escaping the on-disk "refs/" hierarchy).
> 
> I still think we should eventually merge these functions. It's not
> exactly obvious why one would use one or the other. So if we had a
> function with strict default behaviour, where the caller can ask for
> some loosening of the behaviour via flags, then I think it would become
> a ton easier to do the right thing.
> 
> In any case, that doesn't need to be part of this patch series.

Yeah, I think it would be fine to merge them with a flag. With the one
exception that started this topic (and which I think could go away after
this series), every caller of refname_is_safe() only does so in
conjunction with check_refname_format() to check "well, is it at least
safe?".

You will have to tweak the return value somehow to indicate "ok" versus
"bad but safe" versus "unsafe" (e.g., resolving sets REF_BAD_NAME but
continues for the middle one). I think I'd prefer to leave that out of
this series (and after this, I think it becomes easier as a pure
refactoring since the behavior remains the same).

> > The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> > unfortunately has made it into the hands of users via "git
> > check-ref-format --allow-onelevel". So I think it is there to stay.
> > Possibly we should expose this new feature as --fully-qualified or
> > similar.
> 
> Hm, that's really too bad. I wonder whether we should eventually start
> to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
> would continue to accept the flag, but remove it from our documentation
> such that scripts start to move over. Then some day, we may replace
> `ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
> in the root directory while honoring `is_pseudoref_syntax()`.

I don't know if we could ever get rid of --allow-onelevel. If you want
to check a branch name, say, the replacement for it is to ask about
"refs/heads/$name". But sometimes you don't actually know how the short
name is going to be used, but you want to make sure it's syntactically
valid. E.g., validating a refspec may involve a name like "main" on its
own. I suspect it would be OK in practice to just give it an arbitrary
"refs/foo/$main", but that feels kind of hacky.

-Peff

> > @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> >  {
> >  	int component_len, component_count = 0;
> >  
> > +	if ((flags & REFNAME_FULLY_QUALIFIED)) {
> > +		const char *bare_ref;
> > +
> > +		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> > +		if (!starts_with(bare_ref, "refs/") &&
> > +		    !is_pseudoref_syntax(bare_ref))
> > +			return -1;
> > +	}
> > +
> >  	if (!strcmp(refname, "@")) {
> >  		/* Refname is a single character '@'. */
> >  		if (sanitized)
> > @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> >  		else
> >  			return -1;
> >  	}
> > -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> > +
> > +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> > +	    component_count < 2)
> >  		return -1; /* Refname has only one component. */
> > +
> 
> I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
> here because the above should already handle it. But we can of course
> have a single component, only, when the ref is "refs/".

I hadn't really considered that case. The reason we have to handle
FULLY_QUALIFIED here is that without it, "FETCH_HEAD" (or for that
matter "HEAD") is forbidden as having only a single component. The
earlier hunk only rejects bad things, so we still end up in this code.

I think that "refs/" is forbidden both before and after my patch because
it's invalid to have a zero-length component (so "foo//bar" is
forbidden, but so is "foo/" because of the empty component on the end).a

-Peff

  reply	other threads:[~2024-04-30 10:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29  8:33 [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag Jeff King
2024-04-29 16:01 ` Karthik Nayak
2024-04-30  9:47   ` Jeff King
2024-04-30 10:05     ` Patrick Steinhardt
2024-04-30  4:54 ` Patrick Steinhardt
2024-04-30 10:01   ` Jeff King [this message]
2024-04-30 10:09     ` Patrick Steinhardt
2024-04-30 17:00       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240430100145.GB1279403@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.