On Tue, May 09, 2023 at 04:55:55PM -0400, Taylor Blau wrote: > On Tue, May 09, 2023 at 05:15:43PM +0200, Patrick Steinhardt wrote: > > On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote: > > > > > Note that this optimization is only possible when none of the excluded > > > pattern(s) have special meta-characters in them. To see why this is the > > > case, consider the exclusion pattern "refs/foo[a]". In general, in order > > > to find the location of the first record that matches this pattern, we > > > could only consider up to the first meta-character, "refs/foo". But this > > > doesn't work, since the excluded region we'd come up with would include > > > "refs/foobar", even though it is not excluded. > > > > Is this generally true though? A naive implementation would iterate > > through all references and find the first reference that matches the > > exclusion regular exepression. From thereon we continue to iterate until > > we find the first entry that doesn't match. This may cause us to end up > > with a suboptimal skip list, but the skip list would still be valid. > > > > As I said, this implementation would be naive as we're now forced to > > iterate through all references starting at the beginning. I assume that > > your implementation will instead use a binary search to locate the first > > entry that matches the exclusion pattern and the last pattern. But the > > way this paragraph is formulated makes it sound like this is a general > > fact, even though it is a fact that derives from the implementation. > > > > I of course don't propose to change the algorithm here, but instead to > > clarify where this restriction actually comes from and why the tradeoff > > makes sense. > > In the example you include, it's possible. But consider something like: > > $ git for-each-ref --exclude='refs/foo[ac]' > > The region that matches that expression ("refs/fooa", "refs/fooc" and > everything underneath them) does not have to appear as a continuous > single region in the packed-refs file. If you have, say, "refs/foobar", > that will appear between the two regions you want to exclude. > > So I think you *might* be able to do it in general, but at the very > least it would involve splitting each character class and finding the > start and end of any region(s) that it matches. > > Even so, you'd have to try and match each entry as you determine the > width of the excluded region, at which point you're at par with > enumerating them anyway and having the caller discard any entries it > doesn't want. Alternatively you could also do this on a best-effort basis and only find the first matching region. But anyway, as said: I'm fine with the limitations but think that we should document better where they come from. The current commit message sounds like the limitation is of general nature even though it is in fact a conciously-chosen tradeoff that allows us to make the implementation more efficient for most cases. Patrick