Date | Commit message (Collapse) |
|
We don't need v2 features nor scalability to test POP3 stuff.
|
|
This fixes t/mda.t with git 1.8.5
|
|
Older versions of git lack --batch-all-objects, and 2.6+ is
new enough already since v2, lei, etc all depend on it.
|
|
CentOS 7.x ships with git 1.8.5, so unless a CentOS 7.x user
enables 3rd-party repos[1], they'll be stuck with a version
of git without `--stable' (though I'm becoming skeptical of
indexing patchids at all).
[1] https://public-inbox.org/meta/20210421151308.yz5hzkgm75klunpe@nitro.local/
|
|
Test::More distributed with Perl 5.16.3 on CentOS 7.x expects
the `$how_many' argument for `skip' and warns when its
uninitialized, so quiet that warning down.
|
|
But new ideas keep popping into muh brain :x
|
|
musl uses "I/O error" while glibc uses "Input/output error"
I wish something like strerrorname_np(3) were portable
and built into Perl so we could just match on /EIO/.
|
|
While it's not in a code path intended WwwCoderepo and RepoAtom,
those classes provide their own ->zflush, this can future-proof
our code against future subclasses at a minor performance cost.
|
|
Our read buffering only worked well with the stdout buffering on
glibc and *BSD libc, but not musl. When reading the stdout of
git(1), we are likely to get smaller buffers and require more
reads on musl-based systems (tested Alpine Linux 3.19.0).
Thus we must prevent ->translate from being called with an empty
argument list (denoting EOF). We'll also avoid some local
variable assignments while at it and favor the non-OO ->zflush
dispatch inside RepoAtom and WwwCoderepo subclasses.
|
|
My user home directory on Alpine has S_ISGID set on it and every
subdirectory inherits it. This includes my work tree and the
t/data-gen/* subdirectories. So just ignore the presence (or
non-presence) of the S_ISGID bit on directories descended from
the cached t/data-gen/* directories.
Now, public-inbox-convert may want to preserve S_ISGID on the
newly-created v2 inbox, but that's a separate discussion.
|
|
And use it in convert-compact.t This gives us nicer errors for
debugging a problem I noticed on Alpine Linux (tested 3.19.0)
|
|
Somewhat surprising that BSD::Resource hasn't been packaged for
Alpine, but otherwise pretty straightforward mapping with some
dependencies filled in manually.
|
|
This makes the C++ build work on Alpine Linux (tested 3.19.0)
without having to install g++ to get the `c++' executable.
I've tested this change with and without g++ on Alpine so it'll
continue to work if a user decides to install g++.
This should continue to work if the Xapian package on Alpine is
changed to link against libc++ instead of libstdc++, since we
only add `-lstdc++' as a fallback. For reference, Xapian is
already linked against libc++ and not libstdc++ on FreeBSD 13.x
|
|
We don't actually need Inline::C support to build a standalone
executable implemented in C++.
|
|
The musl strftime(3) implementation on AlpineLinux 3.19.0
doesn't support `%k' and `%k' isn't in POSIX, either. So we
fall back to using the `sprintf' perlop in the user-facing UI
since leading zeroes require needless overhead for my eyes and
brain to parse in the time.
|
|
`lei inspect' uses the `iso8601' sub from LeiOverview.
|
|
BusyBox lsof(1) ignores the `-p PID' argument and shows
the open files for every process it knows about. BusyBox
lsof also lacks the `NODE' column of the non-BusyBox
implementation, so we'll rely on /proc/PID/fd/ in those
cases since the deleted file checks are Linux-only and
it's common to have procfs is mounted on /proc on Linux.
|
|
While join(1) is POSIX, busybox on Alpine 3.19.0 does not
provide its functionality. So just skip tests for now since
it's too much trouble to provide a workaround for an otherwise
common POSIX command.
|
|
Alpine Linux ships git-http-backend in the `git-daemon'
package separately from `git', so we must test for its
existence before attempting to test functionality which
depends on it.
|
|
There are many Linux (GNU or otherwise) which do not have
strace(1) installed.
|
|
Our pure-Perl (PublicInbox::AddressPP) fallback is closer to the
preferred Email::Address::XS (EAX) behavior than Mail::Address
is for ->name support. EAX tends to be overkill with good spam
filtering, and using our own fallback means life is easier for
users with neither C/XS build tools nor a pre-built EAX package.
|
|
Post-image blob OIDs are what solver already works with, and
longer OIDs may not be available in historical mail archives.
`patchid' turns out to be unsuitable since:
1) git's default diff algorithm has changed over time
2) users may use different diff options to improve readability
Of course, we could eventually run `lei rediff' during the index
phase to regenerate patchids, but that's out-of-scope for now
and likely to be too expensive.
|
|
This will allow us to use p2q-compatible specifications such as
"dfpost7" to only capture blob OIDs which are 7 characters in
length (the indexer will always index down to 7 characters)
|
|
A quick build check can detect bugs more quickly normal runtime
tests.
|
|
While chdir simplifies path manipulation on our end, its use
falls over when PERL5LIB/@INC contains relative paths which need
to be made absolute. It's fewer lines of code to get eliminate
chdir usage than it is to keep using relative paths in most
places.
|
|
Most xap_terms callers do not benefit from the hashref
return value, and we can delay hashmap use until
List::Util::uniqstr if needed.
|
|
Xapian has always sorted termlist iterators, so we now:
1) break out of the iterator loop early on non-matches
2) avoid doing sorting ourselves
As a result, we'll also favor the wantarray forms of xap_terms
and all_terms to preserve sort order in most cases.
Confirmed by the Xapian maintainer: <20231201184844.GO4059@survex.com>
Link: https://lists.xapian.org/pipermail/xapian-discuss/2023-December/010013.html
|
|
As of SpamAssassin 4.0.0, spamc(1) corrupts messages with NUL in
the body when the `--headers' switch is used. This increases
transport costs, but most spamc/spamd setups are via local
sockets, so it's unlikely to be significant.
Link: https://bugs.debian.org/1057749
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
|
|
There's no need to recurse and trigger deep recursion warnings
when we hit a coderepo with a known hash (SHA-1 vs SHA-256).
Noticed while pruning the 1200+ repos on a git.kernel.org
mirror.
|
|
Our code aims to respect $ENV{PWD} (and therefore symlinks) as
much as possible to ensure portability across devices when repos
and indices are on portable or shared storage. Thus we can't
rely on Cwd::abs_path and ought to favor File::Spec->rel2abs
whenever absolute paths are required.
I noticed this when working on a VM where my worktree is a
symlink to a more reliable device.
|
|
This future proofs the index against git auto-abbreviation
needing more characters as the repo grows. It'll be useful for
joining against inboxes using dfpre.
As with emails, we'll continue indexing abbreviated blob OIDs
down to 7 hex characters so a SHA-1 git repo will have all
abbreviations of the OID from 7-39 hex characters in addition
to the 40 character unabbreviated form.
|
|
Oddly, Perl did not warn about this. Spotted while confirming
abbreviated OIDs are also indexed when unabbreviated OIDs
appear.
|
|
It looks like DragonFly inherited this from FreeBSD to
allow us to save us some syscalls.
|
|
I forgot to set TMPDIR=/path/to/non-tmpfs again.
|
|
I mixed up "flush" with "close" :x
Fixes: 87b7f633f241 (xap_helper: implement mset endpoint for WWW, IMAP, etc...)
|
|
By ignoring SIGPIPE, we hit our own error path and emit an informative
error message instead of dying abruptly and requiring somebody to run
`echo $?' to see the child status from their shell.
|
|
Kyle Meyer <kyle@kyleam.com> wrote:
> Eric Wong writes:
> > +Treat the name of the public inbox as it's unqualified URL when
>
> s/it's/its/
Thanks, will push this fix out:
-------8<------
Subject: [PATCH] doc: config: fix grammar for nameIsUrl
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87bkbazp5g.fsf@kyleam.com/
|
|
As with mail search, a cindex may be updated while WWW is
serving requests. Thus we must reopen the Xapian DB when
the revision we're using becomes stale.
|
|
We no longer vivify the intermediate $ibx->{-hide} hashref,
instead we use $ibx->{-hide_$KEY} directly. This avoids
an intermediate hashref and extra hash table lookups.
|
|
This is a convenient (and slightly memory-saving) alternative to
specifying a `publicinbox.*.url' entry for every single inbox
when using publicinbox.wwwListing.
|
|
For inboxes associated with an extindex (currently only the
special "all") one, we can share the git process across
all those inboxes unambiguously when retrieving full SHA-1
blobs.
The comment for my proposed patch is also out-of-date as that
git speedup has been a part of git since 2.33.
|
|
We no longer trigger git cleanups from the Inbox package since
`git cat-file' users have their own cleanup to support git
coderepos not associated with any inbox.
This change means we unconditionally expire SQLite and Xapian
FDs and some internal caches regardless of git activity. The
old logic was irrelevant to Gcf2 (libgit2) users anyways since
we couldn't determine whether or not an inbox was active based
on {inflight} git requests, and upcoming changes will make it
inaccurate for all extindex/cindex users as well.
Opening SQLite and Xapian DBs is fairly cheap; so it's a small
price to pay to reduce memory use and fragmentation.
|
|
This brings a no-op -cindex scan of a git.kernel.org mirror
down from 70s to 10s with a hot cache on a busy machine.
CPU-intensive SHA-256 fingerprinting of the `git show-ref'
result can be parallelized on shard workers. Future changes can
move more of the initial scan setup phase into shard workers for
more parallelism.
But most of the performance for skipping unchanged repos is
gained from delaying the commit time reading until we've seen
the fingerprint is out-of-date, since reading commit times
requires a large amount of I/O compared to only reading refs
for fingerprints.
|
|
When setting up stdin for commands, the write_file API is
convenient enough nowadays to not be worth having special
support with process spawning.
When reading stdout of commands, we should probably be using
utf8_maybe everywhere since there'll always be legacy encodings
in git repos.
Reading regular files with :utf8 also results in worse memory
management since the file size cannot be used as a hint.
|
|
We no longer fork after cidx_init, so there's no need to spend
CPU cycles on the getpid() syscall, especially since it's no
longer cached on glibc while syscalls are also more expensive
these days due to CPU vulnerability mitigations.
|
|
It saves some code in case we keep libgit2 around.
|
|
This will allow WWW to use a combined LeiALE-like
thing to reduce git processes.
|
|
This fixes the case where we're running both SHA-256 and SHA-1.
There's no tests for SHA-256, yet, but the bug is pretty obvious
upon reading the code.
|
|
We only use it as a boolean flag, and there's no need to waste
space for common, non-error cases.
|
|
Explicitly drop support for "\n" in git coderepo pathnames as
we do other stuff. Gcf2 (our libgit2 helper) was always
broken with "\n" in pathnames, and I'm not sure if cgit config
files work with them, either. Dealing with newline characters
requires extra complexity that I'm not willing to deal with when
managing alternates files.
|