Date | Commit message (Collapse) |
|
getpid() isn't cached by glibc nowadays and system calls are
more expensive due to CPU vulnerability mitigations. To
ensure we switch to the new semantics properly, introduce
a new `on_destroy' function to simplify callers.
Furthermore, most OnDestroy correctness is often tied to the
process which creates it, so make the new API default to
guarded against running in subprocesses.
For cases which require running in all children, a new
PublicInbox::OnDestroy::all call is provided.
|
|
We can rely on Process::IO->attached_pid and work towards
simplifying popen_rd.
|
|
This ensures we handle RNG reseeding and resetting the event
loop properly in child processes after forking.
|
|
remove_tree from File::Path 2.09 (from Perl 5.16.3 on CentOS 7.x)
doesn't seem to work properly on File::Temp objects. Since
File::Temp->newdir sets CLEANUP=>1 by default anyways, we'll
just rely on that to perform cleanup instead of doing it ourselves.
|
|
xcpdb is necessary for upgrading Xapian backends (e.g. glass to
honey), thus codesearch indices (cindex) must be supported.
Resharding is also useful if CPU count is altered on system
upgrades or downgrades.
cindex Xapian sharding is completely different than anything
else we do, so the resharding operation must be a special case
based on existing cindex sharding rules.
|
|
This is helpful if compacting multiple
inboxes/extindices/cindices sequentially from the CLI.
|
|
This is much easier to support than xcpdb since it's 1:1 and
doesn't follow a different sharding scheme than the inboxes and
extindices.
|
|
This likely fixes indexlevel preservation for some v2 on some
systems, too, since (apparently) we need to sort shards
numerically to get Xapian metadata working properly on a
combined (multi-shard) Xapian DB.
|
|
This lets us get rid of some awkwardness around the old API
and single-use subroutines while saving us some LoC.
|
|
btrfs is Linux-only at the moment (and likely to remain that way
for practical purposes). So rely on Linux ABI stability and use
the `syscall' and `ioctl' perlops rather than relying on Inline::C.
Inline::C (and gcc||clang) are monstrous dependencies which we
can't expect users to have.
This makes supporting new architectures more difficult, but new
architectures come along rarely and this reduces the burden for
the majority of Linux users on popular architectures (while
still avoiding the distribution of pre-built binaries).
Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
|
|
It's possible for the rename() sequence to cause read-only
daemons using ->xdb_shards_flat to load an incomplete set of
contiguous shards and get invalid docids for search results.
With this change, we favor the case where search is momentarily
unavailable rather than giving wrong results during the small
window where Xapcmd->commit_changes runs.
|
|
"Correct" meaning the permissions match that of the parent
xap15 or ei15 directory.
|
|
This fixes the occasional t/lei-sigpipe.t infinite loop
under "make check-run".
Link: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
|
|
None of our code elsewhere accounts for non-*nix pathnames and
it's not worth our time to start. So stop wasting CPU cycles
giving the illusion that we'd care about non-*nix pathnames.
|
|
Xapian bindings may not be installed or be out-of-date w.r.t. the
Perl version, improve the visibility of errors in those cases.
Cleanup and drop some redundant checks while we're at it.
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>
Link: https://public-inbox.org/meta/87k0ky5mbd.fsf@toke.dk/
|
|
Since extindex uses Xapian shards in a similar way to
v2 inboxes, we'll support -xcpdb (reshard+upgrade) and
-compact all the same to give admins tuning+upgrade
options.
|
|
File::Temp only requires four 'X' characters (unlike mkstemp(3),
which requires six). So only so only give it 4 to avoid an
80-column violation and maybe save metadata space on FSes.
|
|
Make some notes about sub usage, this may be converted
to use workqueues once the cmsg dependency is dropped.
|
|
This prevents name conflicts leading to retries and slowdowns in
temporary file name generation. No actual data corruption
resulted because all temporary files are opened with O_EXCL
anyways.
This may increase security for IMAP, NNTP, and HTTPS sessions
using TLS, but it's all public data anyways.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
This simplifies all ->with_umask callers and opens the
door for further optimizations to delay/elide process spawning.
|
|
Perl readdir detects list context and can return an array
suitable for the grep op. From there, we can rely on
substr to remove the ".git" suffix and integerize the value
to save a few bytes before letting List::Util::max return
the value.
This is how we detect Xapian shards nowadays, too, and
we'll also use defined-or (//) to simplify the return
value there.
We'll also simplify InboxWritable->git_dir_latest,
remove some callers, and consider removing it entirely.
|
|
We'll be using per-sync-state {ibx} refs instead, so make parts
of the v2 indexing code less-dependent on $self->{ibx} where
$self is a V2Writable object.
|
|
`->connect' is confused with the perlfunc for the `connect(2)'
syscall, and also `DBI->connect'. Since SQLite doesn't use
sockets, the word "connect" needlessly confuses me. Give
it a short name to match the field name we use for it, which
also matches the variable name used by the DBI(3pm) and
DBD::SQLite(3pm) manpages.
|
|
No need to localize it, here, since we can just refer to it
in the `$opt' hashref. Hopefully this improves readability
for others like it does for me.
I sometimes wonder if the concept of a stack in high-level
languages is even necessary...
|
|
--sequential-shard also disables the copy parallelism (--jobs),
so it can be useful for systems unable to handle parallel random
I/O but still want many shards.
There was a missing "use strict", too, which is fixed.
|
|
In case there's unbalanced shards AND we're limiting parallelism
while using many shards, spawn the next task in the queue ASAP
once a task is done, instead of waiting for all tasks to finish
before spawning the next batch.
Unbalanced shards probably isn't a big issue for most users;
however many smaller shards with few jobs can be useful for HDD
users to reduce the effect of random writes.
|
|
We don't need to fully-qualify when referring to subs in
the same namespace, nor do we need make a SCALAR ref only
to dereference it
(Yes, still learning Perl :x)
|
|
-index now invokes ->DESTROY like xcpdb does, which is necessary
to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit
with the expected values for various signals by adding 128
as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html>
-xcpdb now terminates worker processes and xapian-compact(1)
invocations when prematurely killed, too.
|
|
fileno(DIRHANDLE) only works on Perl 5.22+, so we need to use
dirfd(3) ourselves from Inline::C (or rely on chattr(1) being
installed).
While we're at it, rename `set_nodatacow' to `nodatacow_fd'
for consistency with `nodatacow_dir'.
|
|
We'll continue supporting `--no-sync' even if its yet-to-make it
it into a release, but the term `sync' is overloaded in our
codebase which may be confusing to new hackers and users.
None of our our code nor dependencies issue the sync(2) syscall,
either, only fsync(2) and fdatasync(2).
|
|
We replaced Xtmpdir with File::Temp->newdir in
commit 2a3e3a0469f54f6a4f80bf04614e5ddd794a6c5e
("xapcmd: replace Xtmpdirs with File::Temp->newdir")
but forgot to remove the outdated comment.
|
|
We already "use" it starting with commit
cd8dd7b08fddc7c2b5f218c3fcaa5dca5f9ad945
("search: support SWIG-generated Xapian.pm"),
so there's no need to require it redundantly.
|
|
I find myself mindlessly adding "-c" to public-inbox-index,
and other users may do the same. Instead of erroring out,
we'll just silently ignore it, for now and allow
public-inbox-compact to work on SQLite-only inboxes.
We'll only check for xapian-compact if search exists, since
it won't be needed in case we support SQLite VACUUM.
|
|
This gives an opportunity for users already suffering from CoW
fragmentation to at least get the Xapian DBs off CoW. Aside
from over.sqlite3 in v1, the SQLite DBs remain untouched; though
VACUUM support may come in the future.
|
|
And -compact supports --jobs=0 like -index to disable parallel
execution. Running three xapian-compact processes in parallel
on a USB 2.0 HDD is pretty painful.
|
|
This allows us to speed up indexing operations to SQLite
and Xapian.
Unfortunately, it doesn't affect operations using
`xapian-compact' and the compactor API, since that doesn't seem
to support Xapian::DB_NO_SYNC, yet.
|
|
This was a bug, but I'm not sure where it matters, yet, but it
may matter in the future.
|
|
While it makes the code flow slightly less well in some places,
it saves us runtime allocations and indentation.
|
|
We must not trigger wakeups on InboxIdle users until after we've
renamed all files into place. Otherwise, the InboxIdle caller
may just reopen the old (soon-to-be-unlinked) file.
This fixes occasional test failures in t/nntpd.t
Fixes: f977826a17f8735e ("lock: reduce inotify wakeups")
|
|
We can reduce the amount of platform-specific code by always
relying on IN_MODIFY/NOTE_WRITE notifications from lock release.
This reduces the number of times our read-only daemons will
need to wake up when -watch sees no-op message changes
(e.g. replied, seen, recent flag changes).
|
|
Otherwise, the waitpid(-1, 0) call in Xapcmd::process_queue()
may reap it in a subsequent test when using t/run.perl to reuse
processes for testing.
While we're at it, make Xapcmd::process_queue warn about unknown
PIDs in case other PIDs leak through to us in the future.
|
|
It's more convenient to specify `-c' / `--compact' on the
command-line when reindexing than it is to invoke
public-inbox-compact(1) separately.
This is especially convenient in low-space situations when
public-inbox-index is operating on multiple inboxes
sequentially, as compaction can happen immediately after
indexing each inbox, instead of waiting until all inboxes are
indexed.
|
|
I didn't wait until September to do it, this year!
|
|
The old lock scope was only sufficient for protecting against
concurrent modifications from the common -mda, -watch, or -learn
writers.
It was not sufficient for protecting against parallel -compact
or -xcpdb invocations from eager admins. Most of the time this
only leads to confusing and misleading warning messages, but
parallel xcpdb --reshard could lead to errors.
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
public-inbox-compact wrapper displays progress by default,
anyways, and there's not a lot of output, so simplify our
code by using popen_rd instead of spawn + optional pipe.
While we're at it use "while (<HANDLE>)" to display
progress as it happens, since "foreach (<$HANDLE>)"
slurps the contents into an array, first.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
Xapian upstream is slowly phasing out the XS-based Search::Xapian
in favor of the SWIG-generated "Xapian" package. While Debian and
both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian"
binding.
More information about the status of the "Xapian" Perl module here:
https://trac.xapian.org/ticket/523
|