about summary refs log tree commit homepage
path: root/lib/PublicInbox/Xapcmd.pm
DateCommit message (Collapse)
2024-04-03treewide: avoid getpid() for OnDestroy checks
getpid() isn't cached by glibc nowadays and system calls are more expensive due to CPU vulnerability mitigations. To ensure we switch to the new semantics properly, introduce a new `on_destroy' function to simplify callers. Furthermore, most OnDestroy correctness is often tied to the process which creates it, so make the new API default to guarded against running in subprocesses. For cases which require running in all children, a new PublicInbox::OnDestroy::all call is provided.
2023-11-09xapcmd: get rid of scalar wantarray popen_rd
We can rely on Process::IO->attached_pid and work towards simplifying popen_rd.
2023-10-18ds: introduce and use do_fork helper
This ensures we handle RNG reseeding and resetting the event loop properly in child processes after forking.
2023-06-09xapcmd: rely on File::Temp cleanup for temporary dir
remove_tree from File::Path 2.09 (from Perl 5.16.3 on CentOS 7.x) doesn't seem to work properly on File::Temp objects. Since File::Temp->newdir sets CLEANUP=>1 by default anyways, we'll just rely on that to perform cleanup instead of doing it ourselves.
2023-05-04xcpdb: support cindex upgrades and resharding
xcpdb is necessary for upgrading Xapian backends (e.g. glass to honey), thus codesearch indices (cindex) must be supported. Resharding is also useful if CPU count is altered on system upgrades or downgrades. cindex Xapian sharding is completely different than anything else we do, so the resharding operation must be a special case based on existing cindex sharding rules.
2023-05-04compact+xcpdb: ux: include basename(*dir) in progress
This is helpful if compacting multiple inboxes/extindices/cindices sequentially from the CLI.
2023-05-03compact: support codesearch indices
This is much easier to support than xcpdb since it's 1:1 and doesn't follow a different sharding scheme than the inboxes and extindices.
2023-04-26xcpdb: preserve indexlevel for extindex
This likely fixes indexlevel preservation for some v2 on some systems, too, since (apparently) we need to sort shards numerically to get Xapian metadata working properly on a combined (multi-shard) Xapian DB.
2023-04-07umask: rely on the OnDestroy-based call where applicable
This lets us get rid of some awkwardness around the old API and single-use subroutines while saving us some LoC.
2022-01-31rewrite Linux nodatacow use in pure Perl w/o system
btrfs is Linux-only at the moment (and likely to remain that way for practical purposes). So rely on Linux ABI stability and use the `syscall' and `ioctl' perlops rather than relying on Inline::C. Inline::C (and gcc||clang) are monstrous dependencies which we can't expect users to have. This makes supporting new architectures more difficult, but new architectures come along rarely and this reduces the burden for the majority of Linux users on popular architectures (while still avoiding the distribution of pre-built binaries). Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
2021-09-23xcpdb: avoid race when shards are added
It's possible for the rename() sequence to cause read-only daemons using ->xdb_shards_flat to load an incomplete set of contiguous shards and get invalid docids for search results. With this change, we favor the case where search is momentarily unavailable rather than giving wrong results during the small window where Xapcmd->commit_changes runs.
2021-09-23xcpdb: -R$SHARDS creates new shards with correct perms
"Correct" meaning the permissions match that of the parent xap15 or ei15 directory.
2021-09-22treewide: fix %SIG localization, harder
This fixes the occasional t/lei-sigpipe.t infinite loop under "make check-run". Link: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-08searchidx: die on Xapian load errors
Xapian bindings may not be installed or be out-of-date w.r.t. the Perl version, improve the visibility of errors in those cases. Cleanup and drop some redundant checks while we're at it. Cc: "Toke Høiland-Jørgensen" <toke@toke.dk> Link: https://public-inbox.org/meta/87k0ky5mbd.fsf@toke.dk/
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-03-28treewide: shorten temporary filename
File::Temp only requires four 'X' characters (unlike mkstemp(3), which requires six). So only so only give it 4 to avoid an 80-column violation and maybe save metadata space on FSes.
2021-02-07xapcmd: avoid potential die surprise in children
Make some notes about sub usage, this may be converted to use workqueues once the cmsg dependency is dropped.
2021-01-24treewide: reseed RNG in child processes
This prevents name conflicts leading to retries and slowdowns in temporary file name generation. No actual data corruption resulted because all temporary files are opened with O_EXCL anyways. This may increase security for IMAP, NNTP, and HTTPS sessions using TLS, but it's all public data anyways.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-25inboxwritable: delay umask_prepare calls
This simplifies all ->with_umask callers and opens the door for further optimizations to delay/elide process spawning.
2020-12-17inbox: simplify v2 epoch counting
Perl readdir detects list context and can return an array suitable for the grep op. From there, we can rely on substr to remove the ".git" suffix and integerize the value to save a few bytes before letting List::Util::max return the value. This is how we detect Xapian shards nowadays, too, and we'll also use defined-or (//) to simplify the return value there. We'll also simplify InboxWritable->git_dir_latest, remove some callers, and consider removing it entirely.
2020-11-07v2: some changes for ExtSearchIdx compatibility
We'll be using per-sync-state {ibx} refs instead, so make parts of the v2 indexing code less-dependent on $self->{ibx} where $self is a V2Writable object.
2020-08-27over: rename ->connect method to ->dbh
`->connect' is confused with the perlfunc for the `connect(2)' syscall, and also `DBI->connect'. Since SQLite doesn't use sockets, the word "connect" needlessly confuses me. Give it a short name to match the field name we use for it, which also matches the variable name used by the DBI(3pm) and DBD::SQLite(3pm) manpages.
2020-08-20xapcmd: simplify {reindex} parameter passing
No need to localize it, here, since we can just refer to it in the `$opt' hashref. Hopefully this improves readability for others like it does for me. I sometimes wonder if the concept of a stack in high-level languages is even necessary...
2020-08-13xcpdb: wire up new index options and --help
--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed.
2020-08-13xapcmd: reduce CPU idling when shards exceeds job count
In case there's unbalanced shards AND we're limiting parallelism while using many shards, spawn the next task in the queue ASAP once a task is done, instead of waiting for all tasks to finish before spawning the next batch. Unbalanced shards probably isn't a big issue for most users; however many smaller shards with few jobs can be useful for HDD users to reduce the effect of random writes.
2020-08-13xapcmd: simplify sub reference
We don't need to fully-qualify when referring to subs in the same namespace, nor do we need make a SCALAR ref only to dereference it (Yes, still learning Perl :x)
2020-08-10index+xcpdb: improve SIG{INT,TERM,HUP,PIPE} behavior
-index now invokes ->DESTROY like xcpdb does, which is necessary to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit with the expected values for various signals by adding 128 as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html> -xcpdb now terminates worker processes and xapian-compact(1) invocations when prematurely killed, too.
2020-08-08support setting No_COW on Perl <5.22
fileno(DIRHANDLE) only works on Perl 5.22+, so we need to use dirfd(3) ourselves from Inline::C (or rely on chattr(1) being installed). While we're at it, rename `set_nodatacow' to `nodatacow_fd' for consistency with `nodatacow_dir'.
2020-08-07index+xcpdb: rename `--no-sync' to `--no-fsync'
We'll continue supporting `--no-sync' even if its yet-to-make it it into a release, but the term `sync' is overloaded in our codebase which may be confusing to new hackers and users. None of our our code nor dependencies issue the sync(2) syscall, either, only fsync(2) and fdatasync(2).
2020-08-07xapcmd: drop outdated comment
We replaced Xtmpdir with File::Temp->newdir in commit 2a3e3a0469f54f6a4f80bf04614e5ddd794a6c5e ("xapcmd: replace Xtmpdirs with File::Temp->newdir") but forgot to remove the outdated comment.
2020-08-07xapcmd: remove redundant searchidx require
We already "use" it starting with commit cd8dd7b08fddc7c2b5f218c3fcaa5dca5f9ad945 ("search: support SWIG-generated Xapian.pm"), so there's no need to require it redundantly.
2020-08-07xapcmd: quietly no-op on indexlevel=basic
I find myself mindlessly adding "-c" to public-inbox-index, and other users may do the same. Instead of erroring out, we'll just silently ignore it, for now and allow public-inbox-compact to work on SQLite-only inboxes. We'll only check for xapian-compact if search exists, since it won't be needed in case we support SQLite VACUUM.
2020-07-29xapcmd: -xcpdb and -compact disable CoW, too
This gives an opportunity for users already suffering from CoW fragmentation to at least get the Xapian DBs off CoW. Aside from over.sqlite3 in v1, the SQLite DBs remain untouched; though VACUUM support may come in the future.
2020-07-26index: --compact respects --jobs
And -compact supports --jobs=0 like -index to disable parallel execution. Running three xapian-compact processes in parallel on a USB 2.0 HDD is pretty painful.
2020-07-25index+xcpdb: support --no-sync flag
This allows us to speed up indexing operations to SQLite and Xapian. Unfortunately, it doesn't affect operations using `xapian-compact' and the compactor API, since that doesn't seem to support Xapian::DB_NO_SYNC, yet.
2020-07-25xapcmd: set {from} properly for v1 inboxes
This was a bug, but I'm not sure where it matters, yet, but it may matter in the future.
2020-07-17with_umask: pass args to callback
While it makes the code flow slightly less well in some places, it saves us runtime allocations and indentation.
2020-07-14xapcmd: delay over->check_inodes trigger
We must not trigger wakeups on InboxIdle users until after we've renamed all files into place. Otherwise, the InboxIdle caller may just reopen the old (soon-to-be-unlinked) file. This fixes occasional test failures in t/nntpd.t Fixes: f977826a17f8735e ("lock: reduce inotify wakeups")
2020-06-25lock: reduce inotify wakeups
We can reduce the amount of platform-specific code by always relying on IN_MODIFY/NOTE_WRITE notifications from lock release. This reduces the number of times our read-only daemons will need to wake up when -watch sees no-op message changes (e.g. replied, seen, recent flag changes).
2020-04-15testcommon: DESTROY: wait for killed daemon
Otherwise, the waitpid(-1, 0) call in Xapcmd::process_queue() may reap it in a subsequent test when using t/run.perl to reuse processes for testing. While we're at it, make Xapcmd::process_queue warn about unknown PIDs in case other PIDs leak through to us in the future.
2020-03-29index: support --compact / -c on command-line
It's more convenient to specify `-c' / `--compact' on the command-line when reindexing than it is to invoke public-inbox-compact(1) separately. This is especially convenient in low-space situations when public-inbox-index is operating on multiple inboxes sequentially, as compaction can happen immediately after indexing each inbox, instead of waiting until all inboxes are indexed.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27xapcmd: increase scope of lock
The old lock scope was only sufficient for protecting against concurrent modifications from the common -mda, -watch, or -learn writers. It was not sufficient for protecting against parallel -compact or -xcpdb invocations from eager admins. Most of the time this only leads to confusing and misleading warning messages, but parallel xcpdb --reshard could lead to errors.
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-13xapcmd: use popen_rd for running xapian-compact
public-inbox-compact wrapper displays progress by default, anyways, and there's not a lot of output, so simplify our code by using popen_rd instead of spawn + optional pipe. While we're at it use "while (<HANDLE>)" to display progress as it happens, since "foreach (<$HANDLE>)" slurps the contents into an array, first.
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2019-12-30spawn: allow passing GLOB handles for redirects
We can save callers the trouble of {-hold} and {-dev_null} refs as well as the trouble of calling fileno().
2019-12-24search: support SWIG-generated Xapian.pm
Xapian upstream is slowly phasing out the XS-based Search::Xapian in favor of the SWIG-generated "Xapian" package. While Debian and both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian" binding. More information about the status of the "Xapian" Perl module here: https://trac.xapian.org/ticket/523