about summary refs log tree commit homepage
path: root/lib/PublicInbox/XapHelper.pm
DateCommit message (Collapse)
2024-05-20xap_helper: drop DB handles on EMFILE/ENFILE/etc...
This allows the process to recover in case we get the SHARD_COST calculation wrong in case Xapian uses more FDs than expected in new versions. We'll no longer attempt to recover from ENOMEM and similar errors during Xapian DB initialization and instead just tear down the process (as we do in other places).
2024-05-20xap_helper: expire DB handles when FD table is near full
For long-lived daemons across config reloads, we shouldn't keep Xapian DBs open forever under FD pressure. So estimate the number of FDs we need per-shard and start clearing some out if we have too many open. While we're at it, hoist out our ulimit_n helper and share it across extindex and the Perl XapHelper implementation.
2024-05-20xap_helper: key search instances by -Q params, too
In addition to the shards which comprise the xap_helper search instance, we also account for changes in altid and indexheader in case xap_helper lifetime exceeds the given PublicInbox::Config. xap_helper will be Config lifetime agnostic since it's possible to run -netd and -httpd instances with multiple Config files, but a single xap_helper instance (with workers) should be able to service all of them.
2024-05-08xap_helper: unconditionally reopen DBs on reuse
Reopening Xapian DBs is a fairly cheap operation and Xapian avoids doing work when nothing's changed, so just do it to ensure we always get the latest updates in search results. The old synchronous search interface worked around this by having a timer based expiration in hopes of mitigating fragmentation problems, but perhaps that's not worth doing anymore now that memory fragmentation from Perl itself is better understood.
2024-05-06search: fix altid search with XapHelper process
External Xapian helper processes need to support non-standard QueryParser prefixes. The only way to do this is to specify these prefixes in every `mset' request since we have no idea if the XH worker servicing the request has initialized the extra prefixes, yet.
2024-04-28xap_helper: implement alarm(2)-based timeout
alarm(2) delivering SIGALRM seems sufficient for Xapian since Xapian doesn't block signals (which would necessitate the use of SIGKILL via RLIMIT_CPU hard limit). When Xapian gets stuck in `D' state on slow storage, SIGKILL would not make a difference, either (at least not on Linux). Relying on RLIMIT_CPU is also trickier since we must account for CPU time already consumed by a process for unrelated requests. Thus we just rely on a simple alarm-based timeout. This also avoids requiring the optional BSD::Resource module in the (mostly) Perl implementation (and avoids potential bugs given my meager arithmetic skills).
2024-04-28xap_helper: reopen logs in daemons
When read-only daemons reopen log files via SIGUSR1, be sure to propagate it to Xapian helper processes to ensure old log files can be closed and archived.
2024-04-24www: wire up search to use async xap_helper
The C++ version of xap_helper will allow more complex and expensive queries. Both the Perl and C++-only version will allow offloading search into a separate process which can be killed via ITIMER_REAL or RLIMIT_CPU in the face of overload. The xap_helper `mset' command wrapper is simplified to unconditionally return rank, percentage, and estimated matches information. This may slightly penalize mbox retrievals and lei users, but perhaps that can be a different command entirely.
2024-04-24xap_helper: drop terms+data from `mset' command
Retrieving Xapian document terms, data (and possibly values) and transferring to the Perl side would be an increase in complexity and I/O both the Perl and C++ sides. It would require more I/O in C++ and transient memory use on the Perl side where slow mset iteration gives an opportunity to dictate memory release rate. So lets ignore the document-related stuff here for now for ease-of-development. We can reconsider this change if dropping Xapian Perl bindings entirely and relying on JAOT C++ ever becomes a possibility.
2024-04-03treewide: avoid getpid() for OnDestroy checks
getpid() isn't cached by glibc nowadays and system calls are more expensive due to CPU vulnerability mitigations. To ensure we switch to the new semantics properly, introduce a new `on_destroy' function to simplify callers. Furthermore, most OnDestroy correctness is often tied to the process which creates it, so make the new API default to guarded against running in subprocesses. For cases which require running in all children, a new PublicInbox::OnDestroy::all call is provided.
2023-12-09xap_helper: support term length limit
This will allow us to use p2q-compatible specifications such as "dfpost7" to only capture blob OIDs which are 7 characters in length (the indexer will always index down to 7 characters)
2023-11-29xap_helper: implement mset endpoint for WWW, IMAP, etc...
The C++ version will allow us to take full advantage of Xapian's APIs for better queries, and the Perl bindings version can still be advantageous in the future since we'll be able to support timeouts effectively.
2023-11-21cindex: rename --associate to --join, test w/ real repos
The association data is just stored as deflated JSON in Xapian metadata keys of shard[0] for now. It should be reasonably compact and fit in memory for now since we'll assume sane, non-malicious git coderepo history, for now. The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be set in the environment and tests the joins against the inboxes and coderepos of two small projects with a common history. Internally, we'll use `ibx_off', `root_off' instead of `ibx_id' and `root_id' since `_id' may be mistaken for columns in an SQL database which they are not.
2023-11-13xap_helper: stricter and harsher error handling
We'll require an error stream for dump_ibx and dump_roots commands; they're too important to ignore. Instead of writing code to provide diagnostics for errors, rely on abort(3) and the -ggdb3 compiler flag to generate nice core dumps for gdb since all commands sent to xap_helper are from internal users. We'll even abort on most usage errors since they could be bugs in split2argv or our use of getopt(3). We'll also just exit on ENOMEM errors since it's the easiest way to recover from those errors by starting a new process which closes all open Xapian DB handles.
2023-11-13xap_helper: Perl dump_ibx respects `-m MAX'
The C++ version does, so the Perl/XS version should, too; even if we intentionally avoid using it right now.
2023-11-03move read_all, try_cat, and poll_in to PublicInbox::IO
The IO package seems like a better home for I/O subs than the Git package. We lose the 60 second read timeout for `git cat-file --batch-*' processes since it's probably not necessary given how reliable the code has proven and things would fall over hard in other ways if the storage device were completely hosed.
2023-11-02xap_helper.pm: use do_fork to Reset and reseed
We may start using rand() in the worker someday if we need to seed a hash function for caching. It saves us some LoC in the meantime.
2023-11-01xap_helper.pm: quiet undefined die at shutdown
Another attempt at doing what commit 35de8fdcbf290e25 (xap_helper.pm: quiet undefined warnings at shutdown, 2023-10-23) tried to do. It turns out perl croaks (not warn/carp) when it sees an undefined file handle, here.
2023-10-23xap_helper.pm: quiet undefined warnings at shutdown
We can't force EBADF with high-level I/O wrappers like we can in C, so instead we quiet Perl itself.
2023-10-18xap_helper: autodie for getsockopt
Only caveat is we can't use bareword filehandles, but that's a minor inconvenience.
2023-10-18xap_helper: simplify SIGTERM exit checks
We can just close the socket FD to ensure things fail ASAP when a SIGTERM hits instead of wasting time making getppid(2) syscalls.
2023-10-18xap_helper: die more easily in both implementations
We don't need to tolerate bad requests since it's only handling requests from the parent process. So simplify error management and just die||exit if we get a bad request.
2023-10-18use read_all in more places to improve safety
`readline' ops may not detect errors on partial reads. This saves us some code to reduce cognitive overhead for readers. We'll also support reusing a destination buffers so it can work more nicely with existing code.
2023-10-06ipc: lower-level send_cmd/recv_cmd handle EINTR directly
This ensures script/lei $send_cmd usage is EINTR-safe (since I prefer to avoid loading PublicInbox::IPC for startup time). Overall, it saves us some code, too.
2023-10-04xap_helper.pm: use EINTR-aware recv_cmd
The code is already loaded, so there's no point in avoiding it.
2023-10-04xap_helper: retry flock on EINTR
While signals are currently blocked in these helpers, they may not always be...
2023-09-25ds: force event_loop wakeup on final child death
Reaping children needs to keep the event_loop spinning another round when the @post_loop_do callback may be used to check on process exit during shutdown. This allows us to get rid of the hacky SetLoopTimeout calls in lei-daemon and XapHelper.pm during process shutdown if we're trying to wait for all PIDs to exit before leaving the event loop.
2023-09-15xap_helper: detect non-socket STDIN
We don't want to get into a worker respawn loop if somebody just decides to start the executable from a terminal.
2023-09-05xap_helper: support SIGTTIN+SIGTTOU worker adjustments
Being able to tune worker process counts on-the-fly when xap_helper gets used with -{netd,httpd,imapd} will be useful for tuning new setups.
2023-09-01xap_helper: deal with Xapian::DocNotFoundError
It's possible for a long mset streaming operation to hit missing documents after a database reopen if deletes hit the DB.
2023-08-24xap_helper: reopen+retry in MSetIterator loops
It's possible to hit a DatabaseModifiedError while iterating through an MSet. We'll retry in these cases and cleanup some code in both the Perl and C++ implementations.
2023-08-24cindex: implement dump_roots in C++
It's now just `dump_roots' instead of `dump_shard_roots', since this doesn't need to be tied to the concept of shards. I'm still shaky with C++, but intend to keep using stuff like hsearch(3) to make life easier for C hackers :P
2023-08-24introduce optional C++ xap_helper
This allows us to perform the expensive "dump_ibx" operations in native C++ code using the Xapian C++ library. This provides the majority of the speedup with the -cindex --associate switch. Eventually this may be expanded to cover all uses of Xapian within the project to ensure we have access to Xapian APIs which aren't available in XS|SWIG bindings; and also for ease-of-installation on systems which don't provide pre-packaged Perl Xapian bindings (e.g. OpenBSD 7.3) but do provide Xapian development libraries. Most of the C++ code is still C, as I'm not remotely familiar with C++ compared to C. I suspect many users and potential hackers being from git, Linux kernel, and glibc world are in the same boat.