public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2024-04-29	extindex: support --no-multi-pack-index HEAD master
	git multi-pack-index files were creating swap storms and OOM-ing on my system; so providing an option to disable it seems prudent given the minor startup time regression.
2024-04-28	xap_helper: implement alarm(2)-based timeout
	alarm(2) delivering SIGALRM seems sufficient for Xapian since Xapian doesn't block signals (which would necessitate the use of SIGKILL via RLIMIT_CPU hard limit). When Xapian gets stuck in `D' state on slow storage, SIGKILL would not make a difference, either (at least not on Linux). Relying on RLIMIT_CPU is also trickier since we must account for CPU time already consumed by a process for unrelated requests. Thus we just rely on a simple alarm-based timeout. This also avoids requiring the optional BSD::Resource module in the (mostly) Perl implementation (and avoids potential bugs given my meager arithmetic skills).
2024-04-28	xap_helper: reopen logs in daemons
	When read-only daemons reopen log files via SIGUSR1, be sure to propagate it to Xapian helper processes to ensure old log files can be closed and archived.
2024-04-28	search: remove auto-start for async_mset
	Only public-facing daemons use it, currently, and all public-facing daemons will pre-spawn it as early as feasible. lei will need it eventually to handle queries requiring C++, but I'm not certain what path to take with lei, yet...
2024-04-28	test_common: don't needlessly rebuild C++ Xapian helper
	We should almost always be calling `check_build' instead of `build'. Using ccache masked some of the overhead from this, but various linker implementations are still slow.
2024-04-28	daemon: share and allow configuring Xapian helpers
	Xapian helper processes are disabled by default once again. However, they can be enabled via the new `-X INTEGER' parameter. One big positive is the Xapian helpers being spawned by the top-level daemon means they can be shared freely across all workers for improved load balancing and memory reduction.
2024-04-28	search: async_mset: pass resource errors to callback
	We need to be able to handle resource limitation errors in public-facing daemons.
2024-04-28	send_cmd4: make `tries' a per-call parameter
	While existing callers are private (lei, *-index, -watch) are private, we should not be blocking the event loop in public-facing servers when we hit ETOOMANYREFS, ENOMEM, or ENOBUFS.
2024-04-28	www: mbox*: use Perl 5.12
	We were already silently relying on v5.10 features (`//') and all the regexps to work correctly with v5.12 unicode_strings.
2024-04-24	xap_helper: PERL_INLINE_DIRECTORY fallback for JAOT build
	systemd setups may use role accounts (e.g. `news') with XDG_CACHE_HOME unset and a non-existent HOME directory which the user has no permission to create. In those cases, fallback to using PERL_INLINE_DIRECTORY if available for building the just-ahead-of-time C++ binary.
2024-04-24	www: wire up search to use async xap_helper
	The C++ version of xap_helper will allow more complex and expensive queries. Both the Perl and C++-only version will allow offloading search into a separate process which can be killed via ITIMER_REAL or RLIMIT_CPU in the face of overload. The xap_helper `mset' command wrapper is simplified to unconditionally return rank, percentage, and estimated matches information. This may slightly penalize mbox retrievals and lei users, but perhaps that can be a different command entirely.
2024-04-24	mbox: hoist out refill_result_ids
	This makes upcoming changes easier to understand.
2024-04-24	xap_helper: drop terms+data from `mset' command
	Retrieving Xapian document terms, data (and possibly values) and transferring to the Perl side would be an increase in complexity and I/O both the Perl and C++ sides. It would require more I/O in C++ and transient memory use on the Perl side where slow mset iteration gives an opportunity to dictate memory release rate. So lets ignore the document-related stuff here for now for ease-of-development. We can reconsider this change if dropping Xapian Perl bindings entirely and relying on JAOT C++ ever becomes a possibility.
2024-04-24	xap_helper.h: remove _SC_NPROCESSORS_ONLN default
	It's never straightforward to pick an ideal number of processes for anything and Xapian helper processes are no exception since there may be a massive disparities in CPU count and I/O performance. So default to a single worker for now in the C++ version since that's the default is for the Perl/(XS\|SWIG) version, and also the same as for our normal public-facing daemons. This keeps the behavior between the Perl+(XS\|SWIG) and C++ version as similar as possible.
2024-04-24	searchview: get rid of unused adump callback arg
	It hasn't been used since 2016 when we started working on improved streamability of gigantic responses. Fixes: 95d4bf7aded4 (atom: switch to getline/close for response bodies, 2016-12-03)
2024-04-17	lei: use async barrier for --import-before
	Write barriers can take a long time to finish, especially when commands are issues in parallel. So handle it asynchronously without blocking lei-daemon by making EOFpipe a little more flexible by supporting arguments to the callback function. This is another step towards improving parallel use of lei.
2024-04-17	lei/store: stop shard workers + cat-file on idle
	Schedule a timer to stop shard workers and the git-cat-file process after a `barrier' command. This allows us to save some memory again when the lei-daemon is idle but preserves the fork overhead reduction when issuing many commands in parallel or in quick succession.
2024-04-17	lei: use ->barrier to commit to lei/store
	barrier (synchronous checkpoint) is better than ->done with parallel lei commands being issued (via '&' or different terminals), since repeatedly stopping and restarting processes doesn't play nicely with expensive tasks like `lei reindex'. This introduces a slight regression in maintaining more processes (and thus resource use) when lei is idle, but that'll be fixed in the next commit.
2024-04-17	v2 + lei/store: always wait for fast-import checkpoint
	Since data going to git is the most important, always ensure data is written to git before attempting to write anything to SQLite or Xapian.
2024-04-13	lei: remove leftover debugging message
	Noticed while working on other things... Fixes: 299aac294ec3 (lei: do label/keyword parsing in optparse, 2023-10-02)
2024-04-13	io: avoid redundant waitpid in DESTROY
	We shouldn't attempt to reap a process again after it's been reaped asynchronously in the SIGCHLD handler. Noticed while working on changes to get lei/store to use checkpointing.
2024-04-13	lei_remote: solver supports uncommitted blobs
	This should improve `lei blob' and `lei rediff' functionality for folks relying on `lei index' and allows future work to improve parallelism via checkpointing in lei/store.
2024-04-12	lei q: support --thread-id=$MSGID \|\| -T $MSGID
	This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..." added last year to support per-thread searches 764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30) This only supports instances of public-inbox since 764035c83, but unfortunately there hasn't been a release since then.
2024-04-12	lei blob: fix attachment extraction for unimported\|\|inflight
	Noticed while trying to make other reliability improvements to lei...
2024-04-11	www: speed up global manifest.js.gz w/ "all" extindex
	By reducing internal event loop iterations, this brings 300+ inboxes down ~32ms to ~27ms. It should also be more consistent on servers with busy event loops since all the Xapian DB traffic happens at once, theoretically mproving cache utilization.
2024-04-08	syscall: set default constants for Inline::C platforms
	This fixes compile errors on platforms we can't explicitly support from pure Perl due to the lack of syscall stability guarantees by the OS developers. Reported-by: Gaelan Steele <gbs@canishe.com> Tested-by: Gaelan Steele <gbs@canishe.com>
2024-04-03	treewide: avoid getpid for more ownership checks
	There are still some places where on_destroy isn't suitable, This gets rid of getpid() calls in most of those cases to reduce syscall costs and cleanup syscall trace output.
2024-04-03	treewide: avoid getpid() for OnDestroy checks
	getpid() isn't cached by glibc nowadays and system calls are more expensive due to CPU vulnerability mitigations. To ensure we switch to the new semantics properly, introduce a new `on_destroy' function to simplify callers. Furthermore, most OnDestroy correctness is often tied to the process which creates it, so make the new API default to guarded against running in subprocesses. For cases which require running in all children, a new PublicInbox::OnDestroy::all call is provided.
2024-04-03	lock: get rid of PID guard
	PID guards for OnDestroy will be the default in an upcoming change. In the meantime, LeiMirror was the only user and didn't actually need it.
2024-03-12	codesearch: deduplicate $git->{nick} field
	While PublicInbox::Config is responsible for some instances of setting $git->{nick}, more PublicInbox::Git objects may be created from loading the cindex and we should do our best to reuse that memory, too. Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04)
2024-03-12	codesearch: deduplicate {ibx_score} name pairs
	With my current mirror of lore + gko, this saves over 300K allocations and brings the allocation count in this area down to under 5K. The reduction in AV refs saves around 45MB RAM according to measurements done live via Devel::Mwrap.
2024-03-12	www: use a dedicated limiter for blob solver
	Wrap the entire solver command chain with a dedicated limiter. The normal limiter is designed for longer-lived commands or ones which serve a single HTTP request (e.g. git-http-backend or cgit) and not effective for short memory + CPU intensive commands used for solver. Each overall solver request is both memory + CPU intensive: it spawns several short-lived git processes() in addition to a longer-lived `git cat-file --batch' process. Thus running parallel solvers from a single -netd/-httpd worker (which have their own parallelization) results in excessive parallelism that is both memory and CPU-bound (not network-bound) and cascade into slowdowns for handling simpler memory/CPU-bound requests. Parallel solvers were also responsible for the increased lifetime and frequency of zombies since the event loop was too saturated to reap them. We'll also return 503 on excessive solver queueing, since these require an FD for the client HTTP(S) socket to be held onto. () git (update-index\|apply\|ls-files) are all run by solver and short-lived
2024-03-12	listener: don't loop on errors
	Fortunately, this only affects `--multi-accept=' users, with `--multi-accept=-1' users getting infinite loops. I noticed this when EMFILE was reached on my setup, but any error should cause us to give up accept(2) (at least temporarily) and allow work for other items in the event loop to be processed.
2024-03-10	import: fix handling of init.defaultBranch
	We must chomp the newline in the branch name if it's set. Reported-by: Rob Herring <robh@kernel.org> Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/ Fixes: 73830410e4336b77 (treewide: use run_qx where appropriate, 2023-10-27)
2024-03-10	import: croak (instead of die) on write failures
	This allows accurate reporting of the error location and can be made to dump a Perl backtrace via PERL5OPT='-MCarp=verbose'. Noticed while tracking down fast-import failures. Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
2024-03-10	lei: prevent empty {bytes} field in saved search
	Noticed while tracking down fast-import crash bug report. Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
2024-03-08	dedupe inbox names, coderepo nicks + git dirs
	Inbox names, coderepo nicks, git_dir values are used heavily as hash keys by the read-only coderepo WWW pieces. Relying on CoW for mutable scalars on newer Perl doesn't work well since CoW for those scalars are limited to 256 CoW references and blow past that number when mapping thousands of coderepos and inboxes to each other. Instead, make the hash key up-front and get the resulting string to point directly to the pointer used by the hash key.
2024-02-14	eml: reuse ->decode buffer
	It's not really relevant at the moment, but a sufficiently smart implementation could eventually save some memory here. Perl already optimizes in-place sort (@x = sort @x), so there's precedent for a potential future where a Perl implementation could generally optimize in-place operations for non-builtin subroutines, too.
2024-02-14	eml: avoid anonymous __WARN__ sub for encode/decode
	Repeatedly allocating an anonymous sub is an expensive operation and a potential source of leaks in older Perl. Instead, `local'-ize a global and use a permanent sub to workaround the old Encode 2.87..3.12 leak.
2024-02-14	codesearch: generate_cxx: drop unused variables
	We are just using the odd ref+deref (`${\...}') syntax and don't need to calculate line numbers ourselves, nowadays.
2024-02-14	xap_helper_cxx: -O2 optimize read-only files by default
	While fast build times from -O0 is critical to my sanity when actively working on C++, the files installed via package managers or `make install' aren't likely to change frequently. In that case, expensive -O2 optimizations make sense since the 10-20s saved from a single large --join more than covers the cost of waiting on g++ to optimize.
2024-02-14	www: cgit: support non-standard cgitrc locations
	If publicinbox.cgitrc is set in the config file, we'll ensure cgit sees it as CGIT_CONFIG since the configured publicinbox.cgitrc knob may not be the default path the cgit.cgi binary was configured to use. Furthermore, we'll respect CGIT_CONFIG in the environment if publicinbox.cgitrc is unset in the config file at -httpd/-netd startup.
2024-02-13	viewvcs: HTML fixes for commits
	The "patch is too large to show" text is now broken by an <hr> to prevent it from being confused as part of a commit message (or having somebody intentionally insert that text in a commit message to confuse readers). A missing </pre> is also necessary before the <hr> tag for the related commit search form.
2024-02-13	viewvcs: parallelize commit display
	Similar to commit cbe2548c91859dfb923548ea85d8531b90d53dc3 (www_coderepo: use OnDestroy to render summary view, 2023-04-09), we can rely on OnDestroy and Qspawn to run dependencies in a structured way and with some extra parallelism for SMP users. Perl (as opposed to POSIX sh) allows us to easily avoid expensive patch generation for large root commits, and also avoid needless `git patch-id' invocations for patches which are too big to show. Avoiding patch-id alone saved nearly 2s from the linux.git root commit[1] with patch generation enabled and brought response times down to ~6s (still slow). Avoiding patch generation for root commits brings it down to a few hundred milliseconds on a public-facing server (nobody wants a 355MB patch rendered as HTML, right?). [1] torvalds/linux.git 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
2024-02-10	www: quiet errors for git-{archive,http-backend}
	SIGPIPE (13) can be quite common with unreliable connections and impatient clients, so just ignore them.
2024-02-09	view: decode In-Reply-To comments added by some MUAs
	Štěpán Němec <stepnem@smrk.net> wrote: > Eric Wong wrote: > > Subject: [PATCH] view: decode In-Reply-To comments added by Gnus > Or just "some MUAs"? Who knows who else... Yeah, I wouldn't be surprised if there were more... ---8<--- Subject: [PATCH] view: decode In-Reply-To comments added by some MUAs Emacs-based MUAs (e.g. Gnus and rmail) can do it, and maybe some others, too. I noticed it in <https://yhbt.net/lore/git/xmqqr0ho9oi9.fsf@gitster.g/> while scanning for something else.
2024-02-08	daemon: quiet Email::Address::XS warnings properly
	Setting $SIG{__WARN__} at the top-level no longer has any effect since we localize $SIG{__WARN__} when entering ->event_step on a per-listener basis. Fixes: 60d262483a4d (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
2024-02-06	pop3d: support fcntl locks on OpenBSD i386
	The packaged Perl on OpenBSD i386 supports 64-bit file offsets but not 64-bit integer support for 'q' and 'Q' with `pack'. Since servers aren't likely to require lock files larger than 2 GB (we'd need an inbox with >2 billion messages), we can workaround the Perl build limitation with explicit padding. File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be in future releases), but I can test i386 OpenBSD on an extremely slow VM. Big endian support can be done, too, but I have no idea if there's 32-bit BE users around nowadays...
2024-02-01	lei: sort MH inputs sequentially by default
	MH sequence numbers can be analogous to IMAP UIDs and NNTP article numbers (or more like IMAP MSNs with clients which pack). In any case, sort then numerically by default to avoid surprising users who treat NNTP spools and mlmmj archives as MH folders. This gives more coherent git history and resulting NNTP/IMAP numbering when round-tripping MH -> v2 -> (NNTP\|IMAP) -> MH
2024-02-01	import: drop redundant `use' statement
	We don't need multiple `use PublicInbox::IO' statements to import a subroutine.