about summary refs log tree commit homepage
DateCommit message (Collapse)
2024-03-18INSTALL: try to be less confusing about optional modules HEAD master
2024-03-16Fix some typos and language nits in docs and comments
2024-03-14doc: update release notes, marketing, and install
INSTALL now covers more of lei since I'm less uncomfortable about it for 2.0 and points users towards the install/ helpers if installing from source.
2024-03-12codesearch: deduplicate $git->{nick} field
While PublicInbox::Config is responsible for some instances of setting $git->{nick}, more PublicInbox::Git objects may be created from loading the cindex and we should do our best to reuse that memory, too. Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04)
2024-03-12doc: tuning: note reduced fragmentation w/ jemalloc
I may be mistaken, but I suspect the reason jemalloc handles long-lived processes better than glibc is due to granularity reduction being scaled to larger size classes. This can waste 20% of an individual allocation, but increases the likelyhood of reuse (without splitting/consolidating into other sizes). In other words, glibc seems to try too hard to make the best fit for initial allocations. This ends up being suboptimal over time as those allocations are freed and similar (but not identical) allocations come in. jemalloc sacrifices the best initial fit for better fits over a long process lifetime.
2024-03-12codesearch: deduplicate {ibx_score} name pairs
With my current mirror of lore + gko, this saves over 300K allocations and brings the allocation count in this area down to under 5K. The reduction in AV refs saves around 45MB RAM according to measurements done live via Devel::Mwrap.
2024-03-12www: use a dedicated limiter for blob solver
Wrap the entire solver command chain with a dedicated limiter. The normal limiter is designed for longer-lived commands or ones which serve a single HTTP request (e.g. git-http-backend or cgit) and not effective for short memory + CPU intensive commands used for solver. Each overall solver request is both memory + CPU intensive: it spawns several short-lived git processes(*) in addition to a longer-lived `git cat-file --batch' process. Thus running parallel solvers from a single -netd/-httpd worker (which have their own parallelization) results in excessive parallelism that is both memory and CPU-bound (not network-bound) and cascade into slowdowns for handling simpler memory/CPU-bound requests. Parallel solvers were also responsible for the increased lifetime and frequency of zombies since the event loop was too saturated to reap them. We'll also return 503 on excessive solver queueing, since these require an FD for the client HTTP(S) socket to be held onto. (*) git (update-index|apply|ls-files) are all run by solver and short-lived
2024-03-12listener: don't loop on errors
Fortunately, this only affects `--multi-accept=' users, with `--multi-accept=-1' users getting infinite loops. I noticed this when EMFILE was reached on my setup, but any error should cause us to give up accept(2) (at least temporarily) and allow work for other items in the event loop to be processed.
2024-03-10import: fix handling of init.defaultBranch
We must chomp the newline in the branch name if it's set. Reported-by: Rob Herring <robh@kernel.org> Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/ Fixes: 73830410e4336b77 (treewide: use run_qx where appropriate, 2023-10-27)
2024-03-10import: croak (instead of die) on write failures
This allows accurate reporting of the error location and can be made to dump a Perl backtrace via PERL5OPT='-MCarp=verbose'. Noticed while tracking down fast-import failures. Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
2024-03-10lei: prevent empty {bytes} field in saved search
Noticed while tracking down fast-import crash bug report. Link: https://public-inbox.org/meta/CAL_JsqK7P4gjLPyvzxNEcYmxT4j6Ah5f3Pz1RqDHxmysTg3aEg@mail.gmail.com/
2024-03-08dedupe inbox names, coderepo nicks + git dirs
Inbox names, coderepo nicks, git_dir values are used heavily as hash keys by the read-only coderepo WWW pieces. Relying on CoW for mutable scalars on newer Perl doesn't work well since CoW for those scalars are limited to 256 CoW references and blow past that number when mapping thousands of coderepos and inboxes to each other. Instead, make the hash key up-front and get the resulting string to point directly to the pointer used by the hash key.
2024-02-14eml: reuse ->decode buffer
It's not really relevant at the moment, but a sufficiently smart implementation could eventually save some memory here. Perl already optimizes in-place sort (@x = sort @x), so there's precedent for a potential future where a Perl implementation could generally optimize in-place operations for non-builtin subroutines, too.
2024-02-14eml: avoid anonymous __WARN__ sub for encode/decode
Repeatedly allocating an anonymous sub is an expensive operation and a potential source of leaks in older Perl. Instead, `local'-ize a global and use a permanent sub to workaround the old Encode 2.87..3.12 leak.
2024-02-14codesearch: generate_cxx: drop unused variables
We are just using the odd ref+deref (`${\...}') syntax and don't need to calculate line numbers ourselves, nowadays.
2024-02-14xap_helper_cxx: -O2 optimize read-only files by default
While fast build times from -O0 is critical to my sanity when actively working on C++, the files installed via package managers or `make install' aren't likely to change frequently. In that case, expensive -O2 optimizations make sense since the 10-20s saved from a single large --join more than covers the cost of waiting on g++ to optimize.
2024-02-14doc: fix formatting for CLI switch aliases
`=item' elements in Pod need to be surrounded by empty lines. It's an unfortunate waste of vertical space, but Pod is still better than *roff and usually available out-of-the-box.
2024-02-14doc: config: cgit=rewrite isn't implemented, yet
It'll probably be done for another release, I doubt most cgit users are willing to completely replace it with our coderepo viewer just yet...
2024-02-14www: cgit: support non-standard cgitrc locations
If publicinbox.cgitrc is set in the config file, we'll ensure cgit sees it as CGIT_CONFIG since the configured publicinbox.cgitrc knob may not be the default path the cgit.cgi binary was configured to use. Furthermore, we'll respect CGIT_CONFIG in the environment if publicinbox.cgitrc is unset in the config file at -httpd/-netd startup.
2024-02-13viewvcs: HTML fixes for commits
The "patch is too large to show" text is now broken by an <hr> to prevent it from being confused as part of a commit message (or having somebody intentionally insert that text in a commit message to confuse readers). A missing </pre> is also necessary before the <hr> tag for the related commit search form.
2024-02-13viewvcs: parallelize commit display
Similar to commit cbe2548c91859dfb923548ea85d8531b90d53dc3 (www_coderepo: use OnDestroy to render summary view, 2023-04-09), we can rely on OnDestroy and Qspawn to run dependencies in a structured way and with some extra parallelism for SMP users. Perl (as opposed to POSIX sh) allows us to easily avoid expensive patch generation for large root commits, and also avoid needless `git patch-id' invocations for patches which are too big to show. Avoiding patch-id alone saved nearly 2s from the linux.git root commit[1] with patch generation enabled and brought response times down to ~6s (still slow). Avoiding patch generation for root commits brings it down to a few hundred milliseconds on a public-facing server (nobody wants a 355MB patch rendered as HTML, right?). [1] torvalds/linux.git 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
2024-02-10www: quiet errors for git-{archive,http-backend}
SIGPIPE (13) can be quite common with unreliable connections and impatient clients, so just ignore them.
2024-02-09view: decode In-Reply-To comments added by some MUAs
Štěpán Němec <stepnem@smrk.net> wrote: > Eric Wong wrote: > > Subject: [PATCH] view: decode In-Reply-To comments added by Gnus > Or just "some MUAs"? Who knows who else... Yeah, I wouldn't be surprised if there were more... ---8<--- Subject: [PATCH] view: decode In-Reply-To comments added by some MUAs Emacs-based MUAs (e.g. Gnus and rmail) can do it, and maybe some others, too. I noticed it in <https://yhbt.net/lore/git/xmqqr0ho9oi9.fsf@gitster.g/> while scanning for something else.
2024-02-08daemon: quiet Email::Address::XS warnings properly
Setting $SIG{__WARN__} at the top-level no longer has any effect since we localize $SIG{__WARN__} when entering ->event_step on a per-listener basis. Fixes: 60d262483a4d (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
2024-02-06pop3d: support fcntl locks on OpenBSD i386
The packaged Perl on OpenBSD i386 supports 64-bit file offsets but not 64-bit integer support for 'q' and 'Q' with `pack'. Since servers aren't likely to require lock files larger than 2 GB (we'd need an inbox with >2 billion messages), we can workaround the Perl build limitation with explicit padding. File::FcntlLock isn't packaged for OpenBSD <= 7.4 (but should be in future releases), but I can test i386 OpenBSD on an extremely slow VM. Big endian support can be done, too, but I have no idea if there's 32-bit BE users around nowadays...
2024-02-01lei: sort MH inputs sequentially by default
MH sequence numbers can be analogous to IMAP UIDs and NNTP article numbers (or more like IMAP MSNs with clients which pack). In any case, sort then numerically by default to avoid surprising users who treat NNTP spools and mlmmj archives as MH folders. This gives more coherent git history and resulting NNTP/IMAP numbering when round-tripping MH -> v2 -> (NNTP|IMAP) -> MH
2024-02-01scripts/import_*: update usage to include lei tips
These scripts probably don't offer anything useful now that lei has fleshed out read-only MH support and v2 outputs.
2024-02-01scripts/slrnspool2maildir: use MHreader and LeiToMail
This contains gmane-specific header munging to unmunge the things gmane dones to headers. While we're at it, document the generic `lei convert' invocation for users who don't need the gmane-specific header munging.
2024-02-01import: drop redundant `use' statement
We don't need multiple `use PublicInbox::IO' statements to import a subroutine.
2024-02-01lei convert: explicitly allow --sort for inputs
LeiToMail can't sort v2 output, but sorting MH input (and NNTP spool + mlmmj archives) numerically makes sense.
2024-01-31lei_to_mail: improve SIGPIPE handling
I can't reproduce this in t/lei-sigpipe.t with GIANT_INBOX_DIR. In real-world usage, having a large `lei q -f text ...' output piped to a pager and killing the pager prematurely could trigger: non-fatal error from PublicInbox::LeiToMail $?=256 messages in my terminal. This is because $self->{lei} was becoming undefined in the process cleanup process of git_to_mail. So flip the cleanup logic around and unconditionally check for Git::cleanup state to bail out early. With this change, the `non-fatal error ...' message no longer appears when I stop reading results early.
2024-01-30spawn: support some rlimit uses via Inline::C
BSD::Resource isn't packaged for Alpine (as of 3.19), but we also have optional Inline::C support and already rely on calling setrlimit(2) directly from the Inline::C version of pi_fork_exec.
2024-01-30doc/lei-mail-formats: update MH read-only status
I'm not looking forward to dealing with synchronization problems if we end up dealing with writes...
2024-01-30watch: support incremental updates from MH
The good news (compared to lei) is we only have to worry about imports and don't care about the filename nor keywords, so it's immune to .mh_sequences writing inconsistencies across MH implementations and sequence number packing. We still assume the writer will write the mail file with one of: * rename(2) to create the final sequence number filename * a single write(2) if not relying on rename(2) mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py may, I'm not sure...
2024-01-30syscall: use pure Perl sendmsg/recvmsg on *BSD
While syscall symbols (e.g. SYS_*) have changed on us in FreeBSD during the history of Sys::Syscall and this project and did bite us in some cases; the actual numbers don't get recycled for new syscalls. We're also fortunate that sendmsg and recvmsg syscalls and associated msghdr and cmsg structs predate the BSD forks and are compatible across all the BSDs I've tried. OpenBSD routes Perl `syscall' through libc; while NetBSD + FreeBSD document procedures for maintaining backwards compatibility. It looks like Dragonfly follows FreeBSD, here. Tested on i386 OpenBSD, and amd64 {Free,Net,Open,Dragonfly}BSD This enables *BSD users to use lei, -cindex and future SCM_RIGHTS-only features without needing Inline::C. [1] https://cvsweb.openbsd.org/src/gnu/usr.bin/perl/gen_syscall_emulator.pl [2] https://www.netbsd.org/docs/internals/en/chap-processes.html#syscall_versioning [3] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
2024-01-30syscall: update formatting to match our codebase
Sys::Syscall needs separate patches anyways (if it ever gets updated), and having a mix of indentation styles in our codebase gets confusing. We'll also update cfarm-related comments for the current URL.
2024-01-24view: /$INBOX/ links to topics_{new,active}.html
This makes the new endpoints easier-to-find. The navigation is still at the bottom of the page since I figured having it at the top is too cluttered for users on small terminals.
2024-01-24www_topics: simplify date column mapping
We can rely on SQLite to map `MAX(ds)' to `ds' rather than doing it in Perl, reducing the size of our Perl optree at the (smaller) expense of SQLite bytecode.
2024-01-17www: repolist: support globbing in URL
This can make it easier to find deeply-nested repositories on my mirror of git.kernel.org. It's not perfect, since projects like Linux use several completely different basenames (e.g. linux.git vs vfs.git vs net.git), but it can still help find significant matches further up a tree. I don't expect glob characters to conflict with actual git repositories used by reasonable people, but direct (non-glob) hits are still tried first.
2024-01-17config: glob2re: fix over-matching /**/foo
Noticed while adding wildcard support to WwwCoderepo...
2024-01-17config: don't vivify invalid fields for coderepos
We don't need 404s for non-existent coderepos creating fake (and invalid) entries. I noticed this while working on subsequent changes to support globbing in URLs.
2024-01-17examples/unsubscribe-milter@.service: use KillMode=process
This can be a multi-process daemon, but systemd should only kill the top-level one. And also finish a comment about the User having access to the shared private key.
2024-01-17tests: clarify Email::MIME is only for development
We moved to PublicInbox::Eml a while back and have no plans to go back to using Email::MIME, so don't tempt users and packagers to waste disk space on Email::MIME.
2024-01-11lei_to_mail: show supported mbox formats on error
Users may accidentally or unknowingly write `mbox' and not know we support 4 incompatible mbox variants.
2024-01-11lei+net_reader: show NNTP message in more failures
Showing absolutely nothing when hitting a server requiring authentication is a very bad user experience. While we're at it, use Net::Cmd->message in more places where we experience failure, too.
2024-01-11net_reader: fix NNTP credential use
Clearly this was never tested until now, as passwords being retrieved by git-credential got completely ignored and unused. This enables users to connect to NNTP(S) servers requiring a password.
2024-01-10www: use autodie in more coderepo places
This cuts down on code somewhat (before I add more :x)
2024-01-10address: avoid [ undef, undef ] address pairs
For totally bogus things in address fields, we'll fall back to showing the original entry in the name column when using Email::Address::XS. The pure Perl version differs here, but we'll just let them be different when it comes to handling bogus data.
2024-01-10www: linkify inbox addresses in To/Cc headers
This makes it easier to discover contemporary messages crossposted to other groups within the same WWW instance. The internal cache is necessary for giant threads, and the expiry mechanism is necessary to prevent attackers from trivially OOM-ing.
2024-01-10git: lowercase host in host_prefix_url
This will make it more effective for use as a cache key. I'm not entirely happy with this sub being in the Git module since it's used by lei and command-line tools, but that's for another day to deal with...