Date | Commit message (Collapse) |
|
Xapian helper processes are disabled by default once again.
However, they can be enabled via the new `-X INTEGER' parameter.
One big positive is the Xapian helpers being spawned by the
top-level daemon means they can be shared freely across all
workers for improved load balancing and memory reduction.
|
|
The 131072 byte lower bound was the old default before the
sliding mmap window was introduced in modern glibc malloc.
While the sliding mmap window was intended to be faster by
reducing syscalls, zeroing and kernel overhead, it is also prone
to fragmentation from allocation patterns seen in evented Perl
servers.
Individual allocations over 128K are rare in our codebase since
there aren't many messages this large, making any performance
impact tiny. Furthermore, the reduction in fragmentation and
memory use will be a speedup for memory-constrained systems
since they can avoid swap and have more leftover for the page
cache.
|
|
Large string processing + concurrency + caching/memoization
really brings out the worst in glibc malloc :<
|
|
We need these values in the PSGI $env to generate the cache key,
even if we're not linkifying anything.
Fixes: 48cbe0c3 (www: linkify inbox addresses in To/Cc headers, 2024-01-09)
|
|
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..."
added last year to support per-thread searches
764035c83 (www: support POST /$INBOX/$MSGID/?x=m&q=, 2023-03-30)
This only supports instances of public-inbox since 764035c83,
but unfortunately there hasn't been a release since then.
|
|
|
|
INSTALL now covers more of lei since I'm less uncomfortable
about it for 2.0 and points users towards the install/ helpers
if installing from source.
|
|
I may be mistaken, but I suspect the reason jemalloc handles
long-lived processes better than glibc is due to granularity
reduction being scaled to larger size classes. This can waste
20% of an individual allocation, but increases the likelyhood
of reuse (without splitting/consolidating into other sizes).
In other words, glibc seems to try too hard to make the best fit
for initial allocations. This ends up being suboptimal over
time as those allocations are freed and similar (but not
identical) allocations come in. jemalloc sacrifices the best
initial fit for better fits over a long process lifetime.
|
|
`=item' elements in Pod need to be surrounded by empty lines.
It's an unfortunate waste of vertical space, but Pod is still better
than *roff and usually available out-of-the-box.
|
|
It'll probably be done for another release, I doubt most cgit
users are willing to completely replace it with our coderepo
viewer just yet...
|
|
I'm not looking forward to dealing with synchronization
problems if we end up dealing with writes...
|
|
The good news (compared to lei) is we only have to worry about
imports and don't care about the filename nor keywords, so it's
immune to .mh_sequences writing inconsistencies across MH
implementations and sequence number packing.
We still assume the writer will write the mail file with one of:
* rename(2) to create the final sequence number filename
* a single write(2) if not relying on rename(2)
mlmmj and mutt satisfy these requirements. Python's Lib/mailbox.py
may, I'm not sure...
|
|
I noticed the HTML manpages didn't have -extindex linkification
while checking over the docs. While adding it, I also noticed
-config(5) had two entries :x
|
|
But new ideas keep popping into muh brain :x
|
|
Kyle Meyer <kyle@kyleam.com> wrote:
> Eric Wong writes:
> > +Treat the name of the public inbox as it's unqualified URL when
>
> s/it's/its/
Thanks, will push this fix out:
-------8<------
Subject: [PATCH] doc: config: fix grammar for nameIsUrl
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87bkbazp5g.fsf@kyleam.com/
|
|
This is a convenient (and slightly memory-saving) alternative to
specifying a `publicinbox.*.url' entry for every single inbox
when using publicinbox.wwwListing.
|
|
|
|
Fixes: c76a20d75200 ("cindex: require `-g GIT_DIR' or `-r PROJECT_ROOT'")
|
|
--no-import-before skips importing entire messages, not just
keywords, so it can cause permanent data loss if -o is pointed
to precious data.
|
|
Accepting @ARGV without switches ends up being ambiguous with
optional parameters for --join and --show. Requiring users to
specify `--join=' or `--show=' is a bit awkward (as it with
-clone --objstore= and the like, but that is historical baggage
we need to carry at this point...)
|
|
We've had it since v1.7.0 when -extindex was introduced,
but it was never documented outside of commit messages.
Reviewed-by: Štěpán Němec <stepnem@smrk.net>
|
|
For users hosting read-only mirrors (via clone|fetch) and feeding
inboxes via -watch
|
|
There's no point in duplicating --no-fsync documentation across
manpages. --dangerous can be useful for reducing SSD wear, so
add a pointer to it as well.
|
|
Stale entries from newsgroup name changes (including adding
a `publicinbox.<name>.newsgroup' entry when none existed
before) can wreak havoc during a --reindex. So give the
hint to users about running -extindex with --gc to clean
up stale entries.
|
|
Start lowercasing newsgroup names automatically since uppercase
names are incompatible with IMAP and POP3 and also causes
problems with both -extindex and -cindex.
We'll also warn on eidx_key and newsgroup conflicts to avoid
sometimes subtle breakage when using -extindex and -cindex.
|
|
List-Unsubscribe headers with unique identifiers (such as those
generated by our examples/unsubscribe.milter) should not
end up in public archives. Add a new config knob to strip
List-Unsubscribe headers if they have the
`List-Unsubscribe-Post: List-Unsubscribe=One-Click'
header.
Unfortunately, this breaks DKIM signatures if the signature
covers either of these List-Unsubscribe* headers. However,
breaking DKIM is the lesser evil compared to any archive reader
being able to stop archival by an independent archivist.
As much as I would like this to be the default, it probably
affects few users at the moment since very few mailing lists
use unique identifiers in List-Unsubscribe (but that number
has grown, recently).
|
|
Fixes: 1f1b1f0e22f7 ("doc: lei-q: document SEARCH TERMS prefixes")
|
|
[ew: leave install/README unchanged pending wording changes]
|
|
which(1) isn't in POSIX so it's perhaps less likely to be
available (although I don't think I've noticed a system
without it in decades). So replace it with the POSIX
`command -v', even though everyone seems to use which...
Add a note about `lexgrog', too, since I'm not sure if it's
packaged for various *BSDs.
|
|
This non-portable construct isn't needed for our own rules.
I'm understanding them correctly, they have different
semantics between *BSDs and GNU make.
|
|
The Xapian SWIG bindings are favored by Xapian upstream for
ease-of-maintenance compared to the XS version. While Debian
lags on this front, the SWIG bindings are widely available
on all *BSDs.
|
|
Since -cindex uses the xapian-delve(1) command for `--prune'
functionality, we'll rename our `xapian-compact' dependency to
the Debian package name (xapian-tools) since `xapian-delve' is
in the same package.
|
|
Link: https://public-inbox.org/meta/20230901110903.M876537@dcvr/
Link: https://public-inbox.org/meta/20230902194407.M464597@dcvr/
Fixes: 88c7c7c26b44 ("lei: wire up pure Perl sendmsg/recvmsg for Linux users")
Fixes: acefd91b302d ("syscall: implement sendmsg+recvmsg in pure Perl")
|
|
Reported-by: Štěpán Němec <stepnem@smrk.net>
|
|
|
|
We'll also be using this for -cindex for associating inboxes
to coderepos.
|
|
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87leeovmig.fsf@kyleam.com/
|
|
It's unusable for large mirrors, otherwise.
|
|
It's similar to a combination of -index and -extindex but
perhaps more refined this time around...
|
|
When import hits blobs it's already seen, we'll add labels
regardless in order to match the behavior of other inexact
matches. This is useful when importing exact copies of
messages which exist in multiple mailboxes.
I noticed this when I had a message imported from my normal IMAP
`INBOX', but also copied it to a different folder for future
reference.
|
|
This has been supported in every lei release, actually.
|
|
While accepting a single connection at-a-time is likely best for
multi-worker and/or load-balanced deployments; accepting
multiple connections at once should be less bad on overloaded
single-worker systems.
We can't automatically pick the best value here since worker
counts are dynamic via SIGTTIN/SIGTTOU. Process managers
(e.g. systemd) can also spawn multiple instances sharing a
single listener with no knowledge sharing between listeners.
|
|
This lets us clean up disk space when repos are removed
on the remote side.
|
|
It may not be immediately obvious to users unfamiliar with
grokmirror.
|
|
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87v8j4ql8k.fsf@kyleam.com/
|
|
|
|
Did some stuff, still a ton of stuff to do :x
|
|
I typically use --edit/-e to make changes and --list/-l with
git; and same with lei.
|
|
I'm setting up more imports and forgot about them :x
|
|
Hopefully this makes things less surprising to new hackers.
|