about summary refs log tree commit homepage
path: root/script
DateCommit message (Collapse)
2023-01-06clone: implement --exit-code
Since public-inbox-clone is now useful for incremental updates with manifest, --exit-code belongs here, too.
2022-12-30clone: support --post-update-hook= from grokmirror
This should be compatible with both grokmirror 1 and 2 behavior and serialized on a per-repo basis.
2022-11-28clone: support --project-list= for cgit
grokmirror supports it, and we also support cgit, so this should make running mirrors easier. This will be useful for scripting purposes, too.
2022-11-28clone: support --keep-going/-k like make(1)
This can be useful for intermittent network errors, and the required code changes makes it less dependent on global state.
2022-11-28clone|fetch: support passing --prune(-tags) to `git fetch'
We need to be able to get rid of removed branches and tags on the remote. --prune-tags is implied for non-objstore repos, and incompatible with objstore repos.
2022-11-28clone: canonicalize destination path from CLI
We'll probably save the destination path somewhere, so ensure the path doesn't have redundant slashes and such
2022-11-28clone: support loading manifest.js.gz from destination
This will allow us to quickly check fingerprints against remotes with a single HTTP(S) request, saving us numerous `git show-refs' invocations.
2022-11-28fetch: use v5.12
Another tiny step towards improved startup performance by avoiding one .pm file.
2022-11-28clone: require `--objstore=' for default location
Allowing just `--objstore' without `=' was confusing, since it could eat one of the required parameters (URL or DESTINATION).
2022-11-28clone: use v5.12
Another small step in what will probably a be a decades-long quest to reduce startup time by a few milliseconds.
2022-11-28clone: drop unnecessary requires
These packages are all require-ed elsewhere.
2022-11-28clone: move --dry-run handling to lei_mirror
lei will probably support dry-run in more places, too.
2022-11-28lei_mirror: support --objstore and forkgroups
The {forkgroup} directive of grokmirror 2.x manifest.js.gz can facilitate more space savings and improved pack performance with pack.islands.
2022-11-28clone: support --inbox-version
This is part of `lei add-external --mirror', and it makes sense to have for development and testing. We'll also add a fallback in case somebody tries --inbox-version and fails due to a newer remote instances of public-inbox.
2022-11-28clone: support --inbox-config option
This allows avoiding 404s when trying _/text/config/raw on code repositories.
2022-11-28clone: support --dry-run / -n flag
It still makes HTTP(S) requests to retrieve the manifest or scrape HTML, but doesn't make permanent changes to the FS (aside from modifying {acm}time of ${TMPDIR-/tmp}).
2022-11-28clone: parallelize v2 epoch clones
This is a first step in supporting completely parallelized clones. Eventually, everything will be parallelized and dependencies will be managed via callbacks.
2022-11-28clone: support --include and --exclude with multi-clone
These will be handy when someone is interested in a subset of inboxes on a large hosting site.
2022-10-24treewide: replace /^I: / prefix with /^# /
This is like more familiar to readers of TAP (Test Anywhere Protocol) output, as well as shell and Perl scripters which also use `#' for comments. AFAIK, nobody is parsing our stderr, and I'm not sure how standardized the `I:' prefix is (nor `W:' and `E:' are). It's already the prevailing style in Lei* code, too, so things have been moving in that direction for a bit.
2022-07-20public-inbox-pop3d - a mostly read-only POP3 server
Old account expiry has not been implemented, but it seems to work well with both mpop(1) and getmail(1). The strictness of mpop was particularly helpful in ironing out bugs in our implementation of (dreaded) message sequence numbers. "EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE" commands, but that's untested by real-world clients. mpop supports PIPELINING which is effective in hiding latency, and the core networking functionality is already well-tested from our NNTP and IMAP implementations. Configuration requires "publicinbox.pop3state" to point to a directory writable by the otherwise read-only daemon. See public-inbox-pop3d(1) manpage for more usage details.
2022-05-05public-inbox-netd: a multi-protocol superserver
Since we'll be adding POP3 support as our 4th network protocol; asking admins to run yet another daemon on top of existing -httpd, -nntpd, -imapd is a maintenance burden and a waste of memory. The goal of public-inbox-netd is to be able to replace all existing read-only daemons with a single process to save memory and reduce administrative overhead; hopefully encouraging more users to self-host their own mirrors. It's barely-tested at the moment. Eventually, multiple PI_CONFIG and HOME directories will be supported, as are per-listener .psgi config files.
2022-04-26lei: move to v5.12 to avoid "use strict"
Socket.pm still loads strict.pm, unfortunately, which hurts startup time; but we'll save some LoC this way.
2022-03-23syscall: implement sendmsg+recvmsg in pure Perl
Socket::MsgHdr is only packaged for Debian and derivatives at the moment, and Inline::C pulling in gcc/clang is a huge amount of disk space and bandwidth for some users. This enables disk space and/or bandwidth-limited users to use lei. Only Linux guarantees a stable ABI and syscall numbers, but that's the majority of our userbase. FreeBSD users will still have to use Inline::C (or get Socket::MsgHdr packaged). x86, x32, and x86-64 are all currently supported, more to be added.
2022-03-08index|extindex: support --dangerous flag
This enables Xapian::DB_DANGEROUS to support in-place updates. This can speed up the initial index and reduce I/O at the cost of preventing concurrent readers and being unsafe in the face of any abnormal terminations. This is more dangerous than --no-fsync. --no-fsync is only unsafe in the event of a power loss or kernel crash; --dangerous is unsafe even on SIGKILL.
2021-11-02init: respect umask when creating description
I noticed a description for a new inbox had st_mode=0600.
2021-10-15lei: use send() perlop for signals
This may save us a small bit of startup time since there's fewer args and opcodes should be smaller.
2021-10-14lei add-external --mirror: respect client umask
While lei is intended for non-public mail and runs umask(077) by default, externals are one area which can safely defer to the user's umask. Instead of sending it unconditionally with every command, only have lei-daemon request it when necessary.
2021-10-13fetch: support --try-remote/-T for alternate remote names
This allows -fetch to work out-of-the-box on using the grokmirror 2.x default of "_grokmirror".
2021-10-09extindex: support --reindex --fast
This mode only checks history for missed/stale messages and doesn't attempt to reindex messages which are already indexed.
2021-10-05index: --reindex w/ --{since,until,before,after}
This lets administrators reindex specific time ranges according to git "approxidate" formats. These arguments are passed directly to underlying git-log(1) invocations and may still reach into old epochs. Since these options rely on git committer dates (which we infer from the most recent Received: header), they are not guaranteed to be strictly tied to git history and it's possible to over/under-reindex some messages. It's probably not a major problem in practice, though; reindexing a few extra messages is generally harmless aside from some extra device wear. Since this currently relies on git-log, these options do not affect -extindex, yet.
2021-10-01ds: simplify signalfd use
Since signalfd is often combined with our event loop, give it a convenient API and reduce the code duplication required to use it. EventLoop is replaced with ::event_loop to allow consistent parameter passing and avoid needlessly passing the package name on stack. We also avoid exporting SFD_NONBLOCK since it's the only flag we support. There's no sense in having the memory overhead of a constant function when it's in cold code.
2021-09-24clone|--mirror: support --epoch=RANGE for partial clones
Partial (v2) clones should be useful addition for users wanting to conserve storage while having fast access to recent messages. Continuing work started in 876e74283ff3 (fetch: ignore non-writable epoch dirs, 2021-09-17), this creates bare, read-only epoch git repos. These git repos have the remotes pre-configured, but does not fetch any objects. The goal is to allow users to set the writable bit on a previously-skipped epoch and start fetching it. Shell completion support may not be necessary given how short the epoch ranges are, here. Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
2021-09-22lei: drop redundant WQ EOF callbacks
Redundant code is noise and therefore confusing :<
2021-09-22script/lei: describe purpose of sleep loop
It looks dumb, but I'm not about to take a runtime penalty to use signalfd|EVFILT_SIGNAL, here, either.
2021-09-21script/lei: handle SIGTSTP and SIGCONT
Sometimes it's useful to pause an expensive query or refresh-mail-sync to do something else. While lei-daemon and lei/store can't be paused since they're shared across clients, per-invocation WQ workers can be paused safely using the unblockable SIGSTOP. While we're at it, drop the ETOOMANYREFS hint since it hasn't been a problem since we drastically reduced FD passing early in development.
2021-09-17script/lei: umask(077) before execve
While my MUA also runs umask(077) unconditionally, not all MUAs do. Additionally, pagers may support writing its buffer to disk, so ensure anything else we spawn has umask(077).
2021-09-15support -C (chdir) for most non-daemon commands
Because make(1), git(1), tar(1) all support -C in this form, as do our newer commands such as lei, public-inbox-{clone,fetch}.
2021-09-15fetch: support --exit-code switch
As noted in the new manpage entry, this is useful for avoiding public-inbox-index invocations when there's nothing to update. We use 127 to match "grok-pull", and also because it doesn't conflict with any of the current curl(1) exit codes.
2021-09-15multi_git: hoist out common epoch/alternates handling
IMHO, this greatly improves code sharing and organization between v2, extindex, and lei/store. Common git-related logic for these is lightly-refactored and easier to reason about. The impetus for this big change was to ensure inboxes created+managed by public-inbox-{clone,fetch} could have alternates and configs setup properly without depending on SQLite (via V2Writable). This change does that while making old code shorter and better factored.
2021-09-12init: set a useful description
"Unnamed repository" for v1 inboxes was misleading, and having a non-existent description for v2 was equally annoying, so set a short description based on the primary address. We remove descriptions when setting up new test inboxes to preserve the behavior of the t/lei-mirror.t test case.
2021-09-12new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is complex since multiple commands are required to clone and fetch into epochs. Unlike grokmirror, these commands do not require any configuration. Instead, they rely on existing git config files and work like "git clone --mirror" and "git fetch", respectively. Like grokmirror, they use manifest.js.gz, but only on a per-inbox basis so users won't have to clone every inbox of a large instance nor edit config files to include/exclude inboxes they're interested in.
2021-09-11lei: fix handling of broken lei.saved-search config files
lei shouldn't become unusable if a config file is invalid. Instead, show the "git config" stderr and attempt to continue gracefully. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-08httpd: set psgi.url_scheme to 'https' for TLS listeners
For users using the native TLS functionality of -httpd (instead of using nginx + Plack::Middleware::ReverseProxy), psgi.url_scheme=http was wrong and would lead to improper redirects.
2021-08-04extindex: fix boost with partial runs
Boost relies on knowledge of all inboxes in a given config file to work properly. So while we support indexing a subset of inboxes, we must still account for boost in inboxes we're not indexing. So split internal inbox groups into "known" and "active", where previously we only cared for inboxes which were being actively indexed. Furthermore, boost checks need to be applied when a message arrives in different inboxes across multiple invocations. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-07-28lei: die on ECONNRESET
ECONNRESET should be rare on a private local socket, and if we hit it, it's because we're hitting the listen() limit.
2021-07-28treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining the difference between the command-line and internal hash is not worth it.
2021-07-25init: support git <2.30 for "-c KEY=VALUE" args
It turns out `--fixed-value' is a relatively new git-config(1) feature in git 2.30+ (December 2020). So use the quotemeta perlop for now since it seems compatible-enough for POSIX ERE used by git.
2021-07-25extindex: support --dedupe[=MSGID]
Sometimes I just want to dedupe a single Message-ID to test something, and this lets me do it. This patch appears to do what its supposed to. But it also appears to be finding duplicates that were previously missed. That's a good thing, but I wish I understood what seems to be fixed :x I'm not sure why the previous ExtSearchIdx.pm (blob 357312b8) was causing messages to be missed, even, and why this patch seems to fix it... And it's not infinite looping, either. Anyways, before this patch, "-extindex --dedupe" was taking ~5 min to no-op every message (after the initial full --dedupe run which took over a day to run). No-op --dedupes now take just under 2 hours to scan every single cross-posted message for a no-op dedupe. The initial dedupe took nearly 44 hours on my system for <https://yhbt.net/lore/all/> due to SATA-2 TLC SSD latency on 3 gigantic Xapian shards. Running --dedupe with this change seems to prevent /BUG\?.*?not deduplicated properly/ stderr messages from being triggered by View.pm. Current versions of -extindex do not seem susceptible to introducing duplicates.