Date | Commit message (Collapse) |
|
This should help us deal with MH sequence number packing and
invalidating mail_sync.sqlite3.
|
|
The MH format is widely-supported and used by various MUAs such
as mutt and sylpheed, and a MH-like format is used by mlmmj for
archives, as well. Locking implementations for writes are
inconsistent, so this commit doesn't support writes, yet.
inotify|EVFILT_VNODE watches aren't supported, yet, but that'll
have to come since MH allows packing unused integers and
renaming files.
|
|
The IO package seems like a better home for I/O subs than the
Git package. We lose the 60 second read timeout for `git
cat-file --batch-*' processes since it's probably not necessary
given how reliable the code has proven and things would fall
over hard in other ways if the storage device were completely
hosed.
|
|
`readline' ops may not detect errors on partial reads.
This saves us some code to reduce cognitive overhead for
readers. We'll also support reusing a destination buffers so it
can work more nicely with existing code.
|
|
This will make it easier to switch in the far future while
making callers easier-to-read (and more callers will be added).
Anyways, Perl 5.26 is a long time away for enterprise users;
but isolating compatibility code away can improve readability
of code we actually care about in the meantime.
|
|
I'm not sure how combining SHA-1 and SHA-256 in a single git
repo will work, eventually. But this is an obvious place to do
the right thing if we ever see a 64-byte hex string (unless git
adds support for another hash which uses 64-byte hex string
representations, which would break many assumptions elsewhere,
too...).
|
|
I'm not sure how it happens or if/when it was fixed, but my
earliest lei installations have hit some
"E: fid=$fid for $oidhex unknown" messages on `lei import'
invocations.
This really should've enabled the foreign keys pragma to begin
with; but we'll probably start using that in the future. For
now, at least rely on a transaction to keep things consistent
in SQLite.
|
|
This avoids repeated work for incremental "lei import" runs when
users upgrade from 1.7 to current public-inbox.git (and eventually
1.8).
We need the explicit bind_param for fallback calls because
previous bind_param calls are "sticky" for a given statement
handle. The DBI(3pm) manpage states:
The data type is 'sticky' in that bind values passed to execute()
are bound with the data type specified by earlier bind_param()
calls, if any. Portable applications should not rely on being
able to change the data type after the first "bind_param" call.
|
|
This will make transparently upgrading from 1.7.0 -> 1.8.x
easier. Only a single user has access to mail_sync.sqlite3,
and R/W at the kernel-level is required for WAL, anyways.
|
|
DBD::SQLite doesn't seem to use SQL_BLOB automatically, which
can lead to ambiguity in some cases (especially interoperating
with other tools).
Downgrading to lei 1.7.0 will cause problems, but upgrading
appears transparent after weeks of tests.
|
|
Apparently leaving {sqlite_unicode} unset isn't enough, and
there's subtle differences where BLOBs are stored differently
than TEXT when dealing with binary data. We also want to avoid
odd cases where SQLite will attempt to treat a number-like value
as an integer.
This should avoid problems in case non-UTF-8 URLs and pathnames are
used. They'll automatically be upgraded if not, but downgrades
to older lei would cause duplicates to appear.
|
|
btrfs is Linux-only at the moment (and likely to remain that way
for practical purposes). So rely on Linux ABI stability and use
the `syscall' and `ioctl' perlops rather than relying on Inline::C.
Inline::C (and gcc||clang) are monstrous dependencies which we
can't expect users to have.
This makes supporting new architectures more difficult, but new
architectures come along rarely and this reduces the burden for
the majority of Linux users on popular architectures (while
still avoiding the distribution of pre-built binaries).
Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
|
|
We need a transaction across two SQL statements so readers
(which don't use flock) will see the result as atomic.
This may help against some occasional test failures I'm seeing
from t/lei-auto-watch.t and t/lei-watch.t, or make the problem
more apparent.
|
|
It could prove useful for diagnosing bugs (either on our
end or an MUA's), or storage device failures.
|
|
warn() is easier to augment with context information, and
frankly unavoidable in the presence of 3rd-party libraries
we don't control.
|
|
This covers v1 inboxes, as well. We also guard the execution
since "PRAGMA optimize" was only introduced in SQLite 3.18.0
(2017-03-30)
|
|
As recommended by SQLite documentation[1]:
To achieve the best long-term query performance without the need
to do a detailed engineering analysis of the application schema
and SQL, it is recommended that applications run "PRAGMA optimize"
(with no arguments) just before closing each database connection.
Hopefully that works for our use cases and can make things
faster for us.
[1] https://www.sqlite.org/pragma.html#pragma_optimize
|
|
"lei export-kw" no longer completes for anonymous sources.
More commands use "lei refresh-mail-sync" as a basis for their
completion work, as well.
";AUTH=ANONYMOUS@" is stripped from completions since it was
preventing bash completion from working on AUTH=ANONYMOUS IMAP
URLs. I'm not sure if there's a better way, but all of our code
works fine without specifying AUTH=ANONYMOUS as a command-line
arg.
Finally, we fallback to using more candidates if none can
be found, allowing multiple URLs to be completed.
|
|
NNTP URLs are probably more prevalent in public message archives
than IMAP URLs.
|
|
We can set opt->{quiet} for (internal) 'note-event' command
to quiet ->qerr, since we use ->qerr everywhere else. And
we'll just die() instead of setting a ->{fail} message, since
eval + die are more inline with the rest of our Perl code.
|
|
NNTP servers, IMAP servers, and various MUAs may recycle
"unique" identifiers due to software bugs or careless BOFHs.
Warn about them, but always be prepared to account for them.
|
|
No reason not to support them, since there's more
public-inbox-nntpd instances than -imapd instances,
currently.
|
|
As with other SQLite3 databases, copy-on-write with
files experiencing random writes leads to write amplification
and low performance.
|
|
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"),
relying on lei/store to serialize access was a pointless endeavor.
Rely on flock(2) to serialize multiple writers since (in my
experience) it's the easiest way to deal with parallel writers
when using SQLite. This allows us to simplify existing callers
while speeding up 'lei refresh-mail-sync --all=local' by 5% or
so.
|
|
This can cause readers and writers to conflict since the
implicit transaction from SELECT in a LeiRefreshMailSync
worker would block the LeiStore process.
|
|
For lei-index to work in parallel with MUA access and upcoming
inotify-based updates, mail_sync.sqlite3 needs to always be
up-to-date to read-only worker processes (ahead of everything
else). So rely on the default auto-commit behavior and hope
SQLite WAL can reduce some of the overheads involved with
writes.
|
|
Another step towards moving more of our internals to use binary
OIDs to avoid needless conversions before hitting disk.
|
|
We need to account for past canonicalization errors and deal
with cases which violate uniqueness constraints in
mail_sync.sqlite3
|
|
No need to loop when we can rely on grep.
|
|
This still needs tests, but I noticed "--all" w/o "local" or
"remote" was not working correctly since split() returned
an empty array.
|
|
No need to bump refcounts of {dbh} nor declare extra variables
for a rarely-called function.
|
|
We can afford to be liberal in what messages we accept
internally, since LeiToMail uses a trailing slash internally.
|
|
This allows MUA-made flag changes to Maildirs to be instantly
read and acknowledged for future search results.
In the future, it may be used to speed up --augment and
--import-before (the default) with with "lei q".
|
|
Inotify updates may simultaneously remove or update the location
of a message, so ensure we at least have knowledge of the new
location if the old one cannot be updated.
|
|
Favor oidbin use internally to reduce internal memory traffic.
|
|
We'll be reusing it in other commands, too.
|
|
Sharing lms->{dbh} with eidx shards appears to be the cause of
the "Issuing rollback() due to DESTROY without explicit
disconnect() of DBD::SQLite::db handle" messages I've been
seeing from "lei up".
|
|
On a 4-core CPU, this speeds up "lei import" on a largish
Maildir inbox with 75K messages from ~8 minutes down to ~40s.
Parallelizing alone did not bring any improvement and may
even hurt performance slightly, depending on CPU availability.
However, creating the index on the "fid" and "name" columns in
blob2name yields us the same speedup we got.
Parallelizing IMAP makes more sense due to the fact most IMAP
stores are non-local and subject to network latency.
Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
|
|
On a 4-core CPU, this speeds up "lei import" on a largish IMAP
inbox with 75K messages from ~21 minutes down to 40s.
Parallelizing with the new LeiImportKw WQ worker class gives a
near-linear speedup and brought the runtime down to ~5:40.
The new idx_fid_uid index on the "fid" and "uid" columns of
blob2num in mail_sync.sqlite3 brought us the final speedup.
An additional index on over.sqlite3#xref3(oidbin) did not help,
since idx_nntp already exists and speeds up the new ->oidbin_exists
internal API.
I initially experimented with a separate "lei import-kw" command
but decided against it since it's useless outside of IMAP+JMAP
and would require extra cognitive overhead for both users and
hackers. So LeiImportKw is just a WQ worker used by "lei import"
and not its own user-visible command.
v2: fix ikw_done_wait arg handling (ugh, confusing API :x)
|
|
I'm not actually sure if I hit an uncommitted transaction just
now, it doesn't seem like it.
|
|
This makes "lei import" behavior with IMAP folders more
consistent with that with Maildir.
Opening IMAP folders read-write with "SELECT" (instead of
read-only with "EXAMINE") was necessary, since it lets an IMAP
server communicate to us as to whether or not it's worth
refetching IMAP flags of previously imported messages.
Fetching UID+FLAGS only is one of the fastest IMAP operations
with dovecot, our -imapd and presumably other common IMAP servers.
It is issued by common MUAs such as mutt after every SELECT.
Users may now rely on "lei import" exclusively to merge mail and
keywords into lei/store, and "lei export-kw" to propagate
keyword changes back to IMAP servers.
A sticks-and-stones workflow for personal mailboxes is currently:
lei import imaps://$MY_PERSONAL_INBOX
lei q --mua=$MUA -o /tmp/results SEARCH TERMS...
# do stuff from within $MUA to /tmp/results
lei import /tmp/results # read keyword changes from MUA
lei export-kw imaps://$MY_PERSONAL_INBOX
# repeat when new stuff shows up in personal inbox
The next goal is to automate repeated imports + export-kw
commands with with inotify and IMAP IDLE.
|
|
lcat can now dump the memoized contents of entire IMAP folders,
not just a single UID. It's now parallelized and pipelined for
multiple lei2mail workers.
Furthemore, various forms of JSON output work consistently
with blob-only output, now.
While working on this, I noticed NetReader was passing UID URLs
to imap_each callbacks, which was causing mail_sync.sqlite3 to
store UIDs in `folders' and clearly wrong so it's now fixed.
|
|
"lei import" can now import a single IMAP message via
<imaps://example.com/MAILBOX/;UID=$UID>
Likewise, "lei inspect" can show the blob information for UID
URLs and "lei lcat" can display the blob without network access
if imported.
"lei lcat" also gets rid of some unused code and supports
"blob:$OIDHEX" syntax as described in the comments (and used by
our "text" output format).
v2: enforce UID in URL, fail without
v3: fix error reporting (s/fail/child_error/)
|
|
I'm not 100% sure why, but "lei up" seems to cause uncommitted
transaction errors. LeiToMail calls sto->set_sync_info, but
LeiXSearch should call sto->done and lms_commit, so I'm not
sure where the uncommited transaction is coming from...
|
|
Sometimes a user stops caring to sync an IMAP or Maildir
folder, or wants to force a resync. Let them run this
command to have lei forget all the sync information about
the mail folder.
This won't delete any stored messages in git, but will
leave "lei index" users with dangling references.
|
|
This lets us have a more consistent UX for mapping
easily-typed command-line arguments to canonical
folder locations.
|
|
It's inappropriate to store sync information without
UIDVALIDITY, so add an assertion to prevent it.
|
|
Move match_imap_url into LeiMailSync so it can be used in more
places, such as "lei inspect". Upcoming commands such as
"lei forget-mail-sync" and {add,forget,pause,resume}-watch will
also support relaxed IMAP matching rules since there's
no reasonable way to expect users use ";UIDVALIDITY=" on the
command-line.
|
|
IMAP will eventually be supported.
|
|
mbsync and offlineimap both use ":2," suffixes for filenames in
"new/", however my interpretation of the Maildir spec at
<https://cr.yp.to/proto/maildir.html> is that ":2," is only for
files in "cur/". My interpretation also matches that of
doveecot, but we'll allow what mbsync and offlineimap do given
their popularity.
|