Date | Commit message (Collapse) |
|
The content dedupe logic was originally designed for v2 public
inboxes as a fallback for when the importer sees identical
Message-IDs. Thus it did not account for Message-ID(s) in
the message itself.
This change doesn't affect saved searches (the default when
writing to a pathname or IMAP). It affects --no-save, and
outputs to stdout (even if stdout is redirected to a file).
Prior to this change, lei reused the v2 logic as-is without
accounting for Message-IDs anywhere with `--dedupe=content'
(the default). This could cause messages to be skipped when
the content matches despite Message-IDs being different.
So with this change, `lei q --dedupe=content' will hash the
Message-ID(s) in the message to ensure messages with different
Message-IDs are NOT deduplicated.
Whether or not this change is a bug fix or introduces regression
is actually debatable. In my mind, it is better to err on the
side of showing too many messages rather than too few, even if
the actual contents of the message are identical. Making saved
searches deduplicate without accounting for Message-IDs would be
more difficult, too.
|
|
We can just use the sha256() sub instead of dealing with the
OO interface for a small string.
|
|
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as
the Digest::SHA implementation from Perl, most likely due to an
optimized assembly implementation. SHA-1 is a few percent
faster, too.
|
|
SQLite COUNT() is a slow operation that does a full table scan
with no conditions. There's no need for it, since lei dedupe
only needs to know if it's empty or not to decide between
new/ and cur/ for Maildir outputs.
|
|
Despite JMAP not supporting the equivalent of the IMAP \Recent
flag, it is useful for "lei q --augment", and "lei up" users to
be able to distinguish new results from old-but-unread messages
in an mbox or Maildir.
For mbox family messages, we'll drop the "O" status flag when
appending to mboxes, and we'll write to the "new" subdirectory
of Maildirs.
Behavior when writing to initially empty Maildirs and mboxes
remains unchanged since there's no need to distinguish between
new and old results in the initial case. Having users wait
for a rename(2) storm or complete mbox rewrite hurts UX.
With IMAP mailboxes, \Recent is already enforced by the IMAP
server and IMAP clients have no way of changing it(*)
(*) mutt uses the "Old" IMAP flag which isn't part of RFC 3501,
other MUAs may do similar things.
|
|
LeiSavedSearch will use a LeiDedupe-like internal API,
so we won't have to make as many changes to callsites
between saved and unsaved searches.
|
|
This will let us tie keywords from remote externals
to those which only exist in local externals.
|
|
This will make testing IMAP support for other commands easier, as
it doesn't write to lei/store at all. Like the pager and MUA,
"git credential" is always spawned by script/lei (and not
lei-daemon) so it has a controlling terminal for password
prompts.
v2: fix missing requires, correct test ordering
v3: ensure config exists for IMAP auth
|
|
The features we use for SharedKV could probably be implemented
with GDBM_File or SDBM_File, but that doesn't seem worth it at
the moment since we depend on SQLite elsewhere.
|
|
It may be possible for updates or changes to be uncommitted
until disconnect, so we'll use flock() as we do elsewhere
to avoid the polling retry behavior of SQLite.
We also need to clear CachedKids before disconnecting to
to avoid warnings like:
->disconnect invalidates 1 active statement handle
(either destroy statement handles or call finish on
them before disconnecting)
|
|
While it's loaded by ContentHash, we use Digest::SHA directly in
this package for smsg and OID-only deduplication.
|
|
This fixes "--dedupe none" with Maildir where we don't
create the object at all.
|
|
All the augment and deduplication stuff seems to be working
based on unit tests. OpPipe is a nice general addition that
will probably make future state machines easier.
|
|
We'll be passing these objects via PublicInbox::IPC which uses
Storable (or Sereal), so ensure they're safe to use after
serialization.
|
|
We don't want duplicate messages in results overviews, either.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Maildir should be plenty fine for short-lived output folders.
|
|
For writing mboxes and Maildirs, users may wish to use
stricter or looser deduplication strategies. This
gives them more control.
|