diff options
Diffstat (limited to 'Documentation/public-inbox-extindex.pod')
-rw-r--r-- | Documentation/public-inbox-extindex.pod | 51 |
1 files changed, 40 insertions, 11 deletions
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod index f71a90e5..b53e45ed 100644 --- a/Documentation/public-inbox-extindex.pod +++ b/Documentation/public-inbox-extindex.pod @@ -13,7 +13,7 @@ public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all public-inbox-extindex creates and updates an external search and overview database used by the read-only public-inbox PSGI (HTTP), NNTP, and IMAP interfaces. This requires either the -L<Search::Xapian> XS bindings OR the L<Xapian> SWIG bindings, +L<Xapian> SWIG bindings OR or L<Search::Xapian> XS bindings along with L<DBD::SQLite> and L<DBI> Perl modules. =head1 OPTIONS @@ -47,11 +47,26 @@ C<indexlevel> set to C<basic> and their respective Xapian public-inboxes where cross-posting is common, this allows significant space savings on Xapian indices. +=item --dedupe=MSGID + +=item --dedupe + +Rerun deduplication on messages with the given Message-ID or +all messages if no Message-ID is specified. Deduplication rules may +change and evolve over time, especially if filters are involved. + +C<--dedupe=MSGID> may be specified multiple times to deduplicate +multiple Message-IDs. + +Use this if you see C<W: BUG? $MSGID not deduplicated properly> +warnings from WWW logs. + =item --gc Perform garbage collection instead of indexing. Use this if -inboxes are removed from the extindex, or if messages are -purged or removed from some inboxes. +inboxes are removed from the extindex, a newsgroup name is +set or changed, or if messages are purged or removed from +some inboxes. =item --reindex @@ -60,10 +75,6 @@ used for in-place upgrades and bugfixes while read-only server processes are utilizing the index. Keep in mind this roughly doubles the size of the already-large Xapian database. -The extindex locks will be released roughly every 10s to -allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)> -processes to write to the extindex. - =item --fast Used with C<--reindex>, it will only look for new and stale @@ -77,9 +88,9 @@ L<public-inbox-extindex-format(5)> =head1 CONFIGURATION -public-inbox-extindex does not currently write to the -L<public-inbox-config(5)> file, configuration may be entered -manually. The extindex name of C<all> is a special case which +public-inbox-extindex does not write to the L<public-inbox-config(5)> +file, it must be entered manually. +The extindex name of C<all> is a special case which corresponds to indexing C<--all> inboxes. An example for C<--all> is as follows: @@ -89,6 +100,16 @@ C<--all> is as follows: coderepo = foo coderepo = bar +Putting an C<extindex> entry in the config allows L<PublicInbox::WWW>. +You can have any number of C<extentry.$NAME> sections where C<$NAME> +is something other than C<all> to display a union of several inboxes. + +It is strongly recommended any public inboxes indexed by this command +have a stable C<publicinbox.$NAME.newsgroup> entry (regardless of +the presence of an NNTP or IMAP server). Otherwise, public-inbox-extindex +will use C<publicinbox.$NAME.inboxdir> as an internal key which can +cause needless reindexing and require L<--gc> if inboxes are relocated. + See L<public-inbox-config(5)> for more details. =head1 ENVIRONMENT @@ -117,9 +138,17 @@ Default: none, uses C<publicinbox.indexBatchSize> =head1 UPGRADING -Occasionally, public-inbox will update it's schema version and +Occasionally, public-inbox will update its schema version and require a full index by running this command. +=head1 LOCKING + +It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while +other processes are writing to covered inboxes or extindex. +The extindex locks will be released roughly every 10s to +allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)> +processes to write to the extindex. + =head1 CONTACT Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org> |