Date | Commit message (Collapse) |
|
It's extraordinarily expensive to add these terms for
each and every commit.
|
|
We need to flush Xapian more frequently to account for
gigantic commits which introduce lots of text, so do
it when accounting for each line processed, and not
for each commit processed.
|
|
We'll still be keeping "repobrowse" for the public API
for use with .psgi files, but shortening the name means
less typing and we may have command-line tools, too.
|
|
* origin/master:
config: do not slurp lines into memory
TODO: several updates
search: schema version bump for empty References/In-Reply-To
Revert "searchidx: reindex clobbers old thread IDs"
searchidx: reindex clobbers old thread IDs
searchidx: deal with empty In-Reply-To and References headers
searchview: increase limit for displaying search results
searchview: clarify numeric summary at bottom
add filter for Subject: tags
watchmaildir: allow arguments for filters
watchmaildir: limit live importer processes
learn: implement "rm" only functionality
mime: avoid SUPER usage in Email::MIME subclass
inbox: reinstate periodic cleanup of Xapian and SQLite objects
introduce PublicInbox::MIME wrapper class
|
|
"foreach (<$fh>)" in Perl requests lines in array
context, so use "while" instead for lazy reading.
This follows ba4c50c20b95679580beba1ef290a4281d5285b7
in master ("config: do not slurp lines into memory")
|
|
There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context
(for some reason, I thought Perl5 was smart enough
to avoid creating a temporary array, here...)
|
|
Much more work on this will be needed, but at least explicit
flush points prevents OOMs on my system.
|
|
Always plenty to do while working on this...
|
|
We will be reusing this for indexing normal (code) repositories
using git and Xapian, too.
|
|
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
|
|
Oops, that's broken, too. I guess the only way to reindex
after fixing the thread detection is to start from scratch.
This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
|
|
We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.
|
|
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
|
|
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
|
|
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
|
|
I'm unsure if this is even a good idea to support,
but we have it, for now.
|
|
We never ended up using it.
|
|
This shortens the code quite a bit at a negligible performance cost,
and the diffstat agrees.
|
|
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Remove them from our archives to reduce clutter.
|
|
We'll want to allow some degree of configuration for
various mailing lists.
|
|
This reduces one synchronous dependency from the hot path,
and psgi_return will be used in the future.
|
|
I've hit random test failures on this, so attempt to improve
diagnostics and improve documentation for this test.
|
|
Commit messages are assumed to be displayed in a terminal
with a fixed width font, so we must preserve newlines and
all whitespace as-is so ASCII art may be displayed properly.
|
|
Based on what was done for the Atom feed, this will allow us to
simplify state management through metaprogramming and avoid
placeholder characters ('D' for decoration) for empty fields.
|
|
We must not drop the leading slash in the URI. This
regression was introduced when we dropped Plack::Request
dependency.
|
|
This should make the code somewhat easier-to-follow.
|
|
This allows us to wait on "git log" output in a non-blocking manner
while being able to throttle on backpressure from slow clients
when used with pi-httpd.
|
|
This allows pi-httpd to service other I/O while we wait on "git
symbolic-ref" to run. And psgi_return will be used in the next
commit...
|
|
Hopefully this makes the code easier-to-follow for random
readers. This requires a small amount of modification to
our one caller, but this is a new, unstable API (as is
nearly all of our code).
|
|
We don't want to be triggering OOM or swapping on weaker
systems when we have dozens of inboxes as potential targets.
|
|
Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.
|
|
We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method. Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.
Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
|
|
Remove an outdated comment while we're at it, too.
|
|
We can more effectly nuke circular references by clearing
the entire PSGI $env, not just particular keys, when
there are self-referential fields such as "qspawn.response"
in our environment.
|
|
This new asynchronous API, psgi_qx, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.
|
|
We'll probably use this in a lot of places...
|
|
For now, qspawn provides resource management for dealing with
expensive "git ls-tree" processes.
|
|
We may eventually handle tree parsing ourselves (since we
already git cat-file), but for now we can rely on ls-tree
to give good output and qspawn to manage resource allocation.
|
|
We don't need these legacy routines anymore and use the
newer stream-friendly _sed interface.
|
|
If an HTTP client disconnects while we're piping the output of a
process to them, break the pipe of the process to reclaim
resources as soon as possible.
|
|
expath is always defined, even to an empty string,
so simplify the conditional for checking it.
|
|
This wrapper class actually does both reading and
writing, and a shorter name is nicer.
|
|
D::S creates a reference for this, anyways, so avoid
the extra work by doing it ourselves.
|
|
Metaprogramming can be difficult-to-read after several
months, so leave comments in place to describe common
usage results of.
|
|
This will prevent too many processes from being spawned at once
while also allowing us to respond to backpressure from slow
clients.
|
|
We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.
|
|
This is a potentially expensive operation, so we may want to
give it it's own limiter channel.
|
|
|
|
|
|
This prevents "git show" processes from monopolizing
the system and allows us to better handle backpressure
from gigantic commits.
|