about summary refs log tree commit homepage
DateCommit message (Collapse)
2017-02-10repo: search index no longer indexes for --contains
It's extraordinarily expensive to add these terms for each and every commit.
2017-02-09repo: increase search index flush granularity
We need to flush Xapian more frequently to account for gigantic commits which introduce lots of text, so do it when accounting for each line processed, and not for each commit processed.
2017-02-09repobrowse: shorten internal names
We'll still be keeping "repobrowse" for the public API for use with .psgi files, but shortening the name means less typing and we may have command-line tools, too.
2017-02-09Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: config: do not slurp lines into memory TODO: several updates search: schema version bump for empty References/In-Reply-To Revert "searchidx: reindex clobbers old thread IDs" searchidx: reindex clobbers old thread IDs searchidx: deal with empty In-Reply-To and References headers searchview: increase limit for displaying search results searchview: clarify numeric summary at bottom add filter for Subject: tags watchmaildir: allow arguments for filters watchmaildir: limit live importer processes learn: implement "rm" only functionality mime: avoid SUPER usage in Email::MIME subclass inbox: reinstate periodic cleanup of Xapian and SQLite objects introduce PublicInbox::MIME wrapper class
2017-02-09repobrowse: avoid slurping lines
"foreach (<$fh>)" in Perl requests lines in array context, so use "while" instead for lazy reading. This follows ba4c50c20b95679580beba1ef290a4281d5285b7 in master ("config: do not slurp lines into memory")
2017-02-09config: do not slurp lines into memory
There's no need to hold everything in memory, here, since apparently "foreach" will read everything at once in array context (for some reason, I thought Perl5 was smart enough to avoid creating a temporary array, here...)
2017-02-08repobrowse: start wiring up git search
Much more work on this will be needed, but at least explicit flush points prevents OOMs on my system.
2017-02-07TODO: several updates
Always plenty to do while working on this...
2017-02-07search: hoist out git directory search index helper
We will be reusing this for indexing normal (code) repositories using git and Xapian, too.
2017-02-06search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0 ("searchidx: deal with empty In-Reply-To and References headers") so we must rebuild the index in parallel to fix it.
2017-02-06Revert "searchidx: reindex clobbers old thread IDs"
Oops, that's broken, too. I guess the only way to reindex after fixing the thread detection is to start from scratch. This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
2017-02-06searchidx: reindex clobbers old thread IDs
We cannot always reuse thread IDs since our threading logic may change as bugs are fixed.
2017-02-06searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values. Do not let empty values throw off our search indexer to tie threads together, as it can make non-sensical threads grouped to a Message-Id of "" (empty string). See <https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw> for an example of such a message. Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de> <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
2017-02-06searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing, the main page for every inbox already loads 200 results; and thread page views even load 1000! Increase this to 200 for now.
2017-02-06searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is given to it, so make clear it is an estimate to avoid showing non-sensical ranges when no results are returned.
2017-02-05repobrowse: git tag listing is now async
I'm unsure if this is even a good idea to support, but we have it, for now.
2017-01-26repobrowse/git/atom: remove unused subroutine
We never ended up using it.
2017-01-26repobrowse: simplify command generation for git commands
This shortens the code quite a bit at a negligible performance cost, and the diffstat agrees.
2017-01-26add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which discourages readers from doing proper mail organization on the client side. They also waste precious screen space and attention span. Remove them from our archives to reduce clutter.
2017-01-26watchmaildir: allow arguments for filters
We'll want to allow some degree of configuration for various mailing lists.
2017-01-22repobrowse: git summary view uses psgi_qx
This reduces one synchronous dependency from the hot path, and psgi_return will be used in the future.
2017-01-22t/httpd-unix: better diagnostics and comments for test
I've hit random test failures on this, so attempt to improve diagnostics and improve documentation for this test.
2017-01-21repobrowse: preserve newlines in Atom feed
Commit messages are assumed to be displayed in a terminal with a fixed width font, so we must preserve newlines and all whitespace as-is so ASCII art may be displayed properly.
2017-01-21repobrowse: simplify git log parsing implementation
Based on what was done for the Atom feed, this will allow us to simplify state management through metaprogramming and avoid placeholder characters ('D' for decoration) for empty fields.
2017-01-21repobrowse: fix full URL generation in Atom feed
We must not drop the leading slash in the URI. This regression was introduced when we dropped Plack::Request dependency.
2017-01-21repobrowse: avoid extra hash assignments for Atom feed
This should make the code somewhat easier-to-follow.
2017-01-21repobrowse: git Atom feed uses Qspawn->psgi_return
This allows us to wait on "git log" output in a non-blocking manner while being able to throttle on backpressure from slow clients when used with pi-httpd.
2017-01-21repobrowse: git Atom feed uses Qspawn->psgi_qx
This allows pi-httpd to service other I/O while we wait on "git symbolic-ref" to run. And psgi_return will be used in the next commit...
2017-01-21qspawn: better annotate where $qx_cb is called
Hopefully this makes the code easier-to-follow for random readers. This requires a small amount of modification to our one caller, but this is a new, unstable API (as is nearly all of our code).
2017-01-19watchmaildir: limit live importer processes
We don't want to be triggering OOM or swapping on weaker systems when we have dozens of inboxes as potential targets.
2017-01-19learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a way to remove mis-imported multipart messages so public-inbox-watch could pick them up again from my Maildir.
2017-01-18mime: avoid SUPER usage in Email::MIME subclass
We must call Email::Simple methods directly in our monkey patch for Email::MIME to call the intended method. Using SUPER in our subclass would instead hit a different, unintended method in Email::MIME. Reported-by: Junio C Hamano <gitster@pobox.com> <xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
2017-01-18repobrowse: expath is always defined
Remove an outdated comment while we're at it, too.
2017-01-18http: cast a wider net to prevent circular references
We can more effectly nuke circular references by clearing the entire PSGI $env, not just particular keys, when there are self-referential fields such as "qspawn.response" in our environment.
2017-01-18repobrowse: git snapshot waits for all commands asynchronously
This new asynchronous API, psgi_qx, will allow us to take advantage of non-blocking I/O from even small commands; as those may still need to wait for slow operations.
2017-01-17qspawn: better description
We'll probably use this in a lot of places...
2017-01-17repobrowse: verbose git tree display uses qspawn for ls-tree
For now, qspawn provides resource management for dealing with expensive "git ls-tree" processes.
2017-01-15repobrowse: use qspawn for plain tree views
We may eventually handle tree parsing ourselves (since we already git cat-file), but for now we can rely on ls-tree to give good output and qspawn to manage resource allocation.
2017-01-15repobrowse: git: drop unused diff parsing routines
We don't need these legacy routines anymore and use the newer stream-friendly _sed interface.
2017-01-13httpd/async: stop running command if client disconnects
If an HTTP client disconnects while we're piping the output of a process to them, break the pipe of the process to reclaim resources as soon as possible.
2017-01-13repobrowse: simplify conditional for cat-file input
expath is always defined, even to an empty string, so simplify the conditional for checking it.
2017-01-13rename "GitAsyncRd" to "GitAsync"
This wrapper class actually does both reading and writing, and a shorter name is nicer.
2017-01-13gitasyncrd: pass a reference to Danga::Socket::write
D::S creates a reference for this, anyways, so avoid the extra work by doing it ourselves.
2017-01-13repobrowse: comment describing Git wrapper creation
Metaprogramming can be difficult-to-read after several months, so leave comments in place to describe common usage results of.
2017-01-13repobrowse: port git log view to qspawn streaming interface
This will prevent too many processes from being spawned at once while also allowing us to respond to backpressure from slow clients.
2017-01-11inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the Xapian database does not always give the latest results. This time, we'll do it without relying on weak references, and instead check refcounts.
2017-01-11repobrowse: make git diff output use qspawn
This is a potentially expensive operation, so we may want to give it it's own limiter channel.
2017-01-11diff: note the dangers of gigantic anchors hash
2017-01-11async: improve and fix out-of-date comments
2017-01-11repobrowse: qspawn + streaming for git commit display
This prevents "git show" processes from monopolizing the system and allows us to better handle backpressure from gigantic commits.