about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2017-02-24repobrowse: git tree view checks object asynchronously
... when inside public-inbox-httpd. This will allow the server to handle other requests/responses while waiting on "git cat-file --batch-check"
2017-02-24git: move async detection to runtime
We don't actually know what context we'll be called under, so detecting the mere use-ability of Danga::Socket is not sufficient.
2017-02-22repobrowse: eliminate unused query parameters
We will try to reduce the amount of query parameters as much as possible to make URLs more amenable to caching at various levels.
2017-02-22repobrowse: fixup revision handling
Revisions passed in the URL must not be ignored. This fixes some bugs introduced in commit f6244586ba4f5a5e7575e1254be8c9bbe303fce9 ("repobrowse: switch to new URL format to avoid query strings")
2017-02-21repobrowse: stop abbreviating commit hashes
Abbreviations can become ambiguous over time, and it seems other tools are fine with displaying unabbreviated hashes for commits. This should reduce workload for the search engines, too.
2017-02-19repobrowse: unconditionally remove trailing slash handling
We do not need specialized trailing slashes if we break URL compatibility from cgit, here. Removing trailing (and redundant) slashes improves our hit rates with across both server-side (varnish, squid) and client-side (browser) layers.
2017-02-19repobrowse: return git errors as text/plain, for now
For now, this avoids an HTML injection vector. We'll try to have more consistent error reporting in the future.
2017-02-17repobrowse: minor style cleanups
Avoid using '=>' arrow notation for arrays and array references, it is confusing and more verbose. Additionally, combine "use constant" statements when possible.
2017-02-17repobrowse: remove unnecessary import
We do not need to escape URIs in this file.
2017-02-17repobrowse: rename "plain" endpoint to "raw"
This name is shorter and matches terminology in gitweb and other popular git web viewers.
2017-02-16repobrowse: memoize git symbolic-ref resolution
The "HEAD" symbolic ref is rarely changed, so memoize it for now and avoid exposing it in URLs.
2017-02-16repobrowse: shorten "repo_info" to "-repo"
This makes it more consistent with how we use the Inbox objects for the main code.
2017-02-16repo: only read description if git
Other VCSes have other means of providing the description.
2017-02-16repobrowse: switch to new URL format to avoid query strings
Query strings make endpoint caching more difficult since they're order-independent. They are also more likely lost or truncated inadvertantly when copy+pasting, so try to avoid them for default endpoints. There's still some things which are broken and followup commits will be needed to fix them.
2017-02-15config: avoid circular loading dependency
We must lazilly load one of them, so load Inbox later since we need to parse the config, first.
2017-02-14repobrowse: do not unescape PATH_INFO twice
PSGI specs already require PATH_INFO to be unescaped. Followup-to: commit 364de65f8a6b5729027cb70228312a141430122f ("www: do not unescape PATH_INFO twice")
2017-02-14Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: www: do not unescape PATH_INFO twice t/mime: quiet warnings for old versions of Email::Simple handle repeated References and In-Reply-To headers
2017-02-14searchidx: switch to accounting by message bytes
Xapian memory usage is tied to the size of the indexed text, so take the raw message size into account when deciding when to flush Xapian data. More importantly, we now flush Xapian before we have it buffer beyond our maximum; and we do it unconditionally to prevent even high priority processes from OOM-ing.
2017-02-14www: do not unescape PATH_INFO twice
PSGI specs already require PATH_INFO to be unescaped; so our tests were wrong, too.
2017-02-11handle repeated References and In-Reply-To headers
It seems possible for git-send-email(1) to generate repeated repeated instances of References and In-Reply-To headers, as evidenced in: https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw This causes a mismatch between how our search indexer threads and how our HTML view handles threading. In the future, View.pm will use the smsg-parsed {references} field and avoid redoing Email::MIME header parsing. We will still need to figure out a way to deal with messages with repeated Message-IDs, at some point, too.
2017-02-11repo: lazily read description and cloneurl
This improves startup speed at the cost of CoW-friendliness for long-lived daemons (which can be fixed, later).
2017-02-10config: move try_cat function from inbox
This allows RepoConfig to be independent of the PublicInbox::Inbox class.
2017-02-10repo: add class for representing a code repo
This should hopefully allow us to organize our code better
2017-02-10repogit: add prototypes for error checking
And add a note to remove git_commit_title
2017-02-10repo: search index flushes for excessive active refs
For certain repos, having too many active refs will cause memory usage problems. Mitigate the Xapian problems, for now, and consider a switch to GDBM_File or similar for repos with more refs.
2017-02-10search: remove unnecessary abstractions and functionality
This simplifies the code a bit and reduces the translation overhead for looking directly at data from tools shipped with Xapian. While we're at it, fix thread-all.t :)
2017-02-10repo: search index no longer indexes for --contains
It's extraordinarily expensive to add these terms for each and every commit.
2017-02-09repo: increase search index flush granularity
We need to flush Xapian more frequently to account for gigantic commits which introduce lots of text, so do it when accounting for each line processed, and not for each commit processed.
2017-02-09repobrowse: shorten internal names
We'll still be keeping "repobrowse" for the public API for use with .psgi files, but shortening the name means less typing and we may have command-line tools, too.
2017-02-09Merge remote-tracking branch 'origin/master' into repobrowse
* origin/master: config: do not slurp lines into memory TODO: several updates search: schema version bump for empty References/In-Reply-To Revert "searchidx: reindex clobbers old thread IDs" searchidx: reindex clobbers old thread IDs searchidx: deal with empty In-Reply-To and References headers searchview: increase limit for displaying search results searchview: clarify numeric summary at bottom add filter for Subject: tags watchmaildir: allow arguments for filters watchmaildir: limit live importer processes learn: implement "rm" only functionality mime: avoid SUPER usage in Email::MIME subclass inbox: reinstate periodic cleanup of Xapian and SQLite objects introduce PublicInbox::MIME wrapper class
2017-02-09repobrowse: avoid slurping lines
"foreach (<$fh>)" in Perl requests lines in array context, so use "while" instead for lazy reading. This follows ba4c50c20b95679580beba1ef290a4281d5285b7 in master ("config: do not slurp lines into memory")
2017-02-09config: do not slurp lines into memory
There's no need to hold everything in memory, here, since apparently "foreach" will read everything at once in array context (for some reason, I thought Perl5 was smart enough to avoid creating a temporary array, here...)
2017-02-08repobrowse: start wiring up git search
Much more work on this will be needed, but at least explicit flush points prevents OOMs on my system.
2017-02-07search: hoist out git directory search index helper
We will be reusing this for indexing normal (code) repositories using git and Xapian, too.
2017-02-06search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0 ("searchidx: deal with empty In-Reply-To and References headers") so we must rebuild the index in parallel to fix it.
2017-02-06Revert "searchidx: reindex clobbers old thread IDs"
Oops, that's broken, too. I guess the only way to reindex after fixing the thread detection is to start from scratch. This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
2017-02-06searchidx: reindex clobbers old thread IDs
We cannot always reuse thread IDs since our threading logic may change as bugs are fixed.
2017-02-06searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values. Do not let empty values throw off our search indexer to tie threads together, as it can make non-sensical threads grouped to a Message-Id of "" (empty string). See <https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw> for an example of such a message. Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de> <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
2017-02-06searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing, the main page for every inbox already loads 200 results; and thread page views even load 1000! Increase this to 200 for now.
2017-02-06searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is given to it, so make clear it is an estimate to avoid showing non-sensical ranges when no results are returned.
2017-02-05repobrowse: git tag listing is now async
I'm unsure if this is even a good idea to support, but we have it, for now.
2017-01-26repobrowse/git/atom: remove unused subroutine
We never ended up using it.
2017-01-26repobrowse: simplify command generation for git commands
This shortens the code quite a bit at a negligible performance cost, and the diffstat agrees.
2017-01-26add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which discourages readers from doing proper mail organization on the client side. They also waste precious screen space and attention span. Remove them from our archives to reduce clutter.
2017-01-26watchmaildir: allow arguments for filters
We'll want to allow some degree of configuration for various mailing lists.
2017-01-22repobrowse: git summary view uses psgi_qx
This reduces one synchronous dependency from the hot path, and psgi_return will be used in the future.
2017-01-21repobrowse: preserve newlines in Atom feed
Commit messages are assumed to be displayed in a terminal with a fixed width font, so we must preserve newlines and all whitespace as-is so ASCII art may be displayed properly.
2017-01-21repobrowse: simplify git log parsing implementation
Based on what was done for the Atom feed, this will allow us to simplify state management through metaprogramming and avoid placeholder characters ('D' for decoration) for empty fields.
2017-01-21repobrowse: fix full URL generation in Atom feed
We must not drop the leading slash in the URI. This regression was introduced when we dropped Plack::Request dependency.
2017-01-21repobrowse: avoid extra hash assignments for Atom feed
This should make the code somewhat easier-to-follow.