Date | Commit message (Collapse) |
|
... when inside public-inbox-httpd. This will allow
the server to handle other requests/responses while
waiting on "git cat-file --batch-check"
|
|
We don't actually know what context we'll be called under,
so detecting the mere use-ability of Danga::Socket is not
sufficient.
|
|
We will try to reduce the amount of query parameters as
much as possible to make URLs more amenable to caching
at various levels.
|
|
Revisions passed in the URL must not be ignored.
This fixes some bugs introduced in commit
f6244586ba4f5a5e7575e1254be8c9bbe303fce9
("repobrowse: switch to new URL format to avoid query strings")
|
|
Abbreviations can become ambiguous over time, and it seems other
tools are fine with displaying unabbreviated hashes for commits.
This should reduce workload for the search engines, too.
|
|
We do not need specialized trailing slashes if we break URL
compatibility from cgit, here. Removing trailing (and redundant)
slashes improves our hit rates with across both server-side
(varnish, squid) and client-side (browser) layers.
|
|
For now, this avoids an HTML injection vector. We'll try to
have more consistent error reporting in the future.
|
|
Avoid using '=>' arrow notation for arrays and array references,
it is confusing and more verbose. Additionally, combine
"use constant" statements when possible.
|
|
We do not need to escape URIs in this file.
|
|
This name is shorter and matches terminology in gitweb and
other popular git web viewers.
|
|
The "HEAD" symbolic ref is rarely changed, so
memoize it for now and avoid exposing it in URLs.
|
|
This makes it more consistent with how we use the Inbox
objects for the main code.
|
|
Other VCSes have other means of providing the description.
|
|
Query strings make endpoint caching more difficult since
they're order-independent. They are also more likely lost
or truncated inadvertantly when copy+pasting, so try to
avoid them for default endpoints.
There's still some things which are broken and followup
commits will be needed to fix them.
|
|
We must lazilly load one of them, so load Inbox later
since we need to parse the config, first.
|
|
PSGI specs already require PATH_INFO to be unescaped.
Followup-to: commit 364de65f8a6b5729027cb70228312a141430122f
("www: do not unescape PATH_INFO twice")
|
|
* origin/master:
www: do not unescape PATH_INFO twice
t/mime: quiet warnings for old versions of Email::Simple
handle repeated References and In-Reply-To headers
|
|
Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.
More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.
|
|
PSGI specs already require PATH_INFO to be unescaped;
so our tests were wrong, too.
|
|
This is fixed in the newest versions of Email::Simple,
but not the version in Debian jessie (2.203)
|
|
It seems possible for git-send-email(1) to generate repeated
repeated instances of References and In-Reply-To headers,
as evidenced in:
https://public-inbox.org/git/20161111124541.8216-17-vascomalmeida@sapo.pt/raw
This causes a mismatch between how our search indexer threads
and how our HTML view handles threading. In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.
We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.
|
|
This improves startup speed at the cost of CoW-friendliness
for long-lived daemons (which can be fixed, later).
|
|
This allows RepoConfig to be independent of the
PublicInbox::Inbox class.
|
|
This should hopefully allow us to organize our code better
|
|
And add a note to remove git_commit_title
|
|
For certain repos, having too many active refs will cause
memory usage problems. Mitigate the Xapian problems, for
now, and consider a switch to GDBM_File or similar for
repos with more refs.
|
|
This simplifies the code a bit and reduces the translation
overhead for looking directly at data from tools shipped
with Xapian.
While we're at it, fix thread-all.t :)
|
|
It's extraordinarily expensive to add these terms for
each and every commit.
|
|
We need to flush Xapian more frequently to account for
gigantic commits which introduce lots of text, so do
it when accounting for each line processed, and not
for each commit processed.
|
|
We'll still be keeping "repobrowse" for the public API
for use with .psgi files, but shortening the name means
less typing and we may have command-line tools, too.
|
|
* origin/master:
config: do not slurp lines into memory
TODO: several updates
search: schema version bump for empty References/In-Reply-To
Revert "searchidx: reindex clobbers old thread IDs"
searchidx: reindex clobbers old thread IDs
searchidx: deal with empty In-Reply-To and References headers
searchview: increase limit for displaying search results
searchview: clarify numeric summary at bottom
add filter for Subject: tags
watchmaildir: allow arguments for filters
watchmaildir: limit live importer processes
learn: implement "rm" only functionality
mime: avoid SUPER usage in Email::MIME subclass
inbox: reinstate periodic cleanup of Xapian and SQLite objects
introduce PublicInbox::MIME wrapper class
|
|
"foreach (<$fh>)" in Perl requests lines in array
context, so use "while" instead for lazy reading.
This follows ba4c50c20b95679580beba1ef290a4281d5285b7
in master ("config: do not slurp lines into memory")
|
|
There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context
(for some reason, I thought Perl5 was smart enough
to avoid creating a temporary array, here...)
|
|
Much more work on this will be needed, but at least explicit
flush points prevents OOMs on my system.
|
|
Always plenty to do while working on this...
|
|
We will be reusing this for indexing normal (code) repositories
using git and Xapian, too.
|
|
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
|
|
Oops, that's broken, too. I guess the only way to reindex
after fixing the thread detection is to start from scratch.
This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
|
|
We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.
|
|
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
|
|
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
|
|
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
|
|
I'm unsure if this is even a good idea to support,
but we have it, for now.
|
|
We never ended up using it.
|
|
This shortens the code quite a bit at a negligible performance cost,
and the diffstat agrees.
|
|
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Remove them from our archives to reduce clutter.
|
|
We'll want to allow some degree of configuration for
various mailing lists.
|
|
This reduces one synchronous dependency from the hot path,
and psgi_return will be used in the future.
|
|
I've hit random test failures on this, so attempt to improve
diagnostics and improve documentation for this test.
|
|
Commit messages are assumed to be displayed in a terminal
with a fixed width font, so we must preserve newlines and
all whitespace as-is so ASCII art may be displayed properly.
|