Date | Commit message (Collapse) |
|
We already escape the user-provided Message-IDs (so there's no
security problem AFAIK), but the URL templates which exist in
our source code were not escaped properly.
This quiets down tidy(1).
|
|
It's not worth it, and attempts to wildcard off
single-character Message-IDs(*) causes Xapian to error
out in unpredictable ways:
something terrible happened at /usr/lib/x86_64-linux-gnu/perl5/5.24/Search/Xapian/Enquire.pm line 54.
...propagated at lib/PublicInbox/Search.pm line 209.
So don't bother.
(*) because people blindly hit 'y' or 'n' when git-send-email
prompted them for In-Reply-To.
|
|
"LIKE" in SQLite (and other SQL implementations I've seen) is
expensive with nearly 3 million messages in the archives.
This caused some partial Message-ID lookups to take over 600ms
on my workstation (~300ms on a faster Xeon). Cut that to below
under 30ms on average on my workstation by relying exclusively
on Xapian for partial Message-ID lookups as we have in the past.
Unlike in the past when we tried using Xapian to match partial
Message-IDs; we now optimize our indexing of Message-IDs to
break apart "words" in Message-IDs for searching, yielding
(hopefully) "good enough" accuracy for folks who get long URLs
broken across lines when copy+pasting.
We'll also drop the (in retrospect) pointless stripping of
"/[tTf]" suffixes for the partial match, since anybody who
hits that codepath would be hitting an invalid message ID.
Finally, limit wildcard expansion to prevent easy DoS vectors
on short terms.
And blame Pine and alpine for generating Message-IDs with
low-entropy prefixes :P
|
|
* origin/master:
nntp: allow and ignore empty commands
mbox: do not barf on queries which return no results
nntp: fix NEWNEWS command
searchview: fix non-numeric comparison
Allow specification of the number of search results to return
githttpbackend: avoid infinite loop on generic PSGI servers
http: fix modification of read-only value
extmsg: use news.gmane.org for Message-ID lookups
extmsg: rework partial MID matching to favor current inbox
Update the installation instructions with Fedora package names
nntp: do not drain rbuf if there is a command pending
nntp: improve fairness during XOVER and similar commands
searchidx: do not modify Xapian DB while iterating
Don't use LIMIT in UPDATE statements
|
|
Searching across different inboxes is expensive without
SQLite (or Xapian) installed, so avoid doing expensive tree
lookups in git. Since SQLite is required for Xapian
support anyways, we won't need to check Xapian, either.
Sites without SQLite installed will simply 404 if somebody
requests a message which isn't in the current inbox.
|
|
http://mid.gmane.org/ has not worked for a while, but their NNTP
server continues to work. Use that and perhaps give NNTP more
exposure.
Reported-by: Jonathan Corbet <corbet@lwn.net>
|
|
The current inbox is more important for partial Message-ID
matching, so we try harder on that to fix common errors before
moving onto other inboxes. Then, prevent expensive scanning of
other inboxes by requiring a Message-ID length of at least 16
bytes.
Finally, we limit the overall partial responses to 200 when
scanning other inboxes to avoid excessive memory usage.
|
|
The current inbox is more important for partial Message-ID
matching, so we try harder on that to fix common errors before
moving onto other inboxes. Then, prevent expensive scanning of
other inboxes by requiring a Message-ID length of at least 16
bytes.
Finally, we limit the overall partial responses to 200 when
scanning other inboxes to avoid excessive memory usage.
|
|
'Q' is merely a convention in the Xapian world, and is close
enough to unique for practical purposes, so stop using XMID
and gain a little more term length as a result.
|
|
In general, they are, but there's no way for or general purpose
mail server to enforce that. This is a step in allowing us
to handle more corner cases which existing lists throw at us.
|
|
This likely has no real world implications, though, as we
fall back to Msgmap lookups anyways.
Broken since commit 7eeadcb62729b0efbcb53cd9b7b181897c92cf9a
("search: remove unnecessary abstractions and functionality")
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Apparently mid.mail-archive.com does not support HTTPS,
and the HTTP version redirects to the search query, anyways.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
gmane is down at the moment, so lower that in priority
(hopefully it will be brought back up, again). Wikipedia also
lists a few more project-specific list providers, so include
those as well: https://en.wikipedia.org/wiki/Message-ID
|
|
While an inbox may have multiple URLs, we will favor
the existing URL for the current inbox on partial matches
to avoid confusing users or slowing them down by requiring
a new TCP connection.
|
|
Reduce the size of hashes a bit and drops some unneeded hash
lookups for uncommon paths.
|
|
Another step towards a consistent WWW UI...
|
|
Automatic inbox switching was a potentially deceptive pattern
and surprises readers who do not check the URL bar closely.
Furthermore, a message could be cross-posted to multiple lists,
too.
|
|
Exposing compressed Message-IDs in URLs was a mistake,
remove a remnant of it.
|
|
This fills in the internal lookup hashes and simplifies
callers.
|
|
This is less code and hopefully easier-to-understand.
|
|
A public-inbox is NOT necessarily a mailing list, but it
could serve as an input point for zero, one, or infinite
mailing lists :D
|
|
All URL generation in dynamic HTTP pages should be capable of
generating "https" or "http" URLs depending on the user's
preference.
|
|
We cannot modify elements in any shared data strucutures
shared between requests. Oops!
|
|
We will be falling back and cascading to newsgroup lookups, later.
|
|
This allows users to avoid HTTPS -> HTTP downgrade warnings,
but we will also avoid encouraging them towards HTTPS, for now.
IMHO: the CA system gives a false sense of security,
TLS libraries (e.g. OpenSSL) can introduce new bugs and
problems (even to attack clients), and TLS libraries
also eats memory on cheap servers.
|
|
Relying on Plack::Handler::CGI is much easier for long-term
maintenance and development.
Nowadays, we even include our own httpd implementation to
facilitate easier deployment with PSGI/Plack.
|
|
Avoid unintentionally switching protocols if the external site
we're linking to supports both HTTP and HTTPS.
We do not want to force HTTPS everywhere because potential
bugs and performance problems in the TLS stack may outweigh
the privacy benefits. Leave up to site authors and users
to decide whether they want HTTPS or plain old HTTP.
|
|
Not needed, but this is good documentation. Some of these values
should never have newlines.
|
|
We'll probably want to continue supporting CGI for mod_perl
compatibility.
|
|
Fixes commit 4c2c2325d2948ec5340e2fcafbee798cf568f5fd
("rename 'GitCatFile' package to 'Git'")
|
|
We'll be using it for more than just cat-file.
Adding a `popen' API for internal use allows us to save a bunch
of code in other places.
|
|
Sometimes users (me :x) blindly append "raw" to a /t/ URL...
|
|
Hopefully this gives new hackers a better overview of
how the components relate to each other.
|
|
DBI + DBD::SQLite has much better handling of prefix lookups
than Xapian. While we're at it, avoid linking blatantly wrong
Message-IDs to external services.
|
|
ref: http://public-inbox.org/meta/20150905091457.GA27857@dcvr.yhbt.net/
|
|
In case a URL gets truncated (as is common with long URLs),
we can rely on Xapian for partial matches and bring the user
to their destination.
|
|
Oops, browsers normally render this fine, though.
|
|
Provide a fallback for legacy SHA-1 messages, but do not
advertise shorter URLs anymore for data portability concerns.
This fixes a regression introduced in
commit 81a9c1b476987d845b340ab9013d26cf4487cb9a
("search: disable Message-ID compression in Xapian")
which ended up breaking thread-related endpoints for
large Message-IDs, as lookups on the SHA-1 message no longer
worked.
|
|
Since cross-posting is inevitable, we shall link to external
message archives for interopability.
|
|
We'll continue to compress long Message-IDs in URLs (which we know
about), but we will store entire Message-IDs in the Xapian database
to facilitate ease-of-lookups in external databases.
|
|
Currently, this looks at other public-inbox configurations
served in the same process. In the future, it will generate
links to other Message-ID lookup endpoints.
|