Date | Commit message (Collapse) |
|
This name is shorter and matches terminology in gitweb and
other popular git web viewers.
|
|
This should hopefully allow us to organize our code better
|
|
We'll still be keeping "repobrowse" for the public API
for use with .psgi files, but shortening the name means
less typing and we may have command-line tools, too.
|
|
* origin/master:
config: do not slurp lines into memory
TODO: several updates
search: schema version bump for empty References/In-Reply-To
Revert "searchidx: reindex clobbers old thread IDs"
searchidx: reindex clobbers old thread IDs
searchidx: deal with empty In-Reply-To and References headers
searchview: increase limit for displaying search results
searchview: clarify numeric summary at bottom
add filter for Subject: tags
watchmaildir: allow arguments for filters
watchmaildir: limit live importer processes
learn: implement "rm" only functionality
mime: avoid SUPER usage in Email::MIME subclass
inbox: reinstate periodic cleanup of Xapian and SQLite objects
introduce PublicInbox::MIME wrapper class
|
|
Much more work on this will be needed, but at least explicit
flush points prevents OOMs on my system.
|
|
We will be reusing this for indexing normal (code) repositories
using git and Xapian, too.
|
|
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Remove them from our archives to reduce clutter.
|
|
Based on what was done for the Atom feed, this will allow us to
simplify state management through metaprogramming and avoid
placeholder characters ('D' for decoration) for empty fields.
|
|
This wrapper class actually does both reading and
writing, and a shorter name is nicer.
|
|
This should fix problems with multipart messages where
text/plain parts lack a header.
cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
refs/pull/28/head
In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.
|
|
This will allow us to handle network operations while waiting
on "git cat-file" to seek and unpack things.
|
|
* origin/master: (25 commits)
evcleanup: ensure deferred close from timers are handled ASAP
httpd/async: improve variable naming
githttpbackend: minor cleanups to improve readability
githttpbackend: simplify compatibility code
githttpbackend: minor readability improvement
http: fix clobbering of $null_io
linkify: modify argument in place
view: do not modify array during iteration
view: stop chomping off whitespace at ends of messages
view: remove unused parameter
search: lookup_mail handles modified DBs
doc: various comments on async handling
searchthread: simplify API and remove needless OO
searchthread: update comment about loop prevention
searchmsg: remove ensure_metadata
tests: add thread-all testing for benchmarking
searchmsg: do not memoize {date} field
searchmsg: remove locale-dependency for ->date
t/config.t: fix feedmax default
wwwtext: link to RFC4685 (Atom Threading)
...
|
|
I'll be using this to improve message threading performance.
|
|
* origin/repobrowse: (98 commits)
t/repobrowse_git_httpd.t: ensure signature exists for split
t/repobrowse_git_tree.t: fix test for lack of bold
repobrowse: fix alignment of gitlink entries
repobrowse: show invalid type for tree views
repobrowse: do not bold directory names in tree view
repobrowse: reduce checks for response fh
repobrowse: larger, short-lived buffer for reading patches
repobrowse: reduce risk of callback reference cycles
repobrowse: snapshot support for cgit compatibility
test: disable warning for Plack::Test::Impl
repobrowse: avoid confusing linkification for "diff"
repobrowse: git commit view uses pi-httpd.async
repobrowse: more consistent variable naming for /commit/
repobrowse: show roughly equivalent "diff-tree" invocation
repobrowse: reduce local variables for state management
repobrowse: summary handles multiple README types
repobrowse: remove bold decorations from diff view
repobrowse: common git diff parsing code
repobrowse: implement diff view for compatibility
examples/repobrowse.psgi: disable Chunked response by default
...
|
|
This will let us stream larger Atom documents bodies without
wasting too much memory and reduce the amount of round-trip
requests needed to get necessary information.
Hopefully clients are using streaming (SAX) parsers, too.
This is the final transition in the core public-inbox
code to allow migrating to a "pull"-based body streaming
scheme which allows a HTTP server to respond appropriately
to backpressure from slow clients.
|
|
This starts to show noticeable performance improvements when
attempting to thread over 400 messages; but the improvement
may not be measurable with less.
However, the resulting code is much shorter and (IMHO)
much easier to understand.
|
|
Introduce our own SearchThread class for threading messages.
This should allow us to specialize and optimize away objects
in future commits.
|
|
Hopefully more folks can download and run public-inbox,
nowadays.
|
|
Begin documenting some basic help functionality.
I may tweak the anchor names of the various HTML endpoints
to be more consistent with each other (old ones will be
supported for a short while), so I'm not documenting
those, for now.
This may become part of a builtin key-value store for
basic texts, but this probably shouldn't become a wiki
engine, either.
|
|
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.
In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.
|
|
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
NNTPSERVER=news.gmane.org
export NNTPSERVER
GROUP=gmane.comp.version-control.git
perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3
(I might integrate this further with public-inbox-* scripts one day).
My ~/.public-inbox/config as an added "altid" snippet which now
looks like this:
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.vger.git
newsgroup = inbox.comp.version-control.git
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
|
|
This is used to quickly generate an article number to Message-ID
mapping.
Usage:
NNTPSERVER=news.example.org ./scripts/xhdr-num2mid GROUP >file
|
|
|
|
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
|
|
Same as nginx :>
|
|
|
|
We may remove from_name in the future.
...And disallow quotes in email addresses.
Technically I believe they're allowed, but they're definitely
uncommon and unlikely to show up in legitimate mail.
|
|
This should hopefully make it easier to try other anti-spam
systems (or none at all) in the future.
|
|
When running mailing list mirrors, one needs to be careful
spammers do not try to sidestep the list server we want to
mirror from and inject email into our mail directly by setting
the appropriate list headers (e.g. "X-Mailing-List" or
"List-Id"). We trust the top-most Received: header is
the one our own mail server got the mail from.
Bcc:-ing a public mailing list is a very likely indicator of
spam in my experience, so throw in an extra rule mark it.
While public-inbox-mda rejects Bcc: entirely, public-inbox-watch
needs to mirror lists which allow Bcc.
==> list_mirror.cf <==
loadplugin PublicInbox::SaPlugin::ListMirror
ifplugin PublicInbox::SaPlugin::ListMirror
header LIST_MIRROR_RECEIVED eval:check_list_mirror_received()
describe LIST_MIRROR_RECEIVED Received does not match trusted list server
score LIST_MIRROR_RECEIVED 10
header LIST_MIRROR_BCC eval:check_list_mirror_bcc()
describe LIST_MIRROR_BCC Mailing list was Bcc-ed
score LIST_MIRROR_BCC 1
endif
==> ~/.spamassassin/user_prefs <==
ifplugin PublicInbox::SaPlugin::ListMirror
list_mirror X-Mailing-List git@vger.kernel.org *.kernel.org git@vger.kernel.org
endif
|
|
And add a check-manifest target to the Makefile to
ensure we're up-to-date with git (but do not depend on
git).
|
|
This will allow users to run importers off existing mail
accounts where they may not have access to run -mda.
Currently, we only support Maildirs, but IMAP ought to be
doable.
|
|
Oops, maybe this could be auto-maintained somehow...
|
|
Most of its functionality is in the PublicInbox::Inbox class.
While we're at it, we no longer auto-create newsgroup names
based on the inbox name, since newsgroup names probably deserve
some thought when it comes to hierarchy.
|
|
It seems common for users to end statements with URLs,
while it is rare for a URL itself to end with a '.' or ';'.
So make a guess and assume the URL was intended to not
include the trailing '.' or ';'
|
|
This will allow us to more easily reuse it elsewhere.
|
|
Ugh, I wonder if we can/should generate this automatically...
|
|
It's been a while...
|
|
This project is currently implemented in Perl, and pod2man is
probably more common among potential users and developers of
this project.
|
|
We'll be using it for more than just cat-file.
Adding a `popen' API for internal use allows us to save a bunch
of code in other places.
|
|
Thanks to Ask for the patch in
https://rt.cpan.org/Public/Bug/Display.html?id=22817
|
|
This should allow us to more-easily test with Plack.
|
|
We use --git-dir=... instead of $ENV{GIT_DIR} because ENV changes
do not propagate easily with mod_perl.
|
|
This helps us keep track of escaping which needs to be done
for various levels.
|
|
We will be combining common code between -learn and -mda
|
|
We will be reusing the config parsing code for the CGI
script, too.
|
|
|