diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/data_structures.txt | 30 | ||||
-rw-r--r-- | Documentation/technical/ds.txt | 21 | ||||
-rw-r--r-- | Documentation/technical/memory.txt | 10 | ||||
-rw-r--r-- | Documentation/technical/weird-stuff.txt | 22 | ||||
-rw-r--r-- | Documentation/technical/whyperl.txt | 20 |
5 files changed, 68 insertions, 35 deletions
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt index 4dcf9ce6..11f78041 100644 --- a/Documentation/technical/data_structures.txt +++ b/Documentation/technical/data_structures.txt @@ -32,19 +32,19 @@ Per-message classes Common abbreviation: $mime, $eml Used by: PublicInbox::WWW, PublicInbox::SearchIdx - An representation of an entire email, multipart or not. + A representation of an entire email, multipart or not. An option to use libgmime or libmailutils may be supported in the future for performance and memory use. This can be a memory hog with big messages and giant attachments, so our PublicInbox::WWW interface only keeps - one object of this class in memory at-a-time. + one object of this class in memory at a time. In other words, this is the "meat" of the message, whereas $smsg (below) is just the "skeleton". Our PublicInbox::V2Writable class may have two objects of this - type in memory at-a-time for deduplication. + type in memory at a time for deduplication. In public-inbox 1.4 and earlier, Email::MIME and its subclass, PublicInbox::MIME were used. Despite still slurping, @@ -61,10 +61,10 @@ Per-message classes This is loaded from either the overview DB (over.sqlite3) or the Xapian DB (docdata.glass), though the Xapian docdata - is won't hold NNTP-only fields (Cc:/To:) + won't hold NNTP-only fields (Cc:/To:). There may be hundreds or thousands of these objects in memory - at-a-time, so fields are pruned if unneeded. + at a time, so fields are pruned if unneeded. * PublicInbox::SearchThread::Msg - subclass of Smsg Common abbreviation: $cont or $node @@ -75,9 +75,9 @@ Per-message classes Nowadays, this is a re-blessed $smsg with additional fields. As with $smsg objects, there may be hundreds or thousands - of these objects in memory at-a-time. + of these objects in memory at a time. - We also do not use a linked-list for storing children as JWZ + We also do not use a linked list for storing children as JWZ describes, but instead a Perl hashref for {children} which becomes an arrayref upon sorting. @@ -88,7 +88,7 @@ Per-inbox classes * PublicInbox::Inbox - represents a single public-inbox Common abbreviation: $ibx - Used everywhere + Used everywhere. This represents a "publicinbox" section in the config file, see public-inbox-config(5) for details. @@ -117,7 +117,7 @@ Per-inbox classes * PublicInbox::Search - Xapian read-only interface Common abbreviation: $srch, $ibx->search - Used everywhere if Search::Xapian (or Xapian.pm) is available. + Used everywhere if Xapian is available. Each indexed inbox has one of these, see public-inbox-v1-format(5) and public-inbox-v2-format(5) @@ -152,7 +152,7 @@ ad-hoc structures shared across packages This holds the PSGI $env as well as any internal variables used by various modules of PublicInbox::WWW. - As with the PSGI $env, there is one per-active WWW + As with the PSGI $env, there is one per active WWW request+response cycle. It does not exist for idle HTTP clients. @@ -174,8 +174,8 @@ daemon classes Common abbreviation: $http Used by: PublicInbox::DS, public-inbox-httpd - Unlike PublicInbox::NNTP, this class no knowledge of any of - the email or git-specific parts of public-inbox, only PSGI. + Unlike PublicInbox::NNTP, this class has no knowledge of any of + the email- or git-specific parts of public-inbox, only PSGI. However, it supports APIs and behaviors (e.g. streaming large responses) which PublicInbox::WWW may take advantage of. @@ -188,7 +188,7 @@ daemon classes This class calls non-blocking accept(2) or accept4(2) on a listen socket to create new PublicInbox::HTTP and - PublicInbox::HTTP instances. + PublicInbox::NNTP instances. * PublicInbox::HTTPD Common abbreviation: $httpd @@ -197,9 +197,9 @@ daemon classes wrappers around client sockets accepted from PublicInbox::Listener. - Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be + Since the SERVER_NAME and SERVER_PORT PSGI variables need to be exposed for HTTP/1.0 requests when Host: headers are missing, - this is per-Listener socket. + this is per Listener socket. * PublicInbox::HTTPD::Async Common abbreviation: $async diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt index 5a1655a1..afead2f1 100644 --- a/Documentation/technical/ds.txt +++ b/Documentation/technical/ds.txt @@ -1,9 +1,14 @@ PublicInbox::DS - event loop and async I/O base class -Our PublicInbox::DS event loop which powers public-inbox-nntpd -and public-inbox-httpd diverges significantly from the -unmaintained Danga::Socket package we forked from. In fact, -it's probably different from most other event loops out there. +Our PublicInbox::DS event loop which powers most of our long-lived +processes(*) diverges significantly from the unmaintained Danga::Socket +package we forked from. In fact, it's probably different from most +other event loops out there. + +Most notably, it uses one-shot, level-trigger, and edge-trigger mode +modes of kqueue|epoll depending on the situation. + +(*) public-inbox-netd,(-httpd,-imapd,-nntpd,-pop3d,-watch) + lei-daemon Most notably: @@ -14,7 +19,7 @@ Most notably: triggers a call. The lack of read/write callback distinction is driven by the - fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may + fact that TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on SSL_read(). So we end up having to let each user object decide whether it wants to make read or write calls depending on its @@ -30,7 +35,7 @@ Most notably: Reducing the user-supplied code down to a single callback allows subclasses to keep their logic self-contained. The combination of this change and one-shot wakeups (see below) for bidirectional - data flows make asynchronous code easier to reason about. + data flows makes asynchronous code easier to reason about. Other divergences: @@ -48,7 +53,7 @@ Other divergences: Augmented features: -* obj->write(CODEREF) passes the object itself to the CODEREF +* obj->write(CODEREF) passes the object itself to the CODEREF. Being able to enqueue subroutine calls is a powerful feature in Danga::Socket for keeping linear logic in an asynchronous environment. Unfortunately, each subroutine takes several kilobytes of memory. @@ -81,7 +86,7 @@ New features * IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS) -* dwaitpid (waitpid wrapper) support for reaping dead children +* awaitpid (waitpid wrapper) support for reaping dead children * reliable signal wakeups are supported via signalfd on Linux, EVFILT_SIGNAL on *BSDs via IO::KQueue. diff --git a/Documentation/technical/memory.txt b/Documentation/technical/memory.txt index bb1c92fd..039694c3 100644 --- a/Documentation/technical/memory.txt +++ b/Documentation/technical/memory.txt @@ -8,12 +8,12 @@ memory-efficient. We strive to keep processes small to improve locality, allow the kernel to cache more files, and to be a good neighbor to other processes running on the machine. Taking advantage of -automatic reference counting (ARC) in Perl allows us +automatic reference counting (ARC) in Perl allows us to deterministically release memory back to the heap. We start with a simple data model with few circular references. This both eases human understanding and reduces -the likelyhood of bugs. +the likelihood of bugs. Knowing the relative sizes and quantities of our data structures, we limit the scope of allocations as much as @@ -48,3 +48,9 @@ In the future, our internal data model will be further flattened and simplified to reduce the overhead imposed by small objects. Large allocations may also be avoided by optionally using Inline::C. + +Finally, the mwrap-perl LD_PRELOAD wrapper was ported to Perl 5 +and enhanced to provide live memory usage tracking on 64-bit systems +with minimal performance impact on production traffic: + + git clone https://80x24.org/mwrap-perl.git diff --git a/Documentation/technical/weird-stuff.txt b/Documentation/technical/weird-stuff.txt new file mode 100644 index 00000000..0c8d6891 --- /dev/null +++ b/Documentation/technical/weird-stuff.txt @@ -0,0 +1,22 @@ +There's a lot of weird code in public-inbox which may be daunting +to new hackers. + +* The event loop (PublicInbox::DS) is an evolution of a fairly standard + C10K event loop. See ds.txt in this directory for more. + +Things got weirder in 2021: + +* The lei command-line tool is backed by a daemon. This was done to + improve startup time for shell completion and manage git/SQLite/Xapian + single-writer during long, parallel imports. It may eventually become + a read-write IMAP/JMAP server. + +* SOCK_SEQPACKET is used extensively in lei, and will likely make its + way into more places, still. + +And even more so in 2022: + +* public-inbox-clone / PublicInbox::LeiMirror relies on ->DESTROY + for make-like dependency management while providing parallelism. + +More to come, lei will expose Maildirs via FUSE 3... diff --git a/Documentation/technical/whyperl.txt b/Documentation/technical/whyperl.txt index fbe2e1b1..db1d9793 100644 --- a/Documentation/technical/whyperl.txt +++ b/Documentation/technical/whyperl.txt @@ -21,7 +21,7 @@ Good Things Perl 5 is installed on many, if not most GNU/Linux and BSD-based servers and workstations. It is likely the most - widely-installed programming environment that offers a + widely installed programming environment that offers a significant amount of POSIX functionality. Users won't have to waste bandwidth or space with giant toolchains or architecture-specific binaries. @@ -47,8 +47,8 @@ Good Things * Predictable performance - While Perl is neither fast or memory-efficient, its - performance and memory use are predictable and does not + While Perl is neither fast nor memory-efficient, its + performance and memory use are predictable and do not require GC tuning by the user. public-inbox is developed for (and mostly on) old @@ -56,7 +56,7 @@ Good Things late 1990s, and any cheap VPS today has more than enough RAM and CPU for handling plain-text email. - Low hardware requirements increases the reach of our software + Low hardware requirements increase the reach of our software to more users, improving centralization resistance. * Compatibility @@ -86,7 +86,7 @@ Good Things There should be no need to rely on language-specific package managers such as cpan(1), those systems increase - the learning curve for users and systems administrators. + the learning curve for users and system administrators. * Compactness and terseness @@ -98,7 +98,7 @@ Good Things * Performance ceiling and escape hatch With optional Inline::C, we can be "as fast as C" in some - cases. Inline::C is widely-packaged by distros and it + cases. Inline::C is widely packaged by distros and it gives us an escape hatch for dealing with missing bindings or performance problems should they arise. Inline::C use (as opposed to XS) also preserves the software freedom and @@ -135,7 +135,7 @@ Bad Things (m//, substr(), index(), etc.) still require memory copies into userspace, negating a benefit of zero-copy. -* The XS/C API make it difficult to improve internals while +* The XS/C API makes it difficult to improve internals while preserving compatibility. * Lack of optional type checking. This may be a blessing in @@ -161,14 +161,14 @@ Red herrings to ignore when evaluating other runtimes ----------------------------------------------------- These don't discount a language or runtime from being -being used, they're just not interesting. +used, they're just not interesting. * Lightweight threading While lightweight threading implementations are - convenient, they tend to be significantly heavier than a + convenient, they tend to be significantly heavier than pure event-loop systems (or multi-threaded event-loop - systems) + systems). Lightweight threading implementations have stack overhead and growth typically measured in kilobytes. The userspace |