about summary refs log tree commit homepage
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/data_structures.txt30
-rw-r--r--Documentation/technical/ds.txt21
-rw-r--r--Documentation/technical/memory.txt10
-rw-r--r--Documentation/technical/weird-stuff.txt22
-rw-r--r--Documentation/technical/whyperl.txt20
5 files changed, 68 insertions, 35 deletions
diff --git a/Documentation/technical/data_structures.txt b/Documentation/technical/data_structures.txt
index 4dcf9ce6..11f78041 100644
--- a/Documentation/technical/data_structures.txt
+++ b/Documentation/technical/data_structures.txt
@@ -32,19 +32,19 @@ Per-message classes
   Common abbreviation: $mime, $eml
   Used by: PublicInbox::WWW, PublicInbox::SearchIdx
 
-  An representation of an entire email, multipart or not.
+  A representation of an entire email, multipart or not.
   An option to use libgmime or libmailutils may be supported
   in the future for performance and memory use.
 
   This can be a memory hog with big messages and giant
   attachments, so our PublicInbox::WWW interface only keeps
-  one object of this class in memory at-a-time.
+  one object of this class in memory at a time.
 
   In other words, this is the "meat" of the message, whereas
   $smsg (below) is just the "skeleton".
 
   Our PublicInbox::V2Writable class may have two objects of this
-  type in memory at-a-time for deduplication.
+  type in memory at a time for deduplication.
 
   In public-inbox 1.4 and earlier, Email::MIME and its subclass,
   PublicInbox::MIME were used.  Despite still slurping,
@@ -61,10 +61,10 @@ Per-message classes
 
   This is loaded from either the overview DB (over.sqlite3) or
   the Xapian DB (docdata.glass), though the Xapian docdata
-  is won't hold NNTP-only fields (Cc:/To:)
+  won't hold NNTP-only fields (Cc:/To:).
 
   There may be hundreds or thousands of these objects in memory
-  at-a-time, so fields are pruned if unneeded.
+  at a time, so fields are pruned if unneeded.
 
 * PublicInbox::SearchThread::Msg - subclass of Smsg
   Common abbreviation: $cont or $node
@@ -75,9 +75,9 @@ Per-message classes
   Nowadays, this is a re-blessed $smsg with additional fields.
 
   As with $smsg objects, there may be hundreds or thousands
-  of these objects in memory at-a-time.
+  of these objects in memory at a time.
 
-  We also do not use a linked-list for storing children as JWZ
+  We also do not use a linked list for storing children as JWZ
   describes, but instead a Perl hashref for {children} which
   becomes an arrayref upon sorting.
 
@@ -88,7 +88,7 @@ Per-inbox classes
 
 * PublicInbox::Inbox - represents a single public-inbox
   Common abbreviation: $ibx
-  Used everywhere
+  Used everywhere.
 
   This represents a "publicinbox" section in the config
   file, see public-inbox-config(5) for details.
@@ -117,7 +117,7 @@ Per-inbox classes
 
 * PublicInbox::Search - Xapian read-only interface
   Common abbreviation: $srch, $ibx->search
-  Used everywhere if Search::Xapian (or Xapian.pm) is available.
+  Used everywhere if Xapian is available.
 
   Each indexed inbox has one of these, see
   public-inbox-v1-format(5) and public-inbox-v2-format(5)
@@ -152,7 +152,7 @@ ad-hoc structures shared across packages
   This holds the PSGI $env as well as any internal variables
   used by various modules of PublicInbox::WWW.
 
-  As with the PSGI $env, there is one per-active WWW
+  As with the PSGI $env, there is one per active WWW
   request+response cycle.  It does not exist for idle HTTP
   clients.
 
@@ -174,8 +174,8 @@ daemon classes
   Common abbreviation: $http
   Used by: PublicInbox::DS, public-inbox-httpd
 
-  Unlike PublicInbox::NNTP, this class no knowledge of any of
-  the email or git-specific parts of public-inbox, only PSGI.
+  Unlike PublicInbox::NNTP, this class has no knowledge of any of
+  the email- or git-specific parts of public-inbox, only PSGI.
   However, it supports APIs and behaviors (e.g. streaming large
   responses) which PublicInbox::WWW may take advantage of.
 
@@ -188,7 +188,7 @@ daemon classes
 
   This class calls non-blocking accept(2) or accept4(2) on a
   listen socket to create new PublicInbox::HTTP and
-  PublicInbox::HTTP instances.
+  PublicInbox::NNTP instances.
 
 * PublicInbox::HTTPD
   Common abbreviation: $httpd
@@ -197,9 +197,9 @@ daemon classes
   wrappers around client sockets accepted from
   PublicInbox::Listener.
 
-  Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be
+  Since the SERVER_NAME and SERVER_PORT PSGI variables need to be
   exposed for HTTP/1.0 requests when Host: headers are missing,
-  this is per-Listener socket.
+  this is per Listener socket.
 
 * PublicInbox::HTTPD::Async
   Common abbreviation: $async
diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
index 5a1655a1..afead2f1 100644
--- a/Documentation/technical/ds.txt
+++ b/Documentation/technical/ds.txt
@@ -1,9 +1,14 @@
 PublicInbox::DS - event loop and async I/O base class
 
-Our PublicInbox::DS event loop which powers public-inbox-nntpd
-and public-inbox-httpd diverges significantly from the
-unmaintained Danga::Socket package we forked from.  In fact,
-it's probably different from most other event loops out there.
+Our PublicInbox::DS event loop which powers most of our long-lived
+processes(*) diverges significantly from the unmaintained Danga::Socket
+package we forked from.  In fact, it's probably different from most
+other event loops out there.
+
+Most notably, it uses one-shot, level-trigger, and edge-trigger mode
+modes of kqueue|epoll depending on the situation.
+
+(*) public-inbox-netd,(-httpd,-imapd,-nntpd,-pop3d,-watch) + lei-daemon
 
 Most notably:
 
@@ -14,7 +19,7 @@ Most notably:
   triggers a call.
 
   The lack of read/write callback distinction is driven by the
-  fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
+  fact that TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
   declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on
   SSL_read().  So we end up having to let each user object decide
   whether it wants to make read or write calls depending on its
@@ -30,7 +35,7 @@ Most notably:
   Reducing the user-supplied code down to a single callback allows
   subclasses to keep their logic self-contained.  The combination
   of this change and one-shot wakeups (see below) for bidirectional
-  data flows make asynchronous code easier to reason about.
+  data flows makes asynchronous code easier to reason about.
 
 Other divergences:
 
@@ -48,7 +53,7 @@ Other divergences:
 
 Augmented features:
 
-* obj->write(CODEREF) passes the object itself to the CODEREF
+* obj->write(CODEREF) passes the object itself to the CODEREF.
   Being able to enqueue subroutine calls is a powerful feature in
   Danga::Socket for keeping linear logic in an asynchronous environment.
   Unfortunately, each subroutine takes several kilobytes of memory.
@@ -81,7 +86,7 @@ New features
 
 * IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS)
 
-* dwaitpid (waitpid wrapper) support for reaping dead children
+* awaitpid (waitpid wrapper) support for reaping dead children
 
 * reliable signal wakeups are supported via signalfd on Linux,
   EVFILT_SIGNAL on *BSDs via IO::KQueue.
diff --git a/Documentation/technical/memory.txt b/Documentation/technical/memory.txt
index bb1c92fd..039694c3 100644
--- a/Documentation/technical/memory.txt
+++ b/Documentation/technical/memory.txt
@@ -8,12 +8,12 @@ memory-efficient.
 We strive to keep processes small to improve locality, allow
 the kernel to cache more files, and to be a good neighbor to
 other processes running on the machine.  Taking advantage of
-automatic reference counting (ARC) in Perl allows us
+automatic reference counting (ARC) in Perl allows us to
 deterministically release memory back to the heap.
 
 We start with a simple data model with few circular
 references.  This both eases human understanding and reduces
-the likelyhood of bugs.
+the likelihood of bugs.
 
 Knowing the relative sizes and quantities of our data
 structures, we limit the scope of allocations as much as
@@ -48,3 +48,9 @@ In the future, our internal data model will be further
 flattened and simplified to reduce the overhead imposed by
 small objects.  Large allocations may also be avoided by
 optionally using Inline::C.
+
+Finally, the mwrap-perl LD_PRELOAD wrapper was ported to Perl 5
+and enhanced to provide live memory usage tracking on 64-bit systems
+with minimal performance impact on production traffic:
+
+        git clone https://80x24.org/mwrap-perl.git
diff --git a/Documentation/technical/weird-stuff.txt b/Documentation/technical/weird-stuff.txt
new file mode 100644
index 00000000..0c8d6891
--- /dev/null
+++ b/Documentation/technical/weird-stuff.txt
@@ -0,0 +1,22 @@
+There's a lot of weird code in public-inbox which may be daunting
+to new hackers.
+
+* The event loop (PublicInbox::DS) is an evolution of a fairly standard
+  C10K event loop.  See ds.txt in this directory for more.
+
+Things got weirder in 2021:
+
+* The lei command-line tool is backed by a daemon.  This was done to
+  improve startup time for shell completion and manage git/SQLite/Xapian
+  single-writer during long, parallel imports.  It may eventually become
+  a read-write IMAP/JMAP server.
+
+* SOCK_SEQPACKET is used extensively in lei, and will likely make its
+  way into more places, still.
+
+And even more so in 2022:
+
+* public-inbox-clone / PublicInbox::LeiMirror relies on ->DESTROY
+  for make-like dependency management while providing parallelism.
+
+More to come, lei will expose Maildirs via FUSE 3...
diff --git a/Documentation/technical/whyperl.txt b/Documentation/technical/whyperl.txt
index fbe2e1b1..db1d9793 100644
--- a/Documentation/technical/whyperl.txt
+++ b/Documentation/technical/whyperl.txt
@@ -21,7 +21,7 @@ Good Things
 
   Perl 5 is installed on many, if not most GNU/Linux and
   BSD-based servers and workstations.  It is likely the most
-  widely-installed programming environment that offers a
+  widely installed programming environment that offers a
   significant amount of POSIX functionality.  Users won't
   have to waste bandwidth or space with giant toolchains or
   architecture-specific binaries.
@@ -47,8 +47,8 @@ Good Things
 
 * Predictable performance
 
-  While Perl is neither fast or memory-efficient, its
-  performance and memory use are predictable and does not
+  While Perl is neither fast nor memory-efficient, its
+  performance and memory use are predictable and do not
   require GC tuning by the user.
 
   public-inbox is developed for (and mostly on) old
@@ -56,7 +56,7 @@ Good Things
   late 1990s, and any cheap VPS today has more than enough
   RAM and CPU for handling plain-text email.
 
-  Low hardware requirements increases the reach of our software
+  Low hardware requirements increase the reach of our software
   to more users, improving centralization resistance.
 
 * Compatibility
@@ -86,7 +86,7 @@ Good Things
 
   There should be no need to rely on language-specific
   package managers such as cpan(1), those systems increase
-  the learning curve for users and systems administrators.
+  the learning curve for users and system administrators.
 
 * Compactness and terseness
 
@@ -98,7 +98,7 @@ Good Things
 * Performance ceiling and escape hatch
 
   With optional Inline::C, we can be "as fast as C" in some
-  cases.  Inline::C is widely-packaged by distros and it
+  cases.  Inline::C is widely packaged by distros and it
   gives us an escape hatch for dealing with missing bindings
   or performance problems should they arise.  Inline::C use
   (as opposed to XS) also preserves the software freedom and
@@ -135,7 +135,7 @@ Bad Things
   (m//, substr(), index(), etc.) still require memory copies
   into userspace, negating a benefit of zero-copy.
 
-* The XS/C API make it difficult to improve internals while
+* The XS/C API makes it difficult to improve internals while
   preserving compatibility.
 
 * Lack of optional type checking.  This may be a blessing in
@@ -161,14 +161,14 @@ Red herrings to ignore when evaluating other runtimes
 -----------------------------------------------------
 
 These don't discount a language or runtime from being
-being used, they're just not interesting.
+used, they're just not interesting.
 
 * Lightweight threading
 
   While lightweight threading implementations are
-  convenient, they tend to be significantly heavier than a
+  convenient, they tend to be significantly heavier than
   pure event-loop systems (or multi-threaded event-loop
-  systems)
+  systems).
 
   Lightweight threading implementations have stack overhead
   and growth typically measured in kilobytes.  The userspace