dumping ground for random patches and texts
 help / color / mirror / Atom feed
* [WIP 0/13] mail threading speedups
@ 2016-10-05  1:46 Eric Wong
  2016-10-05  1:46 ` [PATCH 01/13] view: remove "subject dummy" references Eric Wong
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

Eric Wong (13):
      view: remove "subject dummy" references
      thread: remove Mail::Thread dependency
      thread: pass array refs instead of entire arrays
      thread: remove accessor usage in internals
      inbox: ghost smsg
      thread: remove Email::Abstract wrapping
      thread: remove rootset accessor method
      thread: simplify
      thread: remove iterate_down
      thread: avoid incrementing undefined value
      thread: order_children no longer cares about depth
      thread: inline and remove recurse_down logic
      thread: fix sorting without topmost

 INSTALL                         |   1 -
 MANIFEST                        |   2 +-
 Makefile.PL                     |   1 -
 lib/PublicInbox/Inbox.pm        |   2 +
 lib/PublicInbox/SearchIdx.pm    |   4 +-
 lib/PublicInbox/SearchMsg.pm    |  29 -----
 lib/PublicInbox/SearchThread.pm | 238 ++++++++++++++++++++++++++++++++++++++++
 lib/PublicInbox/SearchView.pm   |  15 +--
 lib/PublicInbox/Thread.pm       |  86 ---------------
 lib/PublicInbox/View.pm         | 110 ++++++++-----------
 lib/PublicInbox/WWW.pm          |   2 +-
 t/plack.t                       |   3 +-
 t/search.t                      |   7 +-
 13 files changed, 301 insertions(+), 199 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 01/13] view: remove "subject dummy" references
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 02/13] thread: remove Mail::Thread dependency Eric Wong
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

We will not care for inexact threading by subject or pruning.
---
 lib/PublicInbox/View.pm | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 9359209..a3b2681 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -723,8 +723,6 @@ sub anchor_for {
 
 sub ghost_parent {
 	my ($upfx, $mid) = @_;
-	# 'subject dummy' is used internally by Mail::Thread
-	return '[no common parent]' if ($mid eq 'subject dummy');
 
 	$mid = PublicInbox::Hval->new_msgid($mid);
 	my $href = $mid->{href};
@@ -838,12 +836,6 @@ sub skel_dump {
 		my $dst = $ctx->{dst};
 		my $mapping = $ctx->{mapping};
 		my $map = $mapping->{$mid} if $mapping;
-		if ($mid eq 'subject dummy') {
-			my $ncp = "\t[no common parent]\n";
-			$map->[1] = $ncp if $map;
-			$$dst .= $ncp;
-			return;
-		}
 		my $d = $ctx->{pct} ? '    [irrelevant] ' # search result
 				    : '     [not found] ';
 		$d .= indent_for($level) . th_pfx($level);
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 02/13] thread: remove Mail::Thread dependency
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
  2016-10-05  1:46 ` [PATCH 01/13] view: remove "subject dummy" references Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 03/13] thread: pass array refs instead of entire arrays Eric Wong
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

Introduce our own SearchThread class for threading messages.
This should allow us to specialize and optimize away objects
in future commits.
---
 INSTALL                         |   1 -
 MANIFEST                        |   2 +-
 Makefile.PL                     |   1 -
 lib/PublicInbox/SearchIdx.pm    |   4 +-
 lib/PublicInbox/SearchThread.pm | 323 ++++++++++++++++++++++++++++++++++++++++
 lib/PublicInbox/SearchView.pm   |   4 +-
 lib/PublicInbox/Thread.pm       |  86 -----------
 lib/PublicInbox/View.pm         |   4 +-
 lib/PublicInbox/WWW.pm          |   2 +-
 t/plack.t                       |   3 +-
 10 files changed, 332 insertions(+), 98 deletions(-)
 create mode 100644 lib/PublicInbox/SearchThread.pm
 delete mode 100644 lib/PublicInbox/Thread.pm

diff --git a/INSTALL b/INSTALL
index 5851892..3a2f840 100644
--- a/INSTALL
+++ b/INSTALL
@@ -37,7 +37,6 @@ Optional components:
 Optional Perl modules:
 
   - Plack[1]                   libplack-perl
-  - Mail::Thread (2.5+)[1]     libmail-thread-perl
   - URI::Escape[1]             liburi-perl
   - Search::Xapian[2][3]       libsearch-xapian-perl
   - IO::Compress::Gzip[3]      perl-modules (or libio-compress-perl)
diff --git a/MANIFEST b/MANIFEST
index c39fa26..bcc4121 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -78,11 +78,11 @@ lib/PublicInbox/SaPlugin/ListMirror.pm
 lib/PublicInbox/Search.pm
 lib/PublicInbox/SearchIdx.pm
 lib/PublicInbox/SearchMsg.pm
+lib/PublicInbox/SearchThread.pm
 lib/PublicInbox/SearchView.pm
 lib/PublicInbox/Spamcheck/Spamc.pm
 lib/PublicInbox/Spawn.pm
 lib/PublicInbox/SpawnPP.pm
-lib/PublicInbox/Thread.pm
 lib/PublicInbox/Unsubscribe.pm
 lib/PublicInbox/View.pm
 lib/PublicInbox/WWW.pm
diff --git a/Makefile.PL b/Makefile.PL
index 4a91103..0bac7c9 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -22,7 +22,6 @@ WriteMakefile(
 		'Email::MIME::ContentType' => 0,
 		'Email::Simple' => 0,
 		'Encode::MIME::Header' => 0,
-		'Mail::Thread' => '2.5', # 2.5+ needed for Email::Simple compat
 		'Plack' => 0,
 		'URI::Escape' => 0,
 		# We have more test dependencies, but do not force
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 23aef9f..4aac028 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -4,8 +4,8 @@
 #
 # Indexes mail with Xapian and our (SQLite-based) ::Msgmap for use
 # with the web and NNTP interfaces.  This index maintains thread
-# relationships for use by Mail::Thread.  This writes to the search
-# index.
+# relationships for use by PublicInbox::SearchThread.
+# This writes to the search index.
 package PublicInbox::SearchIdx;
 use strict;
 use warnings;
diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
new file mode 100644
index 0000000..41fe859
--- /dev/null
+++ b/lib/PublicInbox/SearchThread.pm
@@ -0,0 +1,323 @@
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# This license differs from the rest of public-inbox
+#
+# Our own jwz-style threading class based on Mail::Thread from CPAN.
+# Mail::Thread is unmaintained and available on some distros.
+# We also do not want pruning or subject grouping, since we want
+# to encourage strict threading and hopefully encourage people
+# to use proper In-Reply-To.
+#
+# This includes fixes from several open bugs for Mail::Thread
+#
+# Avoid circular references
+# - https://rt.cpan.org/Public/Bug/Display.html?id=22817
+#
+# And avoid recursion in recurse_down:
+# - https://rt.cpan.org/Ticket/Display.html?id=116727
+# - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479
+package PublicInbox::SearchThread;
+use strict;
+use warnings;
+use Email::Abstract;
+
+sub new {
+	return bless {
+		messages => $_[1],
+		id_table => {},
+		rootset  => []
+	}, $_[0];
+}
+
+sub _get_hdr {
+	my ($class, $msg, $hdr) = @_;
+	Email::Abstract->get_header($msg, $hdr) || '';
+}
+
+sub _uniq {
+	my %seen;
+	return grep { !$seen{$_}++ } @_;
+}
+
+sub _references {
+	my $class = shift;
+	my $msg = shift;
+	my @references = ($class->_get_hdr($msg, "References") =~ /<([^>]+)>/g);
+	my $foo = $class->_get_hdr($msg, "In-Reply-To");
+	chomp $foo;
+	$foo =~ s/.*?<([^>]+)>.*/$1/;
+	push @references, $foo
+	  if $foo =~ /^\S+\@\S+$/ && (!@references || $references[-1] ne $foo);
+	return _uniq(@references);
+}
+
+sub _msgid {
+	my ($class, $msg) = @_;
+	my $id = $class->_get_hdr($msg, "Message-ID");
+	die "attempt to thread message with no id" unless $id;
+	chomp $id;
+	$id =~ s/^<([^>]+)>.*/$1/; # We expect this not to have <>s
+	return $id;
+}
+
+sub rootset { @{$_[0]{rootset}} }
+
+sub thread {
+	my $self = shift;
+	$self->_setup();
+	$self->{rootset} = [ grep { !$_->parent } values %{$self->{id_table}} ];
+	$self->_finish();
+}
+
+sub _finish {
+	my $self = shift;
+	delete $self->{id_table};
+	delete $self->{seen};
+}
+
+sub _get_cont_for_id {
+	my $self = shift;
+	my $id = shift;
+	$self->{id_table}{$id} ||= $self->_container_class->new($id);
+}
+
+sub _container_class { 'PublicInbox::SearchThread::Container' }
+
+sub _setup {
+	my ($self) = @_;
+
+	_add_message($self, $_) foreach @{$self->{messages}};
+}
+
+sub _add_message ($$) {
+	my ($self, $message) = @_;
+
+	# A. if id_table...
+	my $this_container = $self->_get_cont_for_id($self->_msgid($message));
+	$this_container->message($message);
+
+	# B. For each element in the message's References field:
+	my @refs = $self->_references($message);
+
+	my $prev;
+	for my $ref (@refs) {
+		# Find a Container object for the given Message-ID
+		my $container = $self->_get_cont_for_id($ref);
+
+		# Link the References field's Containers together in the
+		# order implied by the References header
+		# * If they are already linked don't change the existing links
+		# * Do not add a link if adding that link would introduce
+		#   a loop...
+
+		if ($prev &&
+			!$container->parent &&  # already linked
+			!$container->has_descendent($prev) # would loop
+		   ) {
+			$prev->add_child($container);
+		}
+		$prev = $container;
+	}
+
+	# C. Set the parent of this message to be the last element in
+	# References...
+	if ($prev &&
+		!$this_container->has_descendent($prev) # would loop
+	   ) {
+		$prev->add_child($this_container)
+	}
+}
+
+sub order {
+	my $self = shift;
+	my $ordersub = shift;
+
+	# make a fake root
+	my $root = $self->_container_class->new( 'fakeroot' );
+	$root->add_child( $_ ) for @{ $self->{rootset} };
+
+	# sort it
+	$root->order_children( $ordersub );
+
+	# and untangle it
+	my @kids = $root->children;
+	$self->{rootset} = \@kids;
+	$root->remove_child($_) for @kids;
+}
+
+package PublicInbox::SearchThread::Container;
+use Carp qw(croak);
+use Scalar::Util qw(weaken);
+
+sub new { my $self = shift; bless { id => shift }, $self; }
+
+sub message { $_[0]->{message} = $_[1] if @_ == 2; $_[0]->{message} }
+sub parent { @_ == 2 ? weaken($_[0]->{parent} = $_[1]) : $_[0]->{parent} }
+sub child { $_[0]->{child} = $_[1] if @_ == 2; $_[0]->{child} }
+sub next { $_[0]->{next} = $_[1] if @_ == 2; $_[0]->{next} }
+sub messageid { $_[0]->{id} }
+
+sub add_child {
+	my ($self, $child) = @_;
+	croak "Cowardly refusing to become my own parent: $self"
+	  if $self == $child;
+
+	if (grep { $_ == $child } $self->children) {
+		# All is potentially correct with the world
+		$child->parent($self);
+		return;
+	}
+
+	$child->parent->remove_child($child) if $child->parent;
+
+	$child->next($self->child);
+	$self->child($child);
+	$child->parent($self);
+}
+
+sub remove_child {
+	my ($self, $child) = @_;
+	return unless $self->child;
+	if ($self->child == $child) {  # First one's easy.
+		$self->child($child->next);
+		$child->next(undef);
+		$child->parent(undef);
+		return;
+	}
+
+	my $x = $self->child;
+	my $prev = $x;
+	while ($x = $x->next) {
+		if ($x == $child) {
+			$prev->next($x->next); # Unlink x
+			$x->next(undef);
+			$x->parent(undef);	 # Deparent it
+			return;
+		}
+		$prev = $x;
+	}
+	# oddly, we can get here
+	$child->next(undef);
+	$child->parent(undef);
+}
+
+sub has_descendent {
+	my $self = shift;
+	my $child = shift;
+	die "Assertion failed: $child" unless eval {$child};
+	my $there = 0;
+	$self->recurse_down(sub { $there = 1 if $_[0] == $child });
+
+	return $there;
+}
+
+sub children {
+	my $self = shift;
+	my @children;
+	my $visitor = $self->child;
+	while ($visitor) {
+		push @children, $visitor;
+		$visitor = $visitor->next
+	}
+	return @children;
+}
+
+sub set_children {
+	my $self = shift;
+	my $walk = $self->child( shift );
+	while (@_) { $walk = $walk->next( shift ) }
+	$walk->next(undef) if $walk;
+}
+
+sub order_children {
+	my $self = shift;
+	my $ordersub = shift;
+
+	return unless $ordersub;
+
+	my $sub = sub {
+		my $cont = shift;
+		my @children = $cont->children;
+		return if @children < 2;
+		$cont->set_children( $ordersub->( @children ) );
+	};
+	$self->iterate_down( undef, $sub );
+	undef $sub;
+}
+
+# non-recursive version of recurse_down to avoid stack depth warnings
+sub recurse_down {
+	my ($self, $callback) = @_;
+	my %seen;
+	my @q = ($self);
+	while (my $cont = shift @q) {
+		$seen{$cont}++;
+		$callback->($cont);
+
+		if (my $next = $cont->next) {
+			if ($seen{$next}) {
+				$cont->next(undef);
+			} else {
+				push @q, $next;
+			}
+		}
+		if (my $child = $cont->child) {
+			if ($seen{$child}) {
+				$cont->child(undef);
+			} else {
+				push @q, $child;
+			}
+		}
+	}
+}
+
+sub iterate_down {
+	my $self = shift;
+	my ($before, $after) = @_;
+
+	my %seen;
+	my $walk = $self;
+	my $depth = 0;
+	my @visited;
+	while ($walk) {
+		push @visited, [ $walk, $depth ];
+		$before->($walk, $depth) if $before;
+
+		# spot/break loops
+		$seen{$walk}++;
+
+		my $child = $walk->child;
+		if ($child && $seen{$child}) {
+			$walk->child(undef);
+			$child = undef;
+		}
+
+		my $next = $walk->next;
+		if ($next && $seen{$next}) {
+			$walk->next(undef);
+			$next = undef;
+		}
+
+		# go down, or across
+		if ($child) {
+			$next = $child;
+			++$depth;
+		}
+
+		# no next?  look up
+		if (!$next) {
+			my $up = $walk;
+			while ($up && !$next) {
+				$up = $up->parent;
+				--$depth;
+				$next = $up->next if $up;
+			}
+		}
+		$walk = $next;
+	}
+	return unless $after;
+	while (@visited) { $after->(@{ pop @visited }) }
+}
+
+1;
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 4f0811a..da31109 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -11,7 +11,7 @@ use PublicInbox::View;
 use PublicInbox::MID qw(mid2path mid_mime mid_clean mid_escape);
 use Email::MIME;
 require PublicInbox::Git;
-require PublicInbox::Thread;
+require PublicInbox::SearchThread;
 our $LIM = 50;
 
 sub noop {}
@@ -152,7 +152,7 @@ sub mset_thread {
 		$m;
 	} ($mset->items);
 
-	my $th = PublicInbox::Thread->new(@m);
+	my $th = PublicInbox::SearchThread->new(\@m);
 	$th->thread;
 	if ($q->{r}) { # order by relevance
 		$th->order(sub {
diff --git a/lib/PublicInbox/Thread.pm b/lib/PublicInbox/Thread.pm
deleted file mode 100644
index 8af9461..0000000
--- a/lib/PublicInbox/Thread.pm
+++ /dev/null
@@ -1,86 +0,0 @@
-# subclass Mail::Thread and use this to workaround a memory leak
-# Based on the patch in: https://rt.cpan.org/Public/Bug/Display.html?id=22817
-#
-# Additionally, workaround for a bug where $walk->topmost returns undef:
-# - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795913
-# - https://rt.cpan.org/Ticket/Display.html?id=106498
-#
-# And avoid recursion in recurse_down:
-# - https://rt.cpan.org/Ticket/Display.html?id=116727
-# - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479
-#
-# License differs from the rest of public-inbox (but is compatible):
-# This library is free software; you can redistribute it and/or modify
-# it under the same terms as Perl itself.
-package PublicInbox::Thread;
-use strict;
-use warnings;
-use base qw(Mail::Thread);
-# WARNING! both these Mail::Thread knobs were found by inspecting
-# the Mail::Thread 2.55 source code, and we have some monkey patches
-# in PublicInbox::Thread to fix memory leaks.  Since Mail::Thread
-# appears unmaintained, I suppose it's safe to depend on these
-# variables for now:
-{
-	no warnings 'once';
-	# we want strict threads to expose (and hopefully discourage)
-	# use of broken email clients
-	$Mail::Thread::nosubject = 1;
-	# Keep ghosts with only a single direct child,
-	# don't hide that there may be missing messages.
-	$Mail::Thread::noprune = 1;
-}
-
-if ($Mail::Thread::VERSION <= 2.55) {
-	eval q(sub _container_class { 'PublicInbox::Thread::Container' });
-}
-
-package PublicInbox::Thread::Container;
-use strict;
-use warnings;
-use base qw(Mail::Thread::Container);
-use Scalar::Util qw(weaken);
-sub parent { @_ == 2 ? weaken($_[0]->{parent} = $_[1]) : $_[0]->{parent} }
-
-sub topmost {
-	$_[0]->SUPER::topmost || PublicInbox::Thread::CPANRTBug106498->new;
-}
-
-# non-recursive version of recurse_down to avoid stack depth warnings
-sub recurse_down {
-	my ($self, $callback) = @_;
-	my %seen;
-	my @q = ($self);
-	while (my $cont = shift @q) {
-		$seen{$cont}++;
-		$callback->($cont);
-
-		if (my $next = $cont->next) {
-			if ($seen{$next}) {
-				$cont->next(undef);
-			} else {
-				push @q, $next;
-			}
-		}
-		if (my $child = $cont->child) {
-			if ($seen{$child}) {
-				$cont->child(undef);
-			} else {
-				push @q, $child;
-			}
-		}
-	}
-}
-
-# ref:
-# - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795913
-# - https://rt.cpan.org/Ticket/Display.html?id=106498
-package PublicInbox::Thread::CPANRTBug106498;
-use strict;
-use warnings;
-
-sub new { bless {}, $_[0] }
-
-sub simple_subject {}
-
-1;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a3b2681..9f1bf46 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -749,8 +749,8 @@ sub msg_timestamp {
 
 sub thread_results {
 	my ($msgs) = @_;
-	require PublicInbox::Thread;
-	my $th = PublicInbox::Thread->new(@$msgs);
+	require PublicInbox::SearchThread;
+	my $th = PublicInbox::SearchThread->new($msgs);
 	$th->thread;
 	$th->order(*sort_ts);
 	$th
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 4d599fc..11fc92e 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -112,7 +112,7 @@ sub call {
 sub preload {
 	require PublicInbox::Feed;
 	require PublicInbox::View;
-	require PublicInbox::Thread;
+	require PublicInbox::SearchThread;
 	require Email::MIME;
 	require Digest::SHA;
 	require POSIX;
diff --git a/t/plack.t b/t/plack.t
index 608afb9..1d62458 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -11,8 +11,7 @@ my $pi_config = "$tmpdir/config";
 my $maindir = "$tmpdir/main.git";
 my $addr = 'test-public@example.com';
 my $cfgpfx = "publicinbox.test";
-my @mods = qw(HTTP::Request::Common Plack::Test
-	Mail::Thread URI::Escape);
+my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape);
 foreach my $mod (@mods) {
 	eval "require $mod";
 	plan skip_all => "$mod missing for plack.t" if $@;
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 03/13] thread: pass array refs instead of entire arrays
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
  2016-10-05  1:46 ` [PATCH 01/13] view: remove "subject dummy" references Eric Wong
  2016-10-05  1:46 ` [PATCH 02/13] thread: remove Mail::Thread dependency Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 04/13] thread: remove accessor usage in internals Eric Wong
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

Copying large arrays is expensive, so avoid it.
This reduces /$INBOX/ time by around 1%.
---
 lib/PublicInbox/SearchThread.pm | 25 +++++++++++++------------
 lib/PublicInbox/SearchView.pm   |  4 ++--
 lib/PublicInbox/View.pm         |  4 ++--
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 41fe859..e086132 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -141,9 +141,9 @@ sub order {
 	$root->order_children( $ordersub );
 
 	# and untangle it
-	my @kids = $root->children;
-	$self->{rootset} = \@kids;
-	$root->remove_child($_) for @kids;
+	my $kids = $root->children;
+	$self->{rootset} = $kids;
+	$root->remove_child($_) for @$kids;
 }
 
 package PublicInbox::SearchThread::Container;
@@ -163,7 +163,7 @@ sub add_child {
 	croak "Cowardly refusing to become my own parent: $self"
 	  if $self == $child;
 
-	if (grep { $_ == $child } $self->children) {
+	if (grep { $_ == $child } @{$self->children}) {
 		# All is potentially correct with the world
 		$child->parent($self);
 		return;
@@ -220,14 +220,15 @@ sub children {
 		push @children, $visitor;
 		$visitor = $visitor->next
 	}
-	return @children;
+	\@children;
 }
 
 sub set_children {
-	my $self = shift;
-	my $walk = $self->child( shift );
-	while (@_) { $walk = $walk->next( shift ) }
-	$walk->next(undef) if $walk;
+	my ($self, $children) = @_;
+	my $walk = $self->{child} = shift @$children;
+	do {
+		$walk = $walk->{next} = shift @$children;
+	} while ($walk);
 }
 
 sub order_children {
@@ -238,9 +239,9 @@ sub order_children {
 
 	my $sub = sub {
 		my $cont = shift;
-		my @children = $cont->children;
-		return if @children < 2;
-		$cont->set_children( $ordersub->( @children ) );
+		my $children = $cont->children;
+		return if @$children < 2;
+		$cont->set_children( $ordersub->( $children ) );
 	};
 	$self->iterate_down( undef, $sub );
 	undef $sub;
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index da31109..0d54c3d 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -156,10 +156,10 @@ sub mset_thread {
 	$th->thread;
 	if ($q->{r}) { # order by relevance
 		$th->order(sub {
-			sort { (eval { $pct{$b->topmost->messageid} } || 0)
+			[ sort { (eval { $pct{$b->topmost->messageid} } || 0)
 					<=>
 				(eval { $pct{$a->topmost->messageid} } || 0)
-			} @_;
+			} @{$_[0]} ];
 		});
 	} else { # order by time (default for threaded view)
 		$th->order(*PublicInbox::View::sort_ts);
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 9f1bf46..e90efda 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -856,10 +856,10 @@ sub skel_dump {
 }
 
 sub sort_ts {
-	sort {
+	[ sort {
 		(eval { $a->topmost->message->header('X-PI-TS') } || 0) <=>
 		(eval { $b->topmost->message->header('X-PI-TS') } || 0)
-	} @_;
+	} @{$_[0]} ];
 }
 
 sub _tryload_ghost ($$) {
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 04/13] thread: remove accessor usage in internals
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (2 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 03/13] thread: pass array refs instead of entire arrays Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 05/13] inbox: ghost smsg Eric Wong
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

This improves top-level index generation performance by 3-4%.
---
 lib/PublicInbox/SearchThread.pm | 72 +++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 38 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index e086132..53fec97 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -66,7 +66,8 @@ sub rootset { @{$_[0]{rootset}} }
 sub thread {
 	my $self = shift;
 	$self->_setup();
-	$self->{rootset} = [ grep { !$_->parent } values %{$self->{id_table}} ];
+	$self->{rootset} = [
+			grep { !$_->{parent} } values %{$self->{id_table}} ];
 	$self->_finish();
 }
 
@@ -95,7 +96,7 @@ sub _add_message ($$) {
 
 	# A. if id_table...
 	my $this_container = $self->_get_cont_for_id($self->_msgid($message));
-	$this_container->message($message);
+	$this_container->{message} = $message;
 
 	# B. For each element in the message's References field:
 	my @refs = $self->_references($message);
@@ -112,7 +113,7 @@ sub _add_message ($$) {
 		#   a loop...
 
 		if ($prev &&
-			!$container->parent &&  # already linked
+			!$container->{parent} &&  # already linked
 			!$container->has_descendent($prev) # would loop
 		   ) {
 			$prev->add_child($container);
@@ -152,10 +153,9 @@ use Scalar::Util qw(weaken);
 
 sub new { my $self = shift; bless { id => shift }, $self; }
 
-sub message { $_[0]->{message} = $_[1] if @_ == 2; $_[0]->{message} }
-sub parent { @_ == 2 ? weaken($_[0]->{parent} = $_[1]) : $_[0]->{parent} }
-sub child { $_[0]->{child} = $_[1] if @_ == 2; $_[0]->{child} }
-sub next { $_[0]->{next} = $_[1] if @_ == 2; $_[0]->{next} }
+sub message { $_[0]->{message} }
+sub child { $_[0]->{child} }
+sub next { $_[0]->{next} }
 sub messageid { $_[0]->{id} }
 
 sub add_child {
@@ -165,41 +165,39 @@ sub add_child {
 
 	if (grep { $_ == $child } @{$self->children}) {
 		# All is potentially correct with the world
-		$child->parent($self);
+		weaken($child->{parent} = $self);
 		return;
 	}
 
-	$child->parent->remove_child($child) if $child->parent;
+	my $parent = $child->{parent};
+	remove_child($parent, $child) if $parent;
 
-	$child->next($self->child);
-	$self->child($child);
-	$child->parent($self);
+	$child->{next} = $self->{child};
+	$self->{child} = $child;
+	weaken($child->{parent} = $self);
 }
 
 sub remove_child {
 	my ($self, $child) = @_;
-	return unless $self->child;
-	if ($self->child == $child) {  # First one's easy.
-		$self->child($child->next);
-		$child->next(undef);
-		$child->parent(undef);
+
+	my $x = $self->{child} or return;
+	if ($x == $child) {  # First one's easy.
+		$self->{child} = $child->{next};
+		$child->{parent} = $child->{next} = undef;
 		return;
 	}
 
-	my $x = $self->child;
 	my $prev = $x;
-	while ($x = $x->next) {
+	while ($x = $x->{next}) {
 		if ($x == $child) {
-			$prev->next($x->next); # Unlink x
-			$x->next(undef);
-			$x->parent(undef);	 # Deparent it
+			$prev->{next} = $x->{next}; # Unlink x
+			$x->{next} = $x->{parent} = undef; # Deparent it
 			return;
 		}
 		$prev = $x;
 	}
 	# oddly, we can get here
-	$child->next(undef);
-	$child->parent(undef);
+	$child->{next} = $child->{parent} = undef;
 }
 
 sub has_descendent {
@@ -215,10 +213,10 @@ sub has_descendent {
 sub children {
 	my $self = shift;
 	my @children;
-	my $visitor = $self->child;
+	my $visitor = $self->{child};
 	while ($visitor) {
 		push @children, $visitor;
-		$visitor = $visitor->next
+		$visitor = $visitor->{next};
 	}
 	\@children;
 }
@@ -256,16 +254,16 @@ sub recurse_down {
 		$seen{$cont}++;
 		$callback->($cont);
 
-		if (my $next = $cont->next) {
+		if (my $next = $cont->{next}) {
 			if ($seen{$next}) {
-				$cont->next(undef);
+				$cont->{next} = undef;
 			} else {
 				push @q, $next;
 			}
 		}
-		if (my $child = $cont->child) {
+		if (my $child = $cont->{child}) {
 			if ($seen{$child}) {
-				$cont->child(undef);
+				$cont->{child} = undef;
 			} else {
 				push @q, $child;
 			}
@@ -288,16 +286,14 @@ sub iterate_down {
 		# spot/break loops
 		$seen{$walk}++;
 
-		my $child = $walk->child;
+		my $child = $walk->{child};
 		if ($child && $seen{$child}) {
-			$walk->child(undef);
-			$child = undef;
+			$walk->{child} = $child = undef;
 		}
 
-		my $next = $walk->next;
+		my $next = $walk->{next};
 		if ($next && $seen{$next}) {
-			$walk->next(undef);
-			$next = undef;
+			$walk->{next} = $next = undef;
 		}
 
 		# go down, or across
@@ -310,9 +306,9 @@ sub iterate_down {
 		if (!$next) {
 			my $up = $walk;
 			while ($up && !$next) {
-				$up = $up->parent;
+				$up = $up->{parent};
 				--$depth;
-				$next = $up->next if $up;
+				$next = $up->{next} if $up;
 			}
 		}
 		$walk = $next;
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 05/13] inbox: ghost smsg
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (3 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 04/13] thread: remove accessor usage in internals Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 06/13] thread: remove Email::Abstract wrapping Eric Wong
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

smsg will be undef for ghost messages in a subsequent commit
---
 lib/PublicInbox/Inbox.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 414973c..8c63908 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -211,6 +211,8 @@ sub msg_by_path ($$;$) {
 sub msg_by_smsg ($$;$) {
 	my ($self, $smsg, $ref) = @_;
 
+	return unless defined $smsg; # ghost
+
 	# backwards compat to fallback to msg_by_mid
 	# TODO: remove if we bump SCHEMA_VERSION in Search.pm:
 	defined(my $blob = $smsg->blob) or return msg_by_mid($self, $smsg->mid);
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 06/13] thread: remove Email::Abstract wrapping
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (4 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 05/13] inbox: ghost smsg Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 07/13] thread: remove rootset accessor method Eric Wong
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

This roughly doubles performance due to the reduction in
object creation and abstraction layers.
---
 lib/PublicInbox/SearchMsg.pm    | 29 ------------
 lib/PublicInbox/SearchThread.pm | 99 ++++++++++++-----------------------------
 lib/PublicInbox/SearchView.pm   |  8 ++--
 lib/PublicInbox/View.pm         | 90 +++++++++++++++++--------------------
 t/search.t                      |  7 +--
 5 files changed, 76 insertions(+), 157 deletions(-)

diff --git a/lib/PublicInbox/SearchMsg.pm b/lib/PublicInbox/SearchMsg.pm
index 9d873c4..9dcc1e6 100644
--- a/lib/PublicInbox/SearchMsg.pm
+++ b/lib/PublicInbox/SearchMsg.pm
@@ -144,35 +144,6 @@ sub ensure_metadata {
 	}
 }
 
-# for threading only
-sub mini_mime {
-	my ($self) = @_;
-	$self->ensure_metadata;
-	my @h = (
-		'Subject' => $self->subject,
-		'X-PI-From' => $self->from_name,
-		# prevent Email::Simple::Creator from running,
-		# this header is useless for threading as we use X-PI-TS
-		# for sorting and display:
-		'Date' => EPOCH_822,
-		'Message-ID' => "<$self->{mid}>",
-		'X-PI-TS' => $self->ts,
-	);
-	if (my $refs = $self->{references}) {
-		push @h, References => $refs;
-	}
-	my $mime = Email::MIME->create(header => \@h);
-	my $h = $mime->header_obj;
-
-	# set these headers manually since Encode::encode('MIME-Q', ...)
-	# will add spaces to long values when using header_str above.
-
-	# drop useless headers Email::MIME set for us
-	$h->header_set('Date');
-	$h->header_set('MIME-Version');
-	$mime;
-}
-
 sub mid ($;$) {
 	my ($self, $mid) = @_;
 
diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 53fec97..d39e9b6 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -20,7 +20,6 @@
 package PublicInbox::SearchThread;
 use strict;
 use warnings;
-use Email::Abstract;
 
 sub new {
 	return bless {
@@ -30,37 +29,6 @@ sub new {
 	}, $_[0];
 }
 
-sub _get_hdr {
-	my ($class, $msg, $hdr) = @_;
-	Email::Abstract->get_header($msg, $hdr) || '';
-}
-
-sub _uniq {
-	my %seen;
-	return grep { !$seen{$_}++ } @_;
-}
-
-sub _references {
-	my $class = shift;
-	my $msg = shift;
-	my @references = ($class->_get_hdr($msg, "References") =~ /<([^>]+)>/g);
-	my $foo = $class->_get_hdr($msg, "In-Reply-To");
-	chomp $foo;
-	$foo =~ s/.*?<([^>]+)>.*/$1/;
-	push @references, $foo
-	  if $foo =~ /^\S+\@\S+$/ && (!@references || $references[-1] ne $foo);
-	return _uniq(@references);
-}
-
-sub _msgid {
-	my ($class, $msg) = @_;
-	my $id = $class->_get_hdr($msg, "Message-ID");
-	die "attempt to thread message with no id" unless $id;
-	chomp $id;
-	$id =~ s/^<([^>]+)>.*/$1/; # We expect this not to have <>s
-	return $id;
-}
-
 sub rootset { @{$_[0]{rootset}} }
 
 sub thread {
@@ -77,14 +45,11 @@ sub _finish {
 	delete $self->{seen};
 }
 
-sub _get_cont_for_id {
-	my $self = shift;
-	my $id = shift;
-	$self->{id_table}{$id} ||= $self->_container_class->new($id);
+sub _get_cont_for_id ($$) {
+	my ($self, $mid) = @_;
+	$self->{id_table}{$mid} ||= PublicInbox::SearchThread::Msg->new($mid);
 }
 
-sub _container_class { 'PublicInbox::SearchThread::Container' }
-
 sub _setup {
 	my ($self) = @_;
 
@@ -92,41 +57,40 @@ sub _setup {
 }
 
 sub _add_message ($$) {
-	my ($self, $message) = @_;
+	my ($self, $smsg) = @_;
 
 	# A. if id_table...
-	my $this_container = $self->_get_cont_for_id($self->_msgid($message));
-	$this_container->{message} = $message;
+	my $this = _get_cont_for_id($self, $smsg->{mid});
+	$this->{smsg} = $smsg;
 
 	# B. For each element in the message's References field:
-	my @refs = $self->_references($message);
-
 	my $prev;
-	for my $ref (@refs) {
-		# Find a Container object for the given Message-ID
-		my $container = $self->_get_cont_for_id($ref);
-
-		# Link the References field's Containers together in the
-		# order implied by the References header
-		# * If they are already linked don't change the existing links
-		# * Do not add a link if adding that link would introduce
-		#   a loop...
-
-		if ($prev &&
-			!$container->{parent} &&  # already linked
-			!$container->has_descendent($prev) # would loop
-		   ) {
-			$prev->add_child($container);
+	if (defined(my $refs = $smsg->{references})) {
+		foreach my $ref ($refs =~ m/<([^>]+)>/g) {
+			# Find a Container object for the given Message-ID
+			my $cont = _get_cont_for_id($self, $ref);
+
+			# Link the References field's Containers together in
+			# the order implied by the References header
+			#
+			# * If they are already linked don't change the
+			#   existing links
+			# * Do not add a link if adding that link would
+			#   introduce a loop...
+			if ($prev &&
+				!$cont->{parent} &&  # already linked
+				!$cont->has_descendent($prev) # would loop
+			   ) {
+				$prev->add_child($cont);
+			}
+			$prev = $cont;
 		}
-		$prev = $container;
 	}
 
 	# C. Set the parent of this message to be the last element in
 	# References...
-	if ($prev &&
-		!$this_container->has_descendent($prev) # would loop
-	   ) {
-		$prev->add_child($this_container)
+	if ($prev && !$this->has_descendent($prev)) { # would loop
+		$prev->add_child($this)
 	}
 }
 
@@ -135,7 +99,7 @@ sub order {
 	my $ordersub = shift;
 
 	# make a fake root
-	my $root = $self->_container_class->new( 'fakeroot' );
+	my $root = _get_cont_for_id($self, 'fakeroot');
 	$root->add_child( $_ ) for @{ $self->{rootset} };
 
 	# sort it
@@ -147,17 +111,12 @@ sub order {
 	$root->remove_child($_) for @$kids;
 }
 
-package PublicInbox::SearchThread::Container;
+package PublicInbox::SearchThread::Msg;
 use Carp qw(croak);
 use Scalar::Util qw(weaken);
 
 sub new { my $self = shift; bless { id => shift }, $self; }
 
-sub message { $_[0]->{message} }
-sub child { $_[0]->{child} }
-sub next { $_[0]->{next} }
-sub messageid { $_[0]->{id} }
-
 sub add_child {
 	my ($self, $child) = @_;
 	croak "Cowardly refusing to become my own parent: $self"
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 0d54c3d..ebeb41f 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -148,7 +148,6 @@ sub mset_thread {
 		my $i = $_;
 		my $m = PublicInbox::SearchMsg->load_doc($i->get_document);
 		$pct{$m->mid} = $i->get_percent;
-		$m = $m->mini_mime;
 		$m;
 	} ($mset->items);
 
@@ -156,9 +155,9 @@ sub mset_thread {
 	$th->thread;
 	if ($q->{r}) { # order by relevance
 		$th->order(sub {
-			[ sort { (eval { $pct{$b->topmost->messageid} } || 0)
+			[ sort { (eval { $pct{$b->topmost->{id}} } || 0)
 					<=>
-				(eval { $pct{$a->topmost->messageid} } || 0)
+				(eval { $pct{$a->topmost->{id}} } || 0)
 			} @{$_[0]} ];
 		});
 	} else { # order by time (default for threaded view)
@@ -185,8 +184,7 @@ sub mset_thread {
 	sub {
 		return unless $msgs;
 		while ($mime = shift @$msgs) {
-			my $mid = mid_clean(mid_mime($mime));
-			$mime = $inbox->msg_by_mid($mid) and last;
+			$mime = $inbox->msg_by_smsg($mime) and last;
 		}
 		if ($mime) {
 			$mime = Email::MIME->new($mime);
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index e90efda..748e391 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -214,12 +214,12 @@ sub _th_index_lite {
 		$rv .= $pad . $irt_map->[1];
 		if ($idx > 0) {
 			my $prev = $siblings->[$idx - 1];
-			my $pmid = $prev->messageid;
+			my $pmid = $prev->{id};
 			if ($idx > 2) {
 				my $s = ($idx - 1). ' preceding siblings ...';
 				$rv .= pad_link($pmid, $level, $s);
 			} elsif ($idx == 2) {
-				my $ppmid = $siblings->[0]->messageid;
+				my $ppmid = $siblings->[0]->{id};
 				$rv .= $pad . $mapping->{$ppmid}->[1];
 			}
 			$rv .= $pad . $mapping->{$pmid}->[1];
@@ -233,25 +233,25 @@ sub _th_index_lite {
 	$this =~ s!<a\nhref=[^>]+>([^<]+)</a>!$1!s; # no point linking to self
 	$rv .= "<b>@ $this";
 	my $node = $map->[2];
-	if (my $child = $node->child) {
-		my $cmid = $child->messageid;
+	if (my $child = $node->{child}) {
+		my $cmid = $child->{id};
 		$rv .= $pad . $mapping->{$cmid}->[1];
 		if ($nr_c > 2) {
 			my $s = ($nr_c - 1). ' more replies';
 			$rv .= pad_link($cmid, $level + 1, $s);
-		} elsif (my $cn = $child->next) {
-			$rv .= $pad . $mapping->{$cn->messageid}->[1];
+		} elsif (my $cn = $child->{next}) {
+			$rv .= $pad . $mapping->{$cn->{id}}->[1];
 		}
 	}
-	if (my $next = $node->next) {
-		my $nmid = $next->messageid;
+	if (my $next = $node->{next}) {
+		my $nmid = $next->{id};
 		$rv .= $pad . $mapping->{$nmid}->[1];
 		my $nnext = $nr_s - $idx;
 		if ($nnext > 2) {
 			my $s = ($nnext - 1).' subsequent siblings';
 			$rv .= pad_link($nmid, $level, $s);
-		} elsif (my $nn = $next->next) {
-			$rv .= $pad . $mapping->{$nn->messageid}->[1];
+		} elsif (my $nn = $next->{next}) {
+			$rv .= $pad . $mapping->{$nn->{id}}->[1];
 		}
 	}
 	$rv .= $pad ."<a\nhref=#r$id>$s_s, $s_c; $ctx->{s_nr}</a>\n";
@@ -264,7 +264,7 @@ sub walk_thread {
 		my $level = shift @q;
 		my $node = shift @q or next;
 		$cb->($ctx, $level, $node);
-		unshift @q, $level+1, $node->child, $level, $node->next;
+		unshift @q, $level+1, $node->{child}, $level, $node->{next};
 	}
 }
 
@@ -272,12 +272,12 @@ sub pre_thread  {
 	my ($ctx, $level, $node) = @_;
 	my $mapping = $ctx->{mapping};
 	my $idx = -1;
-	if (my $parent = $node->parent) {
-		my $m = $mapping->{$parent->messageid}->[0];
+	if (my $parent = $node->{parent}) {
+		my $m = $mapping->{$parent->{id}}->[0];
 		$idx = scalar @$m;
 		push @$m, $node;
 	}
-	$mapping->{$node->messageid} = [ [], '', $node, $idx, $level ];
+	$mapping->{$node->{id}} = [ [], '', $node, $idx, $level ];
 	skel_dump($ctx, $level, $node);
 }
 
@@ -296,8 +296,8 @@ sub stream_thread ($$) {
 	while (@q) {
 		$level = shift @q;
 		my $node = shift @q or next;
-		unshift @q, $level+1, $node->child, $level, $node->next;
-		$mime = $inbox->msg_by_mid($node->messageid) and last;
+		unshift @q, $level+1, $node->{child}, $level, $node->{next};
+		$mime = $inbox->msg_by_smsg($node->{smsg}) and last;
 	}
 	return missing_thread($ctx) unless $mime;
 
@@ -309,13 +309,14 @@ sub stream_thread ($$) {
 		while (@q) {
 			$level = shift @q;
 			my $node = shift @q or next;
-			unshift @q, $level+1, $node->child, $level, $node->next;
-			my $mid = $node->messageid;
-			if ($mime = $inbox->msg_by_mid($mid)) {
+			unshift @q, $level+1, $node->{child},
+					$level, $node->{next};
+			my $mid = $node->{id};
+			if ($mime = $inbox->msg_by_smsg($node->{smsg})) {
 				$mime = Email::MIME->new($mime);
 				return thread_index_entry($ctx, $level, $mime);
 			} else {
-				return ghost_index_entry($ctx, $level, $mid);
+				return ghost_index_entry($ctx, $level, $node);
 			}
 		}
 		my $ret = join('', thread_adj_level($ctx, 0));
@@ -355,11 +356,11 @@ sub thread_html {
 	$skel .= '</pre>';
 	return stream_thread($th, $ctx) unless $ctx->{flat};
 
-	# flat display: lazy load the full message from mini_mime:
+	# flat display: lazy load the full message from smsg
 	my $inbox = $ctx->{-inbox};
 	my $mime;
 	while ($mime = shift @$msgs) {
-		$mime = $inbox->msg_by_mid(mid_clean(mid_mime($mime))) and last;
+		$mime = $inbox->msg_by_smsg($mime) and last;
 	}
 	return missing_thread($ctx) unless $mime;
 	$mime = Email::MIME->new($mime);
@@ -369,8 +370,7 @@ sub thread_html {
 	PublicInbox::WwwStream->response($ctx, 200, sub {
 		return unless $msgs;
 		while ($mime = shift @$msgs) {
-			$mid = mid_clean(mid_mime($mime));
-			$mime = $inbox->msg_by_mid($mid) and last;
+			$mime = $inbox->msg_by_smsg($mime) and last;
 		}
 		if ($mime) {
 			$mime = Email::MIME->new($mime);
@@ -738,7 +738,7 @@ sub indent_for {
 sub load_results {
 	my ($sres) = @_;
 
-	[ map { $_->mini_mime } @{delete $sres->{msgs}} ];
+	[ map { $_->ensure_metadata; $_ } @{delete $sres->{msgs}} ];
 }
 
 sub msg_timestamp {
@@ -771,13 +771,13 @@ sub _msg_date {
 sub fmt_ts { POSIX::strftime('%Y-%m-%d %k:%M', gmtime($_[0])) }
 
 sub _skel_header {
-	my ($ctx, $hdr, $level) = @_;
+	my ($ctx, $smsg, $level) = @_;
 
 	my $dst = $ctx->{dst};
 	my $cur = $ctx->{cur};
-	my $mid = mid_clean($hdr->header_raw('Message-ID'));
-	my $f = ascii_html($hdr->header('X-PI-From'));
-	my $d = _msg_date($hdr) . ' ' . indent_for($level) . th_pfx($level);
+	my $mid = $smsg->{mid};
+	my $f = ascii_html($smsg->from_name);
+	my $d = fmt_ts($smsg->{ts}) . ' ' . indent_for($level) . th_pfx($level);
 	my $attr = $f;
 	$ctx->{first_level} ||= $level;
 
@@ -802,7 +802,7 @@ sub _skel_header {
 	# Subject is never undef, this mail was loaded from
 	# our Xapian which would've resulted in '' if it were
 	# really missing (and Filter rejects empty subjects)
-	my $s = $hdr->header('Subject');
+	my $s = $smsg->subject;
 	my $h = $ctx->{srch}->subject_path($s);
 	if ($ctx->{seen}->{$h}) {
 		$s = undef;
@@ -829,10 +829,10 @@ sub _skel_header {
 
 sub skel_dump {
 	my ($ctx, $level, $node) = @_;
-	if (my $mime = $node->message) {
-		_skel_header($ctx, $mime->header_obj, $level);
+	if (my $smsg = $node->{smsg}) {
+		_skel_header($ctx, $smsg, $level);
 	} else {
-		my $mid = $node->messageid;
+		my $mid = $node->{id};
 		my $dst = $ctx->{dst};
 		my $mapping = $ctx->{mapping};
 		my $map = $mapping->{$mid} if $mapping;
@@ -857,31 +857,24 @@ sub skel_dump {
 
 sub sort_ts {
 	[ sort {
-		(eval { $a->topmost->message->header('X-PI-TS') } || 0) <=>
-		(eval { $b->topmost->message->header('X-PI-TS') } || 0)
+		(eval { $a->topmost->{smsg}->ts } || 0) <=>
+		(eval { $b->topmost->{smsg}->ts } || 0)
 	} @{$_[0]} ];
 }
 
-sub _tryload_ghost ($$) {
-	my ($srch, $mid) = @_;
-	my $smsg = $srch->lookup_mail($mid) or return;
-	$smsg->mini_mime;
-}
-
 # accumulate recent topics if search is supported
 # returns 200 if done, 404 if not
 sub acc_topic {
 	my ($ctx, $level, $node) = @_;
 	my $srch = $ctx->{srch};
-	my $mid = $node->messageid;
-	my $x = $node->message || _tryload_ghost($srch, $mid);
+	my $mid = $node->{id};
+	my $x = $node->{smsg} || $srch->lookup_mail($mid);
 	my ($subj, $ts);
 	my $topic;
 	if ($x) {
-		$x = $x->header_obj;
-		$subj = $x->header('Subject') || '';
+		$subj = $x->subject;
 		$subj = $srch->subject_normalized($subj);
-		$ts = $x->header('X-PI-TS');
+		$ts = $x->ts;
 		if ($level == 0) {
 			$topic = [ $ts, 1, { $subj => $mid }, $subj ];
 			$ctx->{-cur_topic} = $topic;
@@ -1023,9 +1016,10 @@ sub thread_adj_level {
 }
 
 sub ghost_index_entry {
-	my ($ctx, $level, $mid) = @_;
+	my ($ctx, $level, $node) = @_;
 	my ($beg, $end) = thread_adj_level($ctx,  $level);
-	$beg . '<pre>'. ghost_parent($ctx->{-upfx}, $mid) . '</pre>' . $end;
+	$beg . '<pre>'. ghost_parent($ctx->{-upfx}, $node->{id})
+		. '</pre>' . $end;
 }
 
 1;
diff --git a/t/search.t b/t/search.t
index cce3b9e..eed9c9b 100644
--- a/t/search.t
+++ b/t/search.t
@@ -293,7 +293,7 @@ sub filter_mids {
 	$smsg->ensure_metadata;
 	is($smsg->references, '', "no references created");
 	my $msg = PublicInbox::SearchMsg->load_doc($smsg->{doc});
-	is($s, $msg->mini_mime->header('Subject'), 'long subject not rewritten');
+	is($s, $msg->subject, 'long subject not rewritten');
 }
 
 {
@@ -310,10 +310,7 @@ sub filter_mids {
 	my $smsg = $rw->lookup_message('testmessage@example.com');
 	my $msg = PublicInbox::SearchMsg->load_doc($smsg->{doc});
 
-	# mini_mime technically not valid (I think),
-	# but good enough for displaying HTML:
-	is($mime->header('Subject'), $msg->mini_mime->header('Subject'),
-		'UTF-8 subject preserved');
+	is($mime->header('Subject'), $msg->subject, 'UTF-8 subject preserved');
 }
 
 {
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 07/13] thread: remove rootset accessor method
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (5 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 06/13] thread: remove Email::Abstract wrapping Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 08/13] thread: simplify Eric Wong
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

It doesn't buy us much and copying to a new array is slower;
but probably not measurable in real-world use.
---
 lib/PublicInbox/SearchThread.pm | 2 --
 lib/PublicInbox/View.pm         | 4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index d39e9b6..7e89946 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -29,8 +29,6 @@ sub new {
 	}, $_[0];
 }
 
-sub rootset { @{$_[0]{rootset}} }
-
 sub thread {
 	my $self = shift;
 	$self->_setup();
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 748e391..7554d54 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -259,7 +259,7 @@ sub _th_index_lite {
 
 sub walk_thread {
 	my ($th, $ctx, $cb) = @_;
-	my @q = map { (0, $_) } $th->rootset;
+	my @q = map { (0, $_) } @{$th->{rootset}};
 	while (@q) {
 		my $level = shift @q;
 		my $node = shift @q or next;
@@ -291,7 +291,7 @@ sub stream_thread ($$) {
 	my ($th, $ctx) = @_;
 	my $inbox = $ctx->{-inbox};
 	my $mime;
-	my @q = map { (0, $_) } $th->rootset;
+	my @q = map { (0, $_) } @{$th->{rootset}};
 	my $level;
 	while (@q) {
 		$level = shift @q;
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 08/13] thread: simplify
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (6 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 07/13] thread: remove rootset accessor method Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 09/13] thread: remove iterate_down Eric Wong
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

Single use subroutines actually make the code more complex in
this case, and there's never a {seen} field in $self.
---
 lib/PublicInbox/SearchThread.pm | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 7e89946..a161662 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -31,16 +31,10 @@ sub new {
 
 sub thread {
 	my $self = shift;
-	$self->_setup();
+	_add_message($self, $_) foreach @{$self->{messages}};
 	$self->{rootset} = [
 			grep { !$_->{parent} } values %{$self->{id_table}} ];
-	$self->_finish();
-}
-
-sub _finish {
-	my $self = shift;
 	delete $self->{id_table};
-	delete $self->{seen};
 }
 
 sub _get_cont_for_id ($$) {
@@ -48,12 +42,6 @@ sub _get_cont_for_id ($$) {
 	$self->{id_table}{$mid} ||= PublicInbox::SearchThread::Msg->new($mid);
 }
 
-sub _setup {
-	my ($self) = @_;
-
-	_add_message($self, $_) foreach @{$self->{messages}};
-}
-
 sub _add_message ($$) {
 	my ($self, $smsg) = @_;
 
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 09/13] thread: remove iterate_down
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (7 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 08/13] thread: simplify Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:46 ` [PATCH 10/13] thread: avoid incrementing undefined value Eric Wong
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

Unnecessary subs and complexity.  This was hiding the fact
that $before is never used.
---
 lib/PublicInbox/SearchThread.pm | 39 +++++++++++++--------------------------
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index a161662..dad783e 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -81,8 +81,7 @@ sub _add_message ($$) {
 }
 
 sub order {
-	my $self = shift;
-	my $ordersub = shift;
+	my ($self, $ordersub) = @_;
 
 	# make a fake root
 	my $root = _get_cont_for_id($self, 'fakeroot');
@@ -174,22 +173,6 @@ sub set_children {
 	} while ($walk);
 }
 
-sub order_children {
-	my $self = shift;
-	my $ordersub = shift;
-
-	return unless $ordersub;
-
-	my $sub = sub {
-		my $cont = shift;
-		my $children = $cont->children;
-		return if @$children < 2;
-		$cont->set_children( $ordersub->( $children ) );
-	};
-	$self->iterate_down( undef, $sub );
-	undef $sub;
-}
-
 # non-recursive version of recurse_down to avoid stack depth warnings
 sub recurse_down {
 	my ($self, $callback) = @_;
@@ -216,17 +199,14 @@ sub recurse_down {
 	}
 }
 
-sub iterate_down {
-	my $self = shift;
-	my ($before, $after) = @_;
+sub order_children {
+	my ($walk, $ordersub) = @_;
 
 	my %seen;
-	my $walk = $self;
 	my $depth = 0;
 	my @visited;
 	while ($walk) {
-		push @visited, [ $walk, $depth ];
-		$before->($walk, $depth) if $before;
+		push @visited, $walk;
 
 		# spot/break loops
 		$seen{$walk}++;
@@ -258,8 +238,15 @@ sub iterate_down {
 		}
 		$walk = $next;
 	}
-	return unless $after;
-	while (@visited) { $after->(@{ pop @visited }) }
+	foreach my $cont (@visited) {
+		my $children = $cont->children;
+		next if @$children < 2;
+		$children = $ordersub->($children);
+		$cont = $cont->{child} = shift @$children;
+		do {
+			$cont = $cont->{next} = shift @$children;
+		} while ($cont);
+	}
 }
 
 1;
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 10/13] thread: avoid incrementing undefined value
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (8 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 09/13] thread: remove iterate_down Eric Wong
@ 2016-10-05  1:46 ` Eric Wong
  2016-10-05  1:47 ` [PATCH 11/13] thread: order_children no longer cares about depth Eric Wong
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:46 UTC (permalink / raw)
  To: spew

It is pointless to increment when setting a true value is
simpler as there is no need to read before writing.
---
 lib/PublicInbox/SearchThread.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index dad783e..ba31f43 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -179,7 +179,7 @@ sub recurse_down {
 	my %seen;
 	my @q = ($self);
 	while (my $cont = shift @q) {
-		$seen{$cont}++;
+		$seen{$cont} = 1;
 		$callback->($cont);
 
 		if (my $next = $cont->{next}) {
@@ -209,7 +209,7 @@ sub order_children {
 		push @visited, $walk;
 
 		# spot/break loops
-		$seen{$walk}++;
+		$seen{$walk} = 1;
 
 		my $child = $walk->{child};
 		if ($child && $seen{$child}) {
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 11/13] thread: order_children no longer cares about depth
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (9 preceding siblings ...)
  2016-10-05  1:46 ` [PATCH 10/13] thread: avoid incrementing undefined value Eric Wong
@ 2016-10-05  1:47 ` Eric Wong
  2016-10-05  1:47 ` [PATCH 12/13] thread: inline and remove recurse_down logic Eric Wong
  2016-10-05  1:47 ` [PATCH 13/13] thread: fix sorting without topmost Eric Wong
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:47 UTC (permalink / raw)
  To: spew

We never use the depth anywhere in this sub
---
 lib/PublicInbox/SearchThread.pm | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index ba31f43..2e7b79a 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -203,7 +203,6 @@ sub order_children {
 	my ($walk, $ordersub) = @_;
 
 	my %seen;
-	my $depth = 0;
 	my @visited;
 	while ($walk) {
 		push @visited, $walk;
@@ -222,17 +221,13 @@ sub order_children {
 		}
 
 		# go down, or across
-		if ($child) {
-			$next = $child;
-			++$depth;
-		}
+		$next = $child if $child;
 
 		# no next?  look up
 		if (!$next) {
 			my $up = $walk;
 			while ($up && !$next) {
 				$up = $up->{parent};
-				--$depth;
 				$next = $up->{next} if $up;
 			}
 		}
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 12/13] thread: inline and remove recurse_down logic
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (10 preceding siblings ...)
  2016-10-05  1:47 ` [PATCH 11/13] thread: order_children no longer cares about depth Eric Wong
@ 2016-10-05  1:47 ` Eric Wong
  2016-10-05  1:47 ` [PATCH 13/13] thread: fix sorting without topmost Eric Wong
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:47 UTC (permalink / raw)
  To: spew

We no longer recurse, and it's too hard to come up with
a new name for a sub we will only use once.
---
 lib/PublicInbox/SearchThread.pm | 55 +++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 2e7b79a..153eef2 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -145,42 +145,13 @@ sub remove_child {
 }
 
 sub has_descendent {
-	my $self = shift;
-	my $child = shift;
-	die "Assertion failed: $child" unless eval {$child};
-	my $there = 0;
-	$self->recurse_down(sub { $there = 1 if $_[0] == $child });
-
-	return $there;
-}
-
-sub children {
-	my $self = shift;
-	my @children;
-	my $visitor = $self->{child};
-	while ($visitor) {
-		push @children, $visitor;
-		$visitor = $visitor->{next};
-	}
-	\@children;
-}
-
-sub set_children {
-	my ($self, $children) = @_;
-	my $walk = $self->{child} = shift @$children;
-	do {
-		$walk = $walk->{next} = shift @$children;
-	} while ($walk);
-}
-
-# non-recursive version of recurse_down to avoid stack depth warnings
-sub recurse_down {
-	my ($self, $callback) = @_;
+	my ($self, $child) = @_;
 	my %seen;
 	my @q = ($self);
 	while (my $cont = shift @q) {
 		$seen{$cont} = 1;
-		$callback->($cont);
+
+		return 1 if $cont == $child;
 
 		if (my $next = $cont->{next}) {
 			if ($seen{$next}) {
@@ -197,6 +168,26 @@ sub recurse_down {
 			}
 		}
 	}
+	0;
+}
+
+sub children {
+	my $self = shift;
+	my @children;
+	my $visitor = $self->{child};
+	while ($visitor) {
+		push @children, $visitor;
+		$visitor = $visitor->{next};
+	}
+	\@children;
+}
+
+sub set_children {
+	my ($self, $children) = @_;
+	my $walk = $self->{child} = shift @$children;
+	do {
+		$walk = $walk->{next} = shift @$children;
+	} while ($walk);
 }
 
 sub order_children {
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 13/13] thread: fix sorting without topmost
  2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
                   ` (11 preceding siblings ...)
  2016-10-05  1:47 ` [PATCH 12/13] thread: inline and remove recurse_down logic Eric Wong
@ 2016-10-05  1:47 ` Eric Wong
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2016-10-05  1:47 UTC (permalink / raw)
  To: spew

This bug was hidden, and we may not be able to efficiently
implement a topmost subroutine with the hash-based (vs
linked-list) based container for threading in the next
commit.
---
 lib/PublicInbox/SearchView.pm | 5 ++---
 lib/PublicInbox/View.pm       | 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index ebeb41f..cfe6dff 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -155,9 +155,8 @@ sub mset_thread {
 	$th->thread;
 	if ($q->{r}) { # order by relevance
 		$th->order(sub {
-			[ sort { (eval { $pct{$b->topmost->{id}} } || 0)
-					<=>
-				(eval { $pct{$a->topmost->{id}} } || 0)
+			[ sort { ( $pct{$b->{id}} || 0) <=>
+				 ( $pct{$a->{id}} || 0)
 			} @{$_[0]} ];
 		});
 	} else { # order by time (default for threaded view)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 7554d54..c09b4a2 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -857,8 +857,8 @@ sub skel_dump {
 
 sub sort_ts {
 	[ sort {
-		(eval { $a->topmost->{smsg}->ts } || 0) <=>
-		(eval { $b->topmost->{smsg}->ts } || 0)
+		(eval { $a->{smsg}->ts } || 0) <=>
+		(eval { $b->{smsg}->ts } || 0)
 	} @{$_[0]} ];
 }
 
-- 
EW


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-10-05  1:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-05  1:46 [WIP 0/13] mail threading speedups Eric Wong
2016-10-05  1:46 ` [PATCH 01/13] view: remove "subject dummy" references Eric Wong
2016-10-05  1:46 ` [PATCH 02/13] thread: remove Mail::Thread dependency Eric Wong
2016-10-05  1:46 ` [PATCH 03/13] thread: pass array refs instead of entire arrays Eric Wong
2016-10-05  1:46 ` [PATCH 04/13] thread: remove accessor usage in internals Eric Wong
2016-10-05  1:46 ` [PATCH 05/13] inbox: ghost smsg Eric Wong
2016-10-05  1:46 ` [PATCH 06/13] thread: remove Email::Abstract wrapping Eric Wong
2016-10-05  1:46 ` [PATCH 07/13] thread: remove rootset accessor method Eric Wong
2016-10-05  1:46 ` [PATCH 08/13] thread: simplify Eric Wong
2016-10-05  1:46 ` [PATCH 09/13] thread: remove iterate_down Eric Wong
2016-10-05  1:46 ` [PATCH 10/13] thread: avoid incrementing undefined value Eric Wong
2016-10-05  1:47 ` [PATCH 11/13] thread: order_children no longer cares about depth Eric Wong
2016-10-05  1:47 ` [PATCH 12/13] thread: inline and remove recurse_down logic Eric Wong
2016-10-05  1:47 ` [PATCH 13/13] thread: fix sorting without topmost Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).