dumping ground for random patches and texts
 help / color / mirror / Atom feed
* [PATCH 01/18] lei q: delay worker spawn
@ 2021-02-05 12:07 Eric Wong
  2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
---
 lib/PublicInbox/LeiQuery.pm   | 19 +++++--------------
 lib/PublicInbox/LeiXSearch.pm |  6 ++++++
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 4fe40400..6b1aa40c 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -75,21 +75,12 @@ sub lei_q {
 	$xj ||= $lxs->concurrency($opt); # allow: "--jobs ,$WRITER_ONLY"
 	my $nproc = $lxs->detect_nproc; # don't memoize, schedtool(1) exists
 	$xj = $nproc if $xj > $nproc;
-	PublicInbox::LeiOverview->new($self) or return;
-	$self->atfork_prepare_wq($lxs);
-	$lxs->wq_workers_start('lei_xsearch', $xj, $self->oldset);
-	delete $lxs->{-ipc_atfork_child_close};
-	if (my $l2m = $self->{l2m}) {
-		if (defined($mj) && $mj !~ /\A[1-9][0-9]*\z/) {
-			return $self->fail("`$mj' writer jobs must be >= 1");
-		}
-		$mj //= $nproc;
-		$self->atfork_prepare_wq($l2m);
-		$l2m->wq_workers_start('lei2mail', $mj, $self->oldset);
-		delete $l2m->{-ipc_atfork_child_close};
+	$lxs->{jobs} = $xj;
+	if (defined($mj) && $mj !~ /\A[1-9][0-9]*\z/) {
+		return $self->fail("`$mj' writer jobs must be >= 1");
 	}
-
-	# no forking workers after this
+	$self->{l2m}->{jobs} = ($mj // $nproc) if $self->{l2m};
+	PublicInbox::LeiOverview->new($self) or return;
 
 	my %mset_opt = map { $_ => $opt->{$_} } qw(thread limit offset);
 	$mset_opt{asc} = $opt->{'reverse'} ? 1 : 0;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 965617b5..ab66717c 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -406,7 +406,13 @@ sub do_query {
 	$lei->{ovv}->ovv_begin($lei);
 	my ($au_done, $zpipe);
 	my $l2m = $lei->{l2m};
+	$lei->atfork_prepare_wq($self);
+	$self->wq_workers_start('lei_xsearch', $self->{jobs}, $lei->oldset);
+	delete $self->{-ipc_atfork_child_close};
 	if ($l2m) {
+		$lei->atfork_prepare_wq($l2m);
+		$l2m->wq_workers_start('lei2mail', $l2m->{jobs}, $lei->oldset);
+		delete $l2m->{-ipc_atfork_child_close};
 		pipe($lei->{startq}, $au_done) or die "pipe: $!";
 		# 1031: F_SETPIPE_SZ
 		fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/18] ipc: localize fields assignment to prevent circular refs
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 03/18] lei q: reorder internals to reduce FD passing Eric Wong
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

Circular references are bad and can lead to surprising behavior
during worker exit.
---
 lib/PublicInbox/IPC.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 3873649b..078aaa2c 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -338,7 +338,6 @@ sub _wq_worker_start ($$$) {
 		srand($seed);
 		eval { PublicInbox::DS->Reset };
 		delete @$self{qw(-wq_s1 -wq_workers -wq_ppid)};
-		@$self{keys %$fields} = values(%$fields) if $fields;
 		$SIG{$_} = 'IGNORE' for (qw(PIPE));
 		$SIG{$_} = 'DEFAULT' for (qw(TTOU TTIN TERM QUIT INT CHLD));
 		local $0 = $self->{-wq_ident};
@@ -346,6 +345,8 @@ sub _wq_worker_start ($$$) {
 		# ensure we properly exit even if warn() dies:
 		my $end = PublicInbox::OnDestroy->new($$, sub { exit(!!$@) });
 		eval {
+			$fields //= {};
+			local @$self{keys %$fields} = values(%$fields);
 			my $on_destroy = $self->ipc_atfork_child;
 			local %SIG = %SIG;
 			wq_worker_loop($self);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/18] lei q: reorder internals to reduce FD passing
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
  2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 04/18] lei q: only start pager if output is to stdout Eric Wong
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
---
 lib/PublicInbox/IPC.pm         |  3 --
 lib/PublicInbox/LEI.pm         | 99 +++++++---------------------------
 lib/PublicInbox/LeiOverview.pm | 28 +++-------
 lib/PublicInbox/LeiToMail.pm   | 28 ++++++----
 lib/PublicInbox/LeiXSearch.pm  | 97 ++++++++++++++++-----------------
 5 files changed, 92 insertions(+), 163 deletions(-)

diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 078aaa2c..7f5a3f6f 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -464,9 +464,6 @@ sub DESTROY {
 	ipc_worker_stop($self);
 }
 
-# Sereal doesn't have dclone
-sub deep_clone { ipc_thaw(ipc_freeze($_[-1])) }
-
 sub detect_nproc () {
 	# _SC_NPROCESSORS_ONLN = 84 on both Linux glibc and musl
 	return POSIX::sysconf(84) if $^O eq 'linux';
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 49deed13..0d4b1c11 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -286,7 +286,7 @@ sub x_it ($$) {
 	# make sure client sees stdout before exit
 	$self->{1}->autoflush(1) if $self->{1};
 	dump_and_clear_log();
-	if (my $s = $self->{pkt_op} // $self->{sock}) {
+	if (my $s = $self->{pkt_op_p} // $self->{sock}) {
 		send($s, "x_it $code", MSG_EOR);
 	} elsif ($self->{oneshot}) {
 		# don't want to end up using $? from child processes
@@ -322,7 +322,8 @@ sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
 sub fail ($$;$) {
 	my ($self, $buf, $exit_code) = @_;
 	err($self, $buf) if defined $buf;
-	send($self->{pkt_op}, '!', MSG_EOR) if $self->{pkt_op}; # fail_handler
+	# calls fail_handler:
+	send($self->{pkt_op_p}, '!', MSG_EOR) if $self->{pkt_op_p};
 	x_it($self, ($exit_code // 1) << 8);
 	undef;
 }
@@ -340,7 +341,7 @@ sub puts ($;@) { out(shift, map { "$_\n" } @_) }
 
 sub child_error { # passes non-fatal curl exit codes to user
 	my ($self, $child_error) = @_; # child_error is $?
-	if (my $s = $self->{pkt_op} // $self->{sock}) {
+	if (my $s = $self->{pkt_op_p} // $self->{sock}) {
 		# send to the parent lei-daemon or to lei(1) client
 		send($s, "child_error $child_error", MSG_EOR);
 	} elsif (!$PublicInbox::DS::in_loop) {
@@ -348,94 +349,34 @@ sub child_error { # passes non-fatal curl exit codes to user
 	} # else noop if client disconnected
 }
 
-sub atfork_prepare_wq {
-	my ($self, $wq) = @_;
-	my $tcafc = $wq->{-ipc_atfork_child_close} //= [ $listener // () ];
-	if (my $sock = $self->{sock}) {
-		push @$tcafc, @$self{qw(0 1 2 3)}, $sock;
-	}
-	if (my $pgr = $self->{pgr}) {
-		push @$tcafc, @$pgr[1,2];
-	}
-	if (my $old_1 = $self->{old_1}) {
-		push @$tcafc, $old_1;
-	}
-	for my $f (qw(lxs l2m)) {
-		my $ipc = $self->{$f} or next;
-		push @$tcafc, grep { defined }
-				@$ipc{qw(-wq_s1 -wq_s2 -ipc_req -ipc_res)};
-	}
-}
-
-sub io_restore ($$) {
-	my ($dst, $src) = @_;
-	for my $i (0..2) { # standard FDs
-		my $io = delete $src->{$i} or next;
-		$dst->{$i} = $io;
-	}
-	for my $i (3..9) { # named (non-standard) FDs
-		my $io = $src->{$i} or next;
-		my @st = stat($io) or die "stat $src.$i ($io): $!";
-		my $f = delete $dst->{"dev=$st[0],ino=$st[1]"} // next;
-		$dst->{$f} = $io;
-		delete $src->{$i};
-	}
-}
-
 sub note_sigpipe { # triggers sigpipe_handler
 	my ($self, $fd) = @_;
 	close(delete($self->{$fd})); # explicit close silences Perl warning
-	send($self->{pkt_op}, '|', MSG_EOR) if $self->{pkt_op};
+	send($self->{pkt_op_p}, '|', MSG_EOR) if $self->{pkt_op_p};
 	x_it($self, 13);
 }
 
-sub atfork_child_wq {
-	my ($self, $wq) = @_;
-	io_restore($self, $wq);
-	-S $self->{pkt_op} or die 'BUG: {pkt_op} expected';
-	io_restore($self->{l2m}, $wq);
+sub lei_atfork_child {
+	my ($self) = @_;
+	# we need to explicitly close things which are on stack
+	delete $self->{0};
+	for (delete @$self{qw(3 sock old_1 au_done)}) {
+		close($_) if defined($_);
+	}
+	if (my $op_c = delete $self->{pkt_op_c}) {
+		close(delete $op_c->{sock});
+	}
+	if (my $pgr = delete $self->{pgr}) {
+		close($_) for (@$pgr[1,2]);
+	}
+	close $listener if $listener;
+	undef $listener;
 	%PATH2CFG = ();
 	undef $errors_log;
 	$quit = \&CORE::exit;
 	$current_lei = $self; # for SIG{__WARN__}
 }
 
-sub io_extract ($;@) {
-	my ($obj, @fields) = @_;
-	my @io;
-	for my $f (@fields) {
-		my $io = delete $obj->{$f} or next;
-		my @st = stat($io) or die "W: stat $obj.$f ($io): $!";
-		$obj->{"dev=$st[0],ino=$st[1]"} = $f;
-		push @io, $io;
-	}
-	@io
-}
-
-# usage: ($lei, @io) = $lei->atfork_parent_wq($wq);
-sub atfork_parent_wq {
-	my ($self, $wq) = @_;
-	my $env = delete $self->{env}; # env is inherited at fork
-	my $lei = bless { %$self }, ref($self);
-	for my $f (qw(dedupe ovv)) {
-		my $tmp = delete($lei->{$f}) or next;
-		$lei->{$f} = $wq->deep_clone($tmp);
-	}
-	$self->{env} = $env;
-	delete @$lei{qw(sock 3 -lei_store cfg old_1 pgr lxs)}; # keep l2m
-	my @io = (delete(@$lei{qw(0 1 2)}),
-			io_extract($lei, qw(pkt_op startq)));
-	my $l2m = $lei->{l2m};
-	if ($l2m && $l2m != $wq) { # $wq == lxs
-		if (my $wq_s1 = $l2m->{-wq_s1}) {
-			push @io, io_extract($l2m, '-wq_s1');
-			$l2m->{-wq_s1} = $wq_s1;
-		}
-		$l2m->wq_close(1);
-	}
-	($lei, @io);
-}
-
 sub _help ($;$) {
 	my ($self, $errmsg) = @_;
 	my $cmd = $self->{cmd} // 'COMMAND';
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index e33d63a2..e6bf4f2a 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -207,7 +207,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 	}
 	$lei->{ovv_buf} = \(my $buf = '') if !$l2m;
 	if ($l2m && !$ibxish) { # remote https?:// mboxrd
-		delete $l2m->{-wq_s1};
 		my $g2m = $l2m->can('git_to_mail');
 		my $wcb = $l2m->write_cb($lei);
 		sub {
@@ -215,33 +214,20 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 			$wcb->(undef, $smsg, $eml);
 		};
 	} elsif ($l2m && $l2m->{-wq_s1}) {
-		my ($lei_ipc, @io) = $lei->atfork_parent_wq($l2m);
-		# $io[0] becomes a notification pipe that triggers EOF
+		# $io->[0] becomes a notification pipe that triggers EOF
 		# in this wq worker when all outstanding ->write_mail
 		# calls are complete
-		$io[0] = undef;
-		pipe($l2m->{each_smsg_done}, $io[0]) or die "pipe: $!";
-		fcntl($io[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
-		delete @$lei_ipc{qw(l2m opt mset_opt cmd)};
+		my $io = [];
+		pipe($l2m->{each_smsg_done}, $io->[0]) or die "pipe: $!";
+		fcntl($io->[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
 		my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
 		$self->{git} = $git;
 		my $git_dir = $git->{git_dir};
 		sub {
 			my ($smsg, $mitem) = @_;
 			$smsg->{pct} = get_pct($mitem) if $mitem;
-			$l2m->wq_do('write_mail', \@io, $git_dir, $smsg,
-					$lei_ipc);
+			$l2m->wq_do('write_mail', $io, $git_dir, $smsg);
 		}
-	} elsif ($l2m) {
-		my $wcb = $l2m->write_cb($lei);
-		my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
-		$self->{git} = $git; # for ovv_atexit_child
-		my $g2m = $l2m->can('git_to_mail');
-		sub {
-			my ($smsg, $mitem) = @_;
-			$smsg->{pct} = get_pct($mitem) if $mitem;
-			$git->cat_async($smsg->{blob}, $g2m, [ $wcb, $smsg ]);
-		};
 	} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
 		my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
 		sub { # DIY prettiness :P
@@ -275,7 +261,9 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 			$lei->out($buf);
 			$buf = '';
 		}
-	} # else { ...
+	} else {
+		die "TODO: unhandled case $self->{fmt}"
+	}
 }
 
 no warnings 'once';
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index c704dc2a..f9250860 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -211,10 +211,10 @@ sub zsfx2cmd ($$$) {
 }
 
 sub _post_augment_mbox { # open a compressor process
-	my ($self, $lei, $zpipe) = @_;
+	my ($self, $lei) = @_;
 	my $zsfx = $self->{zsfx} or return;
 	my $cmd = zsfx2cmd($zsfx, undef, $lei);
-	my ($r, $w) = splice(@$zpipe, 0, 2);
+	my ($r, $w) = @{delete $lei->{zpipe}};
 	my $rdr = { 0 => $r, 1 => $lei->{1}, 2 => $lei->{2} };
 	my $pid = spawn($cmd, $lei->{env}, $rdr);
 	my $pp = gensym;
@@ -407,7 +407,7 @@ sub _pre_augment_mbox {
 			$! == ENOENT or die "unlink($dst): $!";
 		}
 		open my $out, $mode, $dst or die "open($dst): $!";
-		$lei->{old_1} = $lei->{1};
+		$lei->{old_1} = $lei->{1}; # keep for spawning MUA
 		$lei->{1} = $out;
 	}
 	# Perl does SEEK_END even with O_APPEND :<
@@ -418,7 +418,7 @@ sub _pre_augment_mbox {
 	state $zsfx_allow = join('|', keys %zsfx2cmd);
 	($self->{zsfx}) = ($dst =~ /\.($zsfx_allow)\z/) or return;
 	pipe(my ($r, $w)) or die "pipe: $!";
-	[ $r, $w ];
+	$lei->{zpipe} = [ $r, $w ];
 }
 
 sub _do_augment_mbox {
@@ -462,16 +462,24 @@ sub post_augment { # fast (spawn compressor or mkdir), runs in main daemon
 	$self->$m($lei, @args);
 }
 
+sub ipc_atfork_child {
+	my ($self) = @_;
+	my $lei = delete $self->{lei};
+	$lei->lei_atfork_child;
+	if (my $zpipe = delete $lei->{zpipe}) {
+		$lei->{1} = $zpipe->[1];
+		close $zpipe->[0];
+	}
+	$self->{wcb} = $self->write_cb($lei);
+	$self->SUPER::ipc_atfork_child;
+}
+
 sub write_mail { # via ->wq_do
-	my ($self, $git_dir, $smsg, $lei) = @_;
+	my ($self, $git_dir, $smsg) = @_;
 	my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
-	my $wcb = $self->{wcb} //= do { # first message
-		$lei->atfork_child_wq($self);
-		$self->write_cb($lei);
-	};
 	my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
 	git_async_cat($git, $smsg->{blob}, \&git_to_mail,
-				[$wcb, $smsg, $not_done]);
+				[$self->{wcb}, $smsg, $not_done]);
 }
 
 sub wq_atexit_child {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index ab66717c..e41d899e 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -110,8 +110,8 @@ sub wait_startq ($) {
 sub mset_progress {
 	my $lei = shift;
 	return unless $lei->{-progress};
-	if ($lei->{pkt_op}) { # called via pkt_op/pkt_do from workers
-		pkt_do($lei->{pkt_op}, 'mset_progress', @_);
+	if ($lei->{pkt_op_p}) {
+		pkt_do($lei->{pkt_op_p}, 'mset_progress', @_);
 	} else { # single lei-daemon consumer
 		my ($desc, $mset_size, $mset_total_est) = @_;
 		$lei->{-mset_total} += $mset_size;
@@ -120,11 +120,10 @@ sub mset_progress {
 }
 
 sub query_thread_mset { # for --thread
-	my ($self, $lei, $ibxish) = @_;
+	my ($self, $ibxish) = @_;
 	local $0 = "$0 query_thread_mset";
-	$lei->atfork_child_wq($self);
+	my $lei = $self->{lei};
 	my $startq = delete $lei->{startq};
-
 	my ($srch, $over) = ($ibxish->search, $ibxish->over);
 	my $desc = $ibxish->{inboxdir} // $ibxish->{topdir};
 	return warn("$desc not indexed by Xapian\n") unless ($srch && $over);
@@ -154,9 +153,9 @@ sub query_thread_mset { # for --thread
 }
 
 sub query_mset { # non-parallel for non-"--thread" users
-	my ($self, $lei) = @_;
+	my ($self) = @_;
 	local $0 = "$0 query_mset";
-	$lei->atfork_child_wq($self);
+	my $lei = $self->{lei};
 	my $startq = delete $lei->{startq};
 	my $mo = { %{$lei->{mset_opt}} };
 	my $mset;
@@ -207,10 +206,10 @@ sub kill_reap {
 }
 
 sub query_remote_mboxrd {
-	my ($self, $lei, $uris) = @_;
+	my ($self, $uris) = @_;
 	local $0 = "$0 query_remote_mboxrd";
-	$lei->atfork_child_wq($self);
 	local $SIG{TERM} = sub { exit(0) }; # for DESTROY (File::Temp, $reap)
+	my $lei = $self->{lei};
 	my ($opt, $env) = @$lei{qw(opt env)};
 	my @qform = (q => $lei->{mset_opt}->{qstr}, x => 'm');
 	push(@qform, t => 1) if $opt->{thread};
@@ -307,7 +306,7 @@ sub git {
 	$git;
 }
 
-sub query_done { # EOF callback
+sub query_done { # EOF callback for main daemon
 	my ($lei) = @_;
 	my $has_l2m = exists $lei->{l2m};
 	for my $f (qw(lxs l2m)) {
@@ -332,9 +331,8 @@ Error closing $lei->{ovv}->{dst}: $!
 }
 
 sub do_post_augment {
-	my ($lei, $zpipe, $au_done) = @_;
-	my $l2m = $lei->{l2m} or die 'BUG: no {l2m}';
-	eval { $l2m->post_augment($lei, $zpipe) };
+	my ($lei) = @_;
+	eval { $lei->{l2m}->post_augment($lei) };
 	if (my $err = $@) {
 		if (my $lxs = delete $lei->{lxs}) {
 			$lxs->wq_kill;
@@ -342,7 +340,7 @@ sub do_post_augment {
 		}
 		$lei->fail("$err");
 	}
-	close $au_done; # triggers wait_startq
+	close(delete $lei->{au_done}); # triggers wait_startq
 }
 
 my $MAX_PER_HOST = 4;
@@ -356,13 +354,13 @@ sub concurrency {
 }
 
 sub start_query { # always runs in main (lei-daemon) process
-	my ($self, $io, $lei) = @_;
+	my ($self, $lei) = @_;
 	if ($lei->{opt}->{thread}) {
 		for my $ibxish (locals($self)) {
-			$self->wq_do('query_thread_mset', $io, $lei, $ibxish);
+			$self->wq_do('query_thread_mset', [], $ibxish);
 		}
 	} elsif (locals($self)) {
-		$self->wq_do('query_mset', $io, $lei);
+		$self->wq_do('query_mset', []);
 	}
 	my $i = 0;
 	my $q = [];
@@ -370,19 +368,23 @@ sub start_query { # always runs in main (lei-daemon) process
 		push @{$q->[$i++ % $MAX_PER_HOST]}, $uri;
 	}
 	for my $uris (@$q) {
-		$self->wq_do('query_remote_mboxrd', $io, $lei, $uris);
+		$self->wq_do('query_remote_mboxrd', [], $uris);
 	}
-	@$io = ();
+}
+
+sub ipc_atfork_child {
+	my ($self) = @_;
+	$self->{lei}->lei_atfork_child;
+	$self->SUPER::ipc_atfork_child;
 }
 
 sub query_prepare { # called by wq_do
-	my ($self, $lei) = @_;
+	my ($self) = @_;
 	local $0 = "$0 query_prepare";
-	$lei->atfork_child_wq($self);
-	delete $lei->{l2m}->{-wq_s1};
+	my $lei = $self->{lei};
 	eval { $lei->{l2m}->do_augment($lei) };
 	$lei->fail($@) if $@;
-	pkt_do($lei->{pkt_op}, '.') == 1 or die "do_post_augment trigger: $!"
+	pkt_do($lei->{pkt_op_p}, '.') == 1 or die "do_post_augment trigger: $!"
 }
 
 sub fail_handler ($;$$) {
@@ -401,45 +403,38 @@ sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
 
 sub do_query {
 	my ($self, $lei) = @_;
-	$lei->{1}->autoflush(1);
-	$lei->start_pager if -t $lei->{1};
-	$lei->{ovv}->ovv_begin($lei);
-	my ($au_done, $zpipe);
-	my $l2m = $lei->{l2m};
-	$lei->atfork_prepare_wq($self);
-	$self->wq_workers_start('lei_xsearch', $self->{jobs}, $lei->oldset);
-	delete $self->{-ipc_atfork_child_close};
-	if ($l2m) {
-		$lei->atfork_prepare_wq($l2m);
-		$l2m->wq_workers_start('lei2mail', $l2m->{jobs}, $lei->oldset);
-		delete $l2m->{-ipc_atfork_child_close};
-		pipe($lei->{startq}, $au_done) or die "pipe: $!";
-		# 1031: F_SETPIPE_SZ
-		fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
-		$zpipe = $l2m->pre_augment($lei);
-	}
 	my $ops = {
 		'|' => [ \&sigpipe_handler, $lei ],
 		'!' => [ \&fail_handler, $lei ],
-		'.' => [ \&do_post_augment, $lei, $zpipe, $au_done ],
+		'.' => [ \&do_post_augment, $lei ],
 		'' => [ \&query_done, $lei ],
 		'mset_progress' => [ \&mset_progress, $lei ],
 		'x_it' => [ $lei->can('x_it'), $lei ],
 		'child_error' => [ $lei->can('child_error'), $lei ],
 	};
-	(my $op, $lei->{pkt_op}) = PublicInbox::PktOp->pair($ops);
-	my ($lei_ipc, @io) = $lei->atfork_parent_wq($self);
-	delete($lei->{pkt_op});
-
-	$lei->event_step_init; # wait for shutdowns
+	($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+	$lei->{1}->autoflush(1);
+	$lei->start_pager if -t $lei->{1};
+	$lei->{ovv}->ovv_begin($lei);
+	my $l2m = $lei->{l2m};
 	if ($l2m) {
-		$self->wq_do('query_prepare', \@io, $lei_ipc);
-		$io[1] = $zpipe->[1] if $zpipe;
+		$l2m->pre_augment($lei);
+		$l2m->wq_workers_start('lei2mail', $l2m->{jobs},
+					$lei->oldset, { lei => $lei });
+		pipe($lei->{startq}, $lei->{au_done}) or die "pipe: $!";
+		# 1031: F_SETPIPE_SZ
+		fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
 	}
-	start_query($self, \@io, $lei_ipc);
-	$self->wq_close(1);
+	$self->wq_workers_start('lei_xsearch', $self->{jobs},
+				$lei->oldset, { lei => $lei });
+	my $op = delete $lei->{pkt_op_c};
+	delete $lei->{pkt_op_p};
+	$l2m->wq_close(1) if $l2m;
+	$lei->event_step_init; # wait for shutdowns
+	$self->wq_do('query_prepare', []) if $l2m;
+	start_query($self, $lei);
+	$self->wq_close(1); # lei_xsearch workers stop when done
 	if ($lei->{oneshot}) {
-		# for the $lei_ipc->atfork_child_wq PIPE handler:
 		while ($op->{sock}) { $op->event_step }
 	}
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/18] lei q: only start pager if output is to stdout
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
  2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
  2021-02-05 12:07 ` [PATCH 03/18] lei q: reorder internals to reduce FD passing Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir Eric Wong
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

No need to be starting a pager if we're writing to a regular file.
---
 lib/PublicInbox/LeiOverview.pm | 3 +--
 lib/PublicInbox/LeiXSearch.pm  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index e6bf4f2a..3125f015 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -78,9 +78,8 @@ sub new {
 	if ($fmt =~ /\A($JSONL|(?:concat)?json)\z/) {
 		$json = $self->{json} = ref(PublicInbox::Config->json);
 	}
-	my ($isatty, $seekable);
 	if ($dst eq '/dev/stdout') {
-		$isatty = -t $lei->{1};
+		my $isatty = $lei->{need_pager} = -t $lei->{1};
 		$opt->{pretty} //= $isatty;
 		if (!$isatty && -f _) {
 			my $fl = fcntl($lei->{1}, F_GETFL, 0) //
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e41d899e..0ca871ea 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -414,7 +414,7 @@ sub do_query {
 	};
 	($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
 	$lei->{1}->autoflush(1);
-	$lei->start_pager if -t $lei->{1};
+	$lei->start_pager if delete $lei->{need_pager};
 	$lei->{ovv}->ovv_begin($lei);
 	my $l2m = $lei->{l2m};
 	if ($l2m) {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (2 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 04/18] lei q: only start pager if output is to stdout Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 06/18] eml: handle warning ignores for lei Eric Wong
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

Once all files are written, we can use utime() to poke Maildirs
to wake up MUAs that fail to account for nanosecond timestamps
resolution.
---
 lib/PublicInbox/LEI.pm        |  1 +
 lib/PublicInbox/LeiToMail.pm  | 13 +++++++++++++
 lib/PublicInbox/LeiXSearch.pm | 15 +++++++++------
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 0d4b1c11..24efb494 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -739,6 +739,7 @@ sub start_mua {
 	} elsif ($self->{oneshot}) {
 		$self->{"mua.pid.$self.$$"} = spawn(\@cmd);
 	}
+	delete $self->{-progress};
 }
 
 # caller needs to "-t $self->{1}" to check if tty
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index f9250860..5a6f18fb 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -365,6 +365,7 @@ sub new {
 	} else {
 		die "bad mail --format=$fmt\n";
 	}
+	$self->{dst} = $dst;
 	$lei->{dedupe} = PublicInbox::LeiDedupe->new($lei);
 	$self;
 }
@@ -474,6 +475,18 @@ sub ipc_atfork_child {
 	$self->SUPER::ipc_atfork_child;
 }
 
+sub lock_free {
+	$_[0]->{base_type} =~ /\A(?:maildir|mh|imap|jmap)\z/ ? 1 : 0;
+}
+
+sub poke_dst {
+	my ($self) = @_;
+	if ($self->{base_type} eq 'maildir') {
+		my $t = time + 1;
+		utime($t, $t, "$self->{dst}/cur");
+	}
+}
+
 sub write_mail { # via ->wq_do
 	my ($self, $git_dir, $smsg) = @_;
 	my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 0ca871ea..e7f0ef63 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -308,13 +308,13 @@ sub git {
 
 sub query_done { # EOF callback for main daemon
 	my ($lei) = @_;
-	my $has_l2m = exists $lei->{l2m};
-	for my $f (qw(lxs l2m)) {
-		my $wq = delete $lei->{$f} or next;
-		$wq->wq_wait_old($lei);
+	my $l2m = delete $lei->{l2m};
+	$l2m->wq_wait_old($lei) if $l2m;
+	if (my $lxs = delete $lei->{lxs}) {
+		$lxs->wq_wait_old($lei);
 	}
 	$lei->{ovv}->ovv_end($lei);
-	if ($has_l2m) { # close() calls LeiToMail reap_compress
+	if ($l2m) { # close() calls LeiToMail reap_compress
 		if (my $out = delete $lei->{old_1}) {
 			if (my $mbout = $lei->{1}) {
 				close($mbout) or return $lei->fail(<<"");
@@ -323,7 +323,7 @@ Error closing $lei->{ovv}->{dst}: $!
 			}
 			$lei->{1} = $out;
 		}
-		$lei->start_mua;
+		$l2m->lock_free ? $l2m->poke_dst : $lei->start_mua;
 	}
 	$lei->{-progress} and
 		$lei->err('# ', $lei->{-mset_total} // 0, " matches");
@@ -355,6 +355,9 @@ sub concurrency {
 
 sub start_query { # always runs in main (lei-daemon) process
 	my ($self, $lei) = @_;
+	if (my $l2m = $lei->{l2m}) {
+		$lei->start_mua if $l2m->lock_free;
+	}
 	if ($lei->{opt}->{thread}) {
 		for my $ibxish (locals($self)) {
 			$self->wq_do('query_thread_mset', [], $ibxish);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/18] eml: handle warning ignores for lei
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (3 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack Eric Wong
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

There's nothing we can do about bad emails in our search
results, so quiet things down and don't fight the MUA for
the terminal.
---
 lib/PublicInbox/Admin.pm         |  7 +++----
 lib/PublicInbox/Eml.pm           | 19 +++++++++++++++++++
 lib/PublicInbox/InboxWritable.pm | 24 +-----------------------
 lib/PublicInbox/LeiToMail.pm     |  1 +
 lib/PublicInbox/Watch.pm         | 14 ++++++--------
 5 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index f96397ea..3b38a5a3 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -10,6 +10,7 @@ our @EXPORT_OK = qw(setup_signals);
 use PublicInbox::Config;
 use PublicInbox::Inbox;
 use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::Eml;
 *rel2abs_collapsed = \&PublicInbox::Config::rel2abs_collapsed;
 
 sub setup_signals {
@@ -241,12 +242,10 @@ sub index_inbox {
 	}
 	local %SIG = %SIG;
 	setup_signals(\&index_terminate, $ibx);
-	my $warn_cb = $SIG{__WARN__} // \&CORE::warn;
 	my $idx = { current_info => $ibx->{inboxdir} };
-	my $warn_ignore = PublicInbox::InboxWritable->can('warn_ignore');
 	local $SIG{__WARN__} = sub {
-		return if $warn_ignore->(@_);
-		$warn_cb->($idx->{current_info}, ': ', @_);
+		return if PublicInbox::Eml::warn_ignore(@_);
+		warn($idx->{current_info}, ': ', @_);
 	};
 	if (ref($ibx) && $ibx->version == 2) {
 		eval { require PublicInbox::V2Writable };
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index bd27f19b..f7f62e7b 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -477,6 +477,25 @@ sub charset_set {
 
 sub crlf { $_[0]->{crlf} // "\n" }
 
+# warnings to ignore when handling spam mailboxes and maybe other places
+sub warn_ignore {
+	my $s = "@_";
+	# Email::Address::XS warnings
+	$s =~ /^Argument contains empty address at /
+	|| $s =~ /^Element at index [0-9]+ contains /
+	# PublicInbox::MsgTime
+	|| $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
+	|| $s =~ /^bad Date: .+? in /
+	# Encode::Unicode::UTF7
+	|| $s =~ /^Bad UTF7 data escape at /
+}
+
+# this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
+sub warn_ignore_cb {
+	my $cb = $SIG{__WARN__} // \&CORE::warn;
+	sub { $cb->(@_) unless warn_ignore(@_) }
+}
+
 sub willneed { re_memo($_) for @_ }
 
 willneed(qw(From To Cc Date Subject Content-Type In-Reply-To References
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 982ad6e5..3a4012cd 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -9,7 +9,7 @@ use parent qw(PublicInbox::Inbox Exporter);
 use PublicInbox::Import;
 use PublicInbox::Filter::Base qw(REJECT);
 use Errno qw(ENOENT);
-our @EXPORT_OK = qw(eml_from_path warn_ignore_cb);
+our @EXPORT_OK = qw(eml_from_path);
 
 use constant {
 	PERM_UMASK => 0,
@@ -277,28 +277,6 @@ sub cleanup ($) {
 	delete @{$_[0]}{qw(over mm git search)};
 }
 
-# warnings to ignore when handling spam mailboxes and maybe other places
-sub warn_ignore {
-	my $s = "@_";
-	# Email::Address::XS warnings
-	$s =~ /^Argument contains empty address at /
-	|| $s =~ /^Element at index [0-9]+ contains /
-	# PublicInbox::MsgTime
-	|| $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
-	|| $s =~ /^bad Date: .+? in /
-	# Encode::Unicode::UTF7
-	|| $s =~ /^Bad UTF7 data escape at /
-}
-
-# this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
-sub warn_ignore_cb {
-	my $cb = $SIG{__WARN__} // \&CORE::warn;
-	sub {
-		return if warn_ignore(@_);
-		$cb->(@_);
-	}
-}
-
 # v2+ only, XXX: maybe we can just rely on ->max_git_epoch and remove
 sub git_dir_latest {
 	my ($self, $max) = @_;
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 5a6f18fb..1f815e40 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -472,6 +472,7 @@ sub ipc_atfork_child {
 		close $zpipe->[0];
 	}
 	$self->{wcb} = $self->write_cb($lei);
+	$SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
 	$self->SUPER::ipc_atfork_child;
 }
 
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index 2b44ba43..185e5da8 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -7,7 +7,7 @@ package PublicInbox::Watch;
 use strict;
 use v5.10.1;
 use PublicInbox::Eml;
-use PublicInbox::InboxWritable qw(eml_from_path warn_ignore_cb);
+use PublicInbox::InboxWritable qw(eml_from_path);
 use PublicInbox::Filter::Base qw(REJECT);
 use PublicInbox::Spamcheck;
 use PublicInbox::Sigfd;
@@ -174,7 +174,7 @@ sub _remove_spam {
 	# path must be marked as (S)een
 	$path =~ /:2,[A-R]*S[T-Za-z]*\z/ or return;
 	my $eml = eml_from_path($path) or return;
-	local $SIG{__WARN__} = warn_ignore_cb();
+	local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
 	$self->{pi_cfg}->each_inbox(\&remove_eml_i, $self, $eml, $path);
 }
 
@@ -414,13 +414,11 @@ sub imap_import_msg ($$$$$) {
 			import_eml($self, $ibx, $eml);
 		}
 	} elsif ($inboxes eq 'watchspam') {
-		# we don't remove unseen messages
-		if ($flags =~ /\\Seen\b/) {
-			local $SIG{__WARN__} = warn_ignore_cb();
-			my $eml = PublicInbox::Eml->new($raw);
-			$self->{pi_cfg}->each_inbox(\&remove_eml_i,
+		return if $flags !~ /\\Seen\b/; # don't remove unseen messages
+		local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
+		my $eml = PublicInbox::Eml->new($raw);
+		$self->{pi_cfg}->each_inbox(\&remove_eml_i,
 						$self, $eml, "$url UID:$uid");
-		}
 	} else {
 		die "BUG: destination unknown $inboxes";
 	}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (4 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 06/18] eml: handle warning ignores for lei Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 08/18] lei_query: remove uneeded dwaitpid import Eric Wong
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

Another step towards simplifying lei internals.

None of our current uses of ->wq_do involve FD passing, and the
plan is only rely on FD passing between lei-daemon and lei(1).
Internally, it ought to be possible for lei-daemon internal bits
to be ordered properly to not need FD passing.
---
 lib/PublicInbox/LeiOverview.pm | 23 ++---------------------
 lib/PublicInbox/LeiToMail.pm   |  3 +--
 lib/PublicInbox/LeiXSearch.pm  | 16 ++++++++++++----
 3 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index 3125f015..d3df4faa 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -147,17 +147,6 @@ sub _unbless_smsg {
 
 sub ovv_atexit_child {
 	my ($self, $lei) = @_;
-	if (my $l2m = $lei->{l2m}) {
-		# wait for ->write_mail work we submitted to lei2mail
-		if (my $rd = delete $l2m->{each_smsg_done}) {
-			read($rd, my $buf, 1); # wait for EOF
-		}
-	}
-	# order matters, git->{-tmp}->DESTROY must not fire until
-	# {each_smsg_done} hits EOF above
-	if (my $git = delete $self->{git}) {
-		$git->async_wait_all;
-	}
 	if (my $bref = delete $lei->{ovv_buf}) {
 		my $lk = $self->lock_for_scope;
 		$lei->out($$bref);
@@ -213,19 +202,11 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
 			$wcb->(undef, $smsg, $eml);
 		};
 	} elsif ($l2m && $l2m->{-wq_s1}) {
-		# $io->[0] becomes a notification pipe that triggers EOF
-		# in this wq worker when all outstanding ->write_mail
-		# calls are complete
-		my $io = [];
-		pipe($l2m->{each_smsg_done}, $io->[0]) or die "pipe: $!";
-		fcntl($io->[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
-		my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
-		$self->{git} = $git;
-		my $git_dir = $git->{git_dir};
+		my $git_dir = $ibxish->git->{git_dir};
 		sub {
 			my ($smsg, $mitem) = @_;
 			$smsg->{pct} = get_pct($mitem) if $mitem;
-			$l2m->wq_do('write_mail', $io, $git_dir, $smsg);
+			$l2m->wq_do('write_mail', [], $git_dir, $smsg);
 		}
 	} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
 		my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 1f815e40..4f847221 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -490,10 +490,9 @@ sub poke_dst {
 
 sub write_mail { # via ->wq_do
 	my ($self, $git_dir, $smsg) = @_;
-	my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
 	my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
 	git_async_cat($git, $smsg->{blob}, \&git_to_mail,
-				[$self->{wcb}, $smsg, $not_done]);
+				[$self->{wcb}, $smsg]);
 }
 
 sub wq_atexit_child {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e7f0ef63..2dc44414 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -287,12 +287,15 @@ sub query_remote_mboxrd {
 	$lei->{ovv}->ovv_atexit_child($lei);
 }
 
-sub git {
+# called by LeiOverview::each_smsg_cb
+sub git { $_[0]->{git_tmp} // die 'BUG: caller did not set {git_tmp}' }
+
+sub git_tmp ($) {
 	my ($self) = @_;
 	my (%seen, @dirs);
-	my $tmp = File::Temp->newdir('lei_xsrch_git-XXXXXXXX', TMPDIR => 1);
-	for my $ibx (@{$self->{shard2ibx} // []}) {
-		my $d = File::Spec->canonpath($ibx->git->{git_dir});
+	my $tmp = File::Temp->newdir("lei_xsearch_git.$$-XXXX", TMPDIR => 1);
+	for my $ibxish (locals($self)) {
+		my $d = File::Spec->canonpath($ibxish->git->{git_dir});
 		$seen{$d} //= push @dirs, "$d/objects\n"
 	}
 	my $git_dir = $tmp->dirname;
@@ -428,6 +431,11 @@ sub do_query {
 		# 1031: F_SETPIPE_SZ
 		fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
 	}
+	if (!$lei->{opt}->{thread} && locals($self)) { # for query_mset
+		# lei->{git_tmp} is set for wq_wait_old so we don't
+		# delete until all lei2mail + lei_xsearch workers are reaped
+		$lei->{git_tmp} = $self->{git_tmp} = git_tmp($self);
+	}
 	$self->wq_workers_start('lei_xsearch', $self->{jobs},
 				$lei->oldset, { lei => $lei });
 	my $op = delete $lei->{pkt_op_c};

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/18] lei_query: remove uneeded dwaitpid import
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (5 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 09/18] lei_xsearch: drop unused imports Eric Wong
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

All process management is handled elsewhere.
---
 lib/PublicInbox/LeiQuery.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 6b1aa40c..56350386 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -5,7 +5,6 @@
 package PublicInbox::LeiQuery;
 use strict;
 use v5.10.1;
-use PublicInbox::DS qw(dwaitpid);
 
 sub prep_ext { # externals_each callback
 	my ($lxs, $exclude, $loc) = @_;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/18] lei_xsearch: drop unused imports
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (6 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 08/18] lei_query: remove uneeded dwaitpid import Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 10/18] lei import: initial implementation Eric Wong
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

Reaping is handled by the parent PublicInbox::IPC, and we
have no business using PublicInbox::Import since LeiXSearch
won't write to git directly (it will write via LeiStore).
---
 lib/PublicInbox/LeiXSearch.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 2dc44414..daf42098 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -8,9 +8,8 @@ package PublicInbox::LeiXSearch;
 use strict;
 use v5.10.1;
 use parent qw(PublicInbox::LeiSearch PublicInbox::IPC);
-use PublicInbox::DS qw(dwaitpid now);
+use PublicInbox::DS qw(now);
 use PublicInbox::PktOp qw(pkt_do);
-use PublicInbox::Import;
 use File::Temp 0.19 (); # 0.19 for ->newdir
 use File::Spec ();
 use PublicInbox::Search qw(xap_terms);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/18] lei import: initial implementation
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (7 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 09/18] lei_xsearch: drop unused imports Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw Eric Wong
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

Only tested with .eml files so far, but Maildir + IMAP
will be supported.
---
 MANIFEST                      |   1 +
 lib/PublicInbox/IPC.pm        |   4 +-
 lib/PublicInbox/LEI.pm        |  48 ++++++++++++---
 lib/PublicInbox/LeiImport.pm  | 106 ++++++++++++++++++++++++++++++++++
 lib/PublicInbox/LeiStore.pm   |  18 ++++++
 lib/PublicInbox/LeiXSearch.pm |  18 +-----
 t/lei.t                       |  15 +++++
 7 files changed, 184 insertions(+), 26 deletions(-)
 create mode 100644 lib/PublicInbox/LeiImport.pm

diff --git a/MANIFEST b/MANIFEST
index 6922f9b1..a11d4106 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -179,6 +179,7 @@ lib/PublicInbox/KQNotify.pm
 lib/PublicInbox/LEI.pm
 lib/PublicInbox/LeiDedupe.pm
 lib/PublicInbox/LeiExternal.pm
+lib/PublicInbox/LeiImport.pm
 lib/PublicInbox/LeiOverview.pm
 lib/PublicInbox/LeiQuery.pm
 lib/PublicInbox/LeiSearch.pm
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 7f5a3f6f..a0e6bfee 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -101,7 +101,7 @@ sub ipc_worker_loop ($$$) {
 
 # starts a worker if Sereal or Storable is installed
 sub ipc_worker_spawn {
-	my ($self, $ident, $oldset) = @_;
+	my ($self, $ident, $oldset, $fields) = @_;
 	return unless $enc; # no Sereal or Storable
 	return if ($self->{-ipc_ppid} // -1) == $$; # idempotent
 	delete(@$self{qw(-ipc_req -ipc_res -ipc_ppid -ipc_pid)});
@@ -123,6 +123,8 @@ sub ipc_worker_spawn {
 		# ensure we properly exit even if warn() dies:
 		my $end = PublicInbox::OnDestroy->new($$, sub { exit(!!$@) });
 		eval {
+			$fields //= {};
+			local @$self{keys %$fields} = values(%$fields);
 			my $on_destroy = $self->ipc_atfork_child;
 			local %SIG = %SIG;
 			ipc_worker_loop($self, $r_req, $w_res);
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 24efb494..682d1bd1 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -160,9 +160,10 @@ our %CMD = ( # sorted in order of importance/use:
 'forget-watch' => [ '{WATCH_NUMBER|--prune}', 'stop and forget a watch',
 	qw(prune) ],
 
-'import' => [ 'URL_OR_PATHNAME|--stdin',
-	'one-shot import/update from URL or filesystem',
-	qw(stdin| offset=i recursive|r exclude=s include=s !flags),
+'import' => [ 'URLS_OR_PATHNAMES...|--stdin',
+	'one-time import/update from URL or filesystem',
+	qw(stdin| offset=i recursive|r exclude=s include|I=s
+	format|f=s flags!),
 	],
 
 'config' => [ '[...]', sub {
@@ -194,8 +195,8 @@ our %CMD = ( # sorted in order of importance/use:
 # $spec => [@ALLOWED_VALUES (default is first), $description],
 # $spec => $description
 # "$SUB_COMMAND TAB $spec" => as above
-my $stdin_formats = [ 'IN|auto|raw|mboxrd|mboxcl2|mboxcl|mboxo',
-		'specify message input format' ];
+my $stdin_formats = [ 'MAIL_FORMAT|eml|mboxrd|mboxcl2|mboxcl|mboxo',
+			'specify message input format' ];
 my $ls_format = [ 'OUT|plain|json|null', 'listing output format' ];
 
 my %OPTDESC = (
@@ -240,6 +241,8 @@ my %OPTDESC = (
 'q	jobs=s'	=> [ '[SEARCH_JOBS][,WRITER_JOBS]',
 		'control number of search and writer jobs' ],
 
+'import format|f=s' => $stdin_formats,
+
 'ls-query	format|f=s' => $ls_format,
 'ls-external	format|f=s' => $ls_format,
 
@@ -319,6 +322,20 @@ sub err ($;@) {
 
 sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
 
+sub fail_handler ($;$$) {
+	my ($lei, $code, $io) = @_;
+	for my $f (qw(imp lxs l2m)) {
+		my $wq = delete $lei->{$f} or next;
+		$wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
+	}
+	close($io) if $io; # needed to avoid warnings on SIGPIPE
+	$lei->x_it($code // (1 >> 8));
+}
+
+sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
+	fail_handler($_[0], 13, delete $_[0]->{1});
+}
+
 sub fail ($$;$) {
 	my ($self, $buf, $exit_code) = @_;
 	err($self, $buf) if defined $buf;
@@ -340,7 +357,8 @@ sub out ($;@) {
 sub puts ($;@) { out(shift, map { "$_\n" } @_) }
 
 sub child_error { # passes non-fatal curl exit codes to user
-	my ($self, $child_error) = @_; # child_error is $?
+	my ($self, $child_error, $msg) = @_; # child_error is $?
+	$self->err($msg) if $msg;
 	if (my $s = $self->{pkt_op_p} // $self->{sock}) {
 		# send to the parent lei-daemon or to lei(1) client
 		send($s, "child_error $child_error", MSG_EOR);
@@ -357,9 +375,16 @@ sub note_sigpipe { # triggers sigpipe_handler
 }
 
 sub lei_atfork_child {
-	my ($self) = @_;
+	my ($self, $persist) = @_;
 	# we need to explicitly close things which are on stack
-	delete $self->{0};
+	if ($persist) {
+		my @io = delete @$self{0,1,2};
+		unless ($self->{oneshot}) {
+			close($_) for @io;
+		}
+	} else {
+		delete $self->{0};
+	}
 	for (delete @$self{qw(3 sock old_1 au_done)}) {
 		close($_) if defined($_);
 	}
@@ -374,7 +399,7 @@ sub lei_atfork_child {
 	%PATH2CFG = ();
 	undef $errors_log;
 	$quit = \&CORE::exit;
-	$current_lei = $self; # for SIG{__WARN__}
+	$current_lei = $persist ? undef : $self; # for SIG{__WARN__}
 }
 
 sub _help ($;$) {
@@ -606,6 +631,11 @@ sub lei_config {
 	x_it($self, $?) if $?;
 }
 
+sub lei_import {
+	require PublicInbox::LeiImport;
+	PublicInbox::LeiImport->call(@_);
+}
+
 sub lei_init {
 	my ($self, $dir) = @_;
 	my $cfg = _lei_cfg($self, 1);
diff --git a/lib/PublicInbox/LeiImport.pm b/lib/PublicInbox/LeiImport.pm
new file mode 100644
index 00000000..4a9af8a7
--- /dev/null
+++ b/lib/PublicInbox/LeiImport.pm
@@ -0,0 +1,106 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# front-end for the "lei import" sub-command
+package PublicInbox::LeiImport;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::IPC);
+use PublicInbox::MboxReader;
+use PublicInbox::Eml;
+
+sub _import_eml { # MboxReader callback
+	my ($eml, $sto, $set_kw) = @_;
+	$sto->ipc_do('set_eml', $eml, $set_kw ? $sto->mbox_keywords($eml) : ());
+}
+
+sub import_done { # EOF callback for main daemon
+	my ($lei) = @_;
+	my $imp = delete $lei->{imp};
+	$imp->wq_wait_old($lei) if $imp;
+	my $wait = $lei->{sto}->ipc_do('done');
+	$lei->dclose;
+}
+
+sub call { # the main "lei import" method
+	my ($cls, $lei, @argv) = @_;
+	my $sto = $lei->_lei_store(1);
+	$sto->write_prepare($lei);
+	$lei->{opt}->{flags} //= 1;
+	my $fmt = $lei->{opt}->{'format'};
+	my $self = $lei->{imp} = bless {}, $cls;
+	return $lei->fail('--format unspecified') if !$fmt;
+	$self->{0} = $lei->{0} if $lei->{opt}->{stdin};
+	my $ops = {
+		'!' => [ $lei->can('fail_handler'), $lei ],
+		'x_it' => [ $lei->can('x_it'), $lei ],
+		'child_error' => [ $lei->can('child_error'), $lei ],
+		'' => [ \&import_done, $lei ],
+	};
+	($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+	my $j = $lei->{opt}->{jobs} // scalar(@argv) || 1;
+	my $nproc = $self->detect_nproc;
+	$j = $nproc if $j > $nproc;
+	$self->wq_workers_start('lei_import', $j, $lei->oldset, {lei => $lei});
+	my $op = delete $lei->{pkt_op_c};
+	delete $lei->{pkt_op_p};
+	$self->wq_do('import_stdin', []) if $self->{0};
+	for my $x (@argv) {
+		$self->wq_do('import_path_url', [], $x);
+	}
+	$self->wq_close(1);
+	$lei->event_step_init; # wait for shutdowns
+	if ($lei->{oneshot}) {
+		while ($op->{sock}) { $op->event_step }
+	}
+}
+
+sub ipc_atfork_child {
+	my ($self) = @_;
+	$self->{lei}->lei_atfork_child;
+	$self->SUPER::ipc_atfork_child;
+}
+
+sub _import_fh {
+	my ($lei, $fh, $x) = @_;
+	my $set_kw = $lei->{opt}->{flags};
+	my $fmt = $lei->{opt}->{'format'};
+	eval {
+		if ($fmt eq 'eml') {
+			my $buf = do { local $/; <$fh> } //
+				return $lei->child_error(1 >> 8, <<"");
+		error reading $x: $!
+
+			my $eml = PublicInbox::Eml->new(\$buf);
+			_import_eml($eml, $lei->{sto}, $set_kw);
+		} else { # some mbox
+			my $cb = PublicInbox::MboxReader->can($fmt);
+			$cb or return $lei->child_error(1 >> 8, <<"");
+	--format $fmt unsupported for $x
+
+			$cb->(undef, $fh, \&_import_eml, $lei->{sto}, $set_kw);
+		}
+	};
+	$lei->child_error(1 >> 8, "<stdin>: $@") if $@;
+}
+
+sub import_path_url {
+	my ($self, $x) = @_;
+	my $lei = $self->{lei};
+	# TODO auto-detect?
+	if (-f $x) {
+		open my $fh, '<', $x or return $lei->child_error(1 >> 8, <<"");
+unable to open $x: $!
+
+		_import_fh($lei, $fh, $x);
+	} else {
+		$lei->fail("$x unsupported (TODO)");
+	}
+}
+
+sub import_stdin {
+	my ($self) = @_;
+	_import_fh($self->{lei}, $self->{0}, '<stdin>');
+}
+
+1;
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index a7d7d953..3a215973 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -17,6 +17,7 @@ use PublicInbox::V2Writable;
 use PublicInbox::ContentHash qw(content_hash content_digest);
 use PublicInbox::MID qw(mids mids_in);
 use PublicInbox::LeiSearch;
+use PublicInbox::MDA;
 use List::Util qw(max);
 
 sub new {
@@ -237,4 +238,21 @@ sub done {
 	die $err if $err;
 }
 
+sub ipc_atfork_child {
+	my ($self) = @_;
+	my $lei = delete $self->{lei};
+	$lei->lei_atfork_child(1) if $lei;
+	$self->SUPER::ipc_atfork_child;
+}
+
+sub write_prepare {
+	my ($self, $lei) = @_;
+	$self->ipc_lock_init;
+	# Mail we import into lei are private, so headers filtered out
+	# by -mda for public mail are not appropriate
+	local @PublicInbox::MDA::BAD_HEADERS = ();
+	$self->ipc_worker_spawn('lei_store', $lei->oldset, { lei => $lei });
+	$lei->{sto} = $self;
+}
+
 1;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index daf42098..f8068362 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -392,25 +392,11 @@ sub query_prepare { # called by wq_do
 	pkt_do($lei->{pkt_op_p}, '.') == 1 or die "do_post_augment trigger: $!"
 }
 
-sub fail_handler ($;$$) {
-	my ($lei, $code, $io) = @_;
-	for my $f (qw(lxs l2m)) {
-		my $wq = delete $lei->{$f} or next;
-		$wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
-	}
-	close($io) if $io; # needed to avoid warnings on SIGPIPE
-	$lei->x_it($code // (1 >> 8));
-}
-
-sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
-	fail_handler($_[0], 13, delete $_[0]->{1});
-}
-
 sub do_query {
 	my ($self, $lei) = @_;
 	my $ops = {
-		'|' => [ \&sigpipe_handler, $lei ],
-		'!' => [ \&fail_handler, $lei ],
+		'|' => [ $lei->can('sigpipe_handler'), $lei ],
+		'!' => [ $lei->can('fail_handler'), $lei ],
 		'.' => [ \&do_post_augment, $lei ],
 		'' => [ \&query_done, $lei ],
 		'mset_progress' => [ \&mset_progress, $lei ],
diff --git a/t/lei.t b/t/lei.t
index a08a6d0d..eb824a30 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -389,6 +389,20 @@ SKIP: {
 }; # /SKIP
 };
 
+my $test_import = sub {
+	$cleanup->();
+	ok($lei->(qw(q s:boolean)), 'search miss before import');
+	unlike($out, qr/boolean/i, 'no results, yet');
+	open my $fh, '<', 't/data/0001.patch' or BAIL_OUT $!;
+	ok($lei->([qw(import -f eml -)], undef, { %$opt, 0 => $fh }),
+		'import single file from stdin');
+	close $fh;
+	ok($lei->(qw(q s:boolean)), 'search hit after import');
+	ok($lei->(qw(import -f eml), 't/data/message_embed.eml'),
+		'import single file by path');
+	$cleanup->();
+};
+
 my $test_lei_common = sub {
 	$test_help->();
 	$test_config->();
@@ -396,6 +410,7 @@ my $test_lei_common = sub {
 	$test_external->();
 	$test_completion->();
 	$test_fail->();
+	$test_import->();
 };
 
 if ($ENV{TEST_LEI_ONESHOT}) {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (8 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 10/18] lei import: initial implementation Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords Eric Wong
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

JMAP brain says "keywords", IMAP brain says "flags";
JMAP brain wins today.

Since "keywords" is a bit long, support "kw" as a shortcut since
there's no conflict and "kw:" will be our search prefix for
looking up messages by keyword.
---
 lib/PublicInbox/LEI.pm       |  7 ++++---
 lib/PublicInbox/LeiImport.pm |  4 ++--
 t/lei.t                      | 21 ++++++++++++++++++++-
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 682d1bd1..b058b533 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -131,7 +131,7 @@ our %CMD = ( # sorted in order of importance/use:
 	'exclude mail matching From: or thread from non-Message-ID searches',
 	qw(stdin| thread|t from|f=s mid=s oid=s) ],
 'mark' => [ 'MESSAGE_FLAGS...',
-	'set/unset flags on message(s) from stdin',
+	'set/unset keywords on message(s) from stdin',
 	qw(stdin| oid=s exact by-mid|mid:s) ],
 'forget' => [ '[--stdin|--oid=OID|--by-mid=MID]',
 	"exclude message(s) on stdin from `q' search results",
@@ -152,7 +152,8 @@ our %CMD = ( # sorted in order of importance/use:
 
 'add-watch' => [ '[URL_OR_PATHNAME]',
 		'watch for new messages and flag changes',
-	qw(import! flags! interval=s recursive|r exclude=s include=s) ],
+	qw(import! kw|keywords|flags! interval=s recursive|r
+	exclude=s include=s) ],
 'ls-watch' => [ '[FILTER...]', 'list active watches with numbers and status',
 		qw(format|f=s z) ],
 'pause-watch' => [ '[WATCH_NUMBER_OR_FILTER]', qw(all local remote) ],
@@ -163,7 +164,7 @@ our %CMD = ( # sorted in order of importance/use:
 'import' => [ 'URLS_OR_PATHNAMES...|--stdin',
 	'one-time import/update from URL or filesystem',
 	qw(stdin| offset=i recursive|r exclude=s include|I=s
-	format|f=s flags!),
+	format|f=s kw|keywords|flags!),
 	],
 
 'config' => [ '[...]', sub {
diff --git a/lib/PublicInbox/LeiImport.pm b/lib/PublicInbox/LeiImport.pm
index 4a9af8a7..2c7cbf2b 100644
--- a/lib/PublicInbox/LeiImport.pm
+++ b/lib/PublicInbox/LeiImport.pm
@@ -26,7 +26,7 @@ sub call { # the main "lei import" method
 	my ($cls, $lei, @argv) = @_;
 	my $sto = $lei->_lei_store(1);
 	$sto->write_prepare($lei);
-	$lei->{opt}->{flags} //= 1;
+	$lei->{opt}->{kw} //= 1;
 	my $fmt = $lei->{opt}->{'format'};
 	my $self = $lei->{imp} = bless {}, $cls;
 	return $lei->fail('--format unspecified') if !$fmt;
@@ -63,7 +63,7 @@ sub ipc_atfork_child {
 
 sub _import_fh {
 	my ($lei, $fh, $x) = @_;
-	my $set_kw = $lei->{opt}->{flags};
+	my $set_kw = $lei->{opt}->{kw};
 	my $fmt = $lei->{opt}->{'format'};
 	eval {
 		if ($fmt eq 'eml') {
diff --git a/t/lei.t b/t/lei.t
index eb824a30..41d854e8 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -400,7 +400,26 @@ my $test_import = sub {
 	ok($lei->(qw(q s:boolean)), 'search hit after import');
 	ok($lei->(qw(import -f eml), 't/data/message_embed.eml'),
 		'import single file by path');
-	$cleanup->();
+
+	my $str = <<'';
+From: a@b
+Message-ID: <x@y>
+Status: RO
+
+	ok($lei->([qw(import -f eml -)], undef, { %$opt, 0 => \$str }),
+		'import single file with keywords from stdin');
+	$lei->(qw(q m:x@y));
+	my $res = $json->decode($out);
+	is($res->[1], undef, 'only one result');
+	is_deeply($res->[0]->{kw}, ['seen'], "message `seen' keyword set");
+
+	$str =~ tr/x/v/; # v@y
+	ok($lei->([qw(import --no-kw -f eml -)], undef, { %$opt, 0 => \$str }),
+		'import single file with --no-kw from stdin');
+	$lei->(qw(q m:v@y));
+	$res = $json->decode($out);
+	is($res->[1], undef, 'only one result');
+	is_deeply($res->[0]->{kw}, [], 'no keywords set');
 };
 
 my $test_lei_common = sub {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (9 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 13/18] t/spawn: blocking examples for ProcessPipe Eric Wong
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

We did not complete --no-* flags properly when multiple options
are allowed.
---
 lib/PublicInbox/LEI.pm | 9 ++++++---
 t/lei.t                | 8 +++++++-
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index b058b533..8d5a921e 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -274,6 +274,8 @@ my %OPTDESC = (
 'by-mid|mid:s' => [ 'MID', 'match only by Message-ID, ignoring contents' ],
 'jobs:i' => 'set parallelism level',
 
+'kw|keywords|flags!' => 'disable/enable importing flags',
+
 # xargs, env, use "-0", git(1) uses "-z".  We support z|0 everywhere
 'z|0' => 'use NUL \\0 instead of newline (CR) to delimit lines',
 
@@ -425,7 +427,7 @@ sub _help ($;$) {
 		my (@vals, @s, @l);
 		my $x = $sw;
 		if ($x =~ s/!\z//) { # solve! => --no-solve
-			$x = "no-$x";
+			$x =~ s/(\A|\|)/$1no-/g
 		} elsif ($x =~ s/:.+//) { # optional args: $x = "mid:s"
 			@vals = (' [', undef, ']');
 		} elsif ($x =~ s/=.+//) { # required arg: $x = "type=s"
@@ -710,8 +712,9 @@ sub lei__complete {
 		}
 		puts $self, grep(/$re/, map { # generate short/long names
 			if (s/[:=].+\z//) { # req/optional args, e.g output|o=i
-			} else { # negation: solve! => no-solve|solve
-				s/\A(.+)!\z/no-$1|$1/;
+			} elsif (s/!\z//) {
+				# negation: solve! => no-solve|solve
+				s/([\w\-]+)/$1|no-$1/g
 			}
 			map {
 				my $x = length > 1 ? "--$_" : "-$_";
diff --git a/t/lei.t b/t/lei.t
index 41d854e8..df333957 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -363,7 +363,7 @@ my $test_completion = sub {
 			--mua --mua-cmd --no-local --local --verbose -v
 			--save-as --no-remote --remote --torsocks
 			--reverse -r )) {
-		ok($out{$sw}, "$sw offered as completion");
+		ok($out{$sw}, "$sw offered as `lei q' completion");
 	}
 
 	ok($lei->(qw(_complete lei q --form)), 'complete q --format');
@@ -376,6 +376,12 @@ my $test_completion = sub {
 			ok($out{$f}, "got $sw $f as output format");
 		}
 	}
+	ok($lei->(qw(_complete lei import)), 'complete import');
+	%out = map { $_ => 1 } split(/\s+/s, $out);
+	for my $sw (qw(--flags --no-flags --no-kw --kw --no-keywords
+			--keywords)) {
+		ok($out{$sw}, "$sw offered as `lei import' completion");
+	}
 };
 
 my $test_fail = sub {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/18] t/spawn: blocking examples for ProcessPipe
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (10 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 14/18] lei: abort lei_import worker on client abort Eric Wong
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

---
 t/spawn.t | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/spawn.t b/t/spawn.t
index 0eed79bb..6f811ec1 100644
--- a/t/spawn.t
+++ b/t/spawn.t
@@ -77,6 +77,11 @@ EOF
 {
 	my $fh = popen_rd([qw(printf foo\nbar)]);
 	ok(fileno($fh) >= 0, 'tied fileno works');
+	my $tfh = (tied *$fh)->{fh};
+	is($tfh->blocking(0), 1, '->blocking was true');
+	is($tfh->blocking, 0, '->blocking is false');
+	is($tfh->blocking(1), 0, '->blocking was true');
+	is($tfh->blocking, 1, '->blocking is true');
 	my @line = <$fh>;
 	is_deeply(\@line, [ "foo\n", 'bar' ], 'wantarray works on readline');
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/18] lei: abort lei_import worker on client abort
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (11 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 13/18] t/spawn: blocking examples for ProcessPipe Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 15/18] lei: @WQ_KEYS Eric Wong
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

---
 lib/PublicInbox/LEI.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 8d5a921e..d10ab170 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -856,7 +856,7 @@ sub accept_dispatch { # Listener {post_accept} callback
 sub dclose {
 	my ($self) = @_;
 	delete $self->{-progress};
-	for my $f (qw(lxs l2m)) {
+	for my $f (qw(lxs l2m imp)) {
 		my $wq = delete $self->{$f} or next;
 		if ($wq->wq_kill) {
 			$wq->wq_close

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 15/18] lei: @WQ_KEYS
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (12 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 14/18] lei: abort lei_import worker on client abort Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 16/18] init: lowercase -j for --jobs Eric Wong
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

---
 lib/PublicInbox/LEI.pm | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index d10ab170..28ad88e7 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -286,6 +286,8 @@ my %CONFIG_KEYS = (
 	'leistore.dir' => 'top-level storage location',
 );
 
+my @WQ_KEYS = qw(lxs l2m imp); # internal workers
+
 # pronounced "exit": x_it(1 << 8) => exit(1); x_it(13) => SIGPIPE
 sub x_it ($$) {
 	my ($self, $code) = @_;
@@ -296,7 +298,7 @@ sub x_it ($$) {
 		send($s, "x_it $code", MSG_EOR);
 	} elsif ($self->{oneshot}) {
 		# don't want to end up using $? from child processes
-		for my $f (qw(lxs l2m)) {
+		for my $f (@WQ_KEYS) {
 			my $wq = delete $self->{$f} or next;
 			$wq->DESTROY;
 		}
@@ -327,7 +329,7 @@ sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
 
 sub fail_handler ($;$$) {
 	my ($lei, $code, $io) = @_;
-	for my $f (qw(imp lxs l2m)) {
+	for my $f (@WQ_KEYS) {
 		my $wq = delete $lei->{$f} or next;
 		$wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
 	}
@@ -335,7 +337,7 @@ sub fail_handler ($;$$) {
 	$lei->x_it($code // (1 >> 8));
 }
 
-sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
+sub sigpipe_handler { # handles SIGPIPE from @WQ_KEYS workers
 	fail_handler($_[0], 13, delete $_[0]->{1});
 }
 
@@ -856,7 +858,7 @@ sub accept_dispatch { # Listener {post_accept} callback
 sub dclose {
 	my ($self) = @_;
 	delete $self->{-progress};
-	for my $f (qw(lxs l2m imp)) {
+	for my $f (@WQ_KEYS) {
 		my $wq = delete $self->{$f} or next;
 		if ($wq->wq_kill) {
 			$wq->wq_close

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 16/18] init: lowercase -j for --jobs
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (13 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 15/18] lei: @WQ_KEYS Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 17/18] lei: add-external --mirror support Eric Wong
  2021-02-05 12:07 ` [PATCH 18/18] lei_query: trim curl options Eric Wong
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

This is taken from common implementations of make(1).
---
 script/public-inbox-init | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/script/public-inbox-init b/script/public-inbox-init
index 6a867a22..e93cab73 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -24,7 +24,7 @@ options:
   --ng NEWSGROUP      set NNTP newsgroup name
   --skip-artnum=NUM   NNTP article numbers to skip
   --skip-epoch=NUM    epochs to skip (-V2 only)
-  -J JOBS             number of indexing jobs (-V2 only), (default: 4)
+  -j JOBS             number of indexing jobs (-V2 only), (default: 4)
 
 See public-inbox-init(1) man page for full documentation.
 EOF

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 17/18] lei: add-external --mirror support
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (14 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 16/18] init: lowercase -j for --jobs Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  2021-02-05 12:07 ` [PATCH 18/18] lei_query: trim curl options Eric Wong
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

---
 MANIFEST                       |   2 +
 lib/PublicInbox/Admin.pm       |   7 +-
 lib/PublicInbox/LEI.pm         |   2 +-
 lib/PublicInbox/LeiCurl.pm     |  65 ++++++++
 lib/PublicInbox/LeiExternal.pm |  26 ++-
 lib/PublicInbox/LeiMirror.pm   | 283 +++++++++++++++++++++++++++++++++
 lib/PublicInbox/LeiXSearch.pm  |  32 +---
 7 files changed, 379 insertions(+), 38 deletions(-)
 create mode 100644 lib/PublicInbox/LeiCurl.pm
 create mode 100644 lib/PublicInbox/LeiMirror.pm

diff --git a/MANIFEST b/MANIFEST
index a11d4106..ab692d28 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -177,9 +177,11 @@ lib/PublicInbox/InputPipe.pm
 lib/PublicInbox/Isearch.pm
 lib/PublicInbox/KQNotify.pm
 lib/PublicInbox/LEI.pm
+lib/PublicInbox/LeiCurl.pm
 lib/PublicInbox/LeiDedupe.pm
 lib/PublicInbox/LeiExternal.pm
 lib/PublicInbox/LeiImport.pm
+lib/PublicInbox/LeiMirror.pm
 lib/PublicInbox/LeiOverview.pm
 lib/PublicInbox/LeiQuery.pm
 lib/PublicInbox/LeiSearch.pm
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 3b38a5a3..b21fb241 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -273,8 +273,8 @@ EOM
 	$idx->{nidx} // 0; # returns number processed
 }
 
-sub progress_prepare ($) {
-	my ($opt) = @_;
+sub progress_prepare ($;$) {
+	my ($opt, $dst) = @_;
 
 	# public-inbox-index defaults to quiet, -xcpdb and -compact do not
 	if (defined($opt->{quiet}) && $opt->{quiet} < 0) {
@@ -286,7 +286,8 @@ sub progress_prepare ($) {
 		$opt->{1} = $null; # suitable for spawn() redirect
 	} else {
 		$opt->{verbose} ||= 1;
-		$opt->{-progress} = sub { print STDERR @_ };
+		$dst //= *STDERR{GLOB};
+		$opt->{-progress} = sub { print $dst @_ };
 	}
 }
 
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 28ad88e7..3ef19918 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -115,7 +115,7 @@ our %CMD = ( # sorted in order of importance/use:
 
 'add-external' => [ 'URL_OR_PATHNAME',
 	'add/set priority of a publicinbox|extindex for extra matches',
-	qw(boost=i quiet|q) ],
+	qw(boost=i quiet|q verbose|v c=s@ mirror=s no-torsocks torsocks=s) ],
 'ls-external' => [ '[FILTER...]', 'list publicinbox|extindex locations',
 	qw(format|f=s z|0 local remote quiet|q) ],
 'forget-external' => [ 'URL_OR_PATHNAME...|--prune',
diff --git a/lib/PublicInbox/LeiCurl.pm b/lib/PublicInbox/LeiCurl.pm
new file mode 100644
index 00000000..c8747d4f
--- /dev/null
+++ b/lib/PublicInbox/LeiCurl.pm
@@ -0,0 +1,65 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# common option and torsocks(1) wrapping for curl(1)
+package PublicInbox::LeiCurl;
+use strict;
+use v5.10.1;
+use PublicInbox::Spawn qw(which);
+use PublicInbox::Config;
+
+# prepares a common command for curl(1) based on $lei command
+sub new {
+	my ($cls, $lei, $curl) = @_;
+	$curl //= which('curl') // return $lei->fail('curl not found');
+	my $opt = $lei->{opt};
+	my @cmd = ($curl, qw(-Sf));
+	$cmd[-1] .= 's' if $opt->{quiet}; # already the default for "lei q"
+	$cmd[-1] .= 'v' if $opt->{verbose}; # we use ourselves, too
+	for my $o ($lei->curl_opt) {
+		$o =~ s/\|[a-z0-9]\b//i; # remove single char short option
+		if ($o =~ s/=[is]@\z//) {
+			my $ary = $opt->{$o} or next;
+			push @cmd, map { ("--$o", $_) } @$ary;
+		} elsif ($o =~ s/=[is]\z//) {
+			my $val = $opt->{$o} // next;
+			push @cmd, "--$o", $val;
+		} elsif ($opt->{$o}) {
+			push @cmd, "--$o";
+		}
+	}
+	push @cmd, '-v' if $opt->{verbose}; # lei uses this itself
+	bless \@cmd, $cls;
+}
+
+sub torsocks { # useful for "git clone" and "git fetch", too
+	my ($self, $lei, $uri)= @_;
+	my $opt = $lei->{opt};
+	$opt->{torsocks} = 'false' if $opt->{'no-torsocks'};
+	my $torsocks = $opt->{torsocks} //= 'auto';
+	if ($torsocks eq 'auto' && substr($uri->host, -6) eq '.onion' &&
+			(($lei->{env}->{LD_PRELOAD}//'') !~ /torsocks/)) {
+		# "auto" continues anyways if torsocks is missing;
+		# a proxy may be specified via CLI, curlrc,
+		# environment variable, or even firewall rule
+		[ ($lei->{torsocks} //= which('torsocks')) // () ]
+	} elsif (PublicInbox::Config::git_bool($torsocks)) {
+		my $x = $lei->{torsocks} //= which('torsocks');
+		$x or return $lei->fail(<<EOM);
+--torsocks=yes specified but torsocks not found in PATH=$ENV{PATH}
+EOM
+		[ $x ];
+	} else { # the common case for current Internet :<
+		[];
+	}
+}
+
+# completes the result of cmd() for $uri
+sub for_uri {
+	my ($self, $lei, $uri) = @_;
+	my $pfx = torsocks($self, $lei, $uri) or return; # error
+	[ @$pfx, @$self, substr($uri->path, -3) eq '.gz' ? () : '--compressed',
+		$uri->as_string ]
+}
+
+1;
diff --git a/lib/PublicInbox/LeiExternal.pm b/lib/PublicInbox/LeiExternal.pm
index accacf1a..53c222b7 100644
--- a/lib/PublicInbox/LeiExternal.pm
+++ b/lib/PublicInbox/LeiExternal.pm
@@ -88,14 +88,10 @@ sub get_externals {
 	();
 }
 
-sub lei_add_external {
+sub add_external_finish {
 	my ($self, $location) = @_;
 	my $cfg = $self->_lei_cfg(1);
 	my $new_boost = $self->{opt}->{boost} // 0;
-	$location = ext_canonicalize($location);
-	if ($location !~ m!\Ahttps?://! && !-d $location) {
-		return $self->fail("$location not a directory");
-	}
 	my $key = "external.$location.boost";
 	my $cur_boost = $cfg->{$key};
 	return if defined($cur_boost) && $cur_boost == $new_boost; # idempotent
@@ -103,6 +99,26 @@ sub lei_add_external {
 	$self->_lei_store(1)->done; # just create the store
 }
 
+sub lei_add_external {
+	my ($self, $location) = @_;
+	my $new_boost = $self->{opt}->{boost} // 0;
+	$location = ext_canonicalize($location);
+	my $mirror = $self->{opt}->{mirror};
+	if (defined($mirror) && -d $location) {
+		$self->fail(<<""); # TODO: did you mean "update-external?"
+--mirror destination `$location' already exists
+
+	}
+	if ($location !~ m!\Ahttps?://! && !-d $location) {
+		$mirror // return $self->fail("$location not a directory");
+		$mirror = ext_canonicalize($mirror);
+		require PublicInbox::LeiMirror;
+		PublicInbox::LeiMirror->start($self, $mirror => $location);
+	} else {
+		add_external_finish($self, $location);
+	}
+}
+
 sub lei_forget_external {
 	my ($self, @locations) = @_;
 	my $cfg = $self->_lei_cfg(1);
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
new file mode 100644
index 00000000..21212657
--- /dev/null
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -0,0 +1,283 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# "lei add-external --mirror" support
+package PublicInbox::LeiMirror;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::IPC);
+use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
+use PublicInbox::Spawn qw(popen_rd spawn);
+use PublicInbox::PktOp;
+
+sub mirror_done { # EOF callback for main daemon
+	my ($lei) = @_;
+	my $mrr = delete $lei->{mrr};
+	$mrr->wq_wait_old($lei) if $mrr;
+	$lei->dclose;
+}
+
+# for old installations without manifest.js.gz
+sub try_scrape {
+	my ($self) = @_;
+	my $uri = URI->new($self->{src});
+	my $lei = $self->{lei};
+	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+	my $cmd = $curl->for_uri($lei, $uri);
+	my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
+	my $fh = popen_rd($cmd, $lei->{env}, $opt);
+	my $html = do { local $/; <$fh> } // die "read(curl $uri): $!";
+	close($fh) or return $lei->child_error($?, "@$cmd failed");
+
+	# we grep with URL below, we don't want Subject/From headers
+	# making us clone random URLs
+	my @urls = ($html =~ m!\bgit clone --mirror ([a-z\+]+://\S+)!g);
+	my $url = $uri->as_string;
+	chop($url) eq '/' or die "BUG: $uri not canonicalized";
+
+	# since this is for old instances w/o manifest.js.gz, try v1 first
+	return clone_v1($self) if grep(m!\A\Q$url\E/*\z!, @urls);
+	if (my @v2_urls = grep(m!\A\Q$url\E/[0-9]+\z!, @urls)) {
+		my %v2_uris = map { $_ => URI->new($_) } @v2_urls; # uniq
+		return clone_v2($self, [ values %v2_uris ]);
+	}
+
+	# filter out common URLs served by WWW (e.g /$MSGID/T/)
+	if (@urls && $url =~ s!/+[^/]+\@[^/]+/.*\z!! &&
+			grep(m!\A\Q$url\E/*\z!, @urls)) {
+		die <<"";
+E: confused by scraping <$uri>, did you mean <$url>?
+
+	}
+	@urls and die <<"";
+E: confused by scraping <$uri>, got ambiguous results:
+@urls
+
+	die "E: scraping <$uri> revealed nothing\n";
+}
+
+sub clone_cmd {
+	my ($lei) = @_;
+	my @cmd = qw(git);
+	# we support "-c $key=$val" for arbitrary git config options
+	# e.g.: git -c http.proxy=socks5h://127.0.0.1:9050
+	push(@cmd, '-c', $_) for @{$lei->{opt}->{c} // []};
+	push @cmd, qw(clone --mirror);
+	push @cmd, '-q' if $lei->{opt}->{quiet};
+	push @cmd, '-v' if $lei->{opt}->{verbose};
+	# XXX any other options to support?
+	# --reference is tricky with multiple epochs...
+	@cmd;
+}
+
+# tries the relatively new /$INBOX/_/text/config/raw endpoint
+sub _try_config {
+	my ($self) = @_;
+	my $dst = $self->{dst};
+	if (!-d $dst || !mkdir($dst)) {
+		require File::Path;
+		File::Path::mkpath($dst);
+		-d $dst or die "mkpath($dst): $!\n";
+	}
+	my $uri = URI->new($self->{src});
+	my $lei = $self->{lei};
+	my $path = $uri->path;
+	chop($path) eq '/' or die "BUG: $uri not canonicalized";
+	$uri->path($path . '/_/text/config/raw');
+	my $cmd = $self->{curl}->for_uri($lei, $uri);
+	push @$cmd, '--compressed'; # curl decompresses for us
+	my $ce = "$dst/inbox.config.example";
+	my $f = "$ce-$$.tmp";
+	open(my $fh, '+>', $f) or return $lei->err("open $f: $! (non-fatal)");
+	my $opt = { 0 => $lei->{0}, 1 => $fh, 2 => $lei->{2} };
+	$lei->err("# @$cmd") if $lei->{opt}->{verbose};
+	my $pid = spawn($cmd, $lei->{env}, $opt);
+	waitpid($pid, 0) == $pid or return $lei->err("waitpid @$cmd: $!");
+	if (($? >> 8) == 22) { # 404 missing
+		unlink($f) if -s $fh == 0;
+		return;
+	}
+	return $lei->err("# @$cmd failed (non-fatal)") if $?;
+	rename($f, $ce) or return $lei->err("link($f, $ce): $! (non-fatal)");
+	my $cfg = PublicInbox::Config::git_config_dump($f);
+	my $ibx = $self->{ibx} = {};
+	for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
+		for (qw(address newsgroup nntpmirror)) {
+			$ibx->{$_} = $cfg->{"$sec.$_"};
+		}
+	}
+}
+
+sub index_cloned_inbox {
+	my ($self, $iv) = @_;
+	my $ibx = delete($self->{ibx}) // {
+		address => [ 'lei@example.com' ],
+		version => $iv,
+	};
+	$ibx->{inboxdir} = $self->{dst};
+	PublicInbox::Inbox->new($ibx);
+	PublicInbox::InboxWritable->new($ibx);
+	my $opt = {};
+	my $lei = $self->{lei};
+	for (qw(fsync jobs indexlevel compact max_size batch_size
+			quiet verbose sequential_shard skip-docdata)) {
+		$opt->{$_} = $lei->{opt}->{$_};
+	}
+	# force synchronous dwaitpid for v2:
+	local $PublicInbox::DS::in_loop = 0;
+	PublicInbox::Admin::progress_prepare($opt, $lei->{2});
+	PublicInbox::Admin::index_inbox($ibx, undef, $opt);
+}
+
+sub clone_v1 {
+	my ($self) = @_;
+	my $lei = $self->{lei};
+	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+	my $uri = URI->new($self->{src});
+	my $pfx = $curl->torsocks($lei, $uri) or return;
+	my $cmd = [ @$pfx, clone_cmd($lei), $uri->as_string, $self->{dst} ];
+	$lei->err("# @$cmd") if $lei->{opt}->{verbose};
+	my $pid = spawn($cmd, $lei->{env}, $lei);
+	waitpid($pid, 0) == $pid or die "BUG: waitpid @$cmd: $!";
+	$? == 0 or return $lei->child_error($?, "@$cmd failed");
+	_try_config($self);
+	index_cloned_inbox($self, 1);
+}
+
+sub clone_v2 {
+	my ($self, $v2_uris) = @_;
+	my $lei = $self->{lei};
+	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+	my $pfx //= $curl->torsocks($lei, $v2_uris->[0]) or return;
+	my @epochs;
+	my $dst = $self->{dst};
+	my @src_edst;
+	for my $uri (@$v2_uris) {
+		my $src = $uri->as_string;
+		my $edst = $dst;
+		$src =~ m!/([0-9]+)(?:\.git)?\z! or die <<"";
+failed to extract epoch number from $src
+
+		my $nr = $1 + 0;
+		$edst .= "/git/$nr.git";
+		push @src_edst, [ $src, $edst ];
+	}
+	my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
+	_try_config($self);
+	my $on_destroy = $lk->lock_for_scope($$);
+	my @cmd = clone_cmd($lei);
+	while (my $pair = shift(@src_edst)) {
+		my $cmd = [ @$pfx, @cmd, @$pair ];
+		$lei->err("# @$cmd") if $lei->{opt}->{verbose};
+		my $pid = spawn($cmd, $lei->{env}, $lei);
+		waitpid($pid, 0) == $pid or die "BUG: waitpid @$cmd: $!";
+		$? == 0 or return $lei->child_error($?, "@$cmd failed");
+	}
+	undef $on_destroy; # unlock
+	index_cloned_inbox($self, 2);
+}
+
+sub try_manifest {
+	my ($self) = @_;
+	my $uri = URI->new($self->{src});
+	my $lei = $self->{lei};
+	my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+	my $path = $uri->path;
+	chop($path) eq '/' or die "BUG: $uri not canonicalized";
+	$uri->path($path . '/manifest.js.gz');
+	my $cmd = $curl->for_uri($lei, $uri);
+	$lei->err("# @$cmd") if $lei->{opt}->{verbose};
+	my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
+	my $fh = popen_rd($cmd, $lei->{env}, $opt);
+	my $gz = do { local $/; <$fh> } // die "read(curl $uri): $!";
+	unless (close $fh) {
+		return try_scrape($self) if ($? >> 8) == 22; # 404 missing
+		return $lei->child_error($?, "@$cmd failed");
+	}
+	my $js;
+	gunzip(\$gz => \$js, MultiStream => 1) or
+		die "gunzip($uri): $GunzipError";
+	my $m = eval { PublicInbox::Config->json->decode($js) };
+	die "$uri: error decoding `$js': $@" if $@;
+	ref($m) eq 'HASH' or die "$uri unknown type: ".ref($m);
+
+	my $v1_bare = $m->{$path};
+	my @v2_epochs = grep(m!\A\Q$path\E/git/[0-9]+\.git\z!, keys %$m);
+	if (@v2_epochs) {
+		# It may be possible to have v1 + v2 in parallel someday:
+		$lei->err(<<EOM) if defined $v1_bare;
+# `$v1_bare' appears to be a v1 inbox while v2 epochs exist:
+# @v2_epochs
+# ignoring $v1_bare (use --inbox-version=1 to force v1 instead)
+EOM
+		@v2_epochs = map { $uri->path($_); $uri->clone } @v2_epochs;
+		clone_v2($self, \@v2_epochs);
+	} elsif ($v1_bare) {
+		clone_v1($self);
+	} elsif (my @maybe = grep(m!\Q$path\E!, keys %$m)) {
+		die "E: confused by <$uri>, possible matches:\n@maybe";
+	} else {
+		die "E: confused by <$uri>";
+	}
+}
+
+sub start_clone_url {
+	my ($self) = @_;
+	return try_manifest($self) if $self->{src} =~ m!\Ahttps?://!;
+	die "TODO: non-HTTP/HTTPS clone of $self->{src} not supported, yet";
+}
+
+sub do_mirror { # via wq_do
+	my ($self) = @_;
+	my $lei = $self->{lei};
+	eval {
+		my $iv = $lei->{opt}->{'inbox-version'};
+		if (defined $iv) {
+			return clone_v1($self) if $iv == 1;
+			return try_scrape($self) if $iv == 2;
+			die "bad --inbox-version=$iv\n";
+		}
+		return start_clone_url($self) if $self->{src} =~ m!://!;
+		die "TODO: cloning local directories not supported, yet";
+	};
+	return $lei->fail($@) if $@;
+	$lei->qerr("# mirrored $self->{src} => $self->{dst}");
+}
+
+sub start {
+	my ($cls, $lei, $src, $dst) = @_;
+	my $self = bless { lei => $lei, src => $src, dst => $dst }, $cls;
+	$lei->_lei_store(1)->write_prepare($lei);
+	if ($src =~ m!https?://!) {
+		require URI;
+		require PublicInbox::LeiCurl;
+	}
+	require PublicInbox::Lock;
+	require PublicInbox::Inbox;
+	require PublicInbox::Admin;
+	require PublicInbox::InboxWritable;
+	my $ops = {
+		'!' => [ $lei->can('fail_handler'), $lei ],
+		'x_it' => [ $lei->can('x_it'), $lei ],
+		'child_error' => [ $lei->can('child_error'), $lei ],
+		'' => [ \&mirror_done, $lei ],
+	};
+	($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+	$self->wq_workers_start('lei_mirror', 1, $lei->oldset, {lei => $lei});
+	my $op = delete $lei->{pkt_op_c};
+	delete $lei->{pkt_op_p};
+	$self->wq_do('do_mirror', []);
+	$self->wq_close(1);
+	$lei->event_step_init; # wait for shutdowns
+	if ($lei->{oneshot}) {
+		while ($op->{sock}) { $op->event_step }
+	}
+}
+
+sub ipc_atfork_child {
+	my ($self) = @_;
+	$self->{lei}->lei_atfork_child;
+	$self->SUPER::ipc_atfork_child;
+}
+
+1;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index f8068362..7085d164 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -212,7 +212,6 @@ sub query_remote_mboxrd {
 	my ($opt, $env) = @$lei{qw(opt env)};
 	my @qform = (q => $lei->{mset_opt}->{qstr}, x => 'm');
 	push(@qform, t => 1) if $opt->{thread};
-	my @cmd = ($self->{curl}, qw(-sSf -d), '');
 	my $verbose = $opt->{verbose};
 	my $reap;
 	my $cerr = File::Temp->new(TEMPLATE => 'curl.err-XXXX', TMPDIR => 1);
@@ -223,43 +222,17 @@ sub query_remote_mboxrd {
 		# spawn a process to force line-buffering, otherwise curl
 		# will write 1 character at-a-time and parallel outputs
 		# mmmaaayyy llloookkk llliiikkkeee ttthhhiiisss
-		push @cmd, '-v';
 		my $o = { 1 => $lei->{2}, 2 => $lei->{2} };
 		my $pid = spawn(['tail', '-f', $cerr->filename], undef, $o);
 		$reap = PublicInbox::OnDestroy->new(\&kill_reap, $pid);
 	}
-	for my $o ($lei->curl_opt) {
-		$o =~ s/\|[a-z0-9]\b//i; # remove single char short option
-		if ($o =~ s/=[is]@\z//) {
-			my $ary = $opt->{$o} or next;
-			push @cmd, map { ("--$o", $_) } @$ary;
-		} elsif ($o =~ s/=[is]\z//) {
-			my $val = $opt->{$o} // next;
-			push @cmd, "--$o", $val;
-		} elsif ($opt->{$o}) {
-			push @cmd, "--$o";
-		}
-	}
-	$opt->{torsocks} = 'false' if $opt->{'no-torsocks'};
-	my $tor = $opt->{torsocks} //= 'auto';
+	my $curl = PublicInbox::LeiCurl->new($lei, $self->{curl}) or return;
 	my $each_smsg = $lei->{ovv}->ovv_each_smsg_cb($lei);
 	for my $uri (@$uris) {
 		$lei->{-current_url} = $uri->as_string;
 		$lei->{-nr_remote_eml} = 0;
 		$uri->query_form(@qform);
-		my $cmd = [ @cmd, $uri->as_string ];
-		if ($tor eq 'auto' && substr($uri->host, -6) eq '.onion' &&
-				(($env->{LD_PRELOAD}//'') !~ /torsocks/)) {
-			unshift @$cmd, which('torsocks');
-		} elsif (PublicInbox::Config::git_bool($tor)) {
-			unshift @$cmd, which('torsocks');
-		}
-
-		# continue anyways if torsocks is missing; a proxy may be
-		# specified via CLI, curlrc, environment variable, or even
-		# firewall rule
-		shift(@$cmd) if !$cmd->[0];
-
+		my $cmd = $curl->for_uri($lei, $uri);
 		$lei->err("# @$cmd") if $verbose;
 		my ($fh, $pid) = popen_rd($cmd, $env, $rdr);
 		$fh = IO::Uncompress::Gunzip->new($fh);
@@ -440,6 +413,7 @@ sub add_uri {
 	if (my $curl = $self->{curl} //= which('curl') // 0) {
 		require PublicInbox::MboxReader;
 		require IO::Uncompress::Gunzip;
+		require PublicInbox::LeiCurl;
 		push @{$self->{remotes}}, $uri;
 	} else {
 		warn "curl missing, ignoring $uri\n";

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 18/18] lei_query: trim curl options
  2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
                   ` (15 preceding siblings ...)
  2021-02-05 12:07 ` [PATCH 17/18] lei: add-external --mirror support Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
  16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
  To: spew

---
 lib/PublicInbox/LeiQuery.pm | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 56350386..7c856032 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -152,18 +152,21 @@ sub _complete_q {
 # with other "lei q" switches.
 # FIXME: Getopt::Long doesn't easily let us support support options with
 # '.' in them (e.g. --http1.1)
+# TODO: should we depend on "-c http.*" options for things which have
+# analogues in git(1)? that would reduce likelyhood of conflicts with
+# our other CLI options
 sub curl_opt { qw(
 	abstract-unix-socket=s anyauth basic cacert=s capath=s
-	cert-status cert-type cert|E=s ciphers=s config|K=s@
-	connect-timeout=s connect-to=s cookie-jar|c=s cookie|b=s crlfile=s
+	cert-status cert-type cert=s ciphers=s config|K=s@
+	connect-timeout=s connect-to=s cookie-jar=s cookie=s crlfile=s
 	digest disable dns-interface=s dns-ipv4-addr=s dns-ipv6-addr=s
 	dns-servers=s doh-url=s egd-file=s engine=s false-start
 	happy-eyeballs-timeout-ms=s haproxy-protocol header|H=s@
-	http2-prior-knowledge http2 insecure|k
+	http2-prior-knowledge http2 insecure
 	interface=s ipv4 ipv6 junk-session-cookies
-	key-type=s key=s limit-rate=s local-port=s location-trusted location|L
+	key-type=s key=s limit-rate=s local-port=s location-trusted location
 	max-redirs=i max-time=s negotiate netrc-file=s netrc-optional netrc
-	no-alpn no-buffer|N no-npn no-sessionid noproxy=s ntlm-wb ntlm
+	no-alpn no-buffer no-npn no-sessionid noproxy=s ntlm-wb ntlm
 	pass=s pinnedpubkey=s post301 post302 post303 preproxy=s
 	proxy-anyauth proxy-basic proxy-cacert=s proxy-capath=s
 	proxy-cert-type=s proxy-cert=s proxy-ciphers=s proxy-crlfile=s
@@ -176,7 +179,7 @@ sub curl_opt { qw(
 	retry-connrefused retry-delay=s retry-max-time=s retry=i
 	sasl-ir service-name=s socks4=s socks4a=s socks5-basic
 	socks5-gssapi-service-name=s socks5-gssapi socks5-hostname=s socks5=s
-	speed-limit|Y speed-type|y ssl-allow-beast sslv2 sslv3
+	speed-limit speed-type ssl-allow-beast sslv2 sslv3
 	suppress-connect-headers tcp-fastopen tls-max=s
 	tls13-ciphers=s tlsauthtype=s tlspassword=s tlsuser=s
 	tlsv1 trace-ascii=s trace-time trace=s

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-02-05 12:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
2021-02-05 12:07 ` [PATCH 03/18] lei q: reorder internals to reduce FD passing Eric Wong
2021-02-05 12:07 ` [PATCH 04/18] lei q: only start pager if output is to stdout Eric Wong
2021-02-05 12:07 ` [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir Eric Wong
2021-02-05 12:07 ` [PATCH 06/18] eml: handle warning ignores for lei Eric Wong
2021-02-05 12:07 ` [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack Eric Wong
2021-02-05 12:07 ` [PATCH 08/18] lei_query: remove uneeded dwaitpid import Eric Wong
2021-02-05 12:07 ` [PATCH 09/18] lei_xsearch: drop unused imports Eric Wong
2021-02-05 12:07 ` [PATCH 10/18] lei import: initial implementation Eric Wong
2021-02-05 12:07 ` [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw Eric Wong
2021-02-05 12:07 ` [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords Eric Wong
2021-02-05 12:07 ` [PATCH 13/18] t/spawn: blocking examples for ProcessPipe Eric Wong
2021-02-05 12:07 ` [PATCH 14/18] lei: abort lei_import worker on client abort Eric Wong
2021-02-05 12:07 ` [PATCH 15/18] lei: @WQ_KEYS Eric Wong
2021-02-05 12:07 ` [PATCH 16/18] init: lowercase -j for --jobs Eric Wong
2021-02-05 12:07 ` [PATCH 17/18] lei: add-external --mirror support Eric Wong
2021-02-05 12:07 ` [PATCH 18/18] lei_query: trim curl options Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).