* [PATCH 01/18] lei q: delay worker spawn
@ 2021-02-05 12:07 Eric Wong
2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
` (16 more replies)
0 siblings, 17 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
---
lib/PublicInbox/LeiQuery.pm | 19 +++++--------------
lib/PublicInbox/LeiXSearch.pm | 6 ++++++
2 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 4fe40400..6b1aa40c 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -75,21 +75,12 @@ sub lei_q {
$xj ||= $lxs->concurrency($opt); # allow: "--jobs ,$WRITER_ONLY"
my $nproc = $lxs->detect_nproc; # don't memoize, schedtool(1) exists
$xj = $nproc if $xj > $nproc;
- PublicInbox::LeiOverview->new($self) or return;
- $self->atfork_prepare_wq($lxs);
- $lxs->wq_workers_start('lei_xsearch', $xj, $self->oldset);
- delete $lxs->{-ipc_atfork_child_close};
- if (my $l2m = $self->{l2m}) {
- if (defined($mj) && $mj !~ /\A[1-9][0-9]*\z/) {
- return $self->fail("`$mj' writer jobs must be >= 1");
- }
- $mj //= $nproc;
- $self->atfork_prepare_wq($l2m);
- $l2m->wq_workers_start('lei2mail', $mj, $self->oldset);
- delete $l2m->{-ipc_atfork_child_close};
+ $lxs->{jobs} = $xj;
+ if (defined($mj) && $mj !~ /\A[1-9][0-9]*\z/) {
+ return $self->fail("`$mj' writer jobs must be >= 1");
}
-
- # no forking workers after this
+ $self->{l2m}->{jobs} = ($mj // $nproc) if $self->{l2m};
+ PublicInbox::LeiOverview->new($self) or return;
my %mset_opt = map { $_ => $opt->{$_} } qw(thread limit offset);
$mset_opt{asc} = $opt->{'reverse'} ? 1 : 0;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 965617b5..ab66717c 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -406,7 +406,13 @@ sub do_query {
$lei->{ovv}->ovv_begin($lei);
my ($au_done, $zpipe);
my $l2m = $lei->{l2m};
+ $lei->atfork_prepare_wq($self);
+ $self->wq_workers_start('lei_xsearch', $self->{jobs}, $lei->oldset);
+ delete $self->{-ipc_atfork_child_close};
if ($l2m) {
+ $lei->atfork_prepare_wq($l2m);
+ $l2m->wq_workers_start('lei2mail', $l2m->{jobs}, $lei->oldset);
+ delete $l2m->{-ipc_atfork_child_close};
pipe($lei->{startq}, $au_done) or die "pipe: $!";
# 1031: F_SETPIPE_SZ
fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 02/18] ipc: localize fields assignment to prevent circular refs
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 03/18] lei q: reorder internals to reduce FD passing Eric Wong
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
Circular references are bad and can lead to surprising behavior
during worker exit.
---
lib/PublicInbox/IPC.pm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 3873649b..078aaa2c 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -338,7 +338,6 @@ sub _wq_worker_start ($$$) {
srand($seed);
eval { PublicInbox::DS->Reset };
delete @$self{qw(-wq_s1 -wq_workers -wq_ppid)};
- @$self{keys %$fields} = values(%$fields) if $fields;
$SIG{$_} = 'IGNORE' for (qw(PIPE));
$SIG{$_} = 'DEFAULT' for (qw(TTOU TTIN TERM QUIT INT CHLD));
local $0 = $self->{-wq_ident};
@@ -346,6 +345,8 @@ sub _wq_worker_start ($$$) {
# ensure we properly exit even if warn() dies:
my $end = PublicInbox::OnDestroy->new($$, sub { exit(!!$@) });
eval {
+ $fields //= {};
+ local @$self{keys %$fields} = values(%$fields);
my $on_destroy = $self->ipc_atfork_child;
local %SIG = %SIG;
wq_worker_loop($self);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 03/18] lei q: reorder internals to reduce FD passing
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 04/18] lei q: only start pager if output is to stdout Eric Wong
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
---
lib/PublicInbox/IPC.pm | 3 --
lib/PublicInbox/LEI.pm | 99 +++++++---------------------------
lib/PublicInbox/LeiOverview.pm | 28 +++-------
lib/PublicInbox/LeiToMail.pm | 28 ++++++----
lib/PublicInbox/LeiXSearch.pm | 97 ++++++++++++++++-----------------
5 files changed, 92 insertions(+), 163 deletions(-)
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 078aaa2c..7f5a3f6f 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -464,9 +464,6 @@ sub DESTROY {
ipc_worker_stop($self);
}
-# Sereal doesn't have dclone
-sub deep_clone { ipc_thaw(ipc_freeze($_[-1])) }
-
sub detect_nproc () {
# _SC_NPROCESSORS_ONLN = 84 on both Linux glibc and musl
return POSIX::sysconf(84) if $^O eq 'linux';
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 49deed13..0d4b1c11 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -286,7 +286,7 @@ sub x_it ($$) {
# make sure client sees stdout before exit
$self->{1}->autoflush(1) if $self->{1};
dump_and_clear_log();
- if (my $s = $self->{pkt_op} // $self->{sock}) {
+ if (my $s = $self->{pkt_op_p} // $self->{sock}) {
send($s, "x_it $code", MSG_EOR);
} elsif ($self->{oneshot}) {
# don't want to end up using $? from child processes
@@ -322,7 +322,8 @@ sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
sub fail ($$;$) {
my ($self, $buf, $exit_code) = @_;
err($self, $buf) if defined $buf;
- send($self->{pkt_op}, '!', MSG_EOR) if $self->{pkt_op}; # fail_handler
+ # calls fail_handler:
+ send($self->{pkt_op_p}, '!', MSG_EOR) if $self->{pkt_op_p};
x_it($self, ($exit_code // 1) << 8);
undef;
}
@@ -340,7 +341,7 @@ sub puts ($;@) { out(shift, map { "$_\n" } @_) }
sub child_error { # passes non-fatal curl exit codes to user
my ($self, $child_error) = @_; # child_error is $?
- if (my $s = $self->{pkt_op} // $self->{sock}) {
+ if (my $s = $self->{pkt_op_p} // $self->{sock}) {
# send to the parent lei-daemon or to lei(1) client
send($s, "child_error $child_error", MSG_EOR);
} elsif (!$PublicInbox::DS::in_loop) {
@@ -348,94 +349,34 @@ sub child_error { # passes non-fatal curl exit codes to user
} # else noop if client disconnected
}
-sub atfork_prepare_wq {
- my ($self, $wq) = @_;
- my $tcafc = $wq->{-ipc_atfork_child_close} //= [ $listener // () ];
- if (my $sock = $self->{sock}) {
- push @$tcafc, @$self{qw(0 1 2 3)}, $sock;
- }
- if (my $pgr = $self->{pgr}) {
- push @$tcafc, @$pgr[1,2];
- }
- if (my $old_1 = $self->{old_1}) {
- push @$tcafc, $old_1;
- }
- for my $f (qw(lxs l2m)) {
- my $ipc = $self->{$f} or next;
- push @$tcafc, grep { defined }
- @$ipc{qw(-wq_s1 -wq_s2 -ipc_req -ipc_res)};
- }
-}
-
-sub io_restore ($$) {
- my ($dst, $src) = @_;
- for my $i (0..2) { # standard FDs
- my $io = delete $src->{$i} or next;
- $dst->{$i} = $io;
- }
- for my $i (3..9) { # named (non-standard) FDs
- my $io = $src->{$i} or next;
- my @st = stat($io) or die "stat $src.$i ($io): $!";
- my $f = delete $dst->{"dev=$st[0],ino=$st[1]"} // next;
- $dst->{$f} = $io;
- delete $src->{$i};
- }
-}
-
sub note_sigpipe { # triggers sigpipe_handler
my ($self, $fd) = @_;
close(delete($self->{$fd})); # explicit close silences Perl warning
- send($self->{pkt_op}, '|', MSG_EOR) if $self->{pkt_op};
+ send($self->{pkt_op_p}, '|', MSG_EOR) if $self->{pkt_op_p};
x_it($self, 13);
}
-sub atfork_child_wq {
- my ($self, $wq) = @_;
- io_restore($self, $wq);
- -S $self->{pkt_op} or die 'BUG: {pkt_op} expected';
- io_restore($self->{l2m}, $wq);
+sub lei_atfork_child {
+ my ($self) = @_;
+ # we need to explicitly close things which are on stack
+ delete $self->{0};
+ for (delete @$self{qw(3 sock old_1 au_done)}) {
+ close($_) if defined($_);
+ }
+ if (my $op_c = delete $self->{pkt_op_c}) {
+ close(delete $op_c->{sock});
+ }
+ if (my $pgr = delete $self->{pgr}) {
+ close($_) for (@$pgr[1,2]);
+ }
+ close $listener if $listener;
+ undef $listener;
%PATH2CFG = ();
undef $errors_log;
$quit = \&CORE::exit;
$current_lei = $self; # for SIG{__WARN__}
}
-sub io_extract ($;@) {
- my ($obj, @fields) = @_;
- my @io;
- for my $f (@fields) {
- my $io = delete $obj->{$f} or next;
- my @st = stat($io) or die "W: stat $obj.$f ($io): $!";
- $obj->{"dev=$st[0],ino=$st[1]"} = $f;
- push @io, $io;
- }
- @io
-}
-
-# usage: ($lei, @io) = $lei->atfork_parent_wq($wq);
-sub atfork_parent_wq {
- my ($self, $wq) = @_;
- my $env = delete $self->{env}; # env is inherited at fork
- my $lei = bless { %$self }, ref($self);
- for my $f (qw(dedupe ovv)) {
- my $tmp = delete($lei->{$f}) or next;
- $lei->{$f} = $wq->deep_clone($tmp);
- }
- $self->{env} = $env;
- delete @$lei{qw(sock 3 -lei_store cfg old_1 pgr lxs)}; # keep l2m
- my @io = (delete(@$lei{qw(0 1 2)}),
- io_extract($lei, qw(pkt_op startq)));
- my $l2m = $lei->{l2m};
- if ($l2m && $l2m != $wq) { # $wq == lxs
- if (my $wq_s1 = $l2m->{-wq_s1}) {
- push @io, io_extract($l2m, '-wq_s1');
- $l2m->{-wq_s1} = $wq_s1;
- }
- $l2m->wq_close(1);
- }
- ($lei, @io);
-}
-
sub _help ($;$) {
my ($self, $errmsg) = @_;
my $cmd = $self->{cmd} // 'COMMAND';
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index e33d63a2..e6bf4f2a 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -207,7 +207,6 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
}
$lei->{ovv_buf} = \(my $buf = '') if !$l2m;
if ($l2m && !$ibxish) { # remote https?:// mboxrd
- delete $l2m->{-wq_s1};
my $g2m = $l2m->can('git_to_mail');
my $wcb = $l2m->write_cb($lei);
sub {
@@ -215,33 +214,20 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
$wcb->(undef, $smsg, $eml);
};
} elsif ($l2m && $l2m->{-wq_s1}) {
- my ($lei_ipc, @io) = $lei->atfork_parent_wq($l2m);
- # $io[0] becomes a notification pipe that triggers EOF
+ # $io->[0] becomes a notification pipe that triggers EOF
# in this wq worker when all outstanding ->write_mail
# calls are complete
- $io[0] = undef;
- pipe($l2m->{each_smsg_done}, $io[0]) or die "pipe: $!";
- fcntl($io[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
- delete @$lei_ipc{qw(l2m opt mset_opt cmd)};
+ my $io = [];
+ pipe($l2m->{each_smsg_done}, $io->[0]) or die "pipe: $!";
+ fcntl($io->[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
$self->{git} = $git;
my $git_dir = $git->{git_dir};
sub {
my ($smsg, $mitem) = @_;
$smsg->{pct} = get_pct($mitem) if $mitem;
- $l2m->wq_do('write_mail', \@io, $git_dir, $smsg,
- $lei_ipc);
+ $l2m->wq_do('write_mail', $io, $git_dir, $smsg);
}
- } elsif ($l2m) {
- my $wcb = $l2m->write_cb($lei);
- my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
- $self->{git} = $git; # for ovv_atexit_child
- my $g2m = $l2m->can('git_to_mail');
- sub {
- my ($smsg, $mitem) = @_;
- $smsg->{pct} = get_pct($mitem) if $mitem;
- $git->cat_async($smsg->{blob}, $g2m, [ $wcb, $smsg ]);
- };
} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
sub { # DIY prettiness :P
@@ -275,7 +261,9 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
$lei->out($buf);
$buf = '';
}
- } # else { ...
+ } else {
+ die "TODO: unhandled case $self->{fmt}"
+ }
}
no warnings 'once';
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index c704dc2a..f9250860 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -211,10 +211,10 @@ sub zsfx2cmd ($$$) {
}
sub _post_augment_mbox { # open a compressor process
- my ($self, $lei, $zpipe) = @_;
+ my ($self, $lei) = @_;
my $zsfx = $self->{zsfx} or return;
my $cmd = zsfx2cmd($zsfx, undef, $lei);
- my ($r, $w) = splice(@$zpipe, 0, 2);
+ my ($r, $w) = @{delete $lei->{zpipe}};
my $rdr = { 0 => $r, 1 => $lei->{1}, 2 => $lei->{2} };
my $pid = spawn($cmd, $lei->{env}, $rdr);
my $pp = gensym;
@@ -407,7 +407,7 @@ sub _pre_augment_mbox {
$! == ENOENT or die "unlink($dst): $!";
}
open my $out, $mode, $dst or die "open($dst): $!";
- $lei->{old_1} = $lei->{1};
+ $lei->{old_1} = $lei->{1}; # keep for spawning MUA
$lei->{1} = $out;
}
# Perl does SEEK_END even with O_APPEND :<
@@ -418,7 +418,7 @@ sub _pre_augment_mbox {
state $zsfx_allow = join('|', keys %zsfx2cmd);
($self->{zsfx}) = ($dst =~ /\.($zsfx_allow)\z/) or return;
pipe(my ($r, $w)) or die "pipe: $!";
- [ $r, $w ];
+ $lei->{zpipe} = [ $r, $w ];
}
sub _do_augment_mbox {
@@ -462,16 +462,24 @@ sub post_augment { # fast (spawn compressor or mkdir), runs in main daemon
$self->$m($lei, @args);
}
+sub ipc_atfork_child {
+ my ($self) = @_;
+ my $lei = delete $self->{lei};
+ $lei->lei_atfork_child;
+ if (my $zpipe = delete $lei->{zpipe}) {
+ $lei->{1} = $zpipe->[1];
+ close $zpipe->[0];
+ }
+ $self->{wcb} = $self->write_cb($lei);
+ $self->SUPER::ipc_atfork_child;
+}
+
sub write_mail { # via ->wq_do
- my ($self, $git_dir, $smsg, $lei) = @_;
+ my ($self, $git_dir, $smsg) = @_;
my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
- my $wcb = $self->{wcb} //= do { # first message
- $lei->atfork_child_wq($self);
- $self->write_cb($lei);
- };
my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
git_async_cat($git, $smsg->{blob}, \&git_to_mail,
- [$wcb, $smsg, $not_done]);
+ [$self->{wcb}, $smsg, $not_done]);
}
sub wq_atexit_child {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index ab66717c..e41d899e 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -110,8 +110,8 @@ sub wait_startq ($) {
sub mset_progress {
my $lei = shift;
return unless $lei->{-progress};
- if ($lei->{pkt_op}) { # called via pkt_op/pkt_do from workers
- pkt_do($lei->{pkt_op}, 'mset_progress', @_);
+ if ($lei->{pkt_op_p}) {
+ pkt_do($lei->{pkt_op_p}, 'mset_progress', @_);
} else { # single lei-daemon consumer
my ($desc, $mset_size, $mset_total_est) = @_;
$lei->{-mset_total} += $mset_size;
@@ -120,11 +120,10 @@ sub mset_progress {
}
sub query_thread_mset { # for --thread
- my ($self, $lei, $ibxish) = @_;
+ my ($self, $ibxish) = @_;
local $0 = "$0 query_thread_mset";
- $lei->atfork_child_wq($self);
+ my $lei = $self->{lei};
my $startq = delete $lei->{startq};
-
my ($srch, $over) = ($ibxish->search, $ibxish->over);
my $desc = $ibxish->{inboxdir} // $ibxish->{topdir};
return warn("$desc not indexed by Xapian\n") unless ($srch && $over);
@@ -154,9 +153,9 @@ sub query_thread_mset { # for --thread
}
sub query_mset { # non-parallel for non-"--thread" users
- my ($self, $lei) = @_;
+ my ($self) = @_;
local $0 = "$0 query_mset";
- $lei->atfork_child_wq($self);
+ my $lei = $self->{lei};
my $startq = delete $lei->{startq};
my $mo = { %{$lei->{mset_opt}} };
my $mset;
@@ -207,10 +206,10 @@ sub kill_reap {
}
sub query_remote_mboxrd {
- my ($self, $lei, $uris) = @_;
+ my ($self, $uris) = @_;
local $0 = "$0 query_remote_mboxrd";
- $lei->atfork_child_wq($self);
local $SIG{TERM} = sub { exit(0) }; # for DESTROY (File::Temp, $reap)
+ my $lei = $self->{lei};
my ($opt, $env) = @$lei{qw(opt env)};
my @qform = (q => $lei->{mset_opt}->{qstr}, x => 'm');
push(@qform, t => 1) if $opt->{thread};
@@ -307,7 +306,7 @@ sub git {
$git;
}
-sub query_done { # EOF callback
+sub query_done { # EOF callback for main daemon
my ($lei) = @_;
my $has_l2m = exists $lei->{l2m};
for my $f (qw(lxs l2m)) {
@@ -332,9 +331,8 @@ Error closing $lei->{ovv}->{dst}: $!
}
sub do_post_augment {
- my ($lei, $zpipe, $au_done) = @_;
- my $l2m = $lei->{l2m} or die 'BUG: no {l2m}';
- eval { $l2m->post_augment($lei, $zpipe) };
+ my ($lei) = @_;
+ eval { $lei->{l2m}->post_augment($lei) };
if (my $err = $@) {
if (my $lxs = delete $lei->{lxs}) {
$lxs->wq_kill;
@@ -342,7 +340,7 @@ sub do_post_augment {
}
$lei->fail("$err");
}
- close $au_done; # triggers wait_startq
+ close(delete $lei->{au_done}); # triggers wait_startq
}
my $MAX_PER_HOST = 4;
@@ -356,13 +354,13 @@ sub concurrency {
}
sub start_query { # always runs in main (lei-daemon) process
- my ($self, $io, $lei) = @_;
+ my ($self, $lei) = @_;
if ($lei->{opt}->{thread}) {
for my $ibxish (locals($self)) {
- $self->wq_do('query_thread_mset', $io, $lei, $ibxish);
+ $self->wq_do('query_thread_mset', [], $ibxish);
}
} elsif (locals($self)) {
- $self->wq_do('query_mset', $io, $lei);
+ $self->wq_do('query_mset', []);
}
my $i = 0;
my $q = [];
@@ -370,19 +368,23 @@ sub start_query { # always runs in main (lei-daemon) process
push @{$q->[$i++ % $MAX_PER_HOST]}, $uri;
}
for my $uris (@$q) {
- $self->wq_do('query_remote_mboxrd', $io, $lei, $uris);
+ $self->wq_do('query_remote_mboxrd', [], $uris);
}
- @$io = ();
+}
+
+sub ipc_atfork_child {
+ my ($self) = @_;
+ $self->{lei}->lei_atfork_child;
+ $self->SUPER::ipc_atfork_child;
}
sub query_prepare { # called by wq_do
- my ($self, $lei) = @_;
+ my ($self) = @_;
local $0 = "$0 query_prepare";
- $lei->atfork_child_wq($self);
- delete $lei->{l2m}->{-wq_s1};
+ my $lei = $self->{lei};
eval { $lei->{l2m}->do_augment($lei) };
$lei->fail($@) if $@;
- pkt_do($lei->{pkt_op}, '.') == 1 or die "do_post_augment trigger: $!"
+ pkt_do($lei->{pkt_op_p}, '.') == 1 or die "do_post_augment trigger: $!"
}
sub fail_handler ($;$$) {
@@ -401,45 +403,38 @@ sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
sub do_query {
my ($self, $lei) = @_;
- $lei->{1}->autoflush(1);
- $lei->start_pager if -t $lei->{1};
- $lei->{ovv}->ovv_begin($lei);
- my ($au_done, $zpipe);
- my $l2m = $lei->{l2m};
- $lei->atfork_prepare_wq($self);
- $self->wq_workers_start('lei_xsearch', $self->{jobs}, $lei->oldset);
- delete $self->{-ipc_atfork_child_close};
- if ($l2m) {
- $lei->atfork_prepare_wq($l2m);
- $l2m->wq_workers_start('lei2mail', $l2m->{jobs}, $lei->oldset);
- delete $l2m->{-ipc_atfork_child_close};
- pipe($lei->{startq}, $au_done) or die "pipe: $!";
- # 1031: F_SETPIPE_SZ
- fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
- $zpipe = $l2m->pre_augment($lei);
- }
my $ops = {
'|' => [ \&sigpipe_handler, $lei ],
'!' => [ \&fail_handler, $lei ],
- '.' => [ \&do_post_augment, $lei, $zpipe, $au_done ],
+ '.' => [ \&do_post_augment, $lei ],
'' => [ \&query_done, $lei ],
'mset_progress' => [ \&mset_progress, $lei ],
'x_it' => [ $lei->can('x_it'), $lei ],
'child_error' => [ $lei->can('child_error'), $lei ],
};
- (my $op, $lei->{pkt_op}) = PublicInbox::PktOp->pair($ops);
- my ($lei_ipc, @io) = $lei->atfork_parent_wq($self);
- delete($lei->{pkt_op});
-
- $lei->event_step_init; # wait for shutdowns
+ ($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+ $lei->{1}->autoflush(1);
+ $lei->start_pager if -t $lei->{1};
+ $lei->{ovv}->ovv_begin($lei);
+ my $l2m = $lei->{l2m};
if ($l2m) {
- $self->wq_do('query_prepare', \@io, $lei_ipc);
- $io[1] = $zpipe->[1] if $zpipe;
+ $l2m->pre_augment($lei);
+ $l2m->wq_workers_start('lei2mail', $l2m->{jobs},
+ $lei->oldset, { lei => $lei });
+ pipe($lei->{startq}, $lei->{au_done}) or die "pipe: $!";
+ # 1031: F_SETPIPE_SZ
+ fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
}
- start_query($self, \@io, $lei_ipc);
- $self->wq_close(1);
+ $self->wq_workers_start('lei_xsearch', $self->{jobs},
+ $lei->oldset, { lei => $lei });
+ my $op = delete $lei->{pkt_op_c};
+ delete $lei->{pkt_op_p};
+ $l2m->wq_close(1) if $l2m;
+ $lei->event_step_init; # wait for shutdowns
+ $self->wq_do('query_prepare', []) if $l2m;
+ start_query($self, $lei);
+ $self->wq_close(1); # lei_xsearch workers stop when done
if ($lei->{oneshot}) {
- # for the $lei_ipc->atfork_child_wq PIPE handler:
while ($op->{sock}) { $op->event_step }
}
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 04/18] lei q: only start pager if output is to stdout
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
2021-02-05 12:07 ` [PATCH 03/18] lei q: reorder internals to reduce FD passing Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir Eric Wong
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
No need to be starting a pager if we're writing to a regular file.
---
lib/PublicInbox/LeiOverview.pm | 3 +--
lib/PublicInbox/LeiXSearch.pm | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index e6bf4f2a..3125f015 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -78,9 +78,8 @@ sub new {
if ($fmt =~ /\A($JSONL|(?:concat)?json)\z/) {
$json = $self->{json} = ref(PublicInbox::Config->json);
}
- my ($isatty, $seekable);
if ($dst eq '/dev/stdout') {
- $isatty = -t $lei->{1};
+ my $isatty = $lei->{need_pager} = -t $lei->{1};
$opt->{pretty} //= $isatty;
if (!$isatty && -f _) {
my $fl = fcntl($lei->{1}, F_GETFL, 0) //
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e41d899e..0ca871ea 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -414,7 +414,7 @@ sub do_query {
};
($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
$lei->{1}->autoflush(1);
- $lei->start_pager if -t $lei->{1};
+ $lei->start_pager if delete $lei->{need_pager};
$lei->{ovv}->ovv_begin($lei);
my $l2m = $lei->{l2m};
if ($l2m) {
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (2 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 04/18] lei q: only start pager if output is to stdout Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 06/18] eml: handle warning ignores for lei Eric Wong
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
Once all files are written, we can use utime() to poke Maildirs
to wake up MUAs that fail to account for nanosecond timestamps
resolution.
---
lib/PublicInbox/LEI.pm | 1 +
lib/PublicInbox/LeiToMail.pm | 13 +++++++++++++
lib/PublicInbox/LeiXSearch.pm | 15 +++++++++------
3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 0d4b1c11..24efb494 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -739,6 +739,7 @@ sub start_mua {
} elsif ($self->{oneshot}) {
$self->{"mua.pid.$self.$$"} = spawn(\@cmd);
}
+ delete $self->{-progress};
}
# caller needs to "-t $self->{1}" to check if tty
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index f9250860..5a6f18fb 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -365,6 +365,7 @@ sub new {
} else {
die "bad mail --format=$fmt\n";
}
+ $self->{dst} = $dst;
$lei->{dedupe} = PublicInbox::LeiDedupe->new($lei);
$self;
}
@@ -474,6 +475,18 @@ sub ipc_atfork_child {
$self->SUPER::ipc_atfork_child;
}
+sub lock_free {
+ $_[0]->{base_type} =~ /\A(?:maildir|mh|imap|jmap)\z/ ? 1 : 0;
+}
+
+sub poke_dst {
+ my ($self) = @_;
+ if ($self->{base_type} eq 'maildir') {
+ my $t = time + 1;
+ utime($t, $t, "$self->{dst}/cur");
+ }
+}
+
sub write_mail { # via ->wq_do
my ($self, $git_dir, $smsg) = @_;
my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 0ca871ea..e7f0ef63 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -308,13 +308,13 @@ sub git {
sub query_done { # EOF callback for main daemon
my ($lei) = @_;
- my $has_l2m = exists $lei->{l2m};
- for my $f (qw(lxs l2m)) {
- my $wq = delete $lei->{$f} or next;
- $wq->wq_wait_old($lei);
+ my $l2m = delete $lei->{l2m};
+ $l2m->wq_wait_old($lei) if $l2m;
+ if (my $lxs = delete $lei->{lxs}) {
+ $lxs->wq_wait_old($lei);
}
$lei->{ovv}->ovv_end($lei);
- if ($has_l2m) { # close() calls LeiToMail reap_compress
+ if ($l2m) { # close() calls LeiToMail reap_compress
if (my $out = delete $lei->{old_1}) {
if (my $mbout = $lei->{1}) {
close($mbout) or return $lei->fail(<<"");
@@ -323,7 +323,7 @@ Error closing $lei->{ovv}->{dst}: $!
}
$lei->{1} = $out;
}
- $lei->start_mua;
+ $l2m->lock_free ? $l2m->poke_dst : $lei->start_mua;
}
$lei->{-progress} and
$lei->err('# ', $lei->{-mset_total} // 0, " matches");
@@ -355,6 +355,9 @@ sub concurrency {
sub start_query { # always runs in main (lei-daemon) process
my ($self, $lei) = @_;
+ if (my $l2m = $lei->{l2m}) {
+ $lei->start_mua if $l2m->lock_free;
+ }
if ($lei->{opt}->{thread}) {
for my $ibxish (locals($self)) {
$self->wq_do('query_thread_mset', [], $ibxish);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 06/18] eml: handle warning ignores for lei
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (3 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack Eric Wong
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
There's nothing we can do about bad emails in our search
results, so quiet things down and don't fight the MUA for
the terminal.
---
lib/PublicInbox/Admin.pm | 7 +++----
lib/PublicInbox/Eml.pm | 19 +++++++++++++++++++
lib/PublicInbox/InboxWritable.pm | 24 +-----------------------
lib/PublicInbox/LeiToMail.pm | 1 +
lib/PublicInbox/Watch.pm | 14 ++++++--------
5 files changed, 30 insertions(+), 35 deletions(-)
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index f96397ea..3b38a5a3 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -10,6 +10,7 @@ our @EXPORT_OK = qw(setup_signals);
use PublicInbox::Config;
use PublicInbox::Inbox;
use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::Eml;
*rel2abs_collapsed = \&PublicInbox::Config::rel2abs_collapsed;
sub setup_signals {
@@ -241,12 +242,10 @@ sub index_inbox {
}
local %SIG = %SIG;
setup_signals(\&index_terminate, $ibx);
- my $warn_cb = $SIG{__WARN__} // \&CORE::warn;
my $idx = { current_info => $ibx->{inboxdir} };
- my $warn_ignore = PublicInbox::InboxWritable->can('warn_ignore');
local $SIG{__WARN__} = sub {
- return if $warn_ignore->(@_);
- $warn_cb->($idx->{current_info}, ': ', @_);
+ return if PublicInbox::Eml::warn_ignore(@_);
+ warn($idx->{current_info}, ': ', @_);
};
if (ref($ibx) && $ibx->version == 2) {
eval { require PublicInbox::V2Writable };
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index bd27f19b..f7f62e7b 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -477,6 +477,25 @@ sub charset_set {
sub crlf { $_[0]->{crlf} // "\n" }
+# warnings to ignore when handling spam mailboxes and maybe other places
+sub warn_ignore {
+ my $s = "@_";
+ # Email::Address::XS warnings
+ $s =~ /^Argument contains empty address at /
+ || $s =~ /^Element at index [0-9]+ contains /
+ # PublicInbox::MsgTime
+ || $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
+ || $s =~ /^bad Date: .+? in /
+ # Encode::Unicode::UTF7
+ || $s =~ /^Bad UTF7 data escape at /
+}
+
+# this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
+sub warn_ignore_cb {
+ my $cb = $SIG{__WARN__} // \&CORE::warn;
+ sub { $cb->(@_) unless warn_ignore(@_) }
+}
+
sub willneed { re_memo($_) for @_ }
willneed(qw(From To Cc Date Subject Content-Type In-Reply-To References
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 982ad6e5..3a4012cd 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -9,7 +9,7 @@ use parent qw(PublicInbox::Inbox Exporter);
use PublicInbox::Import;
use PublicInbox::Filter::Base qw(REJECT);
use Errno qw(ENOENT);
-our @EXPORT_OK = qw(eml_from_path warn_ignore_cb);
+our @EXPORT_OK = qw(eml_from_path);
use constant {
PERM_UMASK => 0,
@@ -277,28 +277,6 @@ sub cleanup ($) {
delete @{$_[0]}{qw(over mm git search)};
}
-# warnings to ignore when handling spam mailboxes and maybe other places
-sub warn_ignore {
- my $s = "@_";
- # Email::Address::XS warnings
- $s =~ /^Argument contains empty address at /
- || $s =~ /^Element at index [0-9]+ contains /
- # PublicInbox::MsgTime
- || $s =~ /^bogus TZ offset: .+?, ignoring and assuming \+0000/
- || $s =~ /^bad Date: .+? in /
- # Encode::Unicode::UTF7
- || $s =~ /^Bad UTF7 data escape at /
-}
-
-# this expects to be RHS in this assignment: "local $SIG{__WARN__} = ..."
-sub warn_ignore_cb {
- my $cb = $SIG{__WARN__} // \&CORE::warn;
- sub {
- return if warn_ignore(@_);
- $cb->(@_);
- }
-}
-
# v2+ only, XXX: maybe we can just rely on ->max_git_epoch and remove
sub git_dir_latest {
my ($self, $max) = @_;
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 5a6f18fb..1f815e40 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -472,6 +472,7 @@ sub ipc_atfork_child {
close $zpipe->[0];
}
$self->{wcb} = $self->write_cb($lei);
+ $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
$self->SUPER::ipc_atfork_child;
}
diff --git a/lib/PublicInbox/Watch.pm b/lib/PublicInbox/Watch.pm
index 2b44ba43..185e5da8 100644
--- a/lib/PublicInbox/Watch.pm
+++ b/lib/PublicInbox/Watch.pm
@@ -7,7 +7,7 @@ package PublicInbox::Watch;
use strict;
use v5.10.1;
use PublicInbox::Eml;
-use PublicInbox::InboxWritable qw(eml_from_path warn_ignore_cb);
+use PublicInbox::InboxWritable qw(eml_from_path);
use PublicInbox::Filter::Base qw(REJECT);
use PublicInbox::Spamcheck;
use PublicInbox::Sigfd;
@@ -174,7 +174,7 @@ sub _remove_spam {
# path must be marked as (S)een
$path =~ /:2,[A-R]*S[T-Za-z]*\z/ or return;
my $eml = eml_from_path($path) or return;
- local $SIG{__WARN__} = warn_ignore_cb();
+ local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
$self->{pi_cfg}->each_inbox(\&remove_eml_i, $self, $eml, $path);
}
@@ -414,13 +414,11 @@ sub imap_import_msg ($$$$$) {
import_eml($self, $ibx, $eml);
}
} elsif ($inboxes eq 'watchspam') {
- # we don't remove unseen messages
- if ($flags =~ /\\Seen\b/) {
- local $SIG{__WARN__} = warn_ignore_cb();
- my $eml = PublicInbox::Eml->new($raw);
- $self->{pi_cfg}->each_inbox(\&remove_eml_i,
+ return if $flags !~ /\\Seen\b/; # don't remove unseen messages
+ local $SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
+ my $eml = PublicInbox::Eml->new($raw);
+ $self->{pi_cfg}->each_inbox(\&remove_eml_i,
$self, $eml, "$url UID:$uid");
- }
} else {
die "BUG: destination unknown $inboxes";
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (4 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 06/18] eml: handle warning ignores for lei Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 08/18] lei_query: remove uneeded dwaitpid import Eric Wong
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
Another step towards simplifying lei internals.
None of our current uses of ->wq_do involve FD passing, and the
plan is only rely on FD passing between lei-daemon and lei(1).
Internally, it ought to be possible for lei-daemon internal bits
to be ordered properly to not need FD passing.
---
lib/PublicInbox/LeiOverview.pm | 23 ++---------------------
lib/PublicInbox/LeiToMail.pm | 3 +--
lib/PublicInbox/LeiXSearch.pm | 16 ++++++++++++----
3 files changed, 15 insertions(+), 27 deletions(-)
diff --git a/lib/PublicInbox/LeiOverview.pm b/lib/PublicInbox/LeiOverview.pm
index 3125f015..d3df4faa 100644
--- a/lib/PublicInbox/LeiOverview.pm
+++ b/lib/PublicInbox/LeiOverview.pm
@@ -147,17 +147,6 @@ sub _unbless_smsg {
sub ovv_atexit_child {
my ($self, $lei) = @_;
- if (my $l2m = $lei->{l2m}) {
- # wait for ->write_mail work we submitted to lei2mail
- if (my $rd = delete $l2m->{each_smsg_done}) {
- read($rd, my $buf, 1); # wait for EOF
- }
- }
- # order matters, git->{-tmp}->DESTROY must not fire until
- # {each_smsg_done} hits EOF above
- if (my $git = delete $self->{git}) {
- $git->async_wait_all;
- }
if (my $bref = delete $lei->{ovv_buf}) {
my $lk = $self->lock_for_scope;
$lei->out($$bref);
@@ -213,19 +202,11 @@ sub ovv_each_smsg_cb { # runs in wq worker usually
$wcb->(undef, $smsg, $eml);
};
} elsif ($l2m && $l2m->{-wq_s1}) {
- # $io->[0] becomes a notification pipe that triggers EOF
- # in this wq worker when all outstanding ->write_mail
- # calls are complete
- my $io = [];
- pipe($l2m->{each_smsg_done}, $io->[0]) or die "pipe: $!";
- fcntl($io->[0], 1031, 4096) if $^O eq 'linux'; # F_SETPIPE_SZ
- my $git = $ibxish->git; # (LeiXSearch|Inbox|ExtSearch)->git
- $self->{git} = $git;
- my $git_dir = $git->{git_dir};
+ my $git_dir = $ibxish->git->{git_dir};
sub {
my ($smsg, $mitem) = @_;
$smsg->{pct} = get_pct($mitem) if $mitem;
- $l2m->wq_do('write_mail', $io, $git_dir, $smsg);
+ $l2m->wq_do('write_mail', [], $git_dir, $smsg);
}
} elsif ($self->{fmt} =~ /\A(concat)?json\z/ && $lei->{opt}->{pretty}) {
my $EOR = ($1//'') eq 'concat' ? "\n}" : "\n},";
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 1f815e40..4f847221 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -490,10 +490,9 @@ sub poke_dst {
sub write_mail { # via ->wq_do
my ($self, $git_dir, $smsg) = @_;
- my $not_done = delete $self->{0} // die 'BUG: $not_done missing';
my $git = $self->{"$$\0$git_dir"} //= PublicInbox::Git->new($git_dir);
git_async_cat($git, $smsg->{blob}, \&git_to_mail,
- [$self->{wcb}, $smsg, $not_done]);
+ [$self->{wcb}, $smsg]);
}
sub wq_atexit_child {
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index e7f0ef63..2dc44414 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -287,12 +287,15 @@ sub query_remote_mboxrd {
$lei->{ovv}->ovv_atexit_child($lei);
}
-sub git {
+# called by LeiOverview::each_smsg_cb
+sub git { $_[0]->{git_tmp} // die 'BUG: caller did not set {git_tmp}' }
+
+sub git_tmp ($) {
my ($self) = @_;
my (%seen, @dirs);
- my $tmp = File::Temp->newdir('lei_xsrch_git-XXXXXXXX', TMPDIR => 1);
- for my $ibx (@{$self->{shard2ibx} // []}) {
- my $d = File::Spec->canonpath($ibx->git->{git_dir});
+ my $tmp = File::Temp->newdir("lei_xsearch_git.$$-XXXX", TMPDIR => 1);
+ for my $ibxish (locals($self)) {
+ my $d = File::Spec->canonpath($ibxish->git->{git_dir});
$seen{$d} //= push @dirs, "$d/objects\n"
}
my $git_dir = $tmp->dirname;
@@ -428,6 +431,11 @@ sub do_query {
# 1031: F_SETPIPE_SZ
fcntl($lei->{startq}, 1031, 4096) if $^O eq 'linux';
}
+ if (!$lei->{opt}->{thread} && locals($self)) { # for query_mset
+ # lei->{git_tmp} is set for wq_wait_old so we don't
+ # delete until all lei2mail + lei_xsearch workers are reaped
+ $lei->{git_tmp} = $self->{git_tmp} = git_tmp($self);
+ }
$self->wq_workers_start('lei_xsearch', $self->{jobs},
$lei->oldset, { lei => $lei });
my $op = delete $lei->{pkt_op_c};
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 08/18] lei_query: remove uneeded dwaitpid import
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (5 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 09/18] lei_xsearch: drop unused imports Eric Wong
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
All process management is handled elsewhere.
---
lib/PublicInbox/LeiQuery.pm | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 6b1aa40c..56350386 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -5,7 +5,6 @@
package PublicInbox::LeiQuery;
use strict;
use v5.10.1;
-use PublicInbox::DS qw(dwaitpid);
sub prep_ext { # externals_each callback
my ($lxs, $exclude, $loc) = @_;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 09/18] lei_xsearch: drop unused imports
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (6 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 08/18] lei_query: remove uneeded dwaitpid import Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 10/18] lei import: initial implementation Eric Wong
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
Reaping is handled by the parent PublicInbox::IPC, and we
have no business using PublicInbox::Import since LeiXSearch
won't write to git directly (it will write via LeiStore).
---
lib/PublicInbox/LeiXSearch.pm | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index 2dc44414..daf42098 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -8,9 +8,8 @@ package PublicInbox::LeiXSearch;
use strict;
use v5.10.1;
use parent qw(PublicInbox::LeiSearch PublicInbox::IPC);
-use PublicInbox::DS qw(dwaitpid now);
+use PublicInbox::DS qw(now);
use PublicInbox::PktOp qw(pkt_do);
-use PublicInbox::Import;
use File::Temp 0.19 (); # 0.19 for ->newdir
use File::Spec ();
use PublicInbox::Search qw(xap_terms);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 10/18] lei import: initial implementation
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (7 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 09/18] lei_xsearch: drop unused imports Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw Eric Wong
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
---
MANIFEST | 1 +
lib/PublicInbox/IPC.pm | 4 +-
lib/PublicInbox/LEI.pm | 48 ++++++++++++---
lib/PublicInbox/LeiImport.pm | 106 ++++++++++++++++++++++++++++++++++
lib/PublicInbox/LeiStore.pm | 18 ++++++
lib/PublicInbox/LeiXSearch.pm | 18 +-----
t/lei.t | 15 +++++
7 files changed, 184 insertions(+), 26 deletions(-)
create mode 100644 lib/PublicInbox/LeiImport.pm
diff --git a/MANIFEST b/MANIFEST
index 6922f9b1..a11d4106 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -179,6 +179,7 @@ lib/PublicInbox/KQNotify.pm
lib/PublicInbox/LEI.pm
lib/PublicInbox/LeiDedupe.pm
lib/PublicInbox/LeiExternal.pm
+lib/PublicInbox/LeiImport.pm
lib/PublicInbox/LeiOverview.pm
lib/PublicInbox/LeiQuery.pm
lib/PublicInbox/LeiSearch.pm
diff --git a/lib/PublicInbox/IPC.pm b/lib/PublicInbox/IPC.pm
index 7f5a3f6f..a0e6bfee 100644
--- a/lib/PublicInbox/IPC.pm
+++ b/lib/PublicInbox/IPC.pm
@@ -101,7 +101,7 @@ sub ipc_worker_loop ($$$) {
# starts a worker if Sereal or Storable is installed
sub ipc_worker_spawn {
- my ($self, $ident, $oldset) = @_;
+ my ($self, $ident, $oldset, $fields) = @_;
return unless $enc; # no Sereal or Storable
return if ($self->{-ipc_ppid} // -1) == $$; # idempotent
delete(@$self{qw(-ipc_req -ipc_res -ipc_ppid -ipc_pid)});
@@ -123,6 +123,8 @@ sub ipc_worker_spawn {
# ensure we properly exit even if warn() dies:
my $end = PublicInbox::OnDestroy->new($$, sub { exit(!!$@) });
eval {
+ $fields //= {};
+ local @$self{keys %$fields} = values(%$fields);
my $on_destroy = $self->ipc_atfork_child;
local %SIG = %SIG;
ipc_worker_loop($self, $r_req, $w_res);
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 24efb494..682d1bd1 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -160,9 +160,10 @@ our %CMD = ( # sorted in order of importance/use:
'forget-watch' => [ '{WATCH_NUMBER|--prune}', 'stop and forget a watch',
qw(prune) ],
-'import' => [ 'URL_OR_PATHNAME|--stdin',
- 'one-shot import/update from URL or filesystem',
- qw(stdin| offset=i recursive|r exclude=s include=s !flags),
+'import' => [ 'URLS_OR_PATHNAMES...|--stdin',
+ 'one-time import/update from URL or filesystem',
+ qw(stdin| offset=i recursive|r exclude=s include|I=s
+ format|f=s flags!),
],
'config' => [ '[...]', sub {
@@ -194,8 +195,8 @@ our %CMD = ( # sorted in order of importance/use:
# $spec => [@ALLOWED_VALUES (default is first), $description],
# $spec => $description
# "$SUB_COMMAND TAB $spec" => as above
-my $stdin_formats = [ 'IN|auto|raw|mboxrd|mboxcl2|mboxcl|mboxo',
- 'specify message input format' ];
+my $stdin_formats = [ 'MAIL_FORMAT|eml|mboxrd|mboxcl2|mboxcl|mboxo',
+ 'specify message input format' ];
my $ls_format = [ 'OUT|plain|json|null', 'listing output format' ];
my %OPTDESC = (
@@ -240,6 +241,8 @@ my %OPTDESC = (
'q jobs=s' => [ '[SEARCH_JOBS][,WRITER_JOBS]',
'control number of search and writer jobs' ],
+'import format|f=s' => $stdin_formats,
+
'ls-query format|f=s' => $ls_format,
'ls-external format|f=s' => $ls_format,
@@ -319,6 +322,20 @@ sub err ($;@) {
sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
+sub fail_handler ($;$$) {
+ my ($lei, $code, $io) = @_;
+ for my $f (qw(imp lxs l2m)) {
+ my $wq = delete $lei->{$f} or next;
+ $wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
+ }
+ close($io) if $io; # needed to avoid warnings on SIGPIPE
+ $lei->x_it($code // (1 >> 8));
+}
+
+sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
+ fail_handler($_[0], 13, delete $_[0]->{1});
+}
+
sub fail ($$;$) {
my ($self, $buf, $exit_code) = @_;
err($self, $buf) if defined $buf;
@@ -340,7 +357,8 @@ sub out ($;@) {
sub puts ($;@) { out(shift, map { "$_\n" } @_) }
sub child_error { # passes non-fatal curl exit codes to user
- my ($self, $child_error) = @_; # child_error is $?
+ my ($self, $child_error, $msg) = @_; # child_error is $?
+ $self->err($msg) if $msg;
if (my $s = $self->{pkt_op_p} // $self->{sock}) {
# send to the parent lei-daemon or to lei(1) client
send($s, "child_error $child_error", MSG_EOR);
@@ -357,9 +375,16 @@ sub note_sigpipe { # triggers sigpipe_handler
}
sub lei_atfork_child {
- my ($self) = @_;
+ my ($self, $persist) = @_;
# we need to explicitly close things which are on stack
- delete $self->{0};
+ if ($persist) {
+ my @io = delete @$self{0,1,2};
+ unless ($self->{oneshot}) {
+ close($_) for @io;
+ }
+ } else {
+ delete $self->{0};
+ }
for (delete @$self{qw(3 sock old_1 au_done)}) {
close($_) if defined($_);
}
@@ -374,7 +399,7 @@ sub lei_atfork_child {
%PATH2CFG = ();
undef $errors_log;
$quit = \&CORE::exit;
- $current_lei = $self; # for SIG{__WARN__}
+ $current_lei = $persist ? undef : $self; # for SIG{__WARN__}
}
sub _help ($;$) {
@@ -606,6 +631,11 @@ sub lei_config {
x_it($self, $?) if $?;
}
+sub lei_import {
+ require PublicInbox::LeiImport;
+ PublicInbox::LeiImport->call(@_);
+}
+
sub lei_init {
my ($self, $dir) = @_;
my $cfg = _lei_cfg($self, 1);
diff --git a/lib/PublicInbox/LeiImport.pm b/lib/PublicInbox/LeiImport.pm
new file mode 100644
index 00000000..4a9af8a7
--- /dev/null
+++ b/lib/PublicInbox/LeiImport.pm
@@ -0,0 +1,106 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# front-end for the "lei import" sub-command
+package PublicInbox::LeiImport;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::IPC);
+use PublicInbox::MboxReader;
+use PublicInbox::Eml;
+
+sub _import_eml { # MboxReader callback
+ my ($eml, $sto, $set_kw) = @_;
+ $sto->ipc_do('set_eml', $eml, $set_kw ? $sto->mbox_keywords($eml) : ());
+}
+
+sub import_done { # EOF callback for main daemon
+ my ($lei) = @_;
+ my $imp = delete $lei->{imp};
+ $imp->wq_wait_old($lei) if $imp;
+ my $wait = $lei->{sto}->ipc_do('done');
+ $lei->dclose;
+}
+
+sub call { # the main "lei import" method
+ my ($cls, $lei, @argv) = @_;
+ my $sto = $lei->_lei_store(1);
+ $sto->write_prepare($lei);
+ $lei->{opt}->{flags} //= 1;
+ my $fmt = $lei->{opt}->{'format'};
+ my $self = $lei->{imp} = bless {}, $cls;
+ return $lei->fail('--format unspecified') if !$fmt;
+ $self->{0} = $lei->{0} if $lei->{opt}->{stdin};
+ my $ops = {
+ '!' => [ $lei->can('fail_handler'), $lei ],
+ 'x_it' => [ $lei->can('x_it'), $lei ],
+ 'child_error' => [ $lei->can('child_error'), $lei ],
+ '' => [ \&import_done, $lei ],
+ };
+ ($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+ my $j = $lei->{opt}->{jobs} // scalar(@argv) || 1;
+ my $nproc = $self->detect_nproc;
+ $j = $nproc if $j > $nproc;
+ $self->wq_workers_start('lei_import', $j, $lei->oldset, {lei => $lei});
+ my $op = delete $lei->{pkt_op_c};
+ delete $lei->{pkt_op_p};
+ $self->wq_do('import_stdin', []) if $self->{0};
+ for my $x (@argv) {
+ $self->wq_do('import_path_url', [], $x);
+ }
+ $self->wq_close(1);
+ $lei->event_step_init; # wait for shutdowns
+ if ($lei->{oneshot}) {
+ while ($op->{sock}) { $op->event_step }
+ }
+}
+
+sub ipc_atfork_child {
+ my ($self) = @_;
+ $self->{lei}->lei_atfork_child;
+ $self->SUPER::ipc_atfork_child;
+}
+
+sub _import_fh {
+ my ($lei, $fh, $x) = @_;
+ my $set_kw = $lei->{opt}->{flags};
+ my $fmt = $lei->{opt}->{'format'};
+ eval {
+ if ($fmt eq 'eml') {
+ my $buf = do { local $/; <$fh> } //
+ return $lei->child_error(1 >> 8, <<"");
+ error reading $x: $!
+
+ my $eml = PublicInbox::Eml->new(\$buf);
+ _import_eml($eml, $lei->{sto}, $set_kw);
+ } else { # some mbox
+ my $cb = PublicInbox::MboxReader->can($fmt);
+ $cb or return $lei->child_error(1 >> 8, <<"");
+ --format $fmt unsupported for $x
+
+ $cb->(undef, $fh, \&_import_eml, $lei->{sto}, $set_kw);
+ }
+ };
+ $lei->child_error(1 >> 8, "<stdin>: $@") if $@;
+}
+
+sub import_path_url {
+ my ($self, $x) = @_;
+ my $lei = $self->{lei};
+ # TODO auto-detect?
+ if (-f $x) {
+ open my $fh, '<', $x or return $lei->child_error(1 >> 8, <<"");
+unable to open $x: $!
+
+ _import_fh($lei, $fh, $x);
+ } else {
+ $lei->fail("$x unsupported (TODO)");
+ }
+}
+
+sub import_stdin {
+ my ($self) = @_;
+ _import_fh($self->{lei}, $self->{0}, '<stdin>');
+}
+
+1;
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index a7d7d953..3a215973 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -17,6 +17,7 @@ use PublicInbox::V2Writable;
use PublicInbox::ContentHash qw(content_hash content_digest);
use PublicInbox::MID qw(mids mids_in);
use PublicInbox::LeiSearch;
+use PublicInbox::MDA;
use List::Util qw(max);
sub new {
@@ -237,4 +238,21 @@ sub done {
die $err if $err;
}
+sub ipc_atfork_child {
+ my ($self) = @_;
+ my $lei = delete $self->{lei};
+ $lei->lei_atfork_child(1) if $lei;
+ $self->SUPER::ipc_atfork_child;
+}
+
+sub write_prepare {
+ my ($self, $lei) = @_;
+ $self->ipc_lock_init;
+ # Mail we import into lei are private, so headers filtered out
+ # by -mda for public mail are not appropriate
+ local @PublicInbox::MDA::BAD_HEADERS = ();
+ $self->ipc_worker_spawn('lei_store', $lei->oldset, { lei => $lei });
+ $lei->{sto} = $self;
+}
+
1;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index daf42098..f8068362 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -392,25 +392,11 @@ sub query_prepare { # called by wq_do
pkt_do($lei->{pkt_op_p}, '.') == 1 or die "do_post_augment trigger: $!"
}
-sub fail_handler ($;$$) {
- my ($lei, $code, $io) = @_;
- for my $f (qw(lxs l2m)) {
- my $wq = delete $lei->{$f} or next;
- $wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
- }
- close($io) if $io; # needed to avoid warnings on SIGPIPE
- $lei->x_it($code // (1 >> 8));
-}
-
-sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
- fail_handler($_[0], 13, delete $_[0]->{1});
-}
-
sub do_query {
my ($self, $lei) = @_;
my $ops = {
- '|' => [ \&sigpipe_handler, $lei ],
- '!' => [ \&fail_handler, $lei ],
+ '|' => [ $lei->can('sigpipe_handler'), $lei ],
+ '!' => [ $lei->can('fail_handler'), $lei ],
'.' => [ \&do_post_augment, $lei ],
'' => [ \&query_done, $lei ],
'mset_progress' => [ \&mset_progress, $lei ],
diff --git a/t/lei.t b/t/lei.t
index a08a6d0d..eb824a30 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -389,6 +389,20 @@ SKIP: {
}; # /SKIP
};
+my $test_import = sub {
+ $cleanup->();
+ ok($lei->(qw(q s:boolean)), 'search miss before import');
+ unlike($out, qr/boolean/i, 'no results, yet');
+ open my $fh, '<', 't/data/0001.patch' or BAIL_OUT $!;
+ ok($lei->([qw(import -f eml -)], undef, { %$opt, 0 => $fh }),
+ 'import single file from stdin');
+ close $fh;
+ ok($lei->(qw(q s:boolean)), 'search hit after import');
+ ok($lei->(qw(import -f eml), 't/data/message_embed.eml'),
+ 'import single file by path');
+ $cleanup->();
+};
+
my $test_lei_common = sub {
$test_help->();
$test_config->();
@@ -396,6 +410,7 @@ my $test_lei_common = sub {
$test_external->();
$test_completion->();
$test_fail->();
+ $test_import->();
};
if ($ENV{TEST_LEI_ONESHOT}) {
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (8 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 10/18] lei import: initial implementation Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords Eric Wong
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
JMAP brain says "keywords", IMAP brain says "flags";
JMAP brain wins today.
Since "keywords" is a bit long, support "kw" as a shortcut since
there's no conflict and "kw:" will be our search prefix for
looking up messages by keyword.
---
lib/PublicInbox/LEI.pm | 7 ++++---
lib/PublicInbox/LeiImport.pm | 4 ++--
t/lei.t | 21 ++++++++++++++++++++-
3 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 682d1bd1..b058b533 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -131,7 +131,7 @@ our %CMD = ( # sorted in order of importance/use:
'exclude mail matching From: or thread from non-Message-ID searches',
qw(stdin| thread|t from|f=s mid=s oid=s) ],
'mark' => [ 'MESSAGE_FLAGS...',
- 'set/unset flags on message(s) from stdin',
+ 'set/unset keywords on message(s) from stdin',
qw(stdin| oid=s exact by-mid|mid:s) ],
'forget' => [ '[--stdin|--oid=OID|--by-mid=MID]',
"exclude message(s) on stdin from `q' search results",
@@ -152,7 +152,8 @@ our %CMD = ( # sorted in order of importance/use:
'add-watch' => [ '[URL_OR_PATHNAME]',
'watch for new messages and flag changes',
- qw(import! flags! interval=s recursive|r exclude=s include=s) ],
+ qw(import! kw|keywords|flags! interval=s recursive|r
+ exclude=s include=s) ],
'ls-watch' => [ '[FILTER...]', 'list active watches with numbers and status',
qw(format|f=s z) ],
'pause-watch' => [ '[WATCH_NUMBER_OR_FILTER]', qw(all local remote) ],
@@ -163,7 +164,7 @@ our %CMD = ( # sorted in order of importance/use:
'import' => [ 'URLS_OR_PATHNAMES...|--stdin',
'one-time import/update from URL or filesystem',
qw(stdin| offset=i recursive|r exclude=s include|I=s
- format|f=s flags!),
+ format|f=s kw|keywords|flags!),
],
'config' => [ '[...]', sub {
diff --git a/lib/PublicInbox/LeiImport.pm b/lib/PublicInbox/LeiImport.pm
index 4a9af8a7..2c7cbf2b 100644
--- a/lib/PublicInbox/LeiImport.pm
+++ b/lib/PublicInbox/LeiImport.pm
@@ -26,7 +26,7 @@ sub call { # the main "lei import" method
my ($cls, $lei, @argv) = @_;
my $sto = $lei->_lei_store(1);
$sto->write_prepare($lei);
- $lei->{opt}->{flags} //= 1;
+ $lei->{opt}->{kw} //= 1;
my $fmt = $lei->{opt}->{'format'};
my $self = $lei->{imp} = bless {}, $cls;
return $lei->fail('--format unspecified') if !$fmt;
@@ -63,7 +63,7 @@ sub ipc_atfork_child {
sub _import_fh {
my ($lei, $fh, $x) = @_;
- my $set_kw = $lei->{opt}->{flags};
+ my $set_kw = $lei->{opt}->{kw};
my $fmt = $lei->{opt}->{'format'};
eval {
if ($fmt eq 'eml') {
diff --git a/t/lei.t b/t/lei.t
index eb824a30..41d854e8 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -400,7 +400,26 @@ my $test_import = sub {
ok($lei->(qw(q s:boolean)), 'search hit after import');
ok($lei->(qw(import -f eml), 't/data/message_embed.eml'),
'import single file by path');
- $cleanup->();
+
+ my $str = <<'';
+From: a@b
+Message-ID: <x@y>
+Status: RO
+
+ ok($lei->([qw(import -f eml -)], undef, { %$opt, 0 => \$str }),
+ 'import single file with keywords from stdin');
+ $lei->(qw(q m:x@y));
+ my $res = $json->decode($out);
+ is($res->[1], undef, 'only one result');
+ is_deeply($res->[0]->{kw}, ['seen'], "message `seen' keyword set");
+
+ $str =~ tr/x/v/; # v@y
+ ok($lei->([qw(import --no-kw -f eml -)], undef, { %$opt, 0 => \$str }),
+ 'import single file with --no-kw from stdin');
+ $lei->(qw(q m:v@y));
+ $res = $json->decode($out);
+ is($res->[1], undef, 'only one result');
+ is_deeply($res->[0]->{kw}, [], 'no keywords set');
};
my $test_lei_common = sub {
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (9 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 13/18] t/spawn: blocking examples for ProcessPipe Eric Wong
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
We did not complete --no-* flags properly when multiple options
are allowed.
---
lib/PublicInbox/LEI.pm | 9 ++++++---
t/lei.t | 8 +++++++-
2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index b058b533..8d5a921e 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -274,6 +274,8 @@ my %OPTDESC = (
'by-mid|mid:s' => [ 'MID', 'match only by Message-ID, ignoring contents' ],
'jobs:i' => 'set parallelism level',
+'kw|keywords|flags!' => 'disable/enable importing flags',
+
# xargs, env, use "-0", git(1) uses "-z". We support z|0 everywhere
'z|0' => 'use NUL \\0 instead of newline (CR) to delimit lines',
@@ -425,7 +427,7 @@ sub _help ($;$) {
my (@vals, @s, @l);
my $x = $sw;
if ($x =~ s/!\z//) { # solve! => --no-solve
- $x = "no-$x";
+ $x =~ s/(\A|\|)/$1no-/g
} elsif ($x =~ s/:.+//) { # optional args: $x = "mid:s"
@vals = (' [', undef, ']');
} elsif ($x =~ s/=.+//) { # required arg: $x = "type=s"
@@ -710,8 +712,9 @@ sub lei__complete {
}
puts $self, grep(/$re/, map { # generate short/long names
if (s/[:=].+\z//) { # req/optional args, e.g output|o=i
- } else { # negation: solve! => no-solve|solve
- s/\A(.+)!\z/no-$1|$1/;
+ } elsif (s/!\z//) {
+ # negation: solve! => no-solve|solve
+ s/([\w\-]+)/$1|no-$1/g
}
map {
my $x = length > 1 ? "--$_" : "-$_";
diff --git a/t/lei.t b/t/lei.t
index 41d854e8..df333957 100644
--- a/t/lei.t
+++ b/t/lei.t
@@ -363,7 +363,7 @@ my $test_completion = sub {
--mua --mua-cmd --no-local --local --verbose -v
--save-as --no-remote --remote --torsocks
--reverse -r )) {
- ok($out{$sw}, "$sw offered as completion");
+ ok($out{$sw}, "$sw offered as `lei q' completion");
}
ok($lei->(qw(_complete lei q --form)), 'complete q --format');
@@ -376,6 +376,12 @@ my $test_completion = sub {
ok($out{$f}, "got $sw $f as output format");
}
}
+ ok($lei->(qw(_complete lei import)), 'complete import');
+ %out = map { $_ => 1 } split(/\s+/s, $out);
+ for my $sw (qw(--flags --no-flags --no-kw --kw --no-keywords
+ --keywords)) {
+ ok($out{$sw}, "$sw offered as `lei import' completion");
+ }
};
my $test_fail = sub {
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 13/18] t/spawn: blocking examples for ProcessPipe
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (10 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 14/18] lei: abort lei_import worker on client abort Eric Wong
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
---
t/spawn.t | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/t/spawn.t b/t/spawn.t
index 0eed79bb..6f811ec1 100644
--- a/t/spawn.t
+++ b/t/spawn.t
@@ -77,6 +77,11 @@ EOF
{
my $fh = popen_rd([qw(printf foo\nbar)]);
ok(fileno($fh) >= 0, 'tied fileno works');
+ my $tfh = (tied *$fh)->{fh};
+ is($tfh->blocking(0), 1, '->blocking was true');
+ is($tfh->blocking, 0, '->blocking is false');
+ is($tfh->blocking(1), 0, '->blocking was true');
+ is($tfh->blocking, 1, '->blocking is true');
my @line = <$fh>;
is_deeply(\@line, [ "foo\n", 'bar' ], 'wantarray works on readline');
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 14/18] lei: abort lei_import worker on client abort
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (11 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 13/18] t/spawn: blocking examples for ProcessPipe Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 15/18] lei: @WQ_KEYS Eric Wong
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
---
lib/PublicInbox/LEI.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 8d5a921e..d10ab170 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -856,7 +856,7 @@ sub accept_dispatch { # Listener {post_accept} callback
sub dclose {
my ($self) = @_;
delete $self->{-progress};
- for my $f (qw(lxs l2m)) {
+ for my $f (qw(lxs l2m imp)) {
my $wq = delete $self->{$f} or next;
if ($wq->wq_kill) {
$wq->wq_close
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 15/18] lei: @WQ_KEYS
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (12 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 14/18] lei: abort lei_import worker on client abort Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 16/18] init: lowercase -j for --jobs Eric Wong
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
---
lib/PublicInbox/LEI.pm | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index d10ab170..28ad88e7 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -286,6 +286,8 @@ my %CONFIG_KEYS = (
'leistore.dir' => 'top-level storage location',
);
+my @WQ_KEYS = qw(lxs l2m imp); # internal workers
+
# pronounced "exit": x_it(1 << 8) => exit(1); x_it(13) => SIGPIPE
sub x_it ($$) {
my ($self, $code) = @_;
@@ -296,7 +298,7 @@ sub x_it ($$) {
send($s, "x_it $code", MSG_EOR);
} elsif ($self->{oneshot}) {
# don't want to end up using $? from child processes
- for my $f (qw(lxs l2m)) {
+ for my $f (@WQ_KEYS) {
my $wq = delete $self->{$f} or next;
$wq->DESTROY;
}
@@ -327,7 +329,7 @@ sub qerr ($;@) { $_[0]->{opt}->{quiet} or err(shift, @_) }
sub fail_handler ($;$$) {
my ($lei, $code, $io) = @_;
- for my $f (qw(imp lxs l2m)) {
+ for my $f (@WQ_KEYS) {
my $wq = delete $lei->{$f} or next;
$wq->wq_wait_old($lei) if $wq->wq_kill_old; # lei-daemon
}
@@ -335,7 +337,7 @@ sub fail_handler ($;$$) {
$lei->x_it($code // (1 >> 8));
}
-sub sigpipe_handler { # handles SIGPIPE from l2m/lxs workers
+sub sigpipe_handler { # handles SIGPIPE from @WQ_KEYS workers
fail_handler($_[0], 13, delete $_[0]->{1});
}
@@ -856,7 +858,7 @@ sub accept_dispatch { # Listener {post_accept} callback
sub dclose {
my ($self) = @_;
delete $self->{-progress};
- for my $f (qw(lxs l2m imp)) {
+ for my $f (@WQ_KEYS) {
my $wq = delete $self->{$f} or next;
if ($wq->wq_kill) {
$wq->wq_close
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 16/18] init: lowercase -j for --jobs
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (13 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 15/18] lei: @WQ_KEYS Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 17/18] lei: add-external --mirror support Eric Wong
2021-02-05 12:07 ` [PATCH 18/18] lei_query: trim curl options Eric Wong
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
This is taken from common implementations of make(1).
---
script/public-inbox-init | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/script/public-inbox-init b/script/public-inbox-init
index 6a867a22..e93cab73 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -24,7 +24,7 @@ options:
--ng NEWSGROUP set NNTP newsgroup name
--skip-artnum=NUM NNTP article numbers to skip
--skip-epoch=NUM epochs to skip (-V2 only)
- -J JOBS number of indexing jobs (-V2 only), (default: 4)
+ -j JOBS number of indexing jobs (-V2 only), (default: 4)
See public-inbox-init(1) man page for full documentation.
EOF
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 17/18] lei: add-external --mirror support
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (14 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 16/18] init: lowercase -j for --jobs Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
2021-02-05 12:07 ` [PATCH 18/18] lei_query: trim curl options Eric Wong
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
---
MANIFEST | 2 +
lib/PublicInbox/Admin.pm | 7 +-
lib/PublicInbox/LEI.pm | 2 +-
lib/PublicInbox/LeiCurl.pm | 65 ++++++++
lib/PublicInbox/LeiExternal.pm | 26 ++-
lib/PublicInbox/LeiMirror.pm | 283 +++++++++++++++++++++++++++++++++
lib/PublicInbox/LeiXSearch.pm | 32 +---
7 files changed, 379 insertions(+), 38 deletions(-)
create mode 100644 lib/PublicInbox/LeiCurl.pm
create mode 100644 lib/PublicInbox/LeiMirror.pm
diff --git a/MANIFEST b/MANIFEST
index a11d4106..ab692d28 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -177,9 +177,11 @@ lib/PublicInbox/InputPipe.pm
lib/PublicInbox/Isearch.pm
lib/PublicInbox/KQNotify.pm
lib/PublicInbox/LEI.pm
+lib/PublicInbox/LeiCurl.pm
lib/PublicInbox/LeiDedupe.pm
lib/PublicInbox/LeiExternal.pm
lib/PublicInbox/LeiImport.pm
+lib/PublicInbox/LeiMirror.pm
lib/PublicInbox/LeiOverview.pm
lib/PublicInbox/LeiQuery.pm
lib/PublicInbox/LeiSearch.pm
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 3b38a5a3..b21fb241 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -273,8 +273,8 @@ EOM
$idx->{nidx} // 0; # returns number processed
}
-sub progress_prepare ($) {
- my ($opt) = @_;
+sub progress_prepare ($;$) {
+ my ($opt, $dst) = @_;
# public-inbox-index defaults to quiet, -xcpdb and -compact do not
if (defined($opt->{quiet}) && $opt->{quiet} < 0) {
@@ -286,7 +286,8 @@ sub progress_prepare ($) {
$opt->{1} = $null; # suitable for spawn() redirect
} else {
$opt->{verbose} ||= 1;
- $opt->{-progress} = sub { print STDERR @_ };
+ $dst //= *STDERR{GLOB};
+ $opt->{-progress} = sub { print $dst @_ };
}
}
diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 28ad88e7..3ef19918 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -115,7 +115,7 @@ our %CMD = ( # sorted in order of importance/use:
'add-external' => [ 'URL_OR_PATHNAME',
'add/set priority of a publicinbox|extindex for extra matches',
- qw(boost=i quiet|q) ],
+ qw(boost=i quiet|q verbose|v c=s@ mirror=s no-torsocks torsocks=s) ],
'ls-external' => [ '[FILTER...]', 'list publicinbox|extindex locations',
qw(format|f=s z|0 local remote quiet|q) ],
'forget-external' => [ 'URL_OR_PATHNAME...|--prune',
diff --git a/lib/PublicInbox/LeiCurl.pm b/lib/PublicInbox/LeiCurl.pm
new file mode 100644
index 00000000..c8747d4f
--- /dev/null
+++ b/lib/PublicInbox/LeiCurl.pm
@@ -0,0 +1,65 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# common option and torsocks(1) wrapping for curl(1)
+package PublicInbox::LeiCurl;
+use strict;
+use v5.10.1;
+use PublicInbox::Spawn qw(which);
+use PublicInbox::Config;
+
+# prepares a common command for curl(1) based on $lei command
+sub new {
+ my ($cls, $lei, $curl) = @_;
+ $curl //= which('curl') // return $lei->fail('curl not found');
+ my $opt = $lei->{opt};
+ my @cmd = ($curl, qw(-Sf));
+ $cmd[-1] .= 's' if $opt->{quiet}; # already the default for "lei q"
+ $cmd[-1] .= 'v' if $opt->{verbose}; # we use ourselves, too
+ for my $o ($lei->curl_opt) {
+ $o =~ s/\|[a-z0-9]\b//i; # remove single char short option
+ if ($o =~ s/=[is]@\z//) {
+ my $ary = $opt->{$o} or next;
+ push @cmd, map { ("--$o", $_) } @$ary;
+ } elsif ($o =~ s/=[is]\z//) {
+ my $val = $opt->{$o} // next;
+ push @cmd, "--$o", $val;
+ } elsif ($opt->{$o}) {
+ push @cmd, "--$o";
+ }
+ }
+ push @cmd, '-v' if $opt->{verbose}; # lei uses this itself
+ bless \@cmd, $cls;
+}
+
+sub torsocks { # useful for "git clone" and "git fetch", too
+ my ($self, $lei, $uri)= @_;
+ my $opt = $lei->{opt};
+ $opt->{torsocks} = 'false' if $opt->{'no-torsocks'};
+ my $torsocks = $opt->{torsocks} //= 'auto';
+ if ($torsocks eq 'auto' && substr($uri->host, -6) eq '.onion' &&
+ (($lei->{env}->{LD_PRELOAD}//'') !~ /torsocks/)) {
+ # "auto" continues anyways if torsocks is missing;
+ # a proxy may be specified via CLI, curlrc,
+ # environment variable, or even firewall rule
+ [ ($lei->{torsocks} //= which('torsocks')) // () ]
+ } elsif (PublicInbox::Config::git_bool($torsocks)) {
+ my $x = $lei->{torsocks} //= which('torsocks');
+ $x or return $lei->fail(<<EOM);
+--torsocks=yes specified but torsocks not found in PATH=$ENV{PATH}
+EOM
+ [ $x ];
+ } else { # the common case for current Internet :<
+ [];
+ }
+}
+
+# completes the result of cmd() for $uri
+sub for_uri {
+ my ($self, $lei, $uri) = @_;
+ my $pfx = torsocks($self, $lei, $uri) or return; # error
+ [ @$pfx, @$self, substr($uri->path, -3) eq '.gz' ? () : '--compressed',
+ $uri->as_string ]
+}
+
+1;
diff --git a/lib/PublicInbox/LeiExternal.pm b/lib/PublicInbox/LeiExternal.pm
index accacf1a..53c222b7 100644
--- a/lib/PublicInbox/LeiExternal.pm
+++ b/lib/PublicInbox/LeiExternal.pm
@@ -88,14 +88,10 @@ sub get_externals {
();
}
-sub lei_add_external {
+sub add_external_finish {
my ($self, $location) = @_;
my $cfg = $self->_lei_cfg(1);
my $new_boost = $self->{opt}->{boost} // 0;
- $location = ext_canonicalize($location);
- if ($location !~ m!\Ahttps?://! && !-d $location) {
- return $self->fail("$location not a directory");
- }
my $key = "external.$location.boost";
my $cur_boost = $cfg->{$key};
return if defined($cur_boost) && $cur_boost == $new_boost; # idempotent
@@ -103,6 +99,26 @@ sub lei_add_external {
$self->_lei_store(1)->done; # just create the store
}
+sub lei_add_external {
+ my ($self, $location) = @_;
+ my $new_boost = $self->{opt}->{boost} // 0;
+ $location = ext_canonicalize($location);
+ my $mirror = $self->{opt}->{mirror};
+ if (defined($mirror) && -d $location) {
+ $self->fail(<<""); # TODO: did you mean "update-external?"
+--mirror destination `$location' already exists
+
+ }
+ if ($location !~ m!\Ahttps?://! && !-d $location) {
+ $mirror // return $self->fail("$location not a directory");
+ $mirror = ext_canonicalize($mirror);
+ require PublicInbox::LeiMirror;
+ PublicInbox::LeiMirror->start($self, $mirror => $location);
+ } else {
+ add_external_finish($self, $location);
+ }
+}
+
sub lei_forget_external {
my ($self, @locations) = @_;
my $cfg = $self->_lei_cfg(1);
diff --git a/lib/PublicInbox/LeiMirror.pm b/lib/PublicInbox/LeiMirror.pm
new file mode 100644
index 00000000..21212657
--- /dev/null
+++ b/lib/PublicInbox/LeiMirror.pm
@@ -0,0 +1,283 @@
+# Copyright (C) 2021 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# "lei add-external --mirror" support
+package PublicInbox::LeiMirror;
+use strict;
+use v5.10.1;
+use parent qw(PublicInbox::IPC);
+use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
+use PublicInbox::Spawn qw(popen_rd spawn);
+use PublicInbox::PktOp;
+
+sub mirror_done { # EOF callback for main daemon
+ my ($lei) = @_;
+ my $mrr = delete $lei->{mrr};
+ $mrr->wq_wait_old($lei) if $mrr;
+ $lei->dclose;
+}
+
+# for old installations without manifest.js.gz
+sub try_scrape {
+ my ($self) = @_;
+ my $uri = URI->new($self->{src});
+ my $lei = $self->{lei};
+ my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+ my $cmd = $curl->for_uri($lei, $uri);
+ my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
+ my $fh = popen_rd($cmd, $lei->{env}, $opt);
+ my $html = do { local $/; <$fh> } // die "read(curl $uri): $!";
+ close($fh) or return $lei->child_error($?, "@$cmd failed");
+
+ # we grep with URL below, we don't want Subject/From headers
+ # making us clone random URLs
+ my @urls = ($html =~ m!\bgit clone --mirror ([a-z\+]+://\S+)!g);
+ my $url = $uri->as_string;
+ chop($url) eq '/' or die "BUG: $uri not canonicalized";
+
+ # since this is for old instances w/o manifest.js.gz, try v1 first
+ return clone_v1($self) if grep(m!\A\Q$url\E/*\z!, @urls);
+ if (my @v2_urls = grep(m!\A\Q$url\E/[0-9]+\z!, @urls)) {
+ my %v2_uris = map { $_ => URI->new($_) } @v2_urls; # uniq
+ return clone_v2($self, [ values %v2_uris ]);
+ }
+
+ # filter out common URLs served by WWW (e.g /$MSGID/T/)
+ if (@urls && $url =~ s!/+[^/]+\@[^/]+/.*\z!! &&
+ grep(m!\A\Q$url\E/*\z!, @urls)) {
+ die <<"";
+E: confused by scraping <$uri>, did you mean <$url>?
+
+ }
+ @urls and die <<"";
+E: confused by scraping <$uri>, got ambiguous results:
+@urls
+
+ die "E: scraping <$uri> revealed nothing\n";
+}
+
+sub clone_cmd {
+ my ($lei) = @_;
+ my @cmd = qw(git);
+ # we support "-c $key=$val" for arbitrary git config options
+ # e.g.: git -c http.proxy=socks5h://127.0.0.1:9050
+ push(@cmd, '-c', $_) for @{$lei->{opt}->{c} // []};
+ push @cmd, qw(clone --mirror);
+ push @cmd, '-q' if $lei->{opt}->{quiet};
+ push @cmd, '-v' if $lei->{opt}->{verbose};
+ # XXX any other options to support?
+ # --reference is tricky with multiple epochs...
+ @cmd;
+}
+
+# tries the relatively new /$INBOX/_/text/config/raw endpoint
+sub _try_config {
+ my ($self) = @_;
+ my $dst = $self->{dst};
+ if (!-d $dst || !mkdir($dst)) {
+ require File::Path;
+ File::Path::mkpath($dst);
+ -d $dst or die "mkpath($dst): $!\n";
+ }
+ my $uri = URI->new($self->{src});
+ my $lei = $self->{lei};
+ my $path = $uri->path;
+ chop($path) eq '/' or die "BUG: $uri not canonicalized";
+ $uri->path($path . '/_/text/config/raw');
+ my $cmd = $self->{curl}->for_uri($lei, $uri);
+ push @$cmd, '--compressed'; # curl decompresses for us
+ my $ce = "$dst/inbox.config.example";
+ my $f = "$ce-$$.tmp";
+ open(my $fh, '+>', $f) or return $lei->err("open $f: $! (non-fatal)");
+ my $opt = { 0 => $lei->{0}, 1 => $fh, 2 => $lei->{2} };
+ $lei->err("# @$cmd") if $lei->{opt}->{verbose};
+ my $pid = spawn($cmd, $lei->{env}, $opt);
+ waitpid($pid, 0) == $pid or return $lei->err("waitpid @$cmd: $!");
+ if (($? >> 8) == 22) { # 404 missing
+ unlink($f) if -s $fh == 0;
+ return;
+ }
+ return $lei->err("# @$cmd failed (non-fatal)") if $?;
+ rename($f, $ce) or return $lei->err("link($f, $ce): $! (non-fatal)");
+ my $cfg = PublicInbox::Config::git_config_dump($f);
+ my $ibx = $self->{ibx} = {};
+ for my $sec (grep(/\Apublicinbox\./, @{$cfg->{-section_order}})) {
+ for (qw(address newsgroup nntpmirror)) {
+ $ibx->{$_} = $cfg->{"$sec.$_"};
+ }
+ }
+}
+
+sub index_cloned_inbox {
+ my ($self, $iv) = @_;
+ my $ibx = delete($self->{ibx}) // {
+ address => [ 'lei@example.com' ],
+ version => $iv,
+ };
+ $ibx->{inboxdir} = $self->{dst};
+ PublicInbox::Inbox->new($ibx);
+ PublicInbox::InboxWritable->new($ibx);
+ my $opt = {};
+ my $lei = $self->{lei};
+ for (qw(fsync jobs indexlevel compact max_size batch_size
+ quiet verbose sequential_shard skip-docdata)) {
+ $opt->{$_} = $lei->{opt}->{$_};
+ }
+ # force synchronous dwaitpid for v2:
+ local $PublicInbox::DS::in_loop = 0;
+ PublicInbox::Admin::progress_prepare($opt, $lei->{2});
+ PublicInbox::Admin::index_inbox($ibx, undef, $opt);
+}
+
+sub clone_v1 {
+ my ($self) = @_;
+ my $lei = $self->{lei};
+ my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+ my $uri = URI->new($self->{src});
+ my $pfx = $curl->torsocks($lei, $uri) or return;
+ my $cmd = [ @$pfx, clone_cmd($lei), $uri->as_string, $self->{dst} ];
+ $lei->err("# @$cmd") if $lei->{opt}->{verbose};
+ my $pid = spawn($cmd, $lei->{env}, $lei);
+ waitpid($pid, 0) == $pid or die "BUG: waitpid @$cmd: $!";
+ $? == 0 or return $lei->child_error($?, "@$cmd failed");
+ _try_config($self);
+ index_cloned_inbox($self, 1);
+}
+
+sub clone_v2 {
+ my ($self, $v2_uris) = @_;
+ my $lei = $self->{lei};
+ my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+ my $pfx //= $curl->torsocks($lei, $v2_uris->[0]) or return;
+ my @epochs;
+ my $dst = $self->{dst};
+ my @src_edst;
+ for my $uri (@$v2_uris) {
+ my $src = $uri->as_string;
+ my $edst = $dst;
+ $src =~ m!/([0-9]+)(?:\.git)?\z! or die <<"";
+failed to extract epoch number from $src
+
+ my $nr = $1 + 0;
+ $edst .= "/git/$nr.git";
+ push @src_edst, [ $src, $edst ];
+ }
+ my $lk = bless { lock_path => "$dst/inbox.lock" }, 'PublicInbox::Lock';
+ _try_config($self);
+ my $on_destroy = $lk->lock_for_scope($$);
+ my @cmd = clone_cmd($lei);
+ while (my $pair = shift(@src_edst)) {
+ my $cmd = [ @$pfx, @cmd, @$pair ];
+ $lei->err("# @$cmd") if $lei->{opt}->{verbose};
+ my $pid = spawn($cmd, $lei->{env}, $lei);
+ waitpid($pid, 0) == $pid or die "BUG: waitpid @$cmd: $!";
+ $? == 0 or return $lei->child_error($?, "@$cmd failed");
+ }
+ undef $on_destroy; # unlock
+ index_cloned_inbox($self, 2);
+}
+
+sub try_manifest {
+ my ($self) = @_;
+ my $uri = URI->new($self->{src});
+ my $lei = $self->{lei};
+ my $curl = $self->{curl} //= PublicInbox::LeiCurl->new($lei) or return;
+ my $path = $uri->path;
+ chop($path) eq '/' or die "BUG: $uri not canonicalized";
+ $uri->path($path . '/manifest.js.gz');
+ my $cmd = $curl->for_uri($lei, $uri);
+ $lei->err("# @$cmd") if $lei->{opt}->{verbose};
+ my $opt = { 0 => $lei->{0}, 2 => $lei->{2} };
+ my $fh = popen_rd($cmd, $lei->{env}, $opt);
+ my $gz = do { local $/; <$fh> } // die "read(curl $uri): $!";
+ unless (close $fh) {
+ return try_scrape($self) if ($? >> 8) == 22; # 404 missing
+ return $lei->child_error($?, "@$cmd failed");
+ }
+ my $js;
+ gunzip(\$gz => \$js, MultiStream => 1) or
+ die "gunzip($uri): $GunzipError";
+ my $m = eval { PublicInbox::Config->json->decode($js) };
+ die "$uri: error decoding `$js': $@" if $@;
+ ref($m) eq 'HASH' or die "$uri unknown type: ".ref($m);
+
+ my $v1_bare = $m->{$path};
+ my @v2_epochs = grep(m!\A\Q$path\E/git/[0-9]+\.git\z!, keys %$m);
+ if (@v2_epochs) {
+ # It may be possible to have v1 + v2 in parallel someday:
+ $lei->err(<<EOM) if defined $v1_bare;
+# `$v1_bare' appears to be a v1 inbox while v2 epochs exist:
+# @v2_epochs
+# ignoring $v1_bare (use --inbox-version=1 to force v1 instead)
+EOM
+ @v2_epochs = map { $uri->path($_); $uri->clone } @v2_epochs;
+ clone_v2($self, \@v2_epochs);
+ } elsif ($v1_bare) {
+ clone_v1($self);
+ } elsif (my @maybe = grep(m!\Q$path\E!, keys %$m)) {
+ die "E: confused by <$uri>, possible matches:\n@maybe";
+ } else {
+ die "E: confused by <$uri>";
+ }
+}
+
+sub start_clone_url {
+ my ($self) = @_;
+ return try_manifest($self) if $self->{src} =~ m!\Ahttps?://!;
+ die "TODO: non-HTTP/HTTPS clone of $self->{src} not supported, yet";
+}
+
+sub do_mirror { # via wq_do
+ my ($self) = @_;
+ my $lei = $self->{lei};
+ eval {
+ my $iv = $lei->{opt}->{'inbox-version'};
+ if (defined $iv) {
+ return clone_v1($self) if $iv == 1;
+ return try_scrape($self) if $iv == 2;
+ die "bad --inbox-version=$iv\n";
+ }
+ return start_clone_url($self) if $self->{src} =~ m!://!;
+ die "TODO: cloning local directories not supported, yet";
+ };
+ return $lei->fail($@) if $@;
+ $lei->qerr("# mirrored $self->{src} => $self->{dst}");
+}
+
+sub start {
+ my ($cls, $lei, $src, $dst) = @_;
+ my $self = bless { lei => $lei, src => $src, dst => $dst }, $cls;
+ $lei->_lei_store(1)->write_prepare($lei);
+ if ($src =~ m!https?://!) {
+ require URI;
+ require PublicInbox::LeiCurl;
+ }
+ require PublicInbox::Lock;
+ require PublicInbox::Inbox;
+ require PublicInbox::Admin;
+ require PublicInbox::InboxWritable;
+ my $ops = {
+ '!' => [ $lei->can('fail_handler'), $lei ],
+ 'x_it' => [ $lei->can('x_it'), $lei ],
+ 'child_error' => [ $lei->can('child_error'), $lei ],
+ '' => [ \&mirror_done, $lei ],
+ };
+ ($lei->{pkt_op_c}, $lei->{pkt_op_p}) = PublicInbox::PktOp->pair($ops);
+ $self->wq_workers_start('lei_mirror', 1, $lei->oldset, {lei => $lei});
+ my $op = delete $lei->{pkt_op_c};
+ delete $lei->{pkt_op_p};
+ $self->wq_do('do_mirror', []);
+ $self->wq_close(1);
+ $lei->event_step_init; # wait for shutdowns
+ if ($lei->{oneshot}) {
+ while ($op->{sock}) { $op->event_step }
+ }
+}
+
+sub ipc_atfork_child {
+ my ($self) = @_;
+ $self->{lei}->lei_atfork_child;
+ $self->SUPER::ipc_atfork_child;
+}
+
+1;
diff --git a/lib/PublicInbox/LeiXSearch.pm b/lib/PublicInbox/LeiXSearch.pm
index f8068362..7085d164 100644
--- a/lib/PublicInbox/LeiXSearch.pm
+++ b/lib/PublicInbox/LeiXSearch.pm
@@ -212,7 +212,6 @@ sub query_remote_mboxrd {
my ($opt, $env) = @$lei{qw(opt env)};
my @qform = (q => $lei->{mset_opt}->{qstr}, x => 'm');
push(@qform, t => 1) if $opt->{thread};
- my @cmd = ($self->{curl}, qw(-sSf -d), '');
my $verbose = $opt->{verbose};
my $reap;
my $cerr = File::Temp->new(TEMPLATE => 'curl.err-XXXX', TMPDIR => 1);
@@ -223,43 +222,17 @@ sub query_remote_mboxrd {
# spawn a process to force line-buffering, otherwise curl
# will write 1 character at-a-time and parallel outputs
# mmmaaayyy llloookkk llliiikkkeee ttthhhiiisss
- push @cmd, '-v';
my $o = { 1 => $lei->{2}, 2 => $lei->{2} };
my $pid = spawn(['tail', '-f', $cerr->filename], undef, $o);
$reap = PublicInbox::OnDestroy->new(\&kill_reap, $pid);
}
- for my $o ($lei->curl_opt) {
- $o =~ s/\|[a-z0-9]\b//i; # remove single char short option
- if ($o =~ s/=[is]@\z//) {
- my $ary = $opt->{$o} or next;
- push @cmd, map { ("--$o", $_) } @$ary;
- } elsif ($o =~ s/=[is]\z//) {
- my $val = $opt->{$o} // next;
- push @cmd, "--$o", $val;
- } elsif ($opt->{$o}) {
- push @cmd, "--$o";
- }
- }
- $opt->{torsocks} = 'false' if $opt->{'no-torsocks'};
- my $tor = $opt->{torsocks} //= 'auto';
+ my $curl = PublicInbox::LeiCurl->new($lei, $self->{curl}) or return;
my $each_smsg = $lei->{ovv}->ovv_each_smsg_cb($lei);
for my $uri (@$uris) {
$lei->{-current_url} = $uri->as_string;
$lei->{-nr_remote_eml} = 0;
$uri->query_form(@qform);
- my $cmd = [ @cmd, $uri->as_string ];
- if ($tor eq 'auto' && substr($uri->host, -6) eq '.onion' &&
- (($env->{LD_PRELOAD}//'') !~ /torsocks/)) {
- unshift @$cmd, which('torsocks');
- } elsif (PublicInbox::Config::git_bool($tor)) {
- unshift @$cmd, which('torsocks');
- }
-
- # continue anyways if torsocks is missing; a proxy may be
- # specified via CLI, curlrc, environment variable, or even
- # firewall rule
- shift(@$cmd) if !$cmd->[0];
-
+ my $cmd = $curl->for_uri($lei, $uri);
$lei->err("# @$cmd") if $verbose;
my ($fh, $pid) = popen_rd($cmd, $env, $rdr);
$fh = IO::Uncompress::Gunzip->new($fh);
@@ -440,6 +413,7 @@ sub add_uri {
if (my $curl = $self->{curl} //= which('curl') // 0) {
require PublicInbox::MboxReader;
require IO::Uncompress::Gunzip;
+ require PublicInbox::LeiCurl;
push @{$self->{remotes}}, $uri;
} else {
warn "curl missing, ignoring $uri\n";
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 18/18] lei_query: trim curl options
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
` (15 preceding siblings ...)
2021-02-05 12:07 ` [PATCH 17/18] lei: add-external --mirror support Eric Wong
@ 2021-02-05 12:07 ` Eric Wong
16 siblings, 0 replies; 18+ messages in thread
From: Eric Wong @ 2021-02-05 12:07 UTC (permalink / raw)
To: spew
---
lib/PublicInbox/LeiQuery.pm | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/LeiQuery.pm b/lib/PublicInbox/LeiQuery.pm
index 56350386..7c856032 100644
--- a/lib/PublicInbox/LeiQuery.pm
+++ b/lib/PublicInbox/LeiQuery.pm
@@ -152,18 +152,21 @@ sub _complete_q {
# with other "lei q" switches.
# FIXME: Getopt::Long doesn't easily let us support support options with
# '.' in them (e.g. --http1.1)
+# TODO: should we depend on "-c http.*" options for things which have
+# analogues in git(1)? that would reduce likelyhood of conflicts with
+# our other CLI options
sub curl_opt { qw(
abstract-unix-socket=s anyauth basic cacert=s capath=s
- cert-status cert-type cert|E=s ciphers=s config|K=s@
- connect-timeout=s connect-to=s cookie-jar|c=s cookie|b=s crlfile=s
+ cert-status cert-type cert=s ciphers=s config|K=s@
+ connect-timeout=s connect-to=s cookie-jar=s cookie=s crlfile=s
digest disable dns-interface=s dns-ipv4-addr=s dns-ipv6-addr=s
dns-servers=s doh-url=s egd-file=s engine=s false-start
happy-eyeballs-timeout-ms=s haproxy-protocol header|H=s@
- http2-prior-knowledge http2 insecure|k
+ http2-prior-knowledge http2 insecure
interface=s ipv4 ipv6 junk-session-cookies
- key-type=s key=s limit-rate=s local-port=s location-trusted location|L
+ key-type=s key=s limit-rate=s local-port=s location-trusted location
max-redirs=i max-time=s negotiate netrc-file=s netrc-optional netrc
- no-alpn no-buffer|N no-npn no-sessionid noproxy=s ntlm-wb ntlm
+ no-alpn no-buffer no-npn no-sessionid noproxy=s ntlm-wb ntlm
pass=s pinnedpubkey=s post301 post302 post303 preproxy=s
proxy-anyauth proxy-basic proxy-cacert=s proxy-capath=s
proxy-cert-type=s proxy-cert=s proxy-ciphers=s proxy-crlfile=s
@@ -176,7 +179,7 @@ sub curl_opt { qw(
retry-connrefused retry-delay=s retry-max-time=s retry=i
sasl-ir service-name=s socks4=s socks4a=s socks5-basic
socks5-gssapi-service-name=s socks5-gssapi socks5-hostname=s socks5=s
- speed-limit|Y speed-type|y ssl-allow-beast sslv2 sslv3
+ speed-limit speed-type ssl-allow-beast sslv2 sslv3
suppress-connect-headers tcp-fastopen tls-max=s
tls13-ciphers=s tlsauthtype=s tlspassword=s tlsuser=s
tlsv1 trace-ascii=s trace-time trace=s
^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-02-05 12:08 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-05 12:07 [PATCH 01/18] lei q: delay worker spawn Eric Wong
2021-02-05 12:07 ` [PATCH 02/18] ipc: localize fields assignment to prevent circular refs Eric Wong
2021-02-05 12:07 ` [PATCH 03/18] lei q: reorder internals to reduce FD passing Eric Wong
2021-02-05 12:07 ` [PATCH 04/18] lei q: only start pager if output is to stdout Eric Wong
2021-02-05 12:07 ` [PATCH 05/18] lei q: reinstate early MUA spawn for Maildir Eric Wong
2021-02-05 12:07 ` [PATCH 06/18] eml: handle warning ignores for lei Eric Wong
2021-02-05 12:07 ` [PATCH 07/18] lei q: eliminate $not_done temporary git dir hack Eric Wong
2021-02-05 12:07 ` [PATCH 08/18] lei_query: remove uneeded dwaitpid import Eric Wong
2021-02-05 12:07 ` [PATCH 09/18] lei_xsearch: drop unused imports Eric Wong
2021-02-05 12:07 ` [PATCH 10/18] lei import: initial implementation Eric Wong
2021-02-05 12:07 ` [PATCH 11/18] lei: favor "keywords" over "flags", test --no-kw Eric Wong
2021-02-05 12:07 ` [PATCH 12/18] lei: fix completion of --no-kw / --no-keywords Eric Wong
2021-02-05 12:07 ` [PATCH 13/18] t/spawn: blocking examples for ProcessPipe Eric Wong
2021-02-05 12:07 ` [PATCH 14/18] lei: abort lei_import worker on client abort Eric Wong
2021-02-05 12:07 ` [PATCH 15/18] lei: @WQ_KEYS Eric Wong
2021-02-05 12:07 ` [PATCH 16/18] init: lowercase -j for --jobs Eric Wong
2021-02-05 12:07 ` [PATCH 17/18] lei: add-external --mirror support Eric Wong
2021-02-05 12:07 ` [PATCH 18/18] lei_query: trim curl options Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).