From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from mail-ua0-x22b.google.com (mail-ua0-x22b.google.com [IPv6:2607:f8b0:400c:c08::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 04F4A1F576; Wed, 24 Jan 2018 10:29:16 +0000 (UTC) Received: by mail-ua0-x22b.google.com with SMTP id d1so2380175uak.1; Wed, 24 Jan 2018 02:29:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=/PMdC6L/IayD7yCoCdjEvDVLphiZpI/7alWncqHUL+U=; b=awFoLzZFMVnISztAE+FXXQE7jPCRAziiH7vZ7YM47J9INqMOtUFFhtktN6cUp0NxBD mg+yGuK3U8hhpAs2+zLwYRZqyzHrmDwwzmIRoHbC+5Kvq2bG6O7cbZto7K03ANi7gYWg HQrBdfy+POyq+qBgkQexSXN/LYXbFV3OaIpXofIkzTko1ttMrdho28hmsStxUteuAZxR x/+wUL2hozePnRORCHgYbRybkgaZ34fy49UMTC2ukbvQFx+0YIZSTv2XKiS9E24gMIT8 0s2wZSDmcfHHEpgjSLAac+vrRkCHNsIyrCjVVHDSY4tpJNCRURQr1cuMCk4OkvemmKqK U7gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/PMdC6L/IayD7yCoCdjEvDVLphiZpI/7alWncqHUL+U=; b=Jv5qgkdhX4LeVnsdNlGz0+vD8njJLXQeLSGVMa4Z/u2cDdT6KxW2BtmiWbxx3xd2M8 XZC4WqwylCblLxDFqWK1MQxte2eIneMOYuB0g1/Ll+BziLZ25lHgI2TPl1L4Hmy2np0A PqfvvtWCcDEJpnwhUISr0km7fGJ2ktUBKcFrwhYyq5Mc+TPoGyBvHcM8YQ0ZsYGYV0N5 I4Tvhx0XMunRPbsU1n1QiYho9YkiGmYI+Krpa9gY2PakefnyWie+L3KdcDwdVEJOubVR xO7Q3ktQIpzEHw+CwZRiCQyE2EwlCOnDc+5JBQtsRC5mORZQ+D3EdQUBGG8SnA8jvGIU N0Fg== X-Gm-Message-State: AKwxytfg89yyylgF1RQF4+vB9d+Z0kyo4eCrtCaFxuvt7Tid4ojtKHKP T+T4QUEQVadOUsSmL7NFH4lj4kn+H+H6eMpzbKCHzA== X-Google-Smtp-Source: AH8x225LsJAqsonEhQlenYWjQQOSw+vuVhysb2qwhMLy05MJaVtwlz2303CfFWvrdsDzTxbsvFcJ+0YZzgikl5vdH7Y= X-Received: by 10.176.77.230 with SMTP id b38mr3826960uah.113.1516789754670; Wed, 24 Jan 2018 02:29:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.21.2 with HTTP; Wed, 24 Jan 2018 02:28:34 -0800 (PST) In-Reply-To: <20180123220303.GA7222@80x24.org> References: <20180121234911.GA29238@whir> <20180123220303.GA7222@80x24.org> From: Dimid Duchovny Date: Wed, 24 Jan 2018 12:28:34 +0200 Message-ID: Subject: Re: Feature Request: thread grouping To: Eric Wong Cc: msgthr-public Content-Type: text/plain; charset="UTF-8" List-Id: 2018-01-24 0:03 GMT+02:00 Eric Wong : > Dimid Duchovny wrote: >> > You're right. In my case the flow was: read emails from storage -> >> > group to threads -> add thread field to storage. >> > However, I guess it's an edge-case. >> > On second thought, maybe it'd be better to have a more general solution. >> > E.g. let the client run an arbitrary callback after adding a child. > > OK, I guess you managed to fit skeletons of all your messages in memory? > >> > Here's a quick POC: >> > https://github.com/dimidd/msgthr/commit/1c701717d10879d492d8b55fb8ca2f1c53d7e13f > > (truncated output of "git show 1c701717d10879d492d8b55fb8ca2f1c53d7e13f" > >> add callback to Msgthr#add >> >> The motivation is to allow the client to have a custom code executed, >> whenever a child is added. >> >> --- a/lib/msgthr.rb >> +++ b/lib/msgthr.rb >> @@ -166,12 +166,16 @@ class Msgthr >> # but do not change existing links or loop >> if prev && !cont.parent && !cont.has_descendent(prev) >> prev.add_child(cont) >> + yield(prev, cont) if block_given? >> end >> prev = cont >> end >> >> # set parent of this message to be the last element in refs >> - prev.add_child(cur) if prev >> + if prev >> + prev.add_child(cur) >> + yield(prev, cur) if block_given? >> + end >> end >> end > > OK, that seems generic enough and we can probably support it > long-term, so I'm somewhat inclined to accept it... > > However, APIs encouraging/supporting folks to load their entire > collection(*) of messages (even skeletons) into memory feels > wrong to me. > > Can you come up with a use case where this is useful for > a subset of messages? > Well, in my specific case there weren't many messages, so memory wasn't an issue. In general, I think the question of adding the add_child callback is orthogonal to the question of using the entire collection or parts of. I.e. one could use Msgthr as it is, with millions of emails, and one could use the callback with only a few messages. Consider this flow: 1. querying the storage backend according to some criteria (e.g. a time range, a particular sender, etc.) 2. grouping the messages in the response to threads I'd rather show than tell, so here's a more elaborated example: https://github.com/dimidd/msgthr/commit/3e38a4910e7a3c17c07f47c4f1b9d556a4a951fd.patch BTW, note how we only needed one pointer per message and one string *per thread*, by using an array with a single element and saving the actual message only in the top level (the rootset). > > (*) I work with millions of emails > >> > P.S. I hope you don't mind I uploaded my fork to github. > > That's fine, I just add a new remote(*) to my .git/config, fetch > and show. > > What I won't accept about GitHub is having it as a centralized > and proprietary messaging system which forces participants to > accept their ToS. I can't accept that; no single entity > controls email, so that's what I stick with. > > > (*) added this to my .git/config > ==> .git/config <== > [remote "dimidd"] > url = https://github.com/dimidd/msgthr > fetch = refs/heads/*:refs/remotes/dimidd/*