msgthr user+dev discussion/patches/pulls/bugs/help
 help / Atom feed
From: Eric Wong <e@80x24.org>
To: Dimid Duchovny <dimidd@gmail.com>
Cc: msgthr-public@80x24.org
Subject: Re: Feature Request: thread grouping
Date: Sun, 21 Jan 2018 23:49:11 +0000
Message-ID: <20180121234911.GA29238@whir> (raw)
In-Reply-To: <CANKvuDf7esPfy3eQ0B8aQjg4sTYTcxR_LNNWeDBcENFwmyC_3g@mail.gmail.com>

Dimid Duchovny <dimidd@gmail.com> wrote:
> However, I realized that the last step (walking) is redundant,
> since that could be done by the library itself in the threading or
> ordering stages.

I think you want is best done in the storage/indexing stage;
whereas msgthr is intended for display/rendering results that
were retrieved from some sort of search engine.

At least thats how notmuch does it, and I stole the logic for
public-inbox(*) as they both use Xapian.  I think mairix does
something similar, too; but it's been a while...

> E.g. keeping track of each container's thread,
> and when adding a message A as a child of message B, to point A's
> thread to B's one.
> We could use an array with a single element,
> or some other solution to have pass-by-reference semantics.
> Finally, all top-level containers should have their own msg_id as the thread,
> and all their descendants will point to it as well.

One advantage to doing this in the storage phase is this info is
persistent and you don't need to calculate it every time.  This
is great when you're dealing with more message skeletons than
can fit in memory.  git@vger has over 300k messages, LKML will
have several million messages, and they both use String
Message-IDs (being email), so it'll be many hundreds of MB just
in containers and Message-IDs.

Another huge advantage in doing this when indexing a message
phase is you can easily search for something in a single
message and then easily pull every message from the thread it
belongs to based on a boolean thread_id search.  I also find
the "-t" switch of mairix being useful for my private mail.

I can help you understand how public-inbox does this in
SearchIdx.pm (indexer) and Search.pm (read-only queries) if
you're not familiar with Perl5, but for now you can grab the
code and try understanding it on your own:

	git clone https://public-inbox.org/public-inbox

http://repo.or.cz/public-inbox.git/blob/4f2f0eb94739edf:/lib/PublicInbox/SearchIdx.pm
http://repo.or.cz/public-inbox.git/blob/4f2f0eb94739edf:/lib/PublicInbox/Search.pm

I'll be happy to answer questions on meta@public-inbox.org
about it :)

> Would you consider adding such a feature? If so, I'll be happy to work
> out the details and submit a patch.

I'm not sure if it makes sense to add this without a stable
storage backend (Xapian or some other search indexer/DB).

Another potential problem is adding this to msgthr is msgthr is
GPL-2+ (since it's a port of Mail::Thread from CPAN); but the
notmuch algorithm is GPL-3+, so I'm not allowed to put it into
a GPL-2+ project (APGL-3+ is OK).

Maybe you can cite prior art from mairix (GPL-2+), but I haven't
looked at that code in many years and don't remember it.

  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-21  9:40 Dimid Duchovny
2018-01-21 23:49 ` Eric Wong [this message]
2018-01-23 21:04   ` Dimid Duchovny
2018-01-23 21:12     ` Dimid Duchovny
2018-01-23 22:03       ` Eric Wong
2018-01-24 10:28         ` Dimid Duchovny
2018-01-24 19:18           ` Eric Wong
2018-01-24 21:14             ` Dimid Duchovny
2018-01-24 22:49               ` Eric Wong
2018-01-25  8:16                 ` Dimid Duchovny
2018-01-25  8:38                   ` Eric Wong
2018-02-08 13:06                     ` Dimid Duchovny

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://80x24.org/msgthr/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180121234911.GA29238@whir \
    --to=e@80x24.org \
    --cc=dimidd@gmail.com \
    --cc=msgthr-public@80x24.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

msgthr user+dev discussion/patches/pulls/bugs/help

Archives are clonable: git clone --mirror https://80x24.org/msgthr-public

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.lang.ruby.msgthr
	nntp://ou63pmih66umazou.onion/inbox.comp.lang.ruby.msgthr

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox