diff options
author | Eric Wong <e@80x24.org> | 2023-08-24 01:22:34 +0000 |
---|---|---|
committer | Eric Wong <e@80x24.org> | 2023-08-24 07:47:52 +0000 |
commit | 1c8430a7fa407e476ef70a6a199983faf071d7a5 (patch) | |
tree | 610e37eb535d08b932f16ed02b82d03a01461bed /MANIFEST | |
parent | b18ecb7707e83cb8cb38c3736aecd984999ca0a7 (diff) | |
download | public-inbox-1c8430a7fa407e476ef70a6a199983faf071d7a5.tar.gz |
We can't rely on combining the `-u' and `-k1,1' switches of POSIX sort(1) to do what we want. So only rely on `sort -k1,1' while introducing a small Perl helper to fold identical prefixes into one line. In other words, input such as: deadbeef 0 deadbeef 1 deadbeef 2 Was getting deduplicated into a single line: deadbeef 0 ... with `sort -u -k1,1' This makes puts the output into a more optimal form for eventual (not-fully-implemented-yet) parsing: deadbeef 0,1,2 ORS is current the comma (`,') for inbox IDs, but it'll be a space (` ') for coderepo root IDs. This implementation also combines identical IDs in the 2nd column. Thus: deadbeef 0 deadbeef 0 Becomes a single `deadbeef 0' line thanks to the use of XS List::Util::uniq (which beats a pure Perl hash). I attempted to implement this in awk but Perl is close enough to gawk in performance while being shorter and easier-to-understand due to List::Util::uniq. mawk was faster, but still not enough to matter as the bottleneck is from iterating through Xapian MSets.
Diffstat (limited to 'MANIFEST')
0 files changed, 0 insertions, 0 deletions