From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 01E4E1F44D for ; Mon, 25 Mar 2024 23:03:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1711407820; bh=Ad/xPDFQFGtgAoANh6CslS2XEepxeS4ut3mtoH/XWjA=; h=From:To:Subject:Date:From; b=clUj31CIA4YiiedCS/goe+s4+aCF2Xo15/1SCtPuUTxBP/apnXz3L1SbaEggczu41 977EYyddWfsAdU+0BSfsVreVT9tTXG7bxPMCBBBtlcCYCxYUTRVIMV8mK+Sblnxtbx D7/j68ONmkRkxu3/2IPpsaqsdMXk1hFG5WD8JpYk= From: Eric Wong To: spew@80x24.org Subject: [PATCH 0/3] switch to tombstone-free khashl table Date: Mon, 25 Mar 2024 23:03:35 +0000 Message-ID: <20240325230339.262200-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: The memory improvement is minor, but any memory reduction at all is welcome at this point. Fortunately, this set of changes is unintrusive. I have some other ideas that I'll hopefully get to implement before swapping kills all my SSDs (see bottom). Eric Wong (3): list-objects-filter: use kh_size API treewide: switch to khashl for memory savings khashl: fix ensemble lookups on empty table builtin/fast-import.c | 2 +- builtin/fsmonitor--daemon.c | 4 +- delta-islands.c | 4 +- khash.h | 338 ----------------------- khashl.h | 522 ++++++++++++++++++++++++++++++++++++ list-objects-filter.c | 2 +- object-store-ll.h | 2 +- object-store.h | 7 +- oidset.h | 2 +- pack-bitmap.h | 2 +- 10 files changed, 535 insertions(+), 350 deletions(-) delete mode 100644 khash.h create mode 100644 khashl.h TODO (some ideas stolen from khashl): * obj_hash can probably use a 0.75 load factor (instead of 0.5) to save memory and not slow down too much. oid_* khash has always had 0.77 which was fine and now khashl has 0.75. 0.75 is cheaper to compute via (shifts + ORs) than 0.77. * obj_hash can use `i = (i + 1) & mask' like khashl does to avoid branches in linear probe loops * obj_hash can use realloc and copy in-place resize logic khashl does. khashl uses the ->used bitmap for this, but it should be possible to do for obj_hash w/o additional allocations via pointer tagging (not khashl inspired): * keep an identity pool of OIDs separately (similar to how Perl5 pools all hash keys) and only use tagged pointers for OIDs. Pointer tagging can be used to distinguish between 7 hash functions, leaving us room for 5 more in addition to SHA-(1|256). This change will be a large effort (with a hopefully large savings). If we ever need more than 7 hash functions, we can switch to storing hash type information in extents that can be looked up using address ranges (AFAIK, jemalloc does this).