Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 0/9] reftable: optimize write performance
Date: Tue, 2 Apr 2024 19:29:47 +0200	[thread overview]
Message-ID: <cover.1712078736.git.ps@pks.im> (raw)

[-- Attachment #1: Type: text/plain, Size: 3483 bytes --]

Hi,

this is my first patch series taking an actual look at write performance
for the reftable backend. This series addresses two major pain points:

  - Duplicate directory/file conflict checks when writing refs.

  - Allocation churn when compressing log blocks.

Overall though I found that there is not much of a point to investigate
write performance in the reftable library itself, at least not right
now. This is mostly because the write performance is heavily dominated
by random ref reads. And while past patch series have optimized scanning
through refs linearly, seeking random refs isn't well-optimized yet. So
once all in-flight series relating to reftable performance have landed I
will focus on random ref reads next.

For the bigger picture, the following benchmarks show perfomance
compared to the "files" backend after applying this patch series.

Writing many refs in a single transaction:

  Benchmark 1: update-ref: create many refs (refformat = files, refcount = 100000)
    Time (mean ± σ):     10.085 s ±  0.057 s    [User: 1.876 s, System: 8.161 s]
    Range (min … max):   10.013 s … 10.202 s    10 runs

  Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 100000)
    Time (mean ± σ):      2.768 s ±  0.018 s    [User: 1.381 s, System: 1.383 s]
    Range (min … max):    2.745 s …  2.804 s    10 runs

  Summary
    update-ref: create many refs (refformat = reftable, refcount = 100000) ran
      3.64 ± 0.03 times faster than update-ref: create many refs (refformat = files, refcount = 100000)

And for writing many refs sequentially in separate transactions:

  Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 10000)
    Time (mean ± σ):     40.286 s ±  0.086 s    [User: 22.241 s, System: 17.912 s]
    Range (min … max):   40.166 s … 40.410 s    10 runs

  Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 10000)
    Time (mean ± σ):     44.046 s ±  0.137 s    [User: 23.790 s, System: 20.146 s]
    Range (min … max):   43.813 s … 44.301 s    10 runs

  Summary
    update-ref: create refs sequentially (refformat = files, refcount = 10000) ran
      1.09 ± 0.00 times faster than update-ref: create refs sequentially (refformat = reftable, refcount = 10000)

This is to the best of my knowledge last area where the "files" backend
outperforms the "reftable" backend. This is partially also due to the
fact that writes perform auto-compaction with the "reftable" backend.

Patrick

Patrick Steinhardt (9):
  refs/reftable: fix D/F conflict error message on ref copy
  refs/reftable: perform explicit D/F check when writing symrefs
  refs/reftable: skip duplicate name checks
  refs/reftable: don't recompute committer ident
  reftable/writer: refactorings for `writer_add_record()`
  reftable/writer: refactorings for `writer_flush_nonempty_block()`
  reftable/block: reuse zstream when writing log blocks
  reftable/block: reuse compressed array
  reftable/writer: reset `last_key` instead of releasing it

 refs/reftable-backend.c    |  80 ++++++++++++++++++-------
 reftable/block.c           |  83 ++++++++++++++++----------
 reftable/block.h           |   4 ++
 reftable/writer.c          | 119 ++++++++++++++++++++++++-------------
 t/t0610-reftable-basics.sh |  35 ++++++++++-
 5 files changed, 227 insertions(+), 94 deletions(-)

-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

             reply	other threads:[~2024-04-02 17:29 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 17:29 Patrick Steinhardt [this message]
2024-04-02 17:29 ` [PATCH 1/9] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-03 18:28   ` Junio C Hamano
2024-04-02 17:29 ` [PATCH 2/9] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 3/9] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 4/9] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-03 18:58   ` Junio C Hamano
2024-04-04  5:36     ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 5/9] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 6/9] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 7/9] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-03 19:35   ` Junio C Hamano
2024-04-04  5:36     ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 8/9] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 9/9] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04  5:48 ` [PATCH v2 00/11] reftable: optimize write performance Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 04/11] reftable: remove " Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-04  6:58     ` Han-Wen Nienhuys
2024-04-04  7:32       ` Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-04  7:08     ` Han-Wen Nienhuys
2024-04-04  7:32       ` Patrick Steinhardt
2024-04-04  9:00         ` Han-Wen Nienhuys
2024-04-04 11:43           ` Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-04  7:09   ` [PATCH v2 00/11] reftable: optimize write performance Han-Wen Nienhuys
2024-04-04  7:32     ` Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 " Patrick Steinhardt
2024-04-08 12:23   ` [PATCH v3 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-08 12:23   ` [PATCH v3 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 04/11] reftable: remove " Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-09  0:09   ` [PATCH v3 00/11] reftable: optimize write performance Junio C Hamano
2024-04-09  3:16     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1712078736.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).