Hi, this is the second version of my patch series that aims to optimize write performance in the reftable backend. I've made the following changes compared to v2: - Added a new patch to drop "reftable/refname.{c,h}" and related completely. - Reworded the commit message for the committer ident refactorings. - Added a new patch that unifies the code paths that release reftable writer resources. - Fixed a stale comment claiming that we retry deflating, which we don't. Thanks! Patrick Patrick Steinhardt (11): refs/reftable: fix D/F conflict error message on ref copy refs/reftable: perform explicit D/F check when writing symrefs refs/reftable: skip duplicate name checks reftable: remove name checks refs/reftable: don't recompute committer ident reftable/writer: refactorings for `writer_add_record()` reftable/writer: refactorings for `writer_flush_nonempty_block()` reftable/writer: unify releasing memory reftable/writer: reset `last_key` instead of releasing it reftable/block: reuse zstream when writing log blocks reftable/block: reuse compressed array Makefile | 2 - refs/reftable-backend.c | 75 +++++++++---- reftable/block.c | 80 ++++++++------ reftable/block.h | 4 + reftable/error.c | 2 - reftable/refname.c | 209 ------------------------------------- reftable/refname.h | 29 ----- reftable/refname_test.c | 101 ------------------ reftable/reftable-error.h | 3 - reftable/reftable-tests.h | 1 - reftable/reftable-writer.h | 4 - reftable/stack.c | 67 +----------- reftable/stack_test.c | 39 ------- reftable/writer.c | 137 +++++++++++++++--------- t/helper/test-reftable.c | 1 - t/t0610-reftable-basics.sh | 35 ++++++- 16 files changed, 230 insertions(+), 559 deletions(-) delete mode 100644 reftable/refname.c delete mode 100644 reftable/refname.h delete mode 100644 reftable/refname_test.c Range-diff against v1: 1: 14b4dacd73 = 1: 926e802395 refs/reftable: fix D/F conflict error message on ref copy 2: 55db366e61 = 2: 6190171906 refs/reftable: perform explicit D/F check when writing symrefs 3: ad8210ec65 = 3: 80008cc5e7 refs/reftable: skip duplicate name checks -: ---------- > 4: 3497a570b4 reftable: remove name checks 4: a9a6795c02 ! 5: f892a3007b refs/reftable: don't recompute committer ident @@ Commit message refs/reftable: don't recompute committer ident In order to write reflog entries we need to compute the committer's - identity as it becomes encoded in the log record itself. In the reftable - backend, computing the identity is repeated for every single reflog - entry which we are about to write in a transaction. Needless to say, - this can be quite a waste of effort when writing many refs with reflog - entries in a single transaction. + identity as it gets encoded in the log record itself. The reftable + backend does this via `git_committer_info()` and `split_ident_line()` in + `fill_reftable_log_record()`, which use the Git config as well as + environment variables to figure out the identity. + + While most callers would only call `fill_reftable_log_record()` once or + twice, `write_transaction_table()` will call it as many times as there + are queued ref updates. This can be quite a waste of effort when writing + many refs with reflog entries in a single transaction. Refactor the code to pre-compute the committer information. This results in a small speedup when writing 100000 refs in a single transaction: 5: 8e9d69e9e6 = 6: 4877ab3921 reftable/writer: refactorings for `writer_add_record()` 6: 1f903afdda = 7: 8f1c5b4169 reftable/writer: refactorings for `writer_flush_nonempty_block()` -: ---------- > 8: 41db7414e1 reftable/writer: unify releasing memory 9: 6950ae4ea7 ! 9: e5c7dbe417 reftable/writer: reset `last_key` instead of releasing it @@ reftable/writer.c: static void writer_reinit_block_writer(struct reftable_writer block_writer_init(&w->block_writer_data, typ, w->block, w->opts.block_size, block_start, hash_size(w->opts.hash_id)); -@@ reftable/writer.c: void reftable_writer_free(struct reftable_writer *w) - block_writer_release(w->block_writer); - w->block_writer = NULL; - } -+ strbuf_release(&w->last_key); - reftable_free(w->block); - reftable_free(w); - } @@ reftable/writer.c: static int writer_finish_section(struct reftable_writer *w) bstats->max_index_level = max_level; 7: 86dab54dfe ! 10: 26f422703f reftable/block: reuse zstream when writing log blocks @@ reftable/block.c: int block_writer_finish(struct block_writer *w) + w->zstream->avail_in = src_len; + + /* -+ * We want to perform all decompression in a single -+ * step, which is why we can pass Z_FINISH here. Note -+ * that both `Z_OK` and `Z_BUF_ERROR` indicate that we -+ * need to retry according to documentation. -+ * -+ * If the call fails we retry with a bigger output -+ * buffer. ++ * We want to perform all decompression in a single step, which ++ * is why we can pass Z_FINISH here. As we have precomputed the ++ * deflated buffer's size via `deflateBound()` this function is ++ * guaranteed to succeed according to the zlib documentation. + */ + ret = deflate(w->zstream, Z_FINISH); + if (ret != Z_STREAM_END) { @@ reftable/block.h: license that can be found in the LICENSE file or at uint8_t *buf; uint32_t block_size; - - ## reftable/writer.c ## -@@ reftable/writer.c: void reftable_writer_free(struct reftable_writer *w) - { - if (!w) - return; -+ if (w->block_writer) { -+ block_writer_release(w->block_writer); -+ w->block_writer = NULL; -+ } - reftable_free(w->block); - reftable_free(w); - } 8: 9899b58dcf ! 11: 4f9df714da reftable/block: reuse compressed array @@ reftable/block.c: int block_writer_finish(struct block_writer *w) w->zstream->next_in = w->buf + block_header_skip; w->zstream->avail_in = src_len; @@ reftable/block.c: int block_writer_finish(struct block_writer *w) - * buffer. + * guaranteed to succeed according to the zlib documentation. */ ret = deflate(w->zstream, Z_FINISH); - if (ret != Z_STREAM_END) { -- 2.44.GIT