Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Han-Wen Nienhuys <hanwenn@gmail.com>
Subject: [PATCH v3 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()`
Date: Mon, 8 Apr 2024 14:24:20 +0200	[thread overview]
Message-ID: <d47ad49d49916d02b8e62ee34404c025fa030845.1712578837.git.ps@pks.im> (raw)
In-Reply-To: <cover.1712578837.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3749 bytes --]

Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_flush_nonempty_block()` such that it conforms
better to it and add some documentation that explains some of its more
intricate behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/writer.c | 72 +++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/reftable/writer.c b/reftable/writer.c
index 0ad5eb8887..d347ec4cc6 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -659,58 +659,74 @@ static void writer_clear_index(struct reftable_writer *w)
 	w->index_cap = 0;
 }
 
-static const int debug = 0;
-
 static int writer_flush_nonempty_block(struct reftable_writer *w)
 {
+	struct reftable_index_record index_record = {
+		.last_key = STRBUF_INIT,
+	};
 	uint8_t typ = block_writer_type(w->block_writer);
-	struct reftable_block_stats *bstats =
-		writer_reftable_block_stats(w, typ);
-	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
-	int raw_bytes = block_writer_finish(w->block_writer);
-	int padding = 0;
-	int err = 0;
-	struct reftable_index_record ir = { .last_key = STRBUF_INIT };
+	struct reftable_block_stats *bstats;
+	int raw_bytes, padding = 0, err;
+	uint64_t block_typ_off;
+
+	/*
+	 * Finish the current block. This will cause the block writer to emit
+	 * restart points and potentially compress records in case we are
+	 * writing a log block.
+	 *
+	 * Note that this is still happening in memory.
+	 */
+	raw_bytes = block_writer_finish(w->block_writer);
 	if (raw_bytes < 0)
 		return raw_bytes;
 
-	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+	/*
+	 * By default, all records except for log records are padded to the
+	 * block size.
+	 */
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG)
 		padding = w->opts.block_size - raw_bytes;
-	}
 
-	if (block_typ_off > 0) {
+	bstats = writer_reftable_block_stats(w, typ);
+	block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	if (block_typ_off > 0)
 		bstats->offset = block_typ_off;
-	}
-
 	bstats->entries += w->block_writer->entries;
 	bstats->restarts += w->block_writer->restart_len;
 	bstats->blocks++;
 	w->stats.blocks++;
 
-	if (debug) {
-		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
-			w->next, raw_bytes,
-			get_be24(w->block + w->block_writer->header_off + 1));
-	}
-
-	if (w->next == 0) {
+	/*
+	 * If this is the first block we're writing to the table then we need
+	 * to also write the reftable header.
+	 */
+	if (!w->next)
 		writer_write_header(w, w->block);
-	}
 
 	err = padded_write(w, w->block, raw_bytes, padding);
 	if (err < 0)
 		return err;
 
+	/*
+	 * Add an index record for every block that we're writing. If we end up
+	 * having more than a threshold of index records we will end up writing
+	 * an index section in `writer_finish_section()`. Each index record
+	 * contains the last record key of the block it is indexing as well as
+	 * the offset of that block.
+	 *
+	 * Note that this also applies when flushing index blocks, in which
+	 * case we will end up with a multi-level index.
+	 */
 	REFTABLE_ALLOC_GROW(w->index, w->index_len + 1, w->index_cap);
-
-	ir.offset = w->next;
-	strbuf_reset(&ir.last_key);
-	strbuf_addbuf(&ir.last_key, &w->block_writer->last_key);
-	w->index[w->index_len] = ir;
-
+	index_record.offset = w->next;
+	strbuf_reset(&index_record.last_key);
+	strbuf_addbuf(&index_record.last_key, &w->block_writer->last_key);
+	w->index[w->index_len] = index_record;
 	w->index_len++;
+
 	w->next += padding + raw_bytes;
 	w->block_writer = NULL;
+
 	return 0;
 }
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-04-08 12:24 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 17:29 [PATCH 0/9] reftable: optimize write performance Patrick Steinhardt
2024-04-02 17:29 ` [PATCH 1/9] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-03 18:28   ` Junio C Hamano
2024-04-02 17:29 ` [PATCH 2/9] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 3/9] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 4/9] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-03 18:58   ` Junio C Hamano
2024-04-04  5:36     ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 5/9] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 6/9] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 7/9] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-03 19:35   ` Junio C Hamano
2024-04-04  5:36     ` Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 8/9] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-02 17:30 ` [PATCH 9/9] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04  5:48 ` [PATCH v2 00/11] reftable: optimize write performance Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 04/11] reftable: remove " Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-04  6:58     ` Han-Wen Nienhuys
2024-04-04  7:32       ` Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 07/11] reftable/writer: refactorings for `writer_flush_nonempty_block()` Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-04  7:08     ` Han-Wen Nienhuys
2024-04-04  7:32       ` Patrick Steinhardt
2024-04-04  9:00         ` Han-Wen Nienhuys
2024-04-04 11:43           ` Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-04  5:48   ` [PATCH v2 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-04  7:09   ` [PATCH v2 00/11] reftable: optimize write performance Han-Wen Nienhuys
2024-04-04  7:32     ` Patrick Steinhardt
2024-04-08 12:23 ` [PATCH v3 " Patrick Steinhardt
2024-04-08 12:23   ` [PATCH v3 01/11] refs/reftable: fix D/F conflict error message on ref copy Patrick Steinhardt
2024-04-08 12:23   ` [PATCH v3 02/11] refs/reftable: perform explicit D/F check when writing symrefs Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 03/11] refs/reftable: skip duplicate name checks Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 04/11] reftable: remove " Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 05/11] refs/reftable: don't recompute committer ident Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 06/11] reftable/writer: refactorings for `writer_add_record()` Patrick Steinhardt
2024-04-08 12:24   ` Patrick Steinhardt [this message]
2024-04-08 12:24   ` [PATCH v3 08/11] reftable/writer: unify releasing memory Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 09/11] reftable/writer: reset `last_key` instead of releasing it Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 10/11] reftable/block: reuse zstream when writing log blocks Patrick Steinhardt
2024-04-08 12:24   ` [PATCH v3 11/11] reftable/block: reuse compressed array Patrick Steinhardt
2024-04-09  0:09   ` [PATCH v3 00/11] reftable: optimize write performance Junio C Hamano
2024-04-09  3:16     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d47ad49d49916d02b8e62ee34404c025fa030845.1712578837.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanwenn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).