Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] refs: introduce reftable backend
@ 2024-01-30  8:05 Patrick Steinhardt
  2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-01-30  8:05 UTC (permalink / raw
  To: git; +Cc: Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]

Hi,

all prequisites for the reftable backend have now been merged or are
about to be merged. This patch series thus introduces the reftable
backend itself.

The patch series is built on top of c5b454771e (The eleventh batch,
2024-01-29). In addition it depends on ps/tests-with-ref-files-backend
at bbd6106967 (t: mark tests regarding git-pack-refs(1) to be backend
specific, 2024-01-29), which is about to be merged to "next".

The patch series is impacted by three in-flight patch series:

  - ps/reftable-compacted-tables-permission-fix, which fixes permissions
    on "tables.list" when compacting so that it correctly honors the
    "core.sharedRepository" setting.

  - jc/reftable-core-fsync, which starts to fsync reftable data to disk.

  - A follow-up to that patch series [1], which starts to fsync
    "tables.list" to disk during compaction.

The patch series does _not_ depend on those series. Instead, I have
added a couple of tests marked with `test_expect_failure` to t0610.
These tests will start to pass once those topics land. As I don't expect
this patch series to land on its first iteration I very much assume that
the in-flight patch series will land before the reftable backend does.

Patrick

[1]: <7bdafc9bd7f53f38a24d69a563615b6ad484e1ba.1706592127.git.ps@pks.im>

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2286 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3237 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/2] refs: introduce reftable backend
  2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
@ 2024-01-30  8:05 ` Patrick Steinhardt
  2024-02-01 15:17   ` Karthik Nayak
  2024-01-30  8:05 ` [PATCH 2/2] ci: add jobs to test with the " Patrick Steinhardt
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2024-01-30  8:05 UTC (permalink / raw
  To: git; +Cc: Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 117635 bytes --]

Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.

This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:

  - It becomes possible to do truly atomic writes where either all refs
    are committed to disk or none are. This was not possible with the
    "files" backend because ref updates were split across multiple loose
    files.

  - The disk space required to store many refs is reduced, both compared
    to loose refs and packed-refs. This is enabled both by the reftable
    format being a binary format, which is more compact, and by prefix
    compression.

  - We can ignore filesystem-specific behaviour as ref names are not
    encoded via paths anymore. This means there is no need to handle
    case sensitivity on Windows systems or Unicode precomposition on
    macOS.

  - There is no need to rewrite the complete refdb anymore every time a
    ref is being deleted like it was the case for packed-refs. This
    means that ref deletions are now constant time instead of scaling
    linearly with the number of refs.

  - We can ignore file/directory conflicts so that it becomes possible
    to store both "refs/heads/foo" and "refs/heads/foo/bar".

  - Due to this property we can retain reflogs for deleted refs. We have
    previously been deleting reflogs together with their refs to avoid
    file/directory conflicts, which is not necessary anymore.

Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.

Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:

  - It makes it significantly easier to land the "reftable" backend as
    tests behave the same. It would be tough to argue for each and every
    single test that doesn't pass with the "reftable" backend.

  - It ensures compatibility between repositories that use the "files"
    backend and repositories that use the "reftable" backend. Like this,
    hosters can migrate their repositories to use the "reftable" backend
    without causing issues for clients that use the "files" backend in
    their clones.

It is expected that these artificial limitations may eventually go away
in the long term.

Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:

  - Creating N refs in separate transactions shows that the "files"
    backend is ~50% faster. This is not surprising given that creating a
    ref only requires us to create a single loose ref. The "reftable"
    backend will also perform auto compaction on updates. In real-world
    workloads we would likely also want to perform pack loose refs,
    which would likely change the picture.

        Benchmark 1: update-ref: create refs sequentially (refformat = files)
          Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
          Range (min … max):     1.8 ms …   4.3 ms    133 runs

        Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
          Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
          Range (min … max):     2.4 ms …   2.9 ms    132 runs

        Benchmark 3: update-ref: create refs sequentially (refformat = files)
          Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
          Range (min … max):    1.969 s …  1.980 s    3 runs

        Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
          Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
          Range (min … max):    2.597 s …  2.622 s    3 runs

        Benchmark 5: update-ref: create refs sequentially (refformat = files)
          Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
          Range (min … max):   198.189 s … 198.670 s    3 runs

        Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
          Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
          Range (min … max):   290.223 s … 298.761 s    3 runs

  - Creating N refs in a single transaction shows that the "files"
    backend is significantly slower once we start to write many refs.
    The "reftable" backend only needs to update two files, whereas the
    "files" backend needs to write one file per ref.

        Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
          Range (min … max):     1.8 ms …   2.6 ms    151 runs

        Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
          Range (min … max):     2.4 ms …   3.4 ms    148 runs

        Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
          Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
          Range (min … max):   148.5 ms … 167.8 ms    15 runs

        Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
          Range (min … max):    56.3 ms …  72.9 ms    40 runs

        Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
          Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
          Range (min … max):   152.165 s … 153.542 s    3 runs

        Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
          Range (min … max):   51.769 s … 52.012 s    3 runs

  - Deleting a ref in a fully-packed repository shows that the "files"
    backend scales with the number of refs. The "reftable" backend has
    constant-time deletions.

        Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
          Range (min … max):     1.6 ms …   2.1 ms    316 runs

        Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
          Range (min … max):     1.7 ms …   2.1 ms    294 runs

        Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
          Range (min … max):     1.9 ms …   2.5 ms    287 runs

        Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
          Range (min … max):     1.8 ms …   2.1 ms    217 runs

        Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
          Range (min … max):   224.6 ms … 245.2 ms    6 runs

        Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
          Range (min … max):     2.0 ms …   2.1 ms    3 runs

  - Listing all refs shows no significant advantage for either of the
    backends. The "files" backend is a bit faster, but not by a
    significant margin. When repositories are not packed the "reftable"
    backend outperforms the "files" backend because the "reftable"
    backend performs auto-compaction.

        Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1729 runs

        Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.8 ms    1816 runs

        Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
          Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
          Range (min … max):     4.1 ms …   4.6 ms    645 runs

        Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
          Range (min … max):     4.2 ms …   5.9 ms    643 runs

        Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
          Range (min … max):    2.511 s …  2.627 s    10 runs

        Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
          Range (min … max):    2.692 s …  2.752 s    10 runs

        Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.9 ms    1834 runs

        Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.0 ms    1840 runs

        Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
          Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
          Range (min … max):    13.3 ms …  14.5 ms    208 runs

        Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
          Range (min … max):     4.3 ms …   6.2 ms    624 runs

        Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
          Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
          Range (min … max):   11.965 s … 12.370 s    10 runs

        Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
          Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
          Range (min … max):    2.769 s …  2.836 s    10 runs

  - Printing a single ref shows no real difference between the "files"
    and "reftable" backends.

        Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
          Range (min … max):     1.4 ms …   1.8 ms    1779 runs

        Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.5 ms    1753 runs

        Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   1.9 ms    1840 runs

        Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1831 runs

        Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1848 runs

        Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1762 runs

So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.

The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".

There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.

As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.

[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/

Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2286 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 13 files changed, 3215 insertions(+), 6 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

diff --git a/Documentation/ref-storage-format.txt b/Documentation/ref-storage-format.txt
index 1a65cac468..14fff8a9c6 100644
--- a/Documentation/ref-storage-format.txt
+++ b/Documentation/ref-storage-format.txt
@@ -1 +1,3 @@
 * `files` for loose files with packed-refs. This is the default.
+* `reftable` for the reftable format. This format is experimental and its
+  internals are subject to change.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 27be3741e6..47281420fc 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -103,5 +103,6 @@ GIT_COMMON_DIR/worktrees/<id>/config.worktree)
 
 ==== `refStorage`
 
-Specifies the file format for the ref database. The only valid value
-is `files` (loose references with a packed-refs file).
+Specifies the file format for the ref database. The valid values are
+`files` (loose references with a packed-refs file) and `reftable` (see
+Documentation/technical/reftable.txt).
diff --git a/Makefile b/Makefile
index 0f748a52e6..b302516173 100644
--- a/Makefile
+++ b/Makefile
@@ -1127,6 +1127,7 @@ LIB_OBJS += reflog.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
index 888c34a521..989197aace 100755
--- a/contrib/workdir/git-new-workdir
+++ b/contrib/workdir/git-new-workdir
@@ -79,7 +79,7 @@ trap cleanup $siglist
 # create the links to the original repo.  explicitly exclude index, HEAD and
 # logs/HEAD from the list since they are purely related to the current working
 # directory, and should not be shared.
-for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn
+for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn reftable
 do
 	# create a containing directory if needed
 	case $x in
diff --git a/path.c b/path.c
index 0fb527918b..8bb223c92c 100644
--- a/path.c
+++ b/path.c
@@ -871,7 +871,7 @@ const char *enter_repo(const char *path, int strict)
 	return NULL;
 }
 
-static int calc_shared_perm(int mode)
+int calc_shared_perm(int mode)
 {
 	int tweak;
 
diff --git a/path.h b/path.h
index b3233c51fa..e053effef2 100644
--- a/path.h
+++ b/path.h
@@ -181,6 +181,7 @@ const char *git_path_shallow(struct repository *r);
 int ends_with_path_components(const char *path, const char *components);
 int validate_headref(const char *ref);
 
+int calc_shared_perm(int mode);
 int adjust_shared_perm(const char *path);
 
 char *interpolate_path(const char *path, int real_home);
diff --git a/refs.c b/refs.c
index c633abf284..fff343c256 100644
--- a/refs.c
+++ b/refs.c
@@ -35,6 +35,7 @@
  */
 static const struct ref_storage_be *refs_backends[] = {
 	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
+	[REF_STORAGE_FORMAT_REFTABLE] = &refs_be_reftable,
 };
 
 static const struct ref_storage_be *find_ref_storage_backend(unsigned int ref_storage_format)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 82219829b0..83e0f0bba3 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..895de0b273
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,2286 @@
+#include "../git-compat-util.h"
+#include "../abspath.h"
+#include "../chdir-notify.h"
+#include "../config.h"
+#include "../environment.h"
+#include "../gettext.h"
+#include "../hash.h"
+#include "../hex.h"
+#include "../iterator.h"
+#include "../ident.h"
+#include "../lockfile.h"
+#include "../object.h"
+#include "../path.h"
+#include "../refs.h"
+#include "../reftable/reftable-stack.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-blocksource.h"
+#include "../reftable/reftable-reader.h"
+#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-merged.h"
+#include "../reftable/reftable-generic.h"
+#include "../setup.h"
+#include "../strmap.h"
+#include "../worktree.h"
+#include "refs-internal.h"
+
+/*
+ * Used as a flag in ref_update::flags when the ref_update was via an
+ * update to HEAD.
+ */
+#define REF_UPDATE_VIA_HEAD (1 << 8)
+
+struct reftable_ref_store {
+	struct ref_store base;
+
+	struct reftable_stack *main_stack;
+	struct reftable_stack *worktree_stack;
+	struct strmap worktree_stacks;
+	struct reftable_write_options write_options;
+
+	unsigned int store_flags;
+	int err;
+};
+
+/*
+ * Downcast ref_store to reftable_ref_store. Die if ref_store is not a
+ * reftable_ref_store. required_flags is compared with ref_store's store_flags
+ * to ensure the ref_store has all required capabilities. "caller" is used in
+ * any necessary error messages.
+ */
+static struct reftable_ref_store *reftable_be_downcast(struct ref_store *ref_store,
+						       unsigned int required_flags,
+						       const char *caller)
+{
+	struct reftable_ref_store *refs;
+
+	if (ref_store->be != &refs_be_reftable)
+		BUG("ref_store is type \"%s\" not \"reftables\" in %s",
+		    ref_store->be->name, caller);
+
+	refs = (struct reftable_ref_store *)ref_store;
+
+	if ((refs->store_flags & required_flags) != required_flags)
+		BUG("operation %s requires abilities 0x%x, but only have 0x%x",
+		    caller, required_flags, refs->store_flags);
+
+	return refs;
+}
+
+/*
+ * Some refs are global to the repository (refs/heads/{*}), while others are
+ * local to the worktree (eg. HEAD, refs/bisect/{*}). We solve this by having
+ * multiple separate databases (ie. multiple reftable/ directories), one for
+ * the shared refs, one for the current worktree refs, and one for each
+ * additional worktree. For reading, we merge the view of both the shared and
+ * the current worktree's refs, when necessary.
+ *
+ * This function also optionally assigns the rewritten reference name that is
+ * local to the stack. This translation is required when using worktree refs
+ * like `worktrees/$worktree/refs/heads/foo` as worktree stacks will store
+ * those references in their normalized form.
+ */
+static struct reftable_stack *stack_for(struct reftable_ref_store *store,
+					const char *refname,
+					const char **rewritten_ref)
+{
+	const char *wtname;
+	int wtname_len;
+
+	if (!refname)
+		return store->main_stack;
+
+	switch (parse_worktree_ref(refname, &wtname, &wtname_len, rewritten_ref)) {
+	case REF_WORKTREE_OTHER: {
+		static struct strbuf wtname_buf = STRBUF_INIT;
+		struct strbuf wt_dir = STRBUF_INIT;
+		struct reftable_stack *stack;
+
+		/*
+		 * We're using a static buffer here so that we don't need to
+		 * allocate the worktree name whenever we look up a reference.
+		 * This could be avoided if the strmap interface knew how to
+		 * handle keys with a length.
+		 */
+		strbuf_reset(&wtname_buf);
+		strbuf_add(&wtname_buf, wtname, wtname_len);
+
+		/*
+		 * There is an edge case here: when the worktree references the
+		 * current worktree, then we set up the stack once via
+		 * `worktree_stacks` and once via `worktree_stack`. This is
+		 * wasteful, but in the reading case it shouldn't matter. And
+		 * in the writing case we would notice that the stack is locked
+		 * already and error out when trying to write a reference via
+		 * both stacks.
+		 */
+		stack = strmap_get(&store->worktree_stacks, wtname_buf.buf);
+		if (!stack) {
+			strbuf_addf(&wt_dir, "%s/worktrees/%s/reftable",
+				    store->base.repo->commondir, wtname_buf.buf);
+
+			store->err = reftable_new_stack(&stack, wt_dir.buf,
+							store->write_options);
+			assert(store->err != REFTABLE_API_ERROR);
+			strmap_put(&store->worktree_stacks, wtname_buf.buf, stack);
+		}
+
+		strbuf_release(&wt_dir);
+		return stack;
+	}
+	case REF_WORKTREE_CURRENT:
+		/*
+		 * If there is no worktree stack then we're currently in the
+		 * main worktree. We thus return the main stack in that case.
+		 */
+		if (!store->worktree_stack)
+			return store->main_stack;
+		return store->worktree_stack;
+	case REF_WORKTREE_MAIN:
+	case REF_WORKTREE_SHARED:
+		return store->main_stack;
+	default:
+		BUG("unhandled worktree reference type");
+	}
+}
+
+static int should_write_log(struct ref_store *refs, const char *refname)
+{
+	if (log_all_ref_updates == LOG_REFS_UNSET)
+		log_all_ref_updates = is_bare_repository() ? LOG_REFS_NONE : LOG_REFS_NORMAL;
+
+	switch (log_all_ref_updates) {
+	case LOG_REFS_NONE:
+		return refs_reflog_exists(refs, refname);
+	case LOG_REFS_ALWAYS:
+		return 1;
+	case LOG_REFS_NORMAL:
+		if (should_autocreate_reflog(refname))
+			return 1;
+		return refs_reflog_exists(refs, refname);
+	default:
+		BUG("unhandled core.logAllRefUpdates value %d", log_all_ref_updates);
+	}
+}
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	switch (log->value_type) {
+	case REFTABLE_LOG_UPDATE:
+		/* when we write log records, the hashes are owned by a struct
+		 * oid */
+		log->value.update.old_hash = NULL;
+		log->value.update.new_hash = NULL;
+		break;
+	case REFTABLE_LOG_DELETION:
+		break;
+	}
+	reftable_log_record_release(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {0};
+	int sign = 1;
+
+	if (split_ident_line(&split, info, strlen(info)))
+		BUG("failed splitting committer info");
+
+	reftable_log_record_release(log);
+	log->value_type = REFTABLE_LOG_UPDATE;
+	log->value.update.name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->value.update.email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->value.update.time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->value.update.tz_offset = sign * atoi(split.tz_begin);
+}
+
+static int read_ref_without_reload(struct reftable_stack *stack,
+				   const char *refname,
+				   struct object_id *oid,
+				   struct strbuf *referent,
+				   unsigned int *type)
+{
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret)
+		goto done;
+
+	if (ref.value_type == REFTABLE_REF_SYMREF) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.value.symref);
+		*type |= REF_ISSYMREF;
+	} else if (reftable_ref_record_val1(&ref)) {
+		oidread(oid, reftable_ref_record_val1(&ref));
+	} else {
+		/* We got a tombstone, which should not happen. */
+		BUG("unhandled reference value type %d", ref.value_type);
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+static struct ref_store *reftable_be_init(struct repository *repo,
+					  const char *gitdir,
+					  unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct strbuf path = STRBUF_INIT;
+	int is_worktree;
+	mode_t mask;
+
+	mask = umask(0);
+	umask(mask);
+
+	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
+	strmap_init(&refs->worktree_stacks);
+	refs->store_flags = store_flags;
+	refs->write_options.block_size = 4096;
+	refs->write_options.hash_id = repo->hash_algo->format_id;
+	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
+
+	/*
+	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
+	 * This stack contains both the shared and the main worktree refs.
+	 */
+	is_worktree = get_common_dir_noenv(&path, gitdir);
+	if (!is_worktree) {
+		strbuf_reset(&path);
+		strbuf_realpath(&path, gitdir, 0);
+	}
+	strbuf_addstr(&path, "/reftable");
+	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
+				       refs->write_options);
+	if (refs->err)
+		goto done;
+
+	/*
+	 * If we're in a worktree we also need to set up the worktree reftable
+	 * stack that is contained in the per-worktree GIT_DIR.
+	 */
+	if (is_worktree) {
+		strbuf_reset(&path);
+		strbuf_addf(&path, "%s/reftable", gitdir);
+
+		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
+					       refs->write_options);
+		if (refs->err)
+			goto done;
+	}
+
+	chdir_notify_reparent("reftables-backend $GIT_DIR", &refs->base.gitdir);
+
+done:
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&path);
+	return &refs->base;
+}
+
+static int reftable_be_init_db(struct ref_store *ref_store,
+			       int flags UNUSED,
+			       struct strbuf *err UNUSED)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "init_db");
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addf(&sb, "%s/reftable", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->base.gitdir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	adjust_shared_perm(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->base.gitdir);
+	write_file(sb.buf, "this repository uses the reftable format");
+	adjust_shared_perm(sb.buf);
+
+	strbuf_release(&sb);
+	return 0;
+}
+
+struct reftable_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+
+	const char *prefix;
+	unsigned int flags;
+	int err;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	struct reftable_ref_store *refs = iter->refs;
+
+	while (!iter->err) {
+		int flags = 0;
+
+		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
+		if (iter->err)
+			break;
+
+		/*
+		 * The files backend only lists references contained in
+		 * "refs/". We emulate the same behaviour here and thus skip
+		 * all references that don't start with this prefix.
+		 */
+		if (!starts_with(iter->ref.refname, "refs/"))
+			continue;
+
+		if (iter->prefix &&
+		    strncmp(iter->prefix, iter->ref.refname, strlen(iter->prefix))) {
+			iter->err = 1;
+			break;
+		}
+
+		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    parse_worktree_ref(iter->ref.refname, NULL, NULL, NULL) !=
+			    REF_WORKTREE_CURRENT)
+			continue;
+
+		switch (iter->ref.value_type) {
+		case REFTABLE_REF_VAL1:
+			oidread(&iter->oid, iter->ref.value.val1);
+			break;
+		case REFTABLE_REF_VAL2:
+			oidread(&iter->oid, iter->ref.value.val2.value);
+			break;
+		case REFTABLE_REF_SYMREF:
+			if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->ref.refname,
+						     RESOLVE_REF_READING, &iter->oid, &flags))
+				oidclr(&iter->oid);
+			break;
+		default:
+			BUG("unhandled reference value type %d", iter->ref.value_type);
+		}
+
+		if (is_null_oid(&iter->oid))
+			flags |= REF_ISBROKEN;
+
+		if (check_refname_format(iter->ref.refname, REFNAME_ALLOW_ONELEVEL)) {
+			if (!refname_is_safe(iter->ref.refname))
+				die(_("refname is dangerous: %s"), iter->ref.refname);
+			oidclr(&iter->oid);
+			flags |= REF_BAD_NAME | REF_ISBROKEN;
+		}
+
+		if (iter->flags & DO_FOR_EACH_OMIT_DANGLING_SYMREFS &&
+		    flags & REF_ISSYMREF &&
+		    flags & REF_ISBROKEN)
+			continue;
+
+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(iter->ref.refname, refs->base.repo,
+					    &iter->oid, flags))
+				continue;
+
+		iter->base.refname = iter->ref.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	if (iter->ref.value_type == REFTABLE_REF_VAL2) {
+		oidread(peeled, iter->ref.value.val2.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	reftable_ref_record_release(&iter->ref);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	.advance = reftable_ref_iterator_advance,
+	.peel = reftable_ref_iterator_peel,
+	.abort = reftable_ref_iterator_abort
+};
+
+static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_store *refs,
+							    struct reftable_stack *stack,
+							    const char *prefix,
+							    int flags)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_ref_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable, 1);
+	iter->prefix = prefix;
+	iter->base.oid = &iter->oid;
+	iter->flags = flags;
+	iter->refs = refs;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
+	if (ret)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
+					       struct ref_iterator *iter_common,
+					       void *cb_data UNUSED)
+{
+	if (iter_worktree && !iter_common) {
+		/*
+		 * Return the worktree ref if there are no more common refs.
+		 */
+		return ITER_SELECT_0;
+	} else if (iter_common) {
+		/*
+		 * In case we have pending worktree and common refs we need to
+		 * yield them based on their lexicographical order. Worktree
+		 * refs that have the same name as common refs shadow the
+		 * latter.
+		 */
+		if (iter_worktree) {
+			int cmp = strcmp(iter_worktree->refname,
+					 iter_common->refname);
+			if (cmp < 0)
+				return ITER_SELECT_0;
+			else if (!cmp)
+				return ITER_SELECT_0_SKIP_1;
+		}
+
+		 /*
+		  * Otherwise, if we either have no worktree refs anymore or if
+		  * the common ref sorts before the next worktree ref, we need
+		  * to figure out whether the next common ref belongs to the
+		  * main worktree. In that case, it should be ignored.
+		  */
+		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
+				       NULL) == REF_WORKTREE_SHARED)
+			return ITER_SELECT_1;
+
+		return ITER_SKIP_1;
+	} else {
+		return ITER_DONE;
+	}
+}
+
+static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_store,
+						       const char *prefix,
+						       const char **exclude_patterns,
+						       unsigned int flags)
+{
+	struct reftable_ref_iterator *main_iter, *worktree_iter;
+	struct reftable_ref_store *refs;
+	unsigned int required_flags = REF_STORE_READ;
+
+	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
+		required_flags |= REF_STORE_ODB;
+	refs = reftable_be_downcast(ref_store, required_flags, "ref_iterator_begin");
+
+	main_iter = ref_iterator_for_stack(refs, refs->main_stack, prefix, flags);
+
+	/*
+	 * The worktree stack is only set when we're in an actual worktree
+	 * right now. If we aren't, then we return the common reftable
+	 * iterator, only.
+	 */
+	 if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	/*
+	 * Otherwise we merge both the common and the per-worktree refs into a
+	 * single iterator.
+	 */
+	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int reftable_be_read_raw_ref(struct ref_store *ref_store,
+				    const char *refname,
+				    struct object_id *oid,
+				    struct strbuf *referent,
+				    unsigned int *type,
+				    int *failure_errno)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = read_ref_without_reload(stack, refname, oid, referent, type);
+	if (ret < 0)
+		return ret;
+	if (ret > 0) {
+		*failure_errno = ENOENT;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_be_read_symbolic_ref(struct ref_store *ref_store,
+					 const char *refname,
+					 struct strbuf *referent)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_symbolic_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret == 0 && ref.value_type == REFTABLE_REF_SYMREF)
+		strbuf_addstr(referent, ref.value.symref);
+	else
+		ret = -1;
+
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+/*
+ * Return the refname under which update was originally requested.
+ */
+static const char *original_update_refname(struct ref_update *update)
+{
+	while (update->parent_update)
+		update = update->parent_update;
+	return update->refname;
+}
+
+struct reftable_transaction_update {
+	struct ref_update *update;
+	struct object_id current_oid;
+};
+
+struct write_transaction_table_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	struct reftable_addition *addition;
+	struct reftable_transaction_update *updates;
+	size_t updates_nr;
+	size_t updates_alloc;
+	size_t updates_expected;
+};
+
+struct reftable_transaction_data {
+	struct write_transaction_table_arg *args;
+	size_t args_nr, args_alloc;
+};
+
+static void free_transaction_data(struct reftable_transaction_data *tx_data)
+{
+	if (!tx_data)
+		return;
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		reftable_addition_destroy(tx_data->args[i].addition);
+		free(tx_data->args[i].updates);
+	}
+	free(tx_data->args);
+	free(tx_data);
+}
+
+/*
+ * Prepare transaction update for the given reference update. This will cause
+ * us to lock the corresponding reftable stack for concurrent modification.
+ */
+static int prepare_transaction_update(struct write_transaction_table_arg **out,
+				      struct reftable_ref_store *refs,
+				      struct reftable_transaction_data *tx_data,
+				      struct ref_update *update,
+				      struct strbuf *err)
+{
+	struct reftable_stack *stack = stack_for(refs, update->refname, NULL);
+	struct write_transaction_table_arg *arg = NULL;
+	size_t i;
+	int ret;
+
+	/*
+	 * Search for a preexisting stack update. If there is one then we add
+	 * the update to it, otherwise we set up a new stack update.
+	 */
+	for (i = 0; !arg && i < tx_data->args_nr; i++)
+		if (tx_data->args[i].stack == stack)
+			arg = &tx_data->args[i];
+
+	if (!arg) {
+		struct reftable_addition *addition;
+
+		ret = reftable_stack_reload(stack);
+		if (ret)
+			return ret;
+
+		ret = reftable_stack_new_addition(&addition, stack);
+		if (ret) {
+			if (ret == REFTABLE_LOCK_ERROR)
+				strbuf_addstr(err, "cannot lock references");
+			return ret;
+		}
+
+		ALLOC_GROW(tx_data->args, tx_data->args_nr + 1,
+			   tx_data->args_alloc);
+		arg = &tx_data->args[tx_data->args_nr++];
+		arg->refs = refs;
+		arg->stack = stack;
+		arg->addition = addition;
+		arg->updates = NULL;
+		arg->updates_nr = 0;
+		arg->updates_alloc = 0;
+		arg->updates_expected = 0;
+	}
+
+	arg->updates_expected++;
+
+	if (out)
+		*out = arg;
+
+	return 0;
+}
+
+/*
+ * Queue a reference update for the correct stack. We potentially need to
+ * handle multiple stack updates in a single transaction when it spans across
+ * multiple worktrees.
+ */
+static int queue_transaction_update(struct reftable_ref_store *refs,
+				    struct reftable_transaction_data *tx_data,
+				    struct ref_update *update,
+				    struct object_id *current_oid,
+				    struct strbuf *err)
+{
+	struct write_transaction_table_arg *arg = NULL;
+	int ret;
+
+	if (update->backend_data)
+		BUG("reference update queued more than once");
+
+	ret = prepare_transaction_update(&arg, refs, tx_data, update, err);
+	if (ret < 0)
+		return ret;
+
+	ALLOC_GROW(arg->updates, arg->updates_nr + 1,
+		   arg->updates_alloc);
+	arg->updates[arg->updates_nr].update = update;
+	oidcpy(&arg->updates[arg->updates_nr].current_oid, current_oid);
+	update->backend_data = &arg->updates[arg->updates_nr++];
+
+	return 0;
+}
+
+static int reftable_be_transaction_prepare(struct ref_store *ref_store,
+					   struct ref_transaction *transaction,
+					   struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
+	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
+	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct reftable_transaction_data *tx_data = NULL;
+	struct object_id head_oid;
+	unsigned int head_type = 0;
+	size_t i;
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	tx_data = xcalloc(1, sizeof(*tx_data));
+
+	/*
+	 * Preprocess all updates. For one we check that there are no duplicate
+	 * reference updates in this transaction. Second, we lock all stacks
+	 * that will be modified during the transaction.
+	 */
+	for (i = 0; i < transaction->nr; i++) {
+		ret = prepare_transaction_update(NULL, refs, tx_data,
+						 transaction->updates[i], err);
+		if (ret)
+			goto done;
+
+		string_list_append(&affected_refnames,
+				   transaction->updates[i]->refname);
+	}
+
+	/*
+	 * Now that we have counted updates per stack we can preallocate their
+	 * arrays. This avoids having to reallocate many times.
+	 */
+	for (i = 0; i < tx_data->args_nr; i++) {
+		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
+		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
+	}
+
+	/*
+	 * Fail if a refname appears more than once in the transaction.
+	 * This code is taken from the files backend and is a good candidate to
+	 * be moved into the generic layer.
+	 */
+	string_list_sort(&affected_refnames);
+	if (ref_update_reject_duplicates(&affected_refnames, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto done;
+	}
+
+	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
+				      &head_referent, &head_type);
+	if (ret < 0)
+		goto done;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		struct object_id current_oid = {0};
+		struct reftable_stack *stack;
+		const char *rewritten_ref;
+
+		stack = stack_for(refs, u->refname, &rewritten_ref);
+
+		/* Verify that the new object ID is valid. */
+		if ((u->flags & REF_HAVE_NEW) && !is_null_oid(&u->new_oid) &&
+		    !(u->flags & REF_SKIP_OID_VERIFICATION) &&
+		    !(u->flags & REF_LOG_ONLY)) {
+			struct object *o = parse_object(refs->base.repo, &u->new_oid);
+			if (!o) {
+				strbuf_addf(err,
+					    _("trying to write ref '%s' with nonexistent object %s"),
+					    u->refname, oid_to_hex(&u->new_oid));
+				ret = -1;
+				goto done;
+			}
+
+			if (o->type != OBJ_COMMIT && is_branch(u->refname)) {
+				strbuf_addf(err, _("trying to write non-commit object %s to branch '%s'"),
+					    oid_to_hex(&u->new_oid), u->refname);
+				ret = -1;
+				goto done;
+			}
+		}
+
+		/*
+		 * When we update the reference that HEAD points to we enqueue
+		 * a second log-only update for HEAD so that its reflog is
+		 * updated accordingly.
+		 */
+		if (head_type == REF_ISSYMREF &&
+		    !(u->flags & REF_LOG_ONLY) &&
+		    !(u->flags & REF_UPDATE_VIA_HEAD) &&
+		    !strcmp(rewritten_ref, head_referent.buf)) {
+			struct ref_update *new_update;
+
+			/*
+			 * First make sure that HEAD is not already in the
+			 * transaction. This check is O(lg N) in the transaction
+			 * size, but it happens at most once per transaction.
+			 */
+			if (string_list_has_string(&affected_refnames, "HEAD")) {
+				/* An entry already existed */
+				strbuf_addf(err,
+					    _("multiple updates for 'HEAD' (including one "
+					    "via its referent '%s') are not allowed"),
+					    u->refname);
+				ret = TRANSACTION_NAME_CONFLICT;
+				goto done;
+			}
+
+			new_update = ref_transaction_add_update(
+					transaction, "HEAD",
+					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
+					&u->new_oid, &u->old_oid, u->msg);
+			string_list_insert(&affected_refnames, new_update->refname);
+		}
+
+		ret = read_ref_without_reload(stack, rewritten_ref,
+					      &current_oid, &referent, &u->type);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 && (!(u->flags & REF_HAVE_OLD) || is_null_oid(&u->old_oid))) {
+			/*
+			 * The reference does not exist, and we either have no
+			 * old object ID or expect the reference to not exist.
+			 * We can thus skip below safety checks as well as the
+			 * symref splitting. But we do want to verify that
+			 * there is no conflicting reference here so that we
+			 * can output a proper error message instead of failing
+			 * at a later point.
+			 */
+			ret = refs_verify_refname_available(ref_store, u->refname,
+							    &affected_refnames, NULL, err);
+			if (ret < 0)
+				goto done;
+
+			/*
+			 * There is no need to write the reference deletion
+			 * when the reference in question doesn't exist.
+			 */
+			 if (u->flags & REF_HAVE_NEW && !is_null_oid(&u->new_oid)) {
+				 ret = queue_transaction_update(refs, tx_data, u,
+								&current_oid, err);
+				 if (ret)
+					 goto done;
+			 }
+
+			continue;
+		}
+		if (ret > 0) {
+			/* The reference does not exist, but we expected it to. */
+			strbuf_addf(err, _("cannot lock ref '%s': "
+				    "unable to resolve reference '%s'"),
+				    original_update_refname(u), u->refname);
+			ret = -1;
+			goto done;
+		}
+
+		if (u->type & REF_ISSYMREF) {
+			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
+								       &current_oid, NULL);
+
+			if (u->flags & REF_NO_DEREF) {
+				/*
+				 * The reftable stack is locked at this point
+				 * already, so it should be safe to call
+				 * `refs_resolve_ref_unsafe()` here.
+				 */
+				if (u->flags & REF_HAVE_OLD && !resolved) {
+					strbuf_addf(err, _("cannot lock ref '%s': "
+						    "error reading reference"), u->refname);
+					ret = -1;
+					goto done;
+				}
+			} else {
+				struct ref_update *new_update;
+				int new_flags;
+
+				new_flags = u->flags;
+				if (!strcmp(rewritten_ref, "HEAD"))
+					new_flags |= REF_UPDATE_VIA_HEAD;
+
+				/*
+				 * If we are updating a symref (eg. HEAD), we should also
+				 * update the branch that the symref points to.
+				 *
+				 * This is generic functionality, and would be better
+				 * done in refs.c, but the current implementation is
+				 * intertwined with the locking in files-backend.c.
+				 */
+				new_update = ref_transaction_add_update(
+						transaction, referent.buf, new_flags,
+						&u->new_oid, &u->old_oid, u->msg);
+				new_update->parent_update = u;
+
+				/*
+				 * Change the symbolic ref update to log only. Also, it
+				 * doesn't need to check its old OID value, as that will be
+				 * done when new_update is processed.
+				 */
+				u->flags |= REF_LOG_ONLY | REF_NO_DEREF;
+				u->flags &= ~REF_HAVE_OLD;
+
+				if (string_list_has_string(&affected_refnames, new_update->refname)) {
+					strbuf_addf(err,
+						    _("multiple updates for '%s' (including one "
+						    "via symref '%s') are not allowed"),
+						    referent.buf, u->refname);
+					ret = TRANSACTION_NAME_CONFLICT;
+					goto done;
+				}
+				string_list_insert(&affected_refnames, new_update->refname);
+			}
+		}
+
+		/*
+		 * Verify that the old object matches our expectations. Note
+		 * that the error messages here do not make a lot of sense in
+		 * the context of the reftable backend as we never lock
+		 * individual refs. But the error messages match what the files
+		 * backend returns, which keeps our tests happy.
+		 */
+		if (u->flags & REF_HAVE_OLD && !oideq(&current_oid, &u->old_oid)) {
+			if (is_null_oid(&u->old_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference already exists"),
+					    original_update_refname(u));
+			else if (is_null_oid(&current_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference is missing but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&u->old_oid));
+			else
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "is at %s but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&current_oid),
+					    oid_to_hex(&u->old_oid));
+			ret = -1;
+			goto done;
+		}
+
+		/*
+		 * If all of the following conditions are true:
+		 *
+		 *   - We're not about to write a symref.
+		 *   - We're not about to write a log-only entry.
+		 *   - Old and new object ID are different.
+		 *
+		 * Then we're essentially doing a no-op update that can be
+		 * skipped. This is not only for the sake of efficiency, but
+		 * also skips writing unneeded reflog entries.
+		 */
+		if ((u->type & REF_ISSYMREF) ||
+		    (u->flags & REF_LOG_ONLY) ||
+		    (u->flags & REF_HAVE_NEW && !oideq(&current_oid, &u->new_oid))) {
+			ret = queue_transaction_update(refs, tx_data, u,
+						       &current_oid, err);
+			if (ret)
+				goto done;
+		}
+	}
+
+	transaction->backend_data = tx_data;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret < 0) {
+		free_transaction_data(tx_data);
+		transaction->state = REF_TRANSACTION_CLOSED;
+		if (!err->len)
+			strbuf_addf(err, _("reftable: transaction prepare: %s"),
+				    reftable_error_str(ret));
+	}
+	string_list_clear(&affected_refnames, 0);
+	strbuf_release(&referent);
+	strbuf_release(&head_referent);
+
+	return ret;
+}
+
+static int reftable_be_transaction_abort(struct ref_store *ref_store,
+					 struct ref_transaction *transaction,
+					 struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	return 0;
+}
+
+static int transaction_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
+		      ((struct reftable_transaction_update *)b)->update->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_transaction_table_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record *logs = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret = 0;
+
+	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < arg->updates_nr; i++) {
+		struct reftable_transaction_update *tx_update = &arg->updates[i];
+		struct ref_update *u = tx_update->update;
+
+		/*
+		 * Write a reflog entry when updating a ref to point to
+		 * something new in either of the following cases:
+		 *
+		 * - The reference is about to be deleted. We always want to
+		 *   delete the reflog in that case.
+		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
+		 *   the reflog entry.
+		 * - `core.logAllRefUpdates` tells us to create the reflog for
+		 *   the given ref.
+		 */
+		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
+			struct reftable_log_record log = {0};
+			struct reftable_iterator it = {0};
+
+			/*
+			 * When deleting refs we also delete all reflog entries
+			 * with them. While it is not strictly required to
+			 * delete reflogs together with their refs, this
+			 * matches the behaviour of the files backend.
+			 *
+			 * Unfortunately, we have no better way than to delete
+			 * all reflog entries one by one.
+			 */
+			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
+			while (ret == 0) {
+				struct reftable_log_record *tombstone;
+
+				ret = reftable_iterator_next_log(&it, &log);
+				if (ret < 0)
+					break;
+				if (ret > 0 || strcmp(log.refname, u->refname)) {
+					ret = 0;
+					break;
+				}
+
+				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+				tombstone = &logs[logs_nr++];
+				tombstone->refname = xstrdup(u->refname);
+				tombstone->value_type = REFTABLE_LOG_DELETION,
+				tombstone->update_index = log.update_index;
+			}
+
+			reftable_log_record_release(&log);
+			reftable_iterator_destroy(&it);
+
+			if (ret)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW &&
+			   (u->flags & REF_FORCE_CREATE_REFLOG ||
+			    should_write_log(&arg->refs->base, u->refname))) {
+			struct reftable_log_record *log;
+
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			log = &logs[logs_nr++];
+			memset(log, 0, sizeof(*log));
+
+			fill_reftable_log_record(log);
+			log->update_index = ts;
+			log->refname = xstrdup(u->refname);
+			log->value.update.new_hash = u->new_oid.hash;
+			log->value.update.old_hash = tx_update->current_oid.hash;
+			log->value.update.message =
+				xstrndup(u->msg, arg->refs->write_options.block_size / 2);
+		}
+
+		if (u->flags & REF_LOG_ONLY)
+			continue;
+
+		if (u->flags & REF_HAVE_NEW && is_null_oid(&u->new_oid)) {
+			struct reftable_ref_record ref = {
+				.refname = (char *)u->refname,
+				.update_index = ts,
+				.value_type = REFTABLE_REF_DELETION,
+			};
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = {0};
+			struct object_id peeled;
+			int peel_error;
+
+			ref.refname = (char *)u->refname;
+			ref.update_index = ts;
+
+			peel_error = peel_object(&u->new_oid, &peeled);
+			if (!peel_error) {
+				ref.value_type = REFTABLE_REF_VAL2;
+				memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+				memcpy(ref.value.val2.value, u->new_oid.hash, GIT_MAX_RAWSZ);
+			} else if (!is_null_oid(&u->new_oid)) {
+				ref.value_type = REFTABLE_REF_VAL1;
+				memcpy(ref.value.val1, u->new_oid.hash, GIT_MAX_RAWSZ);
+			}
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		}
+	}
+
+	/*
+	 * Logs are written at the end so that we do not have intermixed ref
+	 * and log blocks.
+	 */
+	if (logs) {
+		ret = reftable_writer_add_logs(writer, logs, logs_nr);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	for (i = 0; i < logs_nr; i++)
+		clear_reftable_log_record(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_transaction_finish(struct ref_store *ref_store,
+					  struct ref_transaction *transaction,
+					  struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	int ret = 0;
+
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		ret = reftable_addition_add(tx_data->args[i].addition,
+					    write_transaction_table, &tx_data->args[i]);
+		if (ret < 0)
+			goto done;
+
+		ret = reftable_addition_commit(tx_data->args[i].addition);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+
+	if (ret) {
+		strbuf_addf(err, _("reftable: transaction failure: %s"),
+			    reftable_error_str(ret));
+		return -1;
+	}
+	return ret;
+}
+
+static int reftable_be_initial_transaction_commit(struct ref_store *ref_store UNUSED,
+						  struct ref_transaction *transaction,
+						  struct strbuf *err)
+{
+	return ref_transaction_commit(transaction, err);
+}
+
+static int reftable_be_pack_refs(struct ref_store *ref_store,
+				 struct pack_refs_opts *opts)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB, "pack_refs");
+	struct reftable_stack *stack;
+	int ret;
+
+	if (refs->err)
+		return refs->err;
+
+	stack = refs->worktree_stack;
+	if (!stack)
+		stack = refs->main_stack;
+
+	ret = reftable_stack_compact_all(stack, NULL);
+	if (ret)
+		goto out;
+	ret = reftable_stack_clean(stack);
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_create_symref_arg *create = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(create->stack);
+	struct reftable_ref_record ref = {
+		.refname = (char *)create->refname,
+		.value_type = REFTABLE_REF_SYMREF,
+		.value.symref = (char *)create->target,
+		.update_index = ts,
+	};
+	struct reftable_log_record log = {0};
+	struct object_id new_oid;
+	struct object_id old_oid;
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	ret = reftable_writer_add_ref(writer, &ref);
+	if (ret)
+		return ret;
+
+	/*
+	 * Note that it is important to try and resolve the reference before we
+	 * write the log entry. This is because `should_write_log()` will munge
+	 * `core.logAllRefUpdates`, which is undesirable when we create a new
+	 * repository because it would be written into the config. As HEAD will
+	 * not resolve for new repositories this ordering will ensure that this
+	 * never happens.
+	 */
+	if (!create->logmsg ||
+	    !refs_resolve_ref_unsafe(&create->refs->base, create->target,
+				     RESOLVE_REF_READING, &new_oid, NULL) ||
+	    !should_write_log(&create->refs->base, create->refname))
+		return 0;
+
+	fill_reftable_log_record(&log);
+	log.refname = xstrdup(create->refname);
+	log.update_index = ts;
+	log.value.update.message = xstrndup(create->logmsg,
+					    create->refs->write_options.block_size / 2);
+	log.value.update.new_hash = new_oid.hash;
+	if (refs_resolve_ref_unsafe(&create->refs->base, create->refname,
+				    RESOLVE_REF_READING, &old_oid, NULL))
+		log.value.update.old_hash = old_oid.hash;
+
+	ret = reftable_writer_add_log(writer, &log);
+	clear_reftable_log_record(&log);
+	return ret;
+}
+
+static int reftable_be_create_symref(struct ref_store *ref_store,
+				     const char *refname,
+				     const char *target,
+				     const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_symref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_create_symref_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+		.target = target,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_create_symref_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret)
+		error("unable to write symref for %s: %s", refname,
+		      reftable_error_str(ret));
+	return ret;
+}
+
+struct write_copy_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+	int delete_old;
+};
+
+static int write_copy_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_copy_arg *arg = cb_data;
+	uint64_t deletion_ts, creation_ts;
+	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
+	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
+	struct reftable_log_record old_log = {0}, *logs = NULL;
+	struct reftable_iterator it = {0};
+	struct string_list skip = STRING_LIST_INIT_NODUP;
+	struct strbuf errbuf = STRBUF_INIT;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (reftable_stack_read_ref(arg->stack, arg->oldname, &old_ref)) {
+		ret = error(_("refname %s not found"), arg->oldname);
+		goto done;
+	}
+	if (old_ref.value_type == REFTABLE_REF_SYMREF) {
+		ret = error(_("refname %s is a symbolic ref, copying it is not supported"),
+			    arg->oldname);
+		goto done;
+	}
+
+	/*
+	 * There's nothing to do in case the old and new name are the same, so
+	 * we exit early in that case.
+	 */
+	if (!strcmp(arg->oldname, arg->newname)) {
+		ret = 0;
+		goto done;
+	}
+
+	/*
+	 * Verify that the new refname is available.
+	 */
+	string_list_insert(&skip, arg->oldname);
+	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
+					    NULL, &skip, &errbuf);
+	if (ret < 0) {
+		error("%s", errbuf.buf);
+		goto done;
+	}
+
+	/*
+	 * When deleting the old reference we have to use two update indices:
+	 * one to delete the old ref and its reflog, and once to create the new
+	 * ref and its reflog. They need to be staged with two separate indices
+	 * because the new reflog needs to encode both the deletion of the old
+	 * branch and the creation of the new branch, and we cannot do two
+	 * changes to a reflog in a single update.
+	 */
+	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
+	if (arg->delete_old)
+		creation_ts++;
+	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+
+	/*
+	 * Add the new reference. If this is a rename then we also delete the
+	 * old reference.
+	 */
+	refs[0] = old_ref;
+	refs[0].refname = (char *)arg->newname;
+	refs[0].update_index = creation_ts;
+	if (arg->delete_old) {
+		refs[1].refname = (char *)arg->oldname;
+		refs[1].value_type = REFTABLE_REF_DELETION;
+		refs[1].update_index = deletion_ts;
+	}
+	ret = reftable_writer_add_refs(writer, refs, arg->delete_old ? 2 : 1);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * When deleting the old branch we need to create a reflog entry on the
+	 * new branch name that indicates that the old branch has been deleted
+	 * and then recreated. This is a tad weird, but matches what the files
+	 * backend does.
+	 */
+	if (arg->delete_old) {
+		struct strbuf head_referent = STRBUF_INIT;
+		struct object_id head_oid;
+		int append_head_reflog;
+		unsigned head_type = 0;
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+		fill_reftable_log_record(&logs[logs_nr]);
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs[logs_nr].update_index = deletion_ts;
+		logs[logs_nr].value.update.message =
+			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+		logs[logs_nr].value.update.old_hash = old_ref.value.val1;
+		logs_nr++;
+
+		ret = read_ref_without_reload(arg->stack, "HEAD", &head_oid, &head_referent, &head_type);
+		if (ret < 0)
+			goto done;
+		append_head_reflog = (head_type & REF_ISSYMREF) && !strcmp(head_referent.buf, arg->oldname);
+		strbuf_release(&head_referent);
+
+		/*
+		 * The files backend uses `refs_delete_ref()` to delete the old
+		 * branch name, which will append a reflog entry for HEAD in
+		 * case it points to the old branch.
+		 */
+		if (append_head_reflog) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			logs[logs_nr] = logs[logs_nr - 1];
+			logs[logs_nr].refname = "HEAD";
+			logs_nr++;
+		}
+	}
+
+	/*
+	 * Create the reflog entry for the newly created branch.
+	 */
+	ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+	memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+	fill_reftable_log_record(&logs[logs_nr]);
+	logs[logs_nr].refname = (char *)arg->newname;
+	logs[logs_nr].update_index = creation_ts;
+	logs[logs_nr].value.update.message =
+		xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+	logs[logs_nr].value.update.new_hash = old_ref.value.val1;
+	logs_nr++;
+
+	/*
+	 * In addition to writing the reflog entry for the new branch, we also
+	 * copy over all log entries from the old reflog. Last but not least,
+	 * when renaming we also have to delete all the old reflog entries.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
+	if (ret < 0)
+		return ret;
+
+	while (1) {
+		ret = reftable_iterator_next_log(&it, &old_log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(old_log.refname, arg->oldname)) {
+			ret = 0;
+			break;
+		}
+
+		free(old_log.refname);
+
+		/*
+		 * Copy over the old reflog entry with the new refname.
+		 */
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr] = old_log;
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs_nr++;
+
+		/*
+		 * Delete the old reflog entry in case we are renaming.
+		 */
+		if (arg->delete_old) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+			logs[logs_nr].refname = (char *)arg->oldname;
+			logs[logs_nr].value_type = REFTABLE_LOG_DELETION;
+			logs[logs_nr].update_index = old_log.update_index;
+			logs_nr++;
+		}
+
+		/*
+		 * Transfer ownership of the log record we're iterating over to
+		 * the array of log records. Otherwise, the pointers would get
+		 * free'd or reallocated by the iterator.
+		 */
+		memset(&old_log, 0, sizeof(old_log));
+	}
+
+	ret = reftable_writer_add_logs(writer, logs, logs_nr);
+	if (ret < 0)
+		goto done;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_iterator_destroy(&it);
+	string_list_clear(&skip, 0);
+	strbuf_release(&errbuf);
+	for (i = 0; i < logs_nr; i++) {
+		if (!strcmp(logs[i].refname, "HEAD"))
+			continue;
+		if (logs[i].value.update.old_hash == old_ref.value.val1)
+			logs[i].value.update.old_hash = NULL;
+		if (logs[i].value.update.new_hash == old_ref.value.val1)
+			logs[i].value.update.new_hash = NULL;
+		logs[i].refname = NULL;
+		reftable_log_record_release(&logs[i]);
+	}
+	free(logs);
+	reftable_ref_record_release(&old_ref);
+	reftable_log_record_release(&old_log);
+	return ret;
+}
+
+static int reftable_be_rename_ref(struct ref_store *ref_store,
+				  const char *oldrefname,
+				  const char *newrefname,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "rename_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+		.delete_old = 1,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+static int reftable_be_copy_ref(struct ref_store *ref_store,
+				const char *oldrefname,
+				const char *newrefname,
+				const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "copy_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reftable_reflog_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+	int err;
+};
+
+static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+
+	if (iter->err) {
+		ref_iterator_abort(ref_iterator);
+		return iter->err;
+	}
+
+	while (1) {
+		int flags, ret;
+
+		ret = reftable_iterator_next_log(&iter->iter, &iter->log);
+		if (ret < 0) {
+			ref_iterator_abort(ref_iterator);
+			return ITER_ERROR;
+		}
+		if (ret > 0) {
+			if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+				return ITER_ERROR;
+			return ITER_DONE;
+		}
+
+		/*
+		 * We want the refnames that we have reflogs for, so we skip if
+		 * we've already produced this name. This could be faster by
+		 * seeking directly to reflog@update_index==0.
+		 */
+		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
+			continue;
+
+		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
+					     0, &iter->oid, &flags)) {
+			error(_("bad ref for %s"), iter->log.refname);
+			continue;
+		}
+
+		free(iter->last_name);
+		iter->last_name = xstrdup(iter->log.refname);
+		iter->base.refname = iter->log.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
+						 struct object_id *peeled)
+{
+	BUG("reftable reflog iterator cannot be peeled");
+	return -1;
+}
+
+static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+	reftable_log_record_release(&iter->log);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter->last_name);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
+	.advance = reftable_reflog_iterator_advance,
+	.peel = reftable_reflog_iterator_peel,
+	.abort = reftable_reflog_iterator_abort
+};
+
+static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
+								  struct reftable_stack *stack)
+{
+	struct reftable_reflog_iterator *iter;
+	struct reftable_merged_table *mt;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
+	iter->refs = refs;
+	iter->base.oid = &iter->oid;
+
+	ret = reftable_stack_reload(refs->main_stack);
+	if (ret < 0)
+		goto done;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
+	if (ret < 0)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_iterator_begin");
+	struct reftable_reflog_iterator *main_iter, *worktree_iter;
+
+	main_iter = reflog_iterator_for_stack(refs, refs->main_stack);
+	if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
+
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+						   const char *refname,
+						   each_reflog_ent_fn fn,
+						   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		struct object_id old_oid, new_oid;
+		const char *full_committer;
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			ret = 0;
+			break;
+		}
+
+		oidread(&old_oid, log.value.update.old_hash);
+		oidread(&new_oid, log.value.update.new_hash);
+
+		/*
+		 * When both the old object ID and the new object ID are null
+		 * then this is the reflog existence marker. The caller must
+		 * not be aware of it.
+		 */
+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
+			continue;
+
+		full_committer = fmt_ident(log.value.update.name, log.value.update.email,
+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
+		ret = fn(&old_oid, &new_oid, full_committer,
+			 log.value.update.time, log.value.update.tz_offset,
+			 log.value.update.message, cb_data);
+		if (ret)
+			break;
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
+					   const char *refname,
+					   each_reflog_ent_fn fn,
+					   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	struct reftable_iterator it = {0};
+	size_t logs_alloc = 0, logs_nr = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		struct reftable_log_record log = {0};
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			ret = 0;
+			break;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	for (i = logs_nr; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid, new_oid;
+		const char *full_committer = "";
+
+		oidread(&old_oid, log->value.update.old_hash);
+		oidread(&new_oid, log->value.update.new_hash);
+
+		/*
+		 * When both the old object ID and the new object ID are null
+		 * then this is the reflog existence marker. The caller must
+		 * not be aware of it.
+		 */
+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
+			continue;
+
+		full_committer = fmt_ident(log->value.update.name, log->value.update.email,
+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
+		ret = fn(&old_oid, &new_oid, full_committer,
+			 log->value.update.time, log->value.update.tz_offset,
+			 log->value.update.message, cb_data);
+		if (ret)
+			break;
+	}
+
+	reftable_iterator_destroy(&it);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_reflog_exists(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret)
+		goto done;
+
+	/*
+	 * Seek the reflog to see whether it contains any reflog entries which
+	 * aren't marked for deletion.
+	 */
+	while (1) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			ret = 0;
+			goto done;
+		}
+
+		ret = 1;
+		break;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	reftable_log_record_release(&log);
+	if (ret < 0)
+		ret = 0;
+	return ret;
+}
+
+struct write_reflog_existence_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	struct reftable_stack *stack;
+};
+
+static int write_reflog_existence_table(struct reftable_writer *writer,
+					void *cb_data)
+{
+	struct write_reflog_existence_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record log = {0};
+	int ret;
+
+	ret = reftable_stack_read_log(arg->stack, arg->refname, &log);
+	if (ret <= 0)
+		goto done;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * The existence entry has both old and new object ID set to the the
+	 * null object ID. Our iterators are aware of this and will not present
+	 * them to their callers.
+	 */
+	log.refname = xstrdup(arg->refname);
+	log.update_index = ts;
+	log.value_type = REFTABLE_LOG_UPDATE;
+	ret = reftable_writer_add_log(writer, &log);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_log_record_release(&log);
+	return ret;
+}
+
+static int reftable_be_create_reflog(struct ref_store *ref_store,
+				     const char *refname,
+				     struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_existence_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_reflog_existence_table, &arg);
+
+done:
+	return ret;
+}
+
+struct write_reflog_delete_arg {
+	struct reftable_stack *stack;
+	const char *refname;
+};
+
+static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_reflog_delete_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	struct reftable_log_record log = {0}, tombstone = {0};
+	struct reftable_iterator it = {0};
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * In order to delete a table we need to delete all reflog entries one
+	 * by one. This is inefficient, but the reftable format does not have a
+	 * better marker right now.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->refname);
+	while (ret == 0) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, arg->refname)) {
+			ret = 0;
+			break;
+		}
+
+		tombstone.refname = (char *)arg->refname,
+		tombstone.value_type = REFTABLE_LOG_DELETION,
+		tombstone.update_index = log.update_index;
+
+		ret = reftable_writer_add_log(writer, &tombstone);
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_delete_reflog(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "delete_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_delete_arg arg = {
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+	ret = reftable_stack_add(stack, &write_reflog_delete_table, &arg);
+
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reflog_expiry_arg {
+	struct reftable_stack *stack;
+	struct reftable_log_record *records;
+	struct object_id update_oid;
+	const char *refname;
+	size_t len;
+};
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct reflog_expiry_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	uint64_t live_records = 0;
+	size_t i;
+	int ret;
+
+	for (i = 0; i < arg->len; i++)
+		if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
+			live_records++;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	if (!is_null_oid(&arg->update_oid)) {
+		struct reftable_ref_record ref = {0};
+		struct object_id peeled;
+
+		ref.refname = (char *)arg->refname;
+		ref.update_index = ts;
+
+		if (!peel_object(&arg->update_oid, &peeled)) {
+			ref.value_type = REFTABLE_REF_VAL2;
+			memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+			memcpy(ref.value.val2.value, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		} else {
+			ref.value_type = REFTABLE_REF_VAL1;
+			memcpy(ref.value.val1, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		}
+
+		ret = reftable_writer_add_ref(writer, &ref);
+		if (ret < 0)
+			return ret;
+	}
+
+	/*
+	 * When there are no more entries left in the reflog we empty it
+	 * completely, but write a placeholder reflog entry that indicates that
+	 * the reflog still exists.
+	 */
+	if (!live_records) {
+		struct reftable_log_record log = {
+			.refname = (char *)arg->refname,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.update_index = ts,
+		};
+
+		ret = reftable_writer_add_log(writer, &log);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < arg->len; i++) {
+		ret = reftable_writer_add_log(writer, &arg->records[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int reftable_be_reflog_expire(struct ref_store *ref_store,
+				     const char *refname,
+				     unsigned int flags,
+				     reflog_expiry_prepare_fn prepare_fn,
+				     reflog_expiry_should_prune_fn should_prune_fn,
+				     reflog_expiry_cleanup_fn cleanup_fn,
+				     void *policy_cb_data)
+{
+	/*
+	 * For log expiry, we write tombstones for every single reflog entry
+	 * that is to be expired. This means that the entries are still
+	 * retrievable by delving into the stack, and expiring entries
+	 * paradoxically takes extra memory. This memory is only reclaimed when
+	 * compacting the reftable stack.
+	 *
+	 * It would be better if the refs backend supported an API that sets a
+	 * criterion for all refs, passing the criterion to pack_refs().
+	 *
+	 * On the plus side, because we do the expiration per ref, we can easily
+	 * insert the reflog existence dummies.
+	 */
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "reflog_expire");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record *logs = NULL;
+	struct reftable_log_record *rewritten = NULL;
+	struct reftable_ref_record ref_record = {0};
+	struct reftable_iterator it = {0};
+	struct reftable_addition *add = NULL;
+	struct reflog_expiry_arg arg = {0};
+	struct object_id oid = {0};
+	uint8_t *last_hash = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_new_addition(&add, stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref_record);
+	if (ret < 0)
+		goto done;
+	if (reftable_ref_record_val1(&ref_record))
+		oidread(&oid, reftable_ref_record_val1(&ref_record));
+	prepare_fn(refname, &oid, policy_cb_data);
+
+	while (1) {
+		struct reftable_log_record log = {0};
+		struct object_id old_oid, new_oid;
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			break;
+		}
+
+		oidread(&old_oid, log.value.update.old_hash);
+		oidread(&new_oid, log.value.update.new_hash);
+
+		/*
+		 * Skip over the reflog existence marker. We will add it back
+		 * in when there are no live reflog records.
+		 */
+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid)) {
+			reftable_log_record_release(&log);
+			continue;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	/*
+	 * We need to rewrite all reflog entries according to the pruning
+	 * callback function:
+	 *
+	 *   - If a reflog entry shall be pruned we mark the record for
+	 *     deletion.
+	 *
+	 *   - Otherwise we may have to rewrite the chain of reflog entries so
+	 *     that gaps created by just-deleted records get backfilled.
+	 */
+	CALLOC_ARRAY(rewritten, logs_nr);
+	for (i = logs_nr; i--;) {
+		struct reftable_log_record *dest = &rewritten[i];
+		struct object_id old_oid, new_oid;
+
+		*dest = logs[i];
+		oidread(&old_oid, logs[i].value.update.old_hash);
+		oidread(&new_oid, logs[i].value.update.new_hash);
+
+		if (should_prune_fn(&old_oid, &new_oid, logs[i].value.update.email,
+				    (timestamp_t)logs[i].value.update.time,
+				    logs[i].value.update.tz_offset,
+				    logs[i].value.update.message,
+				    policy_cb_data)) {
+			dest->value_type = REFTABLE_LOG_DELETION;
+		} else {
+			if ((flags & EXPIRE_REFLOGS_REWRITE) && last_hash)
+				dest->value.update.old_hash = last_hash;
+			last_hash = logs[i].value.update.new_hash;
+		}
+	}
+
+	if (flags & EXPIRE_REFLOGS_UPDATE_REF && last_hash &&
+	    reftable_ref_record_val1(&ref_record))
+		oidread(&arg.update_oid, last_hash);
+
+	arg.records = rewritten;
+	arg.len = logs_nr;
+	arg.stack = stack,
+	arg.refname = refname,
+
+	ret = reftable_addition_add(add, &write_reflog_expiry_table, &arg);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Future improvement: we could skip writing records that were
+	 * not changed.
+	 */
+	if (!(flags & EXPIRE_REFLOGS_DRY_RUN))
+		ret = reftable_addition_commit(add);
+
+done:
+	if (add)
+		cleanup_fn(policy_cb_data);
+	assert(ret != REFTABLE_API_ERROR);
+
+	reftable_ref_record_release(&ref_record);
+	reftable_iterator_destroy(&it);
+	reftable_addition_destroy(add);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	free(rewritten);
+	return ret;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	.name = "reftable",
+	.init = reftable_be_init,
+	.init_db = reftable_be_init_db,
+	.transaction_prepare = reftable_be_transaction_prepare,
+	.transaction_finish = reftable_be_transaction_finish,
+	.transaction_abort = reftable_be_transaction_abort,
+	.initial_transaction_commit = reftable_be_initial_transaction_commit,
+
+	.pack_refs = reftable_be_pack_refs,
+	.create_symref = reftable_be_create_symref,
+	.rename_ref = reftable_be_rename_ref,
+	.copy_ref = reftable_be_copy_ref,
+
+	.iterator_begin = reftable_be_iterator_begin,
+	.read_raw_ref = reftable_be_read_raw_ref,
+	.read_symbolic_ref = reftable_be_read_symbolic_ref,
+
+	.reflog_iterator_begin = reftable_be_reflog_iterator_begin,
+	.for_each_reflog_ent = reftable_be_for_each_reflog_ent,
+	.for_each_reflog_ent_reverse = reftable_be_for_each_reflog_ent_reverse,
+	.reflog_exists = reftable_be_reflog_exists,
+	.create_reflog = reftable_be_create_reflog,
+	.delete_reflog = reftable_be_delete_reflog,
+	.reflog_expire = reftable_be_reflog_expire,
+};
diff --git a/repository.h b/repository.h
index 7a250a6605..1645cef518 100644
--- a/repository.h
+++ b/repository.h
@@ -24,8 +24,9 @@ enum fetch_negotiation_setting {
 	FETCH_NEGOTIATION_NOOP,
 };
 
-#define REF_STORAGE_FORMAT_UNKNOWN 0
-#define REF_STORAGE_FORMAT_FILES   1
+#define REF_STORAGE_FORMAT_UNKNOWN  0
+#define REF_STORAGE_FORMAT_FILES    1
+#define REF_STORAGE_FORMAT_REFTABLE 2
 
 struct repo_settings {
 	int initialized;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
new file mode 100755
index 0000000000..94ee46984a
--- /dev/null
+++ b/t/t0610-reftable-basics.sh
@@ -0,0 +1,887 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+if ! test_have_prereq REFTABLE
+then
+	skip_all='skipping reftable tests; set GIT_TEST_DEFAULT_REF_FORMAT=reftable'
+	test_done
+fi
+
+INVALID_OID=$(test_oid 001)
+
+test_expect_success 'init: creates basic reftable structures' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_path_is_dir repo/.git/reftable &&
+	test_path_is_file repo/.git/reftable/tables.list &&
+	echo reftable >expect &&
+	git -C repo rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via environment variable' '
+	test_when_finished "rm -rf repo" &&
+	GIT_DEFAULT_HASH=sha256 git init repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via option' '
+	test_when_finished "rm -rf repo" &&
+	git init --object-format=sha256 repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing reftable backend succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo for-each-ref >expect &&
+	git init --ref-format=reftable repo &&
+	git -C repo for-each-ref >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing files with reftable backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=reftable repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_success 'init: reinitializing reftable with files backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=reftable repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=files repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_perms () {
+	local perms="$1"
+	local file="$2"
+	local actual=$(ls -l "$file") &&
+
+	case "$actual" in
+	$perms*)
+		: happy
+		;;
+	*)
+		echo "$(basename $2) is not $perms but $actual"
+		false
+		;;
+	esac
+}
+
+for umask in 002 022
+do
+	test_expect_success POSIXPERM 'init: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test 1 = "$(git -C repo config core.sharedrepository)"
+		) &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+test_expect_success 'clone: can clone reftable repository' '
+	test_when_finished "rm -rf repo clone" &&
+	git init repo &&
+	test_commit -C repo message1 file1 &&
+
+	git clone repo cloned &&
+	echo reftable >expect &&
+	git -C cloned rev-parse --show-ref-format >actual &&
+	test_cmp expect actual &&
+	test_path_is_file cloned/file1
+'
+
+test_expect_success 'clone: can clone reffiles into reftable repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=files reffiles &&
+	test_commit -C reffiles A &&
+	git clone --ref-format=reftable ./reffiles reftable &&
+
+	git -C reffiles rev-parse HEAD >expect &&
+	git -C reftable rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone: can clone reftable into reffiles repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=reftable reftable &&
+	test_commit -C reftable A &&
+	git clone --ref-format=files ./reftable reffiles &&
+
+	git -C reftable rev-parse HEAD >expect &&
+	git -C reffiles rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'ref transaction: corrupted tables cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		for f in .git/reftable/*.ref
+		do
+			: >"$f" || return 1
+		done &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: corrupted tables.list cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		echo garbage >.git/reftable/tables.list &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: refuses to write ref causing F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref refs/heads/main/forbidden
+'
+
+test_expect_success 'ref transaction: deleting ref with invalid name fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref -d ../../my-private-file
+'
+
+test_expect_success 'ref transaction: can skip object ID verification' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID 0 &&
+	test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION
+'
+
+test_expect_success 'ref transaction: updating same ref multiple times fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >updates <<-EOF &&
+	update refs/heads/main $A
+	update refs/heads/main $A
+	EOF
+	cat >expect <<-EOF &&
+	fatal: multiple updates for ref ${SQ}refs/heads/main${SQ} not allowed
+	EOF
+	test_must_fail git -C repo update-ref --stdin <updates 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: can delete symbolic self-reference with git-symbolic-ref(1)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo symbolic-ref -d refs/heads/self
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference without --no-deref fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	cat >expect <<-EOF &&
+	error: multiple updates for ${SQ}refs/heads/self${SQ} (including one via symref ${SQ}refs/heads/self${SQ}) are not allowed
+	EOF
+	test_must_fail git -C repo update-ref -d refs/heads/self 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference with --no-deref succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo update-ref -d --no-deref refs/heads/self
+'
+
+test_expect_success 'ref transaction: creating symbolic ref fails with F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >expect <<-EOF &&
+	error: unable to write symref for refs/heads: file/directory conflict
+	EOF
+	test_must_fail git -C repo symbolic-ref refs/heads refs/heads/foo 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: ref deletion' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		HEAD_OID=$(git show-ref -s --verify HEAD) &&
+		cat >expect <<-EOF &&
+		$HEAD_OID refs/heads/main
+		$HEAD_OID refs/tags/file
+		EOF
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		test_must_fail git update-ref -d refs/tags/file $INVALID_OID &&
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		git update-ref -d refs/tags/file $HEAD_OID &&
+		echo "$HEAD_OID refs/heads/main" >expect &&
+		git show-ref >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref transaction: writes cause auto-compaction' '
+	test_when_finished "rm -rf repo" &&
+
+	git init repo &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag A &&
+	test_line_count = 2 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag B &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+check_fsync_events () {
+	local trace="$1" &&
+	shift &&
+
+	cat >expect &&
+	sed -n \
+		-e '/^{"event":"counter",.*"category":"fsync",/ {
+			s/.*"category":"fsync",//;
+			s/}$//;
+			p;
+		}' \
+		<"$trace" >actual &&
+	test_cmp expect actual
+}
+
+# A fix for this is in flight via jc/reftable-core-fsync.
+test_expect_failure 'ref transaction: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference update-ref refs/heads/branch HEAD &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'pack-refs: compacts tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	test_commit -C repo A &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 4 table-files &&
+	test_line_count = 3 repo/.git/reftable/tables.list &&
+
+	git -C repo pack-refs &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'pack-refs: prunes stale tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/stale-table.ref &&
+	git -C repo pack-refs &&
+	test_path_is_missing repo/.git/reftable/stable-ref.ref
+'
+
+test_expect_success 'pack-refs: does not prune non-table files' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/garbage &&
+	git -C repo pack-refs &&
+	test_path_is_file repo/.git/reftable/garbage
+'
+
+for umask in 002 022
+do
+	# A fix for this is in flight via ps/reftable-compacted-tables-permission-fix.
+	test_expect_failure POSIXPERM 'pack-refs: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test_commit -C repo A &&
+			test_line_count = 3 repo/.git/reftable/tables.list
+		) &&
+		git -C repo pack-refs &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+# A fix for this is in flight.
+test_expect_failure 'packed-refs: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+	test_line_count = 2 table-files &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference pack-refs &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'ref iterator: bogus names are flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit --no-tag file &&
+		test-tool ref-store main update-ref msg "refs/heads/bogus..name" $(git rev-parse HEAD) $ZERO_OID REF_SKIP_REFNAME_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$ZERO_OID refs/heads/bogus..name 0xc
+		$(git rev-parse HEAD) refs/heads/main 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref iterator: missing object IDs are not flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test-tool ref-store main update-ref msg "refs/heads/broken-hash" $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$INVALID_OID refs/heads/broken-hash 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: commit and list refs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_write_lines refs/heads/main refs/tags/file >expect &&
+	git -C repo for-each-ref --format="%(refname)" >actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'basic: can write large commit message' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	perl -e "
+		print \"this is a long commit message\" x 50000
+	" >commit-msg &&
+	git -C repo commit --allow-empty --file=../commit-msg
+'
+
+test_expect_success 'basic: show-ref fails with empty repository' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail git -C repo show-ref >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'basic: can check out unborn branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo checkout -b main
+'
+
+test_expect_success 'basic: peeled tags are stored' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	git -C repo tag -m "annotated tag" test_tag HEAD &&
+	for ref in refs/heads/main refs/tags/file refs/tags/test_tag refs/tags/test_tag^{}
+	do
+		echo "$(git -C repo rev-parse "$ref") $ref" || return 1
+	done >expect &&
+	git -C repo show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic: for-each-ref can print symrefs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git branch &&
+		git symbolic-ref refs/heads/sym refs/heads/main &&
+		cat >expected <<-EOF &&
+		refs/heads/main
+		EOF
+		git for-each-ref --format="%(symref)" refs/heads/sym >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'basic: notes' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		write_script fake_editor <<-\EOF &&
+		echo "$MSG" >"$1"
+		echo "$MSG" >&2
+		EOF
+
+		test_commit 1st &&
+		test_commit 2nd &&
+		GIT_EDITOR=./fake_editor MSG=b4 git notes add &&
+		GIT_EDITOR=./fake_editor MSG=b3 git notes edit &&
+		echo b4 >expect &&
+		git notes --ref commits@{1} show >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: stash' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git stash list >expect &&
+		test_line_count = 0 expect &&
+
+		echo hoi >>file.t &&
+		git stash push -m stashed &&
+		git stash list >expect &&
+		test_line_count = 1 expect &&
+
+		git stash clear &&
+		git stash list >expect &&
+		test_line_count = 0 expect
+	)
+'
+
+test_expect_success 'basic: cherry-pick' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git cherry-pick source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'basic: rebase' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git rebase source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'reflog: can delete separate reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit file &&
+		test_commit file2 &&
+		test_commit file3 &&
+		test_commit file4 &&
+		git reflog >actual &&
+		grep file3 actual &&
+
+		git reflog delete HEAD@{1} &&
+		git reflog >actual &&
+		! grep file3 actual
+	)
+'
+
+test_expect_success 'reflog: can switch to previous branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		git checkout -b branch1 &&
+		test_commit file2 &&
+		git checkout -b branch2 &&
+		git switch - &&
+		git rev-parse --symbolic-full-name HEAD >actual &&
+		echo refs/heads/branch1 >expect &&
+		test_cmp actual expect
+	)
+'
+
+test_expect_success 'reflog: copying branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		test_commit file2 &&
+		oid=$(git rev-parse --short HEAD) &&
+		git branch src &&
+		cat >expect <<-EOF &&
+		${oid} dst@{0}: Branch: copied refs/heads/src to refs/heads/dst
+		${oid} dst@{1}: branch: Created from main
+		EOF
+		git branch -c src dst &&
+		git reflog dst >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog: renaming branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git symbolic-ref HEAD refs/heads/before &&
+		test_commit file &&
+		git show-ref >expected.refs &&
+		sed s/before/after/g <expected.refs >expected &&
+		git branch -M after &&
+		git show-ref >actual &&
+		test_cmp expected actual &&
+		echo refs/heads/after >expected &&
+		git symbolic-ref HEAD >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'reflog: can store empty logs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main create-reflog refs/heads/branch &&
+		test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main for-each-reflog-ent-reverse refs/heads/branch >actual &&
+		test_must_be_empty actual
+	)
+'
+
+test_expect_success 'reflog: expiry empties reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit initial &&
+		git checkout -b branch &&
+		test_commit fileA &&
+		test_commit fileB &&
+
+		cat >expect <<-EOF &&
+		commit: fileB
+		commit: fileA
+		branch: Created from HEAD
+		EOF
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_cmp expect actual &&
+
+		git reflog expire branch --expire=all &&
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_must_be_empty actual &&
+		test-tool ref-store main reflog-exists refs/heads/branch
+	)
+'
+
+test_expect_success 'reflog: can be deleted' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		test-tool ref-store main reflog-exists refs/heads/main &&
+		test-tool ref-store main delete-reflog refs/heads/main &&
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/main
+	)
+'
+
+test_expect_success 'reflog: garbage collection deletes reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		for count in $(test_seq 1 10)
+		do
+			test_commit "number $count" file.t $count number-$count ||
+			return 1
+		done &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 10 actual &&
+		grep "commit (initial): number 1" actual &&
+		grep "commit: number 10" actual &&
+
+		git gc &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 0 actual
+	)
+'
+
+test_expect_success 'reflog: updates via HEAD update HEAD reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit main-one &&
+		git checkout -b new-branch &&
+		test_commit new-one &&
+		test_commit new-two &&
+
+		echo new-one >expect &&
+		git log -1 --format=%s HEAD@{1} >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'worktree: adding worktree creates separate stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	test_path_is_file repo/.git/worktrees/worktree/refs/heads &&
+	echo "ref: refs/heads/.invalid" >expect &&
+	test_cmp expect repo/.git/worktrees/worktree/HEAD &&
+	test_path_is_dir repo/.git/worktrees/worktree/reftable &&
+	test_path_is_file repo/.git/worktrees/worktree/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in main repo packs main refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C repo pack-refs &&
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating shared ref updates main stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/heads/shared HEAD &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref updates worktree stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from main repo' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C repo update-ref worktrees/worktree/refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from second worktree' '
+	test_when_finished "rm -rf repo wt1 wt2" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../wt1 &&
+	git -C repo worktree add ../wt2 &&
+	git -C repo pack-refs &&
+	git -C wt1 pack-refs &&
+	git -C wt2 pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C wt1 update-ref worktrees/wt2/refs/bisect/per-worktree HEAD &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 2 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can create shared and per-worktree ref in one transaction' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	cat >stdin <<-EOF &&
+	create worktrees/worktree/refs/bisect/per-worktree HEAD
+	create refs/branches/shared HEAD
+	EOF
+	git -C repo update-ref --stdin <stdin &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can access common refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo file1 &&
+	git -C repo branch branch1 &&
+	git -C repo worktree add ../worktree &&
+
+	echo refs/heads/worktree >expect &&
+	git -C worktree symbolic-ref HEAD >actual &&
+	test_cmp expect actual &&
+	git -C worktree checkout branch1
+'
+
+test_expect_success 'worktree: adds worktree with detached HEAD' '
+	test_when_finished "rm -rf repo worktree" &&
+
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo rev-parse main >expect &&
+
+	git -C repo worktree add --detach ../worktree main &&
+	git -C worktree rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'fetch: accessing FETCH_HEAD special ref works' '
+	test_when_finished "rm -rf repo sub" &&
+
+	git init sub &&
+	test_commit -C sub two &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git init repo &&
+	test_commit -C repo one &&
+	git -C repo fetch ../sub &&
+	git -C repo rev-parse FETCH_HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/t/t0611-reftable-httpd.sh b/t/t0611-reftable-httpd.sh
new file mode 100755
index 0000000000..5e05b9c1f2
--- /dev/null
+++ b/t/t0611-reftable-httpd.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description='reftable HTTPD tests'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+
+test_expect_success 'serving ls-remote' '
+	git init --ref-format=reftable -b main "$REPO" &&
+	cd "$REPO" &&
+	test_commit m1 &&
+	>.git/git-daemon-export-ok &&
+	git ls-remote "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo" | cut -f 2-2 -d "	" >actual &&
+	cat >expect <<-EOF &&
+	HEAD
+	refs/heads/main
+	refs/tags/m1
+	EOF
+	test_cmp actual expect
+'
+
+test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index fc93aa57e6..0fb8757b7d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1750,6 +1750,8 @@ esac
 case "$GIT_DEFAULT_REF_FORMAT" in
 files)
 	test_set_prereq REFFILES;;
+reftable)
+	test_set_prereq REFTABLE;;
 *)
 	echo 2>&1 "error: unknown ref format $GIT_DEFAULT_REF_FORMAT"
 	exit 1
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] ci: add jobs to test with the reftable backend
  2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
  2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
@ 2024-01-30  8:05 ` Patrick Steinhardt
  2024-01-30 22:08 ` [PATCH 0/2] refs: introduce " Junio C Hamano
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-01-30  8:05 UTC (permalink / raw
  To: git; +Cc: Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 3294 bytes --]

Add CI jobs for both GitHub Workflows and GitLab CI to run Git with the
new reftable backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 .github/workflows/main.yml | 9 +++++++++
 .gitlab-ci.yml             | 9 +++++++++
 ci/lib.sh                  | 2 +-
 ci/run-build-and-tests.sh  | 3 +++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 4d97da57ec..1b43e49dad 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -266,6 +266,9 @@ jobs:
           - jobname: linux-sha256
             cc: clang
             pool: ubuntu-latest
+          - jobname: linux-reftable
+            cc: clang
+            pool: ubuntu-latest
           - jobname: linux-gcc
             cc: gcc
             cc_package: gcc-8
@@ -277,6 +280,9 @@ jobs:
           - jobname: osx-clang
             cc: clang
             pool: macos-13
+          - jobname: osx-reftable
+            cc: clang
+            pool: macos-13
           - jobname: osx-gcc
             cc: gcc
             cc_package: gcc-13
@@ -287,6 +293,9 @@ jobs:
           - jobname: linux-leaks
             cc: gcc
             pool: ubuntu-latest
+          - jobname: linux-reftable-leaks
+            cc: gcc
+            pool: ubuntu-latest
           - jobname: linux-asan-ubsan
             cc: clang
             pool: ubuntu-latest
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 43bfbd8834..c0fa2fe90b 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -26,6 +26,9 @@ test:linux:
       - jobname: linux-sha256
         image: ubuntu:latest
         CC: clang
+      - jobname: linux-reftable
+        image: ubuntu:latest
+        CC: clang
       - jobname: linux-gcc
         image: ubuntu:20.04
         CC: gcc
@@ -40,6 +43,9 @@ test:linux:
       - jobname: linux-leaks
         image: ubuntu:latest
         CC: gcc
+      - jobname: linux-reftable-leaks
+        image: ubuntu:latest
+        CC: gcc
       - jobname: linux-asan-ubsan
         image: ubuntu:latest
         CC: clang
@@ -79,6 +85,9 @@ test:osx:
       - jobname: osx-clang
         image: macos-13-xcode-14
         CC: clang
+      - jobname: osx-reftable
+        image: macos-13-xcode-14
+        CC: clang
   artifacts:
     paths:
       - t/failed-test-artifacts
diff --git a/ci/lib.sh b/ci/lib.sh
index d5dd2f2697..0a73fc7bd1 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -367,7 +367,7 @@ linux-musl)
 	MAKEFLAGS="$MAKEFLAGS NO_REGEX=Yes ICONV_OMITS_BOM=Yes"
 	MAKEFLAGS="$MAKEFLAGS GIT_TEST_UTF8_LOCALE=C.UTF-8"
 	;;
-linux-leaks)
+linux-leaks|linux-reftable-leaks)
 	export SANITIZE=leak
 	export GIT_TEST_PASSING_SANITIZE_LEAK=true
 	export GIT_TEST_SANITIZE_LEAK_LOG=true
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 7a1466b868..c192bd613c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -37,6 +37,9 @@ linux-clang)
 linux-sha256)
 	export GIT_TEST_DEFAULT_HASH=sha256
 	;;
+linux-reftable|linux-reftable-leaks|osx-reftable)
+	export GIT_TEST_DEFAULT_REF_FORMAT=reftable
+	;;
 pedantic)
 	# Don't run the tests; we only care about whether Git can be
 	# built.
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/2] refs: introduce reftable backend
  2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
  2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
  2024-01-30  8:05 ` [PATCH 2/2] ci: add jobs to test with the " Patrick Steinhardt
@ 2024-01-30 22:08 ` Junio C Hamano
  2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2024-01-30 22:08 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys

Patrick Steinhardt <ps@pks.im> writes:

> Patrick Steinhardt (2):
>   refs: introduce reftable backend
>   ci: add jobs to test with the reftable backend

I haven't checked the code in refs/reftable-backend.c at all, but
judging from the fallout to other files, it issurprisingly cleanly
done.  I somehow expected that we would need more changes like
exposing the helper function to compute shared permission bits,
but I see that it was the only change needed.  Everything else done
to other files are natural requirement to plug the new thing as a
new ref backend to the system.  Very nicely done.

Thanks.


>  .github/workflows/main.yml                    |    9 +
>  .gitlab-ci.yml                                |    9 +
>  Documentation/ref-storage-format.txt          |    2 +
>  .../technical/repository-version.txt          |    5 +-
>  Makefile                                      |    1 +
>  ci/lib.sh                                     |    2 +-
>  ci/run-build-and-tests.sh                     |    3 +
>  contrib/workdir/git-new-workdir               |    2 +-
>  path.c                                        |    2 +-
>  path.h                                        |    1 +
>  refs.c                                        |    1 +
>  refs/refs-internal.h                          |    1 +
>  refs/reftable-backend.c                       | 2286 +++++++++++++++++
>  repository.h                                  |    5 +-
>  t/t0610-reftable-basics.sh                    |  887 +++++++
>  t/t0611-reftable-httpd.sh                     |   26 +
>  t/test-lib.sh                                 |    2 +
>  17 files changed, 3237 insertions(+), 7 deletions(-)
>  create mode 100644 refs/reftable-backend.c
>  create mode 100755 t/t0610-reftable-basics.sh
>  create mode 100755 t/t0611-reftable-httpd.sh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] refs: introduce reftable backend
  2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
@ 2024-02-01 15:17   ` Karthik Nayak
  2024-02-02  8:30     ` Patrick Steinhardt
  0 siblings, 1 reply; 23+ messages in thread
From: Karthik Nayak @ 2024-02-01 15:17 UTC (permalink / raw
  To: Patrick Steinhardt, git; +Cc: Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 24080 bytes --]

Hello,

Patrick Steinhardt <ps@pks.im> writes:
>   - It becomes possible to do truly atomic writes where either all refs
>     are committed to disk or none are. This was not possible with the
>     "files" backend because ref updates were split across multiple loose
>     files.
>
>   - The disk space required to store many refs is reduced, both compared
>     to loose refs and packed-refs. This is enabled both by the reftable
>     format being a binary format, which is more compact, and by prefix
>     compression.
>
>   - We can ignore filesystem-specific behaviour as ref names are not
>     encoded via paths anymore. This means there is no need to handle
>     case sensitivity on Windows systems or Unicode precomposition on
>     macOS.
>
>   - There is no need to rewrite the complete refdb anymore every time a
>     ref is being deleted like it was the case for packed-refs. This
>     means that ref deletions are now constant time instead of scaling
>     linearly with the number of refs.
>
>   - We can ignore file/directory conflicts so that it becomes possible
>     to store both "refs/heads/foo" and "refs/heads/foo/bar".
>
>   - Due to this property we can retain reflogs for deleted refs. We have
>     previously been deleting reflogs together with their refs to avoid
>     file/directory conflicts, which is not necessary anymore.
>

Nit: Maybe also a point about how with the current files backend doesn't
have a good way to differentiate between regular files and refs. While
regular refs are in the 'refs/' folder, pseudorefs have no namespace.
While with the reftable implementation we know that everything within
the reftable is a ref and there are no other refs we need to consider.


> Performance-wise things very much depend on the actual workload. The
> following benchmarks compare the "files" and "reftable" backends in the
> current version:

Pretty nice to have these numbers here.

>
>   - Creating N refs in separate transactions shows that the "files"
>     backend is ~50% faster. This is not surprising given that creating a
>     ref only requires us to create a single loose ref. The "reftable"
>     backend will also perform auto compaction on updates. In real-world
>     workloads we would likely also want to perform pack loose refs,
>     which would likely change the picture.
>
>         Benchmark 1: update-ref: create refs sequentially (refformat = files)
>           Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
>           Range (min … max):     1.8 ms …   4.3 ms    133 runs
>
>         Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
>           Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
>           Range (min … max):     2.4 ms …   2.9 ms    132 runs
>
>         Benchmark 3: update-ref: create refs sequentially (refformat = files)
>           Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
>           Range (min … max):    1.969 s …  1.980 s    3 runs
>
>         Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
>           Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
>           Range (min … max):    2.597 s …  2.622 s    3 runs
>
>         Benchmark 5: update-ref: create refs sequentially (refformat = files)
>           Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
>           Range (min … max):   198.189 s … 198.670 s    3 runs
>
>         Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
>           Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
>           Range (min … max):   290.223 s … 298.761 s    3 runs
>

Nit: The refcount is missing in these benchmarks.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> new file mode 100644
> index 0000000000..895de0b273
> --- /dev/null
> +++ b/refs/reftable-backend.c
> @@ -0,0 +1,2286 @@
> +#include "../git-compat-util.h"
> +#include "../abspath.h"
> +#include "../chdir-notify.h"
> +#include "../config.h"

I believe this header can be dropped.

Not sure about removing headers which aren't needed. The coding
guidelines mentions that we should include headers that declare the
functions and the types used. I found some headers here which are not
required considering this rule.

> +#include "../environment.h"
> +#include "../gettext.h"
> +#include "../hash.h"
> +#include "../hex.h"
> +#include "../iterator.h"
> +#include "../ident.h"
> +#include "../lockfile.h"
> +#include "../object.h"
> +#include "../path.h"
> +#include "../refs.h"
> +#include "../reftable/reftable-stack.h"
> +#include "../reftable/reftable-record.h"
> +#include "../reftable/reftable-error.h"
> +#include "../reftable/reftable-blocksource.h"

This header can be dropped too.

> +#include "../reftable/reftable-reader.h"

This also.

> +#include "../reftable/reftable-iterator.h"
> +#include "../reftable/reftable-merged.h"
> +#include "../reftable/reftable-generic.h"

This one as well.

> +#include "../setup.h"
> +#include "../strmap.h"
> +#include "../worktree.h"

This one too.

> +struct reftable_ref_store {
> +	struct ref_store base;
> +
> +	struct reftable_stack *main_stack;
> +	struct reftable_stack *worktree_stack;
> +	struct strmap worktree_stacks;
> +	struct reftable_write_options write_options;
> +
> +	unsigned int store_flags;
> +	int err;
> +};

So, I'm assuming that `main_stack` here would be the primary reference
db, even when inside the worktree. I'm wondering why we can't just have

    struct reftable_stack *current_stack;
    struct strmap worktree_stacks;

Reading further on, it becomes necessary to have both, because the user
could
    1. Request main-worktree/{ref}
    2. Request worktrees/{worktree}/{ref}
So it's important that we have access to both the main worktree, the
current worktree and also any other worktree.

Maybe we could just drop the `struct reftable_stack *worktree_stack` and
rely on the map entirely, but I guess `worktree_stack` acts as a caching
layer and avoids using the map unless necessary.

> +static struct ref_store *reftable_be_init(struct repository *repo,
> +					  const char *gitdir,
> +					  unsigned int store_flags)
> +{
> +	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
> +	struct strbuf path = STRBUF_INIT;
> +	int is_worktree;
> +	mode_t mask;
> +
> +	mask = umask(0);
> +	umask(mask);

Took me more than a glance to understand this. For other readers who
didn't know: there is no single API to get the current umask. But
umask(2) sets the umask and returns the prev umask. So we set a random
value (here 0) to get the prev umask and reset it.

> +
> +	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
> +	strmap_init(&refs->worktree_stacks);
> +	refs->store_flags = store_flags;
> +	refs->write_options.block_size = 4096;
>

Perhaps use `DEFAULT_BLOCK_SIZE` from "reftable/constants.h" here.

> +	refs->write_options.hash_id = repo->hash_algo->format_id;
> +	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
> +
> +	/*
> +	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
> +	 * This stack contains both the shared and the main worktree refs.
> +	 */
> +	is_worktree = get_common_dir_noenv(&path, gitdir);
> +	if (!is_worktree) {
> +		strbuf_reset(&path);
> +		strbuf_realpath(&path, gitdir, 0);
> +	}

Nit: would be nice to have a comment that for wortrees, `gitdir` would
already be an absolute path.

> +	strbuf_addstr(&path, "/reftable");
> +	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
> +				       refs->write_options);
> +	if (refs->err)
> +		goto done;
> +
> +	/*
> +	 * If we're in a worktree we also need to set up the worktree reftable
> +	 * stack that is contained in the per-worktree GIT_DIR.
> +	 */
> +	if (is_worktree) {
> +		strbuf_reset(&path);
> +		strbuf_addf(&path, "%s/reftable", gitdir);
> +
> +		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
> +					       refs->write_options);
> +		if (refs->err)
> +			goto done;
> +	}

Wondering if we should also add this to the `refs->worktree_stacks`.

> +struct reftable_ref_iterator {
> +	struct ref_iterator base;
> +	struct reftable_ref_store *refs;
> +	struct reftable_iterator iter;
> +	struct reftable_ref_record ref;
> +	struct object_id oid;
> +
> +	const char *prefix;
> +	unsigned int flags;
> +	int err;
> +};

So the `flags` in this structure are for the iterator itself but the
`flags` in `base` are flags related to the current ref stored in `base`.

> +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
> +{
> +	struct reftable_ref_iterator *iter =
> +		(struct reftable_ref_iterator *)ref_iterator;
> +	struct reftable_ref_store *refs = iter->refs;
> +
> +	while (!iter->err) {
> +		int flags = 0;
> +

Nit: I think the syntax in `reftable_reflog_iterator_advance` is similar but
nicer to read, the usage of `while(1)` and returning within the while
loop is better than breaking here and returning outside, either ways, I
think it'd be nicer to make both of them consistent.

> +		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
> +		if (iter->err)
> +			break;
> +
> +		/*
> +		 * The files backend only lists references contained in
> +		 * "refs/". We emulate the same behaviour here and thus skip
> +		 * all references that don't start with this prefix.
> +		 */
> +		if (!starts_with(iter->ref.refname, "refs/"))
> +			continue;
> +

Since my patch series [1] to print all refs is now merged to next, maybe
you could add this in?

    diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
    index 895de0b273..3f4f905292 100644
    --- a/refs/reftable-backend.c
    +++ b/refs/reftable-backend.c
    @@ -348,11 +348,10 @@ static int
reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
     			break;

     		/*
    -		 * The files backend only lists references contained in
    -		 * "refs/". We emulate the same behaviour here and thus skip
    -		 * all references that don't start with this prefix.
    +		 * Unless the `DO_FOR_EACH_INCLUDE_ALL_REFS` flag is use, we only
    +		 * list references contained in "refs/" to mimic the file-backend.
     		 */
    -		if (!starts_with(iter->ref.refname, "refs/"))
    +		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
!starts_with(iter->ref.refname, "refs/"))
     			continue;

     		if (iter->prefix &&

> +static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
> +					       struct ref_iterator *iter_common,
> +					       void *cb_data UNUSED)
> +{
> +	if (iter_worktree && !iter_common) {
> +		/*
> +		 * Return the worktree ref if there are no more common refs.
> +		 */
> +		return ITER_SELECT_0;
> +	} else if (iter_common) {
> +		/*
> +		 * In case we have pending worktree and common refs we need to
> +		 * yield them based on their lexicographical order. Worktree
> +		 * refs that have the same name as common refs shadow the
> +		 * latter.
> +		 */
> +		if (iter_worktree) {
> +			int cmp = strcmp(iter_worktree->refname,
> +					 iter_common->refname);
> +			if (cmp < 0)
> +				return ITER_SELECT_0;
> +			else if (!cmp)
> +				return ITER_SELECT_0_SKIP_1;
> +		}
> +
> +		 /*
> +		  * Otherwise, if we either have no worktree refs anymore or if
> +		  * the common ref sorts before the next worktree ref, we need
> +		  * to figure out whether the next common ref belongs to the
> +		  * main worktree. In that case, it should be ignored.
> +		  */
> +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
> +				       NULL) == REF_WORKTREE_SHARED)
> +			return ITER_SELECT_1;
> +

I'm not sure I understand this. When would this situation occur?

> +		return ITER_SKIP_1;
> +	} else {
> +		return ITER_DONE;
> +	}
> +}
> +
> +
> +static int reftable_be_transaction_prepare(struct ref_store *ref_store,
> +					   struct ref_transaction *transaction,
> +					   struct strbuf *err)
> +{
> +	struct reftable_ref_store *refs =
> +		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
> +	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
> +	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct reftable_transaction_data *tx_data = NULL;
> +	struct object_id head_oid;
> +	unsigned int head_type = 0;
> +	size_t i;
> +	int ret;
> +
> +	ret = refs->err;
> +	if (ret < 0)
> +		goto done;
> +
> +	tx_data = xcalloc(1, sizeof(*tx_data));
> +
> +	/*
> +	 * Preprocess all updates. For one we check that there are no duplicate
> +	 * reference updates in this transaction. Second, we lock all stacks
> +	 * that will be modified during the transaction.
> +	 */
> +	for (i = 0; i < transaction->nr; i++) {
> +		ret = prepare_transaction_update(NULL, refs, tx_data,
> +						 transaction->updates[i], err);
> +		if (ret)
> +			goto done;
> +
> +		string_list_append(&affected_refnames,
> +				   transaction->updates[i]->refname);
> +	}
> +
> +	/*
> +	 * Now that we have counted updates per stack we can preallocate their
> +	 * arrays. This avoids having to reallocate many times.
> +	 */
> +	for (i = 0; i < tx_data->args_nr; i++) {
> +		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
> +		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
> +	}
> +
> +	/*
> +	 * Fail if a refname appears more than once in the transaction.
> +	 * This code is taken from the files backend and is a good candidate to
> +	 * be moved into the generic layer.
> +	 */
> +	string_list_sort(&affected_refnames);
> +	if (ref_update_reject_duplicates(&affected_refnames, err)) {
> +		ret = TRANSACTION_GENERIC_ERROR;
> +		goto done;
> +	}
> +
> +	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
> +				      &head_referent, &head_type);
> +	if (ret < 0)
> +		goto done;
> +
> +	for (i = 0; i < transaction->nr; i++) {
> +		struct ref_update *u = transaction->updates[i];
> +		struct object_id current_oid = {0};
> +		struct reftable_stack *stack;
> +		const char *rewritten_ref;
> +
> +		stack = stack_for(refs, u->refname, &rewritten_ref);
>

Wondering why we didn't just iterate over `tx_data.args` and
`args[i].updates` since there we already have the required stack.

Update: we don't save rewritten_ref in tx_data, so that's one blocker
for this.

> +
> +		if (u->type & REF_ISSYMREF) {
> +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
> +								       &current_oid, NULL);
> +
> +			if (u->flags & REF_NO_DEREF) {
> +				/*
> +				 * The reftable stack is locked at this point
> +				 * already, so it should be safe to call
> +				 * `refs_resolve_ref_unsafe()` here.
> +				 */

Shouldn't this comment be a few lines before?

> +
> +static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
> +{
> +	struct write_transaction_table_arg *arg = cb_data;
> +	struct reftable_merged_table *mt =
> +		reftable_stack_merged_table(arg->stack);
> +	uint64_t ts = reftable_stack_next_update_index(arg->stack);
> +	struct reftable_log_record *logs = NULL;
> +	size_t logs_nr = 0, logs_alloc = 0, i;
> +	int ret = 0;
> +
> +	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
> +
> +	reftable_writer_set_limits(writer, ts, ts);
> +
> +	for (i = 0; i < arg->updates_nr; i++) {
> +		struct reftable_transaction_update *tx_update = &arg->updates[i];
> +		struct ref_update *u = tx_update->update;
> +
> +		/*
> +		 * Write a reflog entry when updating a ref to point to
> +		 * something new in either of the following cases:
> +		 *
> +		 * - The reference is about to be deleted. We always want to
> +		 *   delete the reflog in that case.
> +		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
> +		 *   the reflog entry.
> +		 * - `core.logAllRefUpdates` tells us to create the reflog for
> +		 *   the given ref.
> +		 */
> +		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
> +			struct reftable_log_record log = {0};
> +			struct reftable_iterator it = {0};
> +
> +			/*
> +			 * When deleting refs we also delete all reflog entries
> +			 * with them. While it is not strictly required to
> +			 * delete reflogs together with their refs, this
> +			 * matches the behaviour of the files backend.
> +			 *
> +			 * Unfortunately, we have no better way than to delete
> +			 * all reflog entries one by one.
> +			 */
> +			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
> +			while (ret == 0) {
> +				struct reftable_log_record *tombstone;
> +
> +				ret = reftable_iterator_next_log(&it, &log);
> +				if (ret < 0)
> +					break;
> +				if (ret > 0 || strcmp(log.refname, u->refname)) {
> +					ret = 0;
> +					break;
> +				}
> +
> +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> +				tombstone = &logs[logs_nr++];
> +				tombstone->refname = xstrdup(u->refname);
> +				tombstone->value_type = REFTABLE_LOG_DELETION,

Why is this a comma? one other such instance in this file.

> +static int write_copy_table(struct reftable_writer *writer, void *cb_data)
> +{
> +	struct write_copy_arg *arg = cb_data;
> +	uint64_t deletion_ts, creation_ts;
> +	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
> +	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
> +	struct reftable_log_record old_log = {0}, *logs = NULL;
> +	struct reftable_iterator it = {0};
> +	struct string_list skip = STRING_LIST_INIT_NODUP;
> +	struct strbuf errbuf = STRBUF_INIT;
> +	size_t logs_nr = 0, logs_alloc = 0, i;
> +	int ret;
> +
> +	if (reftable_stack_read_ref(arg->stack, arg->oldname, &old_ref)) {
> +		ret = error(_("refname %s not found"), arg->oldname);
> +		goto done;
> +	}
> +	if (old_ref.value_type == REFTABLE_REF_SYMREF) {
> +		ret = error(_("refname %s is a symbolic ref, copying it is not supported"),
> +			    arg->oldname);
> +		goto done;
> +	}
> +
> +	/*
> +	 * There's nothing to do in case the old and new name are the same, so
> +	 * we exit early in that case.
> +	 */
> +	if (!strcmp(arg->oldname, arg->newname)) {
> +		ret = 0;
> +		goto done;
> +	}
> +
> +	/*
> +	 * Verify that the new refname is available.
> +	 */
> +	string_list_insert(&skip, arg->oldname);
> +	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
> +					    NULL, &skip, &errbuf);
> +	if (ret < 0) {
> +		error("%s", errbuf.buf);
> +		goto done;
> +	}
> +
> +	/*
> +	 * When deleting the old reference we have to use two update indices:
> +	 * one to delete the old ref and its reflog, and once to create the new
>

s/once/one

> +	 * ref and its reflog. They need to be staged with two separate indices
> +	 * because the new reflog needs to encode both the deletion of the old
> +	 * branch and the creation of the new branch, and we cannot do two
> +	 * changes to a reflog in a single update.
> +	 */
> +	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
> +	if (arg->delete_old)
> +		creation_ts++;
> +	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
> +
> +	/*
> +	 * Add the new reference. If this is a rename then we also delete the
> +	 * old reference.
> +	 */
> +	refs[0] = old_ref;
> +	refs[0].refname = (char *)arg->newname;
> +	refs[0].update_index = creation_ts;
> +	if (arg->delete_old) {
> +		refs[1].refname = (char *)arg->oldname;
> +		refs[1].value_type = REFTABLE_REF_DELETION;
> +		refs[1].update_index = deletion_ts;
> +	}
> +	ret = reftable_writer_add_refs(writer, refs, arg->delete_old ? 2 : 1);
> +	if (ret < 0)
> +		goto done;
> +
> +	/*
> +	 * When deleting the old branch we need to create a reflog entry on the
> +	 * new branch name that indicates that the old branch has been deleted
> +	 * and then recreated. This is a tad weird, but matches what the files
> +	 * backend does.
> +	 */
> +	if (arg->delete_old) {
> +		struct strbuf head_referent = STRBUF_INIT;
> +		struct object_id head_oid;
> +		int append_head_reflog;
> +		unsigned head_type = 0;
> +
> +		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> +		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
> +		fill_reftable_log_record(&logs[logs_nr]);
> +		logs[logs_nr].refname = (char *)arg->newname;
> +		logs[logs_nr].update_index = deletion_ts;
> +		logs[logs_nr].value.update.message =
> +			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);

Question: here and other places in this file, shouldn't we free memory
allocated by `xstrndup`?

Also, why is it `block_size / 2` ?

> +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
> +								  struct reftable_stack *stack)
> +{
> +	struct reftable_reflog_iterator *iter;
> +	struct reftable_merged_table *mt;
> +	int ret;
> +
> +	iter = xcalloc(1, sizeof(*iter));
> +	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
> +	iter->refs = refs;
> +	iter->base.oid = &iter->oid;
> +
> +	ret = reftable_stack_reload(refs->main_stack);
> +	if (ret < 0)
> +		goto done;
> +
> +	mt = reftable_stack_merged_table(stack);
> +	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
> +	if (ret < 0)
> +		goto done;
> +

Keeping it similar to `ref_iterator_for_stack`, perhaps:

    ret = refs->err;
	if (ret)
		goto done;

	ret = reftable_stack_reload(stack);
	if (ret)
		goto done;

	merged_table = reftable_stack_merged_table(stack);

	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
	if (ret)
		goto done;

> +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
> +						   const char *refname,
> +						   each_reflog_ent_fn fn,
> +						   void *cb_data)
> +{
> +	struct reftable_ref_store *refs =
> +		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
> +	struct reftable_stack *stack = stack_for(refs, refname, &refname);
> +	struct reftable_merged_table *mt = NULL;
> +	struct reftable_log_record log = {0};
> +	struct reftable_iterator it = {0};
> +	int ret;
> +
> +	if (refs->err < 0)
> +		return refs->err;

Nit: seems like this function and the one below
`reftable_be_for_each_reflog_ent`, share the same code apart from
iteration direction. Wonder if it'd be nicer to extract out the common
code from them.

> +static int reftable_be_reflog_exists(struct ref_store *ref_store,
> +				     const char *refname)
> +{
> +	struct reftable_ref_store *refs =
> +		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
> +	struct reftable_stack *stack = stack_for(refs, refname, &refname);
> +	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
> +	struct reftable_log_record log = {0};
> +	struct reftable_iterator it = {0};
> +	int ret;
> +
> +	ret = refs->err;
> +	if (ret < 0)
> +		goto done;
> +
> +	ret = reftable_stack_reload(stack);
> +	if (ret)
> +		goto done;
> +
> +	ret = reftable_merged_table_seek_log(mt, &it, refname);
> +	if (ret)
> +		goto done;
> +
> +	/*
> +	 * Seek the reflog to see whether it contains any reflog entries which
> +	 * aren't marked for deletion.
> +	 */

Shouldn't we be checking `log.value_type` in that case?

This email only reviews this file, I'll review the others in follow up
email[s].

[1]: https://git.kernel.org/pub/scm/git/git.git/commit/?h=next&id=e7a9234a8b39bd69de12b105731413ab8e62942e

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] refs: introduce reftable backend
  2024-02-01 15:17   ` Karthik Nayak
@ 2024-02-02  8:30     ` Patrick Steinhardt
  2024-02-02 10:52       ` Patrick Steinhardt
  0 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-02  8:30 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 26308 bytes --]

On Thu, Feb 01, 2024 at 07:17:45AM -0800, Karthik Nayak wrote:
> Hello,
> 
> Patrick Steinhardt <ps@pks.im> writes:
> >   - It becomes possible to do truly atomic writes where either all refs
> >     are committed to disk or none are. This was not possible with the
> >     "files" backend because ref updates were split across multiple loose
> >     files.
> >
> >   - The disk space required to store many refs is reduced, both compared
> >     to loose refs and packed-refs. This is enabled both by the reftable
> >     format being a binary format, which is more compact, and by prefix
> >     compression.
> >
> >   - We can ignore filesystem-specific behaviour as ref names are not
> >     encoded via paths anymore. This means there is no need to handle
> >     case sensitivity on Windows systems or Unicode precomposition on
> >     macOS.
> >
> >   - There is no need to rewrite the complete refdb anymore every time a
> >     ref is being deleted like it was the case for packed-refs. This
> >     means that ref deletions are now constant time instead of scaling
> >     linearly with the number of refs.
> >
> >   - We can ignore file/directory conflicts so that it becomes possible
> >     to store both "refs/heads/foo" and "refs/heads/foo/bar".
> >
> >   - Due to this property we can retain reflogs for deleted refs. We have
> >     previously been deleting reflogs together with their refs to avoid
> >     file/directory conflicts, which is not necessary anymore.
> >
> 
> Nit: Maybe also a point about how with the current files backend doesn't
> have a good way to differentiate between regular files and refs. While
> regular refs are in the 'refs/' folder, pseudorefs have no namespace.
> While with the reftable implementation we know that everything within
> the reftable is a ref and there are no other refs we need to consider.

True, I'll add that.

> > Performance-wise things very much depend on the actual workload. The
> > following benchmarks compare the "files" and "reftable" backends in the
> > current version:
> 
> Pretty nice to have these numbers here.
> 
> >
> >   - Creating N refs in separate transactions shows that the "files"
> >     backend is ~50% faster. This is not surprising given that creating a
> >     ref only requires us to create a single loose ref. The "reftable"
> >     backend will also perform auto compaction on updates. In real-world
> >     workloads we would likely also want to perform pack loose refs,
> >     which would likely change the picture.
> >
> >         Benchmark 1: update-ref: create refs sequentially (refformat = files)
> >           Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
> >           Range (min … max):     1.8 ms …   4.3 ms    133 runs
> >
> >         Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
> >           Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
> >           Range (min … max):     2.4 ms …   2.9 ms    132 runs
> >
> >         Benchmark 3: update-ref: create refs sequentially (refformat = files)
> >           Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
> >           Range (min … max):    1.969 s …  1.980 s    3 runs
> >
> >         Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
> >           Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
> >           Range (min … max):    2.597 s …  2.622 s    3 runs
> >
> >         Benchmark 5: update-ref: create refs sequentially (refformat = files)
> >           Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
> >           Range (min … max):   198.189 s … 198.670 s    3 runs
> >
> >         Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
> >           Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
> >           Range (min … max):   290.223 s … 298.761 s    3 runs
> >
> 
> Nit: The refcount is missing in these benchmarks.

Good catch, this is an edge case of hyperfine. Because the refcount is
used in the command line it's not shown anymore in its output. I'll just
massage these manually.

> > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> > new file mode 100644
> > index 0000000000..895de0b273
> > --- /dev/null
> > +++ b/refs/reftable-backend.c
> > @@ -0,0 +1,2286 @@
> > +#include "../git-compat-util.h"
> > +#include "../abspath.h"
> > +#include "../chdir-notify.h"
> > +#include "../config.h"
> 
> I believe this header can be dropped.
> 
> Not sure about removing headers which aren't needed. The coding
> guidelines mentions that we should include headers that declare the
> functions and the types used. I found some headers here which are not
> required considering this rule.

It's always good hygiene to not include unneeded headers. Also helps to
keep build times lower and reduce unnecessary rebuilds.

> > +#include "../environment.h"
> > +#include "../gettext.h"
> > +#include "../hash.h"
> > +#include "../hex.h"
> > +#include "../iterator.h"
> > +#include "../ident.h"
> > +#include "../lockfile.h"
> > +#include "../object.h"
> > +#include "../path.h"
> > +#include "../refs.h"
> > +#include "../reftable/reftable-stack.h"
> > +#include "../reftable/reftable-record.h"
> > +#include "../reftable/reftable-error.h"
> > +#include "../reftable/reftable-blocksource.h"
> 
> This header can be dropped too.
> 
> > +#include "../reftable/reftable-reader.h"
> 
> This also.
> 
> > +#include "../reftable/reftable-iterator.h"
> > +#include "../reftable/reftable-merged.h"
> > +#include "../reftable/reftable-generic.h"
> 
> This one as well.
> 
> > +#include "../setup.h"
> > +#include "../strmap.h"
> > +#include "../worktree.h"
> 
> This one too.
> 
> > +struct reftable_ref_store {
> > +	struct ref_store base;
> > +
> > +	struct reftable_stack *main_stack;
> > +	struct reftable_stack *worktree_stack;
> > +	struct strmap worktree_stacks;
> > +	struct reftable_write_options write_options;
> > +
> > +	unsigned int store_flags;
> > +	int err;
> > +};
> 
> So, I'm assuming that `main_stack` here would be the primary reference
> db, even when inside the worktree. I'm wondering why we can't just have
> 
>     struct reftable_stack *current_stack;
>     struct strmap worktree_stacks;
> 
> Reading further on, it becomes necessary to have both, because the user
> could
>     1. Request main-worktree/{ref}
>     2. Request worktrees/{worktree}/{ref}
> So it's important that we have access to both the main worktree, the
> current worktree and also any other worktree.

Indeed. I'll add some comments to point out their respective uses.

> Maybe we could just drop the `struct reftable_stack *worktree_stack` and
> rely on the map entirely, but I guess `worktree_stack` acts as a caching
> layer and avoids using the map unless necessary.

I doubt that the map access would be all that expensive in practice. The
problem rather is that we don't know about the worktree name when we
init the backend, and looking it up feels fragile to me and may involve
parsing all worktree metadata. So I rather want to avoid that to the
extent possible.

> > +	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
> > +	strmap_init(&refs->worktree_stacks);
> > +	refs->store_flags = store_flags;
> > +	refs->write_options.block_size = 4096;
> >
> 
> Perhaps use `DEFAULT_BLOCK_SIZE` from "reftable/constants.h" here.

This isn't exposed given that only the "reftable/reftable-*.h" headers
are considered to be public. Ideally, we'd just leave it unset and let
the default block size kick in. But `reftable_new_stack()` doesn't set
up the block size at all in that case, and that leads to failures.

I'll leave this as-is for now. We can clean it up in the future.

> > +	strbuf_addstr(&path, "/reftable");
> > +	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
> > +				       refs->write_options);
> > +	if (refs->err)
> > +		goto done;
> > +
> > +	/*
> > +	 * If we're in a worktree we also need to set up the worktree reftable
> > +	 * stack that is contained in the per-worktree GIT_DIR.
> > +	 */
> > +	if (is_worktree) {
> > +		strbuf_reset(&path);
> > +		strbuf_addf(&path, "%s/reftable", gitdir);
> > +
> > +		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
> > +					       refs->write_options);
> > +		if (refs->err)
> > +			goto done;
> > +	}
> 
> Wondering if we should also add this to the `refs->worktree_stacks`.

Ideally we would, but at this point we have no easy way to figure out
the worktree name to the best of my knowledge.

> > +struct reftable_ref_iterator {
> > +	struct ref_iterator base;
> > +	struct reftable_ref_store *refs;
> > +	struct reftable_iterator iter;
> > +	struct reftable_ref_record ref;
> > +	struct object_id oid;
> > +
> > +	const char *prefix;
> > +	unsigned int flags;
> > +	int err;
> > +};
> 
> So the `flags` in this structure are for the iterator itself but the
> `flags` in `base` are flags related to the current ref stored in `base`.

Yup.

> > +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > +{
> > +	struct reftable_ref_iterator *iter =
> > +		(struct reftable_ref_iterator *)ref_iterator;
> > +	struct reftable_ref_store *refs = iter->refs;
> > +
> > +	while (!iter->err) {
> > +		int flags = 0;
> > +
> 
> Nit: I think the syntax in `reftable_reflog_iterator_advance` is similar but
> nicer to read, the usage of `while(1)` and returning within the while
> loop is better than breaking here and returning outside, either ways, I
> think it'd be nicer to make both of them consistent.

The problem here is that the ref iterator has multiple exit paths that
need to be treated the same, so it is easier to have a common exit path.
I'll thus touch up the reflog iterator to look the same. I think that's
the better option anyway so that there is a single exit path, only.

Also shows that we didn't store the error into `iter->err` in the reflog
case. Not a huge issue because callers aren't supposed to continue
iterating after they've hit an error anyway. But still preferable to
make errors persistent.

> > +		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
> > +		if (iter->err)
> > +			break;
> > +
> > +		/*
> > +		 * The files backend only lists references contained in
> > +		 * "refs/". We emulate the same behaviour here and thus skip
> > +		 * all references that don't start with this prefix.
> > +		 */
> > +		if (!starts_with(iter->ref.refname, "refs/"))
> > +			continue;
> > +
> 
> Since my patch series [1] to print all refs is now merged to next, maybe
> you could add this in?
> 
>     diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>     index 895de0b273..3f4f905292 100644
>     --- a/refs/reftable-backend.c
>     +++ b/refs/reftable-backend.c
>     @@ -348,11 +348,10 @@ static int
> reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
>      			break;
> 
>      		/*
>     -		 * The files backend only lists references contained in
>     -		 * "refs/". We emulate the same behaviour here and thus skip
>     -		 * all references that don't start with this prefix.
>     +		 * Unless the `DO_FOR_EACH_INCLUDE_ALL_REFS` flag is use, we only
>     +		 * list references contained in "refs/" to mimic the file-backend.
>      		 */
>     -		if (!starts_with(iter->ref.refname, "refs/"))
>     +		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
> !starts_with(iter->ref.refname, "refs/"))
>      			continue;
> 
>      		if (iter->prefix &&

I'm still waiting for your patch series to hit `next` before making it a
dependency of this series, but thanks for the patch!

> > +static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
> > +					       struct ref_iterator *iter_common,
> > +					       void *cb_data UNUSED)
> > +{
> > +	if (iter_worktree && !iter_common) {
> > +		/*
> > +		 * Return the worktree ref if there are no more common refs.
> > +		 */
> > +		return ITER_SELECT_0;
> > +	} else if (iter_common) {
> > +		/*
> > +		 * In case we have pending worktree and common refs we need to
> > +		 * yield them based on their lexicographical order. Worktree
> > +		 * refs that have the same name as common refs shadow the
> > +		 * latter.
> > +		 */
> > +		if (iter_worktree) {
> > +			int cmp = strcmp(iter_worktree->refname,
> > +					 iter_common->refname);
> > +			if (cmp < 0)
> > +				return ITER_SELECT_0;
> > +			else if (!cmp)
> > +				return ITER_SELECT_0_SKIP_1;
> > +		}
> > +
> > +		 /*
> > +		  * Otherwise, if we either have no worktree refs anymore or if
> > +		  * the common ref sorts before the next worktree ref, we need
> > +		  * to figure out whether the next common ref belongs to the
> > +		  * main worktree. In that case, it should be ignored.
> > +		  */
> > +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
> > +				       NULL) == REF_WORKTREE_SHARED)
> > +			return ITER_SELECT_1;
> > +
> 
> I'm not sure I understand this. When would this situation occur?

There are two cases here:

  - The worktree iterator is exhausted because there are common refs
    which sort lexicographically after all worktree refs.

  - Both iterators still have refs, and the next lexicographic ref is
    from the common ref store.

In both case we want to return only those refs which are shared, but we
want to skip over all per-worktree refs from the common ref store. This
is because the per-worktree refs from the common ref store belong to the
main worktree, not to ours.

I'll rework the comments a bit.

> > +		return ITER_SKIP_1;
> > +	} else {
> > +		return ITER_DONE;
> > +	}
> > +}
> > +
> > +
> > +static int reftable_be_transaction_prepare(struct ref_store *ref_store,
> > +					   struct ref_transaction *transaction,
> > +					   struct strbuf *err)
> > +{
> > +	struct reftable_ref_store *refs =
> > +		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
> > +	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
> > +	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> > +	struct reftable_transaction_data *tx_data = NULL;
> > +	struct object_id head_oid;
> > +	unsigned int head_type = 0;
> > +	size_t i;
> > +	int ret;
> > +
> > +	ret = refs->err;
> > +	if (ret < 0)
> > +		goto done;
> > +
> > +	tx_data = xcalloc(1, sizeof(*tx_data));
> > +
> > +	/*
> > +	 * Preprocess all updates. For one we check that there are no duplicate
> > +	 * reference updates in this transaction. Second, we lock all stacks
> > +	 * that will be modified during the transaction.
> > +	 */
> > +	for (i = 0; i < transaction->nr; i++) {
> > +		ret = prepare_transaction_update(NULL, refs, tx_data,
> > +						 transaction->updates[i], err);
> > +		if (ret)
> > +			goto done;
> > +
> > +		string_list_append(&affected_refnames,
> > +				   transaction->updates[i]->refname);
> > +	}
> > +
> > +	/*
> > +	 * Now that we have counted updates per stack we can preallocate their
> > +	 * arrays. This avoids having to reallocate many times.
> > +	 */
> > +	for (i = 0; i < tx_data->args_nr; i++) {
> > +		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
> > +		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
> > +	}
> > +
> > +	/*
> > +	 * Fail if a refname appears more than once in the transaction.
> > +	 * This code is taken from the files backend and is a good candidate to
> > +	 * be moved into the generic layer.
> > +	 */
> > +	string_list_sort(&affected_refnames);
> > +	if (ref_update_reject_duplicates(&affected_refnames, err)) {
> > +		ret = TRANSACTION_GENERIC_ERROR;
> > +		goto done;
> > +	}
> > +
> > +	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
> > +				      &head_referent, &head_type);
> > +	if (ret < 0)
> > +		goto done;
> > +
> > +	for (i = 0; i < transaction->nr; i++) {
> > +		struct ref_update *u = transaction->updates[i];
> > +		struct object_id current_oid = {0};
> > +		struct reftable_stack *stack;
> > +		const char *rewritten_ref;
> > +
> > +		stack = stack_for(refs, u->refname, &rewritten_ref);
> >
> 
> Wondering why we didn't just iterate over `tx_data.args` and
> `args[i].updates` since there we already have the required stack.
> 
> Update: we don't save rewritten_ref in tx_data, so that's one blocker
> for this.

Right, that's the reason. I was trying to do that, but also ran into
some other problems caused by the ref splitting we do in the loop.

> > +
> > +		if (u->type & REF_ISSYMREF) {
> > +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
> > +								       &current_oid, NULL);
> > +
> > +			if (u->flags & REF_NO_DEREF) {
> > +				/*
> > +				 * The reftable stack is locked at this point
> > +				 * already, so it should be safe to call
> > +				 * `refs_resolve_ref_unsafe()` here.
> > +				 */
> 
> Shouldn't this comment be a few lines before?

Good catch. I moved the `refs_resolve_ref_unsafe()` call at one point
and forgot to also move the comment.

> > +static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
> > +{
> > +	struct write_transaction_table_arg *arg = cb_data;
> > +	struct reftable_merged_table *mt =
> > +		reftable_stack_merged_table(arg->stack);
> > +	uint64_t ts = reftable_stack_next_update_index(arg->stack);
> > +	struct reftable_log_record *logs = NULL;
> > +	size_t logs_nr = 0, logs_alloc = 0, i;
> > +	int ret = 0;
> > +
> > +	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
> > +
> > +	reftable_writer_set_limits(writer, ts, ts);
> > +
> > +	for (i = 0; i < arg->updates_nr; i++) {
> > +		struct reftable_transaction_update *tx_update = &arg->updates[i];
> > +		struct ref_update *u = tx_update->update;
> > +
> > +		/*
> > +		 * Write a reflog entry when updating a ref to point to
> > +		 * something new in either of the following cases:
> > +		 *
> > +		 * - The reference is about to be deleted. We always want to
> > +		 *   delete the reflog in that case.
> > +		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
> > +		 *   the reflog entry.
> > +		 * - `core.logAllRefUpdates` tells us to create the reflog for
> > +		 *   the given ref.
> > +		 */
> > +		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
> > +			struct reftable_log_record log = {0};
> > +			struct reftable_iterator it = {0};
> > +
> > +			/*
> > +			 * When deleting refs we also delete all reflog entries
> > +			 * with them. While it is not strictly required to
> > +			 * delete reflogs together with their refs, this
> > +			 * matches the behaviour of the files backend.
> > +			 *
> > +			 * Unfortunately, we have no better way than to delete
> > +			 * all reflog entries one by one.
> > +			 */
> > +			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
> > +			while (ret == 0) {
> > +				struct reftable_log_record *tombstone;
> > +
> > +				ret = reftable_iterator_next_log(&it, &log);
> > +				if (ret < 0)
> > +					break;
> > +				if (ret > 0 || strcmp(log.refname, u->refname)) {
> > +					ret = 0;
> > +					break;
> > +				}
> > +
> > +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> > +				tombstone = &logs[logs_nr++];
> > +				tombstone->refname = xstrdup(u->refname);
> > +				tombstone->value_type = REFTABLE_LOG_DELETION,
> 
> Why is this a comma? one other such instance in this file.

Typo :) Funny enough this is fine and doesn't make any difference. The
comma operator will simply discard the value of the first statement
(which is the assigned value) and instead return the value of the second
value. We don't use either though, so no problem.

I've also fixed the second instance.

[snip]
> > +	/*
> > +	 * When deleting the old branch we need to create a reflog entry on the
> > +	 * new branch name that indicates that the old branch has been deleted
> > +	 * and then recreated. This is a tad weird, but matches what the files
> > +	 * backend does.
> > +	 */
> > +	if (arg->delete_old) {
> > +		struct strbuf head_referent = STRBUF_INIT;
> > +		struct object_id head_oid;
> > +		int append_head_reflog;
> > +		unsigned head_type = 0;
> > +
> > +		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> > +		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
> > +		fill_reftable_log_record(&logs[logs_nr]);
> > +		logs[logs_nr].refname = (char *)arg->newname;
> > +		logs[logs_nr].update_index = deletion_ts;
> > +		logs[logs_nr].value.update.message =
> > +			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
> 
> Question: here and other places in this file, shouldn't we free memory
> allocated by `xstrndup`?

We use `reftable_log_record_release()`, which takes care of this.

> Also, why is it `block_size / 2` ?

That's a very good question indeed. The problem we have here is that the
maximum length of a record is limited by the block size. Thus, we cannot
write entries into a block that exceed a certain length and we must
ensure that no record exceeds that size.

Using `block_size / 2` is thus a mechanism to cull the reflog message so
that we can at least write a _partial_ reflog without breaking the
format. The division by two is kind of arbitrary here and ensures that
we have enough headroom to also write the remaining data of the record.

So this could be improved to instead trim at the exact block boundary
minus the overhead of the remaining data. But I dobut that's really
worth it for now given that nobody will ever write a 2kB reflog entry
(famous last words).

> > +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
> > +								  struct reftable_stack *stack)
> > +{
> > +	struct reftable_reflog_iterator *iter;
> > +	struct reftable_merged_table *mt;
> > +	int ret;
> > +
> > +	iter = xcalloc(1, sizeof(*iter));
> > +	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
> > +	iter->refs = refs;
> > +	iter->base.oid = &iter->oid;
> > +
> > +	ret = reftable_stack_reload(refs->main_stack);
> > +	if (ret < 0)
> > +		goto done;
> > +
> > +	mt = reftable_stack_merged_table(stack);
> > +	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
> > +	if (ret < 0)
> > +		goto done;
> > +
> 
> Keeping it similar to `ref_iterator_for_stack`, perhaps:
> 
>     ret = refs->err;
> 	if (ret)
> 		goto done;
> 
> 	ret = reftable_stack_reload(stack);
> 	if (ret)
> 		goto done;
> 
> 	merged_table = reftable_stack_merged_table(stack);
> 
> 	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
> 	if (ret)
> 		goto done;

Will do.

> > +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
> > +						   const char *refname,
> > +						   each_reflog_ent_fn fn,
> > +						   void *cb_data)
> > +{
> > +	struct reftable_ref_store *refs =
> > +		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
> > +	struct reftable_stack *stack = stack_for(refs, refname, &refname);
> > +	struct reftable_merged_table *mt = NULL;
> > +	struct reftable_log_record log = {0};
> > +	struct reftable_iterator it = {0};
> > +	int ret;
> > +
> > +	if (refs->err < 0)
> > +		return refs->err;
> 
> Nit: seems like this function and the one below
> `reftable_be_for_each_reflog_ent`, share the same code apart from
> iteration direction. Wonder if it'd be nicer to extract out the common
> code from them.

Hm. I tried to do this once a while ago, but didn't end up with a nice
solution because I basically tried to unify the iterator loops, which
didn't quite work out in a nice way. But coming back with fresh eyes we
can at least easily share the code that yields the log record to the
callback function to remove some code duplication.

> > +static int reftable_be_reflog_exists(struct ref_store *ref_store,
> > +				     const char *refname)
> > +{
> > +	struct reftable_ref_store *refs =
> > +		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
> > +	struct reftable_stack *stack = stack_for(refs, refname, &refname);
> > +	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
> > +	struct reftable_log_record log = {0};
> > +	struct reftable_iterator it = {0};
> > +	int ret;
> > +
> > +	ret = refs->err;
> > +	if (ret < 0)
> > +		goto done;
> > +
> > +	ret = reftable_stack_reload(stack);
> > +	if (ret)
> > +		goto done;
> > +
> > +	ret = reftable_merged_table_seek_log(mt, &it, refname);
> > +	if (ret)
> > +		goto done;
> > +
> > +	/*
> > +	 * Seek the reflog to see whether it contains any reflog entries which
> > +	 * aren't marked for deletion.
> > +	 */
> 
> Shouldn't we be checking `log.value_type` in that case?

I think this whole loop doesn't make any sense anymore. I initially
misunderstood how reflog deletion works [1]: I assumed that a reflog
entry with the zero object ID as new value is supposed to mark a reflog
as deleted, but that's not the case. Instead, deleting a reflog requires
us to mark each record as deleted individually.

Given that the reftable stack is marked to suppress deletions (see
`reftable_stack_reload_once()`, `new_merged->suppress_deletions = 1`),
we don't have to check for deletions at all. All we have to do is seek
the log once and check whether we get a log record matching the expected
name.

[1]: https://public-inbox.org/git/ZVy0zKcmc8tjmgzs@tanuki/

> This email only reviews this file, I'll review the others in follow up
> email[s].

Thanks a lot for your review! I know the code here is quite dense, so I
really appreciate it.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 0/2] refs: introduce reftable backend
  2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-01-30 22:08 ` [PATCH 0/2] refs: introduce " Junio C Hamano
@ 2024-02-02  8:38 ` Patrick Steinhardt
  2024-02-02  8:38   ` [PATCH v2 1/2] " Patrick Steinhardt
                     ` (3 more replies)
  2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
  2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
  5 siblings, 4 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-02  8:38 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 18602 bytes --]

Hi,

this is the second version of my patch series that introduces the
reftable backend.

This version addresses the feedback by Karthik. I don't think it really
makes sense to spell out every change here -- the range diff should
likely be easier to digest.

Thanks!

Patrick

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2295 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3246 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

Range-diff against v1:
1:  5598cd1307 ! 1:  d6548dcfad refs: introduce reftable backend
    @@ Commit message
             previously been deleting reflogs together with their refs to avoid
             file/directory conflicts, which is not necessary anymore.
     
    +      - We can properly enumerate all refs. With the "files" backend it is
    +        not easily possible to distinguish between refs and non-refs because
    +        they may live side by side in the gitdir.
    +
         Not all of these improvements are realized with the current "reftable"
         backend implementation. At this point, the new backend is supposed to be
         a drop-in replacement for the "files" backend that is used by basically
    @@ Commit message
             workloads we would likely also want to perform pack loose refs,
             which would likely change the picture.
     
    -            Benchmark 1: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
                   Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
                   Range (min … max):     1.8 ms …   4.3 ms    133 runs
     
    -            Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
                   Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
                   Range (min … max):     2.4 ms …   2.9 ms    132 runs
     
    -            Benchmark 3: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
                   Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
                   Range (min … max):    1.969 s …  1.980 s    3 runs
     
    -            Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
                   Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
                   Range (min … max):    2.597 s …  2.622 s    3 runs
     
    -            Benchmark 5: update-ref: create refs sequentially (refformat = files)
    +            Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
                   Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
                   Range (min … max):   198.189 s … 198.670 s    3 runs
     
    -            Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
    +            Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
                   Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
                   Range (min … max):   290.223 s … 298.761 s    3 runs
     
    @@ refs/reftable-backend.c (new)
     +#include "../git-compat-util.h"
     +#include "../abspath.h"
     +#include "../chdir-notify.h"
    -+#include "../config.h"
     +#include "../environment.h"
     +#include "../gettext.h"
     +#include "../hash.h"
    @@ refs/reftable-backend.c (new)
     +#include "../reftable/reftable-stack.h"
     +#include "../reftable/reftable-record.h"
     +#include "../reftable/reftable-error.h"
    -+#include "../reftable/reftable-blocksource.h"
    -+#include "../reftable/reftable-reader.h"
     +#include "../reftable/reftable-iterator.h"
     +#include "../reftable/reftable-merged.h"
    -+#include "../reftable/reftable-generic.h"
     +#include "../setup.h"
     +#include "../strmap.h"
    -+#include "../worktree.h"
     +#include "refs-internal.h"
     +
     +/*
    @@ refs/reftable-backend.c (new)
     +struct reftable_ref_store {
     +	struct ref_store base;
     +
    ++	/*
    ++	 * The main stack refers to the common dir and thus contains common
    ++	 * refs as well as refs of the main repository.
    ++	 */
     +	struct reftable_stack *main_stack;
    ++	/*
    ++	 * The worktree stack refers to the gitdir in case the refdb is opened
    ++	 * via a worktree. It thus contains the per-worktree refs.
    ++	 */
     +	struct reftable_stack *worktree_stack;
    ++	/*
    ++	 * Map of worktree stacks by their respective worktree names. The map
    ++	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
    ++	 */
     +	struct strmap worktree_stacks;
     +	struct reftable_write_options write_options;
     +
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
     +	 * This stack contains both the shared and the main worktree refs.
    ++	 *
    ++	 * Note that we don't try to resolve the path in case we have a
    ++	 * worktree because `get_common_dir_noenv()` already does it for us.
     +	 */
     +	is_worktree = get_common_dir_noenv(&path, gitdir);
     +	if (!is_worktree) {
    @@ refs/reftable-backend.c (new)
     +	/*
     +	 * If we're in a worktree we also need to set up the worktree reftable
     +	 * stack that is contained in the per-worktree GIT_DIR.
    ++	 *
    ++	 * Ideally, we would also add the stack to our worktree stack map. But
    ++	 * we have no way to figure out the worktree name here and thus can't
    ++	 * do it efficiently.
     +	 */
     +	if (is_worktree) {
     +		strbuf_reset(&path);
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		 /*
    -+		  * Otherwise, if we either have no worktree refs anymore or if
    -+		  * the common ref sorts before the next worktree ref, we need
    -+		  * to figure out whether the next common ref belongs to the
    -+		  * main worktree. In that case, it should be ignored.
    ++		  * We now know that the lexicographically-next ref is a common
    ++		  * ref. When the common ref is a shared one we return it.
     +		  */
     +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
     +				       NULL) == REF_WORKTREE_SHARED)
     +			return ITER_SELECT_1;
     +
    ++		/*
    ++		 * Otherwise, if the common ref is a per-worktree ref we skip
    ++		 * it because it would belong to the main worktree, not ours.
    ++		 */
     +		return ITER_SKIP_1;
     +	} else {
     +		return ITER_DONE;
    @@ refs/reftable-backend.c (new)
     +		}
     +
     +		if (u->type & REF_ISSYMREF) {
    ++			/*
    ++			 * The reftable stack is locked at this point already,
    ++			 * so it is safe to call `refs_resolve_ref_unsafe()`
    ++			 * here without causing races.
    ++			 */
     +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
     +								       &current_oid, NULL);
     +
     +			if (u->flags & REF_NO_DEREF) {
    -+				/*
    -+				 * The reftable stack is locked at this point
    -+				 * already, so it should be safe to call
    -+				 * `refs_resolve_ref_unsafe()` here.
    -+				 */
     +				if (u->flags & REF_HAVE_OLD && !resolved) {
     +					strbuf_addf(err, _("cannot lock ref '%s': "
     +						    "error reading reference"), u->refname);
    @@ refs/reftable-backend.c (new)
     +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
     +				tombstone = &logs[logs_nr++];
     +				tombstone->refname = xstrdup(u->refname);
    -+				tombstone->value_type = REFTABLE_LOG_DELETION,
    ++				tombstone->value_type = REFTABLE_LOG_DELETION;
     +				tombstone->update_index = log.update_index;
     +			}
     +
    @@ refs/reftable-backend.c (new)
     +
     +	/*
     +	 * When deleting the old reference we have to use two update indices:
    -+	 * one to delete the old ref and its reflog, and once to create the new
    -+	 * ref and its reflog. They need to be staged with two separate indices
    -+	 * because the new reflog needs to encode both the deletion of the old
    -+	 * branch and the creation of the new branch, and we cannot do two
    -+	 * changes to a reflog in a single update.
    ++	 * once to delete the old ref and its reflog, and once to create the
    ++	 * new ref and its reflog. They need to be staged with two separate
    ++	 * indices because the new reflog needs to encode both the deletion of
    ++	 * the old branch and the creation of the new branch, and we cannot do
    ++	 * two changes to a reflog in a single update.
     +	 */
     +	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
     +	if (arg->delete_old)
    @@ refs/reftable-backend.c (new)
     +	struct reftable_reflog_iterator *iter =
     +		(struct reftable_reflog_iterator *)ref_iterator;
     +
    -+	if (iter->err) {
    -+		ref_iterator_abort(ref_iterator);
    -+		return iter->err;
    -+	}
    -+
    -+	while (1) {
    -+		int flags, ret;
    ++	while (!iter->err) {
    ++		int flags;
     +
    -+		ret = reftable_iterator_next_log(&iter->iter, &iter->log);
    -+		if (ret < 0) {
    -+			ref_iterator_abort(ref_iterator);
    -+			return ITER_ERROR;
    -+		}
    -+		if (ret > 0) {
    -+			if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    -+				return ITER_ERROR;
    -+			return ITER_DONE;
    -+		}
    ++		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
    ++		if (iter->err)
    ++			break;
     +
     +		/*
     +		 * We want the refnames that we have reflogs for, so we skip if
    @@ refs/reftable-backend.c (new)
     +		iter->base.refname = iter->log.refname;
     +		iter->base.oid = &iter->oid;
     +		iter->base.flags = flags;
    -+		return ITER_OK;
    ++
    ++		break;
     +	}
    ++
    ++	if (iter->err > 0) {
    ++		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
    ++			return ITER_ERROR;
    ++		return ITER_DONE;
    ++	}
    ++
    ++	if (iter->err < 0) {
    ++		ref_iterator_abort(ref_iterator);
    ++		return ITER_ERROR;
    ++	}
    ++
    ++	return ITER_OK;
     +}
     +
     +static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/reftable-backend.c (new)
     +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
     +								  struct reftable_stack *stack)
     +{
    ++	struct reftable_merged_table *merged_table;
     +	struct reftable_reflog_iterator *iter;
    -+	struct reftable_merged_table *mt;
     +	int ret;
     +
     +	iter = xcalloc(1, sizeof(*iter));
    @@ refs/reftable-backend.c (new)
     +	iter->refs = refs;
     +	iter->base.oid = &iter->oid;
     +
    ++	ret = refs->err;
    ++	if (ret)
    ++		goto done;
    ++
     +	ret = reftable_stack_reload(refs->main_stack);
     +	if (ret < 0)
     +		goto done;
     +
    -+	mt = reftable_stack_merged_table(stack);
    -+	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
    ++	merged_table = reftable_stack_merged_table(stack);
    ++
    ++	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
     +	if (ret < 0)
     +		goto done;
     +
    @@ refs/reftable-backend.c (new)
     +					iterator_select, NULL);
     +}
     +
    ++static int yield_log_record(struct reftable_log_record *log,
    ++			    each_reflog_ent_fn fn,
    ++			    void *cb_data)
    ++{
    ++	struct object_id old_oid, new_oid;
    ++	const char *full_committer;
    ++
    ++	oidread(&old_oid, log->value.update.old_hash);
    ++	oidread(&new_oid, log->value.update.new_hash);
    ++
    ++	/*
    ++	 * When both the old object ID and the new object ID are null
    ++	 * then this is the reflog existence marker. The caller must
    ++	 * not be aware of it.
    ++	 */
    ++	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    ++		return 0;
    ++
    ++	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    ++				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    ++	return fn(&old_oid, &new_oid, full_committer,
    ++		  log->value.update.time, log->value.update.tz_offset,
    ++		  log->value.update.message, cb_data);
    ++}
    ++
     +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
     +						   const char *refname,
     +						   each_reflog_ent_fn fn,
    @@ refs/reftable-backend.c (new)
     +	mt = reftable_stack_merged_table(stack);
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
     +	while (!ret) {
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer;
    -+
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
     +			break;
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		oidread(&old_oid, log.value.update.old_hash);
    -+		oidread(&new_oid, log.value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log.value.update.name, log.value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log.value.update.time, log.value.update.tz_offset,
    -+			 log.value.update.message, cb_data);
    ++		ret = yield_log_record(&log, fn, cb_data);
     +		if (ret)
     +			break;
     +	}
    @@ refs/reftable-backend.c (new)
     +
     +		ret = reftable_iterator_next_log(&it, &log);
     +		if (ret < 0)
    -+			break;
    ++			goto done;
     +		if (ret > 0 || strcmp(log.refname, refname)) {
     +			reftable_log_record_release(&log);
     +			ret = 0;
    @@ refs/reftable-backend.c (new)
     +	}
     +
     +	for (i = logs_nr; i--;) {
    -+		struct reftable_log_record *log = &logs[i];
    -+		struct object_id old_oid, new_oid;
    -+		const char *full_committer = "";
    -+
    -+		oidread(&old_oid, log->value.update.old_hash);
    -+		oidread(&new_oid, log->value.update.new_hash);
    -+
    -+		/*
    -+		 * When both the old object ID and the new object ID are null
    -+		 * then this is the reflog existence marker. The caller must
    -+		 * not be aware of it.
    -+		 */
    -+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
    -+			continue;
    -+
    -+		full_committer = fmt_ident(log->value.update.name, log->value.update.email,
    -+					   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
    -+		ret = fn(&old_oid, &new_oid, full_committer,
    -+			 log->value.update.time, log->value.update.tz_offset,
    -+			 log->value.update.message, cb_data);
    ++		ret = yield_log_record(&logs[i], fn, cb_data);
     +		if (ret)
    -+			break;
    ++			goto done;
     +	}
     +
    ++done:
     +	reftable_iterator_destroy(&it);
     +	for (i = 0; i < logs_nr; i++)
     +		reftable_log_record_release(&logs[i]);
    @@ refs/reftable-backend.c (new)
     +		goto done;
     +
     +	ret = reftable_stack_reload(stack);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	ret = reftable_merged_table_seek_log(mt, &it, refname);
    -+	if (ret)
    ++	if (ret < 0)
     +		goto done;
     +
     +	/*
    -+	 * Seek the reflog to see whether it contains any reflog entries which
    -+	 * aren't marked for deletion.
    ++	 * Check whether we get at least one log record for the given ref name.
    ++	 * If so, the reflog exists, otherwise it doesn't.
     +	 */
    -+	while (1) {
    -+		ret = reftable_iterator_next_log(&it, &log);
    -+		if (ret < 0)
    -+			goto done;
    -+		if (ret > 0 || strcmp(log.refname, refname)) {
    -+			ret = 0;
    -+			goto done;
    -+		}
    -+
    -+		ret = 1;
    -+		break;
    ++	ret = reftable_iterator_next_log(&it, &log);
    ++	if (ret < 0)
    ++		goto done;
    ++	if (ret > 0) {
    ++		ret = 0;
    ++		goto done;
     +	}
     +
    ++	ret = strcmp(log.refname, refname) == 0;
    ++
     +done:
     +	reftable_iterator_destroy(&it);
     +	reftable_log_record_release(&log);
    @@ refs/reftable-backend.c (new)
     +			break;
     +		}
     +
    -+		tombstone.refname = (char *)arg->refname,
    -+		tombstone.value_type = REFTABLE_LOG_DELETION,
    ++		tombstone.refname = (char *)arg->refname;
    ++		tombstone.value_type = REFTABLE_LOG_DELETION;
     +		tombstone.update_index = log.update_index;
     +
     +		ret = reftable_writer_add_log(writer, &tombstone);
2:  bb694fa132 = 2:  63eafc9f5b ci: add jobs to test with the reftable backend
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/2] refs: introduce reftable backend
  2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
@ 2024-02-02  8:38   ` Patrick Steinhardt
  2024-02-02  8:38   ` [PATCH v2 2/2] ci: add jobs to test with the " Patrick Steinhardt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-02  8:38 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 118068 bytes --]

Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.

This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:

  - It becomes possible to do truly atomic writes where either all refs
    are committed to disk or none are. This was not possible with the
    "files" backend because ref updates were split across multiple loose
    files.

  - The disk space required to store many refs is reduced, both compared
    to loose refs and packed-refs. This is enabled both by the reftable
    format being a binary format, which is more compact, and by prefix
    compression.

  - We can ignore filesystem-specific behaviour as ref names are not
    encoded via paths anymore. This means there is no need to handle
    case sensitivity on Windows systems or Unicode precomposition on
    macOS.

  - There is no need to rewrite the complete refdb anymore every time a
    ref is being deleted like it was the case for packed-refs. This
    means that ref deletions are now constant time instead of scaling
    linearly with the number of refs.

  - We can ignore file/directory conflicts so that it becomes possible
    to store both "refs/heads/foo" and "refs/heads/foo/bar".

  - Due to this property we can retain reflogs for deleted refs. We have
    previously been deleting reflogs together with their refs to avoid
    file/directory conflicts, which is not necessary anymore.

  - We can properly enumerate all refs. With the "files" backend it is
    not easily possible to distinguish between refs and non-refs because
    they may live side by side in the gitdir.

Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.

Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:

  - It makes it significantly easier to land the "reftable" backend as
    tests behave the same. It would be tough to argue for each and every
    single test that doesn't pass with the "reftable" backend.

  - It ensures compatibility between repositories that use the "files"
    backend and repositories that use the "reftable" backend. Like this,
    hosters can migrate their repositories to use the "reftable" backend
    without causing issues for clients that use the "files" backend in
    their clones.

It is expected that these artificial limitations may eventually go away
in the long term.

Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:

  - Creating N refs in separate transactions shows that the "files"
    backend is ~50% faster. This is not surprising given that creating a
    ref only requires us to create a single loose ref. The "reftable"
    backend will also perform auto compaction on updates. In real-world
    workloads we would likely also want to perform pack loose refs,
    which would likely change the picture.

        Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
          Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
          Range (min … max):     1.8 ms …   4.3 ms    133 runs

        Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
          Range (min … max):     2.4 ms …   2.9 ms    132 runs

        Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
          Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
          Range (min … max):    1.969 s …  1.980 s    3 runs

        Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
          Range (min … max):    2.597 s …  2.622 s    3 runs

        Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
          Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
          Range (min … max):   198.189 s … 198.670 s    3 runs

        Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
          Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
          Range (min … max):   290.223 s … 298.761 s    3 runs

  - Creating N refs in a single transaction shows that the "files"
    backend is significantly slower once we start to write many refs.
    The "reftable" backend only needs to update two files, whereas the
    "files" backend needs to write one file per ref.

        Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
          Range (min … max):     1.8 ms …   2.6 ms    151 runs

        Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
          Range (min … max):     2.4 ms …   3.4 ms    148 runs

        Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
          Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
          Range (min … max):   148.5 ms … 167.8 ms    15 runs

        Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
          Range (min … max):    56.3 ms …  72.9 ms    40 runs

        Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
          Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
          Range (min … max):   152.165 s … 153.542 s    3 runs

        Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
          Range (min … max):   51.769 s … 52.012 s    3 runs

  - Deleting a ref in a fully-packed repository shows that the "files"
    backend scales with the number of refs. The "reftable" backend has
    constant-time deletions.

        Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
          Range (min … max):     1.6 ms …   2.1 ms    316 runs

        Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
          Range (min … max):     1.7 ms …   2.1 ms    294 runs

        Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
          Range (min … max):     1.9 ms …   2.5 ms    287 runs

        Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
          Range (min … max):     1.8 ms …   2.1 ms    217 runs

        Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
          Range (min … max):   224.6 ms … 245.2 ms    6 runs

        Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
          Range (min … max):     2.0 ms …   2.1 ms    3 runs

  - Listing all refs shows no significant advantage for either of the
    backends. The "files" backend is a bit faster, but not by a
    significant margin. When repositories are not packed the "reftable"
    backend outperforms the "files" backend because the "reftable"
    backend performs auto-compaction.

        Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1729 runs

        Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.8 ms    1816 runs

        Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
          Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
          Range (min … max):     4.1 ms …   4.6 ms    645 runs

        Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
          Range (min … max):     4.2 ms …   5.9 ms    643 runs

        Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
          Range (min … max):    2.511 s …  2.627 s    10 runs

        Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
          Range (min … max):    2.692 s …  2.752 s    10 runs

        Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.9 ms    1834 runs

        Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.0 ms    1840 runs

        Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
          Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
          Range (min … max):    13.3 ms …  14.5 ms    208 runs

        Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
          Range (min … max):     4.3 ms …   6.2 ms    624 runs

        Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
          Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
          Range (min … max):   11.965 s … 12.370 s    10 runs

        Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
          Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
          Range (min … max):    2.769 s …  2.836 s    10 runs

  - Printing a single ref shows no real difference between the "files"
    and "reftable" backends.

        Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
          Range (min … max):     1.4 ms …   1.8 ms    1779 runs

        Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.5 ms    1753 runs

        Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   1.9 ms    1840 runs

        Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1831 runs

        Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1848 runs

        Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1762 runs

So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.

The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".

There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.

As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.

[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/

Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2295 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 13 files changed, 3224 insertions(+), 6 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

diff --git a/Documentation/ref-storage-format.txt b/Documentation/ref-storage-format.txt
index 1a65cac468..14fff8a9c6 100644
--- a/Documentation/ref-storage-format.txt
+++ b/Documentation/ref-storage-format.txt
@@ -1 +1,3 @@
 * `files` for loose files with packed-refs. This is the default.
+* `reftable` for the reftable format. This format is experimental and its
+  internals are subject to change.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 27be3741e6..47281420fc 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -103,5 +103,6 @@ GIT_COMMON_DIR/worktrees/<id>/config.worktree)
 
 ==== `refStorage`
 
-Specifies the file format for the ref database. The only valid value
-is `files` (loose references with a packed-refs file).
+Specifies the file format for the ref database. The valid values are
+`files` (loose references with a packed-refs file) and `reftable` (see
+Documentation/technical/reftable.txt).
diff --git a/Makefile b/Makefile
index 0f748a52e6..b302516173 100644
--- a/Makefile
+++ b/Makefile
@@ -1127,6 +1127,7 @@ LIB_OBJS += reflog.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
index 888c34a521..989197aace 100755
--- a/contrib/workdir/git-new-workdir
+++ b/contrib/workdir/git-new-workdir
@@ -79,7 +79,7 @@ trap cleanup $siglist
 # create the links to the original repo.  explicitly exclude index, HEAD and
 # logs/HEAD from the list since they are purely related to the current working
 # directory, and should not be shared.
-for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn
+for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn reftable
 do
 	# create a containing directory if needed
 	case $x in
diff --git a/path.c b/path.c
index 0fb527918b..8bb223c92c 100644
--- a/path.c
+++ b/path.c
@@ -871,7 +871,7 @@ const char *enter_repo(const char *path, int strict)
 	return NULL;
 }
 
-static int calc_shared_perm(int mode)
+int calc_shared_perm(int mode)
 {
 	int tweak;
 
diff --git a/path.h b/path.h
index b3233c51fa..e053effef2 100644
--- a/path.h
+++ b/path.h
@@ -181,6 +181,7 @@ const char *git_path_shallow(struct repository *r);
 int ends_with_path_components(const char *path, const char *components);
 int validate_headref(const char *ref);
 
+int calc_shared_perm(int mode);
 int adjust_shared_perm(const char *path);
 
 char *interpolate_path(const char *path, int real_home);
diff --git a/refs.c b/refs.c
index c633abf284..fff343c256 100644
--- a/refs.c
+++ b/refs.c
@@ -35,6 +35,7 @@
  */
 static const struct ref_storage_be *refs_backends[] = {
 	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
+	[REF_STORAGE_FORMAT_REFTABLE] = &refs_be_reftable,
 };
 
 static const struct ref_storage_be *find_ref_storage_backend(unsigned int ref_storage_format)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 82219829b0..83e0f0bba3 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..f592afe9ff
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,2295 @@
+#include "../git-compat-util.h"
+#include "../abspath.h"
+#include "../chdir-notify.h"
+#include "../environment.h"
+#include "../gettext.h"
+#include "../hash.h"
+#include "../hex.h"
+#include "../iterator.h"
+#include "../ident.h"
+#include "../lockfile.h"
+#include "../object.h"
+#include "../path.h"
+#include "../refs.h"
+#include "../reftable/reftable-stack.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-merged.h"
+#include "../setup.h"
+#include "../strmap.h"
+#include "refs-internal.h"
+
+/*
+ * Used as a flag in ref_update::flags when the ref_update was via an
+ * update to HEAD.
+ */
+#define REF_UPDATE_VIA_HEAD (1 << 8)
+
+struct reftable_ref_store {
+	struct ref_store base;
+
+	/*
+	 * The main stack refers to the common dir and thus contains common
+	 * refs as well as refs of the main repository.
+	 */
+	struct reftable_stack *main_stack;
+	/*
+	 * The worktree stack refers to the gitdir in case the refdb is opened
+	 * via a worktree. It thus contains the per-worktree refs.
+	 */
+	struct reftable_stack *worktree_stack;
+	/*
+	 * Map of worktree stacks by their respective worktree names. The map
+	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
+	 */
+	struct strmap worktree_stacks;
+	struct reftable_write_options write_options;
+
+	unsigned int store_flags;
+	int err;
+};
+
+/*
+ * Downcast ref_store to reftable_ref_store. Die if ref_store is not a
+ * reftable_ref_store. required_flags is compared with ref_store's store_flags
+ * to ensure the ref_store has all required capabilities. "caller" is used in
+ * any necessary error messages.
+ */
+static struct reftable_ref_store *reftable_be_downcast(struct ref_store *ref_store,
+						       unsigned int required_flags,
+						       const char *caller)
+{
+	struct reftable_ref_store *refs;
+
+	if (ref_store->be != &refs_be_reftable)
+		BUG("ref_store is type \"%s\" not \"reftables\" in %s",
+		    ref_store->be->name, caller);
+
+	refs = (struct reftable_ref_store *)ref_store;
+
+	if ((refs->store_flags & required_flags) != required_flags)
+		BUG("operation %s requires abilities 0x%x, but only have 0x%x",
+		    caller, required_flags, refs->store_flags);
+
+	return refs;
+}
+
+/*
+ * Some refs are global to the repository (refs/heads/{*}), while others are
+ * local to the worktree (eg. HEAD, refs/bisect/{*}). We solve this by having
+ * multiple separate databases (ie. multiple reftable/ directories), one for
+ * the shared refs, one for the current worktree refs, and one for each
+ * additional worktree. For reading, we merge the view of both the shared and
+ * the current worktree's refs, when necessary.
+ *
+ * This function also optionally assigns the rewritten reference name that is
+ * local to the stack. This translation is required when using worktree refs
+ * like `worktrees/$worktree/refs/heads/foo` as worktree stacks will store
+ * those references in their normalized form.
+ */
+static struct reftable_stack *stack_for(struct reftable_ref_store *store,
+					const char *refname,
+					const char **rewritten_ref)
+{
+	const char *wtname;
+	int wtname_len;
+
+	if (!refname)
+		return store->main_stack;
+
+	switch (parse_worktree_ref(refname, &wtname, &wtname_len, rewritten_ref)) {
+	case REF_WORKTREE_OTHER: {
+		static struct strbuf wtname_buf = STRBUF_INIT;
+		struct strbuf wt_dir = STRBUF_INIT;
+		struct reftable_stack *stack;
+
+		/*
+		 * We're using a static buffer here so that we don't need to
+		 * allocate the worktree name whenever we look up a reference.
+		 * This could be avoided if the strmap interface knew how to
+		 * handle keys with a length.
+		 */
+		strbuf_reset(&wtname_buf);
+		strbuf_add(&wtname_buf, wtname, wtname_len);
+
+		/*
+		 * There is an edge case here: when the worktree references the
+		 * current worktree, then we set up the stack once via
+		 * `worktree_stacks` and once via `worktree_stack`. This is
+		 * wasteful, but in the reading case it shouldn't matter. And
+		 * in the writing case we would notice that the stack is locked
+		 * already and error out when trying to write a reference via
+		 * both stacks.
+		 */
+		stack = strmap_get(&store->worktree_stacks, wtname_buf.buf);
+		if (!stack) {
+			strbuf_addf(&wt_dir, "%s/worktrees/%s/reftable",
+				    store->base.repo->commondir, wtname_buf.buf);
+
+			store->err = reftable_new_stack(&stack, wt_dir.buf,
+							store->write_options);
+			assert(store->err != REFTABLE_API_ERROR);
+			strmap_put(&store->worktree_stacks, wtname_buf.buf, stack);
+		}
+
+		strbuf_release(&wt_dir);
+		return stack;
+	}
+	case REF_WORKTREE_CURRENT:
+		/*
+		 * If there is no worktree stack then we're currently in the
+		 * main worktree. We thus return the main stack in that case.
+		 */
+		if (!store->worktree_stack)
+			return store->main_stack;
+		return store->worktree_stack;
+	case REF_WORKTREE_MAIN:
+	case REF_WORKTREE_SHARED:
+		return store->main_stack;
+	default:
+		BUG("unhandled worktree reference type");
+	}
+}
+
+static int should_write_log(struct ref_store *refs, const char *refname)
+{
+	if (log_all_ref_updates == LOG_REFS_UNSET)
+		log_all_ref_updates = is_bare_repository() ? LOG_REFS_NONE : LOG_REFS_NORMAL;
+
+	switch (log_all_ref_updates) {
+	case LOG_REFS_NONE:
+		return refs_reflog_exists(refs, refname);
+	case LOG_REFS_ALWAYS:
+		return 1;
+	case LOG_REFS_NORMAL:
+		if (should_autocreate_reflog(refname))
+			return 1;
+		return refs_reflog_exists(refs, refname);
+	default:
+		BUG("unhandled core.logAllRefUpdates value %d", log_all_ref_updates);
+	}
+}
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	switch (log->value_type) {
+	case REFTABLE_LOG_UPDATE:
+		/* when we write log records, the hashes are owned by a struct
+		 * oid */
+		log->value.update.old_hash = NULL;
+		log->value.update.new_hash = NULL;
+		break;
+	case REFTABLE_LOG_DELETION:
+		break;
+	}
+	reftable_log_record_release(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {0};
+	int sign = 1;
+
+	if (split_ident_line(&split, info, strlen(info)))
+		BUG("failed splitting committer info");
+
+	reftable_log_record_release(log);
+	log->value_type = REFTABLE_LOG_UPDATE;
+	log->value.update.name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->value.update.email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->value.update.time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->value.update.tz_offset = sign * atoi(split.tz_begin);
+}
+
+static int read_ref_without_reload(struct reftable_stack *stack,
+				   const char *refname,
+				   struct object_id *oid,
+				   struct strbuf *referent,
+				   unsigned int *type)
+{
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret)
+		goto done;
+
+	if (ref.value_type == REFTABLE_REF_SYMREF) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.value.symref);
+		*type |= REF_ISSYMREF;
+	} else if (reftable_ref_record_val1(&ref)) {
+		oidread(oid, reftable_ref_record_val1(&ref));
+	} else {
+		/* We got a tombstone, which should not happen. */
+		BUG("unhandled reference value type %d", ref.value_type);
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+static struct ref_store *reftable_be_init(struct repository *repo,
+					  const char *gitdir,
+					  unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct strbuf path = STRBUF_INIT;
+	int is_worktree;
+	mode_t mask;
+
+	mask = umask(0);
+	umask(mask);
+
+	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
+	strmap_init(&refs->worktree_stacks);
+	refs->store_flags = store_flags;
+	refs->write_options.block_size = 4096;
+	refs->write_options.hash_id = repo->hash_algo->format_id;
+	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
+
+	/*
+	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
+	 * This stack contains both the shared and the main worktree refs.
+	 *
+	 * Note that we don't try to resolve the path in case we have a
+	 * worktree because `get_common_dir_noenv()` already does it for us.
+	 */
+	is_worktree = get_common_dir_noenv(&path, gitdir);
+	if (!is_worktree) {
+		strbuf_reset(&path);
+		strbuf_realpath(&path, gitdir, 0);
+	}
+	strbuf_addstr(&path, "/reftable");
+	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
+				       refs->write_options);
+	if (refs->err)
+		goto done;
+
+	/*
+	 * If we're in a worktree we also need to set up the worktree reftable
+	 * stack that is contained in the per-worktree GIT_DIR.
+	 *
+	 * Ideally, we would also add the stack to our worktree stack map. But
+	 * we have no way to figure out the worktree name here and thus can't
+	 * do it efficiently.
+	 */
+	if (is_worktree) {
+		strbuf_reset(&path);
+		strbuf_addf(&path, "%s/reftable", gitdir);
+
+		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
+					       refs->write_options);
+		if (refs->err)
+			goto done;
+	}
+
+	chdir_notify_reparent("reftables-backend $GIT_DIR", &refs->base.gitdir);
+
+done:
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&path);
+	return &refs->base;
+}
+
+static int reftable_be_init_db(struct ref_store *ref_store,
+			       int flags UNUSED,
+			       struct strbuf *err UNUSED)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "init_db");
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addf(&sb, "%s/reftable", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->base.gitdir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	adjust_shared_perm(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->base.gitdir);
+	write_file(sb.buf, "this repository uses the reftable format");
+	adjust_shared_perm(sb.buf);
+
+	strbuf_release(&sb);
+	return 0;
+}
+
+struct reftable_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+
+	const char *prefix;
+	unsigned int flags;
+	int err;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	struct reftable_ref_store *refs = iter->refs;
+
+	while (!iter->err) {
+		int flags = 0;
+
+		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
+		if (iter->err)
+			break;
+
+		/*
+		 * The files backend only lists references contained in
+		 * "refs/". We emulate the same behaviour here and thus skip
+		 * all references that don't start with this prefix.
+		 */
+		if (!starts_with(iter->ref.refname, "refs/"))
+			continue;
+
+		if (iter->prefix &&
+		    strncmp(iter->prefix, iter->ref.refname, strlen(iter->prefix))) {
+			iter->err = 1;
+			break;
+		}
+
+		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    parse_worktree_ref(iter->ref.refname, NULL, NULL, NULL) !=
+			    REF_WORKTREE_CURRENT)
+			continue;
+
+		switch (iter->ref.value_type) {
+		case REFTABLE_REF_VAL1:
+			oidread(&iter->oid, iter->ref.value.val1);
+			break;
+		case REFTABLE_REF_VAL2:
+			oidread(&iter->oid, iter->ref.value.val2.value);
+			break;
+		case REFTABLE_REF_SYMREF:
+			if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->ref.refname,
+						     RESOLVE_REF_READING, &iter->oid, &flags))
+				oidclr(&iter->oid);
+			break;
+		default:
+			BUG("unhandled reference value type %d", iter->ref.value_type);
+		}
+
+		if (is_null_oid(&iter->oid))
+			flags |= REF_ISBROKEN;
+
+		if (check_refname_format(iter->ref.refname, REFNAME_ALLOW_ONELEVEL)) {
+			if (!refname_is_safe(iter->ref.refname))
+				die(_("refname is dangerous: %s"), iter->ref.refname);
+			oidclr(&iter->oid);
+			flags |= REF_BAD_NAME | REF_ISBROKEN;
+		}
+
+		if (iter->flags & DO_FOR_EACH_OMIT_DANGLING_SYMREFS &&
+		    flags & REF_ISSYMREF &&
+		    flags & REF_ISBROKEN)
+			continue;
+
+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(iter->ref.refname, refs->base.repo,
+					    &iter->oid, flags))
+				continue;
+
+		iter->base.refname = iter->ref.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	if (iter->ref.value_type == REFTABLE_REF_VAL2) {
+		oidread(peeled, iter->ref.value.val2.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	reftable_ref_record_release(&iter->ref);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	.advance = reftable_ref_iterator_advance,
+	.peel = reftable_ref_iterator_peel,
+	.abort = reftable_ref_iterator_abort
+};
+
+static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_store *refs,
+							    struct reftable_stack *stack,
+							    const char *prefix,
+							    int flags)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_ref_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable, 1);
+	iter->prefix = prefix;
+	iter->base.oid = &iter->oid;
+	iter->flags = flags;
+	iter->refs = refs;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
+	if (ret)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
+					       struct ref_iterator *iter_common,
+					       void *cb_data UNUSED)
+{
+	if (iter_worktree && !iter_common) {
+		/*
+		 * Return the worktree ref if there are no more common refs.
+		 */
+		return ITER_SELECT_0;
+	} else if (iter_common) {
+		/*
+		 * In case we have pending worktree and common refs we need to
+		 * yield them based on their lexicographical order. Worktree
+		 * refs that have the same name as common refs shadow the
+		 * latter.
+		 */
+		if (iter_worktree) {
+			int cmp = strcmp(iter_worktree->refname,
+					 iter_common->refname);
+			if (cmp < 0)
+				return ITER_SELECT_0;
+			else if (!cmp)
+				return ITER_SELECT_0_SKIP_1;
+		}
+
+		 /*
+		  * We now know that the lexicographically-next ref is a common
+		  * ref. When the common ref is a shared one we return it.
+		  */
+		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
+				       NULL) == REF_WORKTREE_SHARED)
+			return ITER_SELECT_1;
+
+		/*
+		 * Otherwise, if the common ref is a per-worktree ref we skip
+		 * it because it would belong to the main worktree, not ours.
+		 */
+		return ITER_SKIP_1;
+	} else {
+		return ITER_DONE;
+	}
+}
+
+static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_store,
+						       const char *prefix,
+						       const char **exclude_patterns,
+						       unsigned int flags)
+{
+	struct reftable_ref_iterator *main_iter, *worktree_iter;
+	struct reftable_ref_store *refs;
+	unsigned int required_flags = REF_STORE_READ;
+
+	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
+		required_flags |= REF_STORE_ODB;
+	refs = reftable_be_downcast(ref_store, required_flags, "ref_iterator_begin");
+
+	main_iter = ref_iterator_for_stack(refs, refs->main_stack, prefix, flags);
+
+	/*
+	 * The worktree stack is only set when we're in an actual worktree
+	 * right now. If we aren't, then we return the common reftable
+	 * iterator, only.
+	 */
+	 if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	/*
+	 * Otherwise we merge both the common and the per-worktree refs into a
+	 * single iterator.
+	 */
+	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int reftable_be_read_raw_ref(struct ref_store *ref_store,
+				    const char *refname,
+				    struct object_id *oid,
+				    struct strbuf *referent,
+				    unsigned int *type,
+				    int *failure_errno)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = read_ref_without_reload(stack, refname, oid, referent, type);
+	if (ret < 0)
+		return ret;
+	if (ret > 0) {
+		*failure_errno = ENOENT;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_be_read_symbolic_ref(struct ref_store *ref_store,
+					 const char *refname,
+					 struct strbuf *referent)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_symbolic_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret == 0 && ref.value_type == REFTABLE_REF_SYMREF)
+		strbuf_addstr(referent, ref.value.symref);
+	else
+		ret = -1;
+
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+/*
+ * Return the refname under which update was originally requested.
+ */
+static const char *original_update_refname(struct ref_update *update)
+{
+	while (update->parent_update)
+		update = update->parent_update;
+	return update->refname;
+}
+
+struct reftable_transaction_update {
+	struct ref_update *update;
+	struct object_id current_oid;
+};
+
+struct write_transaction_table_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	struct reftable_addition *addition;
+	struct reftable_transaction_update *updates;
+	size_t updates_nr;
+	size_t updates_alloc;
+	size_t updates_expected;
+};
+
+struct reftable_transaction_data {
+	struct write_transaction_table_arg *args;
+	size_t args_nr, args_alloc;
+};
+
+static void free_transaction_data(struct reftable_transaction_data *tx_data)
+{
+	if (!tx_data)
+		return;
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		reftable_addition_destroy(tx_data->args[i].addition);
+		free(tx_data->args[i].updates);
+	}
+	free(tx_data->args);
+	free(tx_data);
+}
+
+/*
+ * Prepare transaction update for the given reference update. This will cause
+ * us to lock the corresponding reftable stack for concurrent modification.
+ */
+static int prepare_transaction_update(struct write_transaction_table_arg **out,
+				      struct reftable_ref_store *refs,
+				      struct reftable_transaction_data *tx_data,
+				      struct ref_update *update,
+				      struct strbuf *err)
+{
+	struct reftable_stack *stack = stack_for(refs, update->refname, NULL);
+	struct write_transaction_table_arg *arg = NULL;
+	size_t i;
+	int ret;
+
+	/*
+	 * Search for a preexisting stack update. If there is one then we add
+	 * the update to it, otherwise we set up a new stack update.
+	 */
+	for (i = 0; !arg && i < tx_data->args_nr; i++)
+		if (tx_data->args[i].stack == stack)
+			arg = &tx_data->args[i];
+
+	if (!arg) {
+		struct reftable_addition *addition;
+
+		ret = reftable_stack_reload(stack);
+		if (ret)
+			return ret;
+
+		ret = reftable_stack_new_addition(&addition, stack);
+		if (ret) {
+			if (ret == REFTABLE_LOCK_ERROR)
+				strbuf_addstr(err, "cannot lock references");
+			return ret;
+		}
+
+		ALLOC_GROW(tx_data->args, tx_data->args_nr + 1,
+			   tx_data->args_alloc);
+		arg = &tx_data->args[tx_data->args_nr++];
+		arg->refs = refs;
+		arg->stack = stack;
+		arg->addition = addition;
+		arg->updates = NULL;
+		arg->updates_nr = 0;
+		arg->updates_alloc = 0;
+		arg->updates_expected = 0;
+	}
+
+	arg->updates_expected++;
+
+	if (out)
+		*out = arg;
+
+	return 0;
+}
+
+/*
+ * Queue a reference update for the correct stack. We potentially need to
+ * handle multiple stack updates in a single transaction when it spans across
+ * multiple worktrees.
+ */
+static int queue_transaction_update(struct reftable_ref_store *refs,
+				    struct reftable_transaction_data *tx_data,
+				    struct ref_update *update,
+				    struct object_id *current_oid,
+				    struct strbuf *err)
+{
+	struct write_transaction_table_arg *arg = NULL;
+	int ret;
+
+	if (update->backend_data)
+		BUG("reference update queued more than once");
+
+	ret = prepare_transaction_update(&arg, refs, tx_data, update, err);
+	if (ret < 0)
+		return ret;
+
+	ALLOC_GROW(arg->updates, arg->updates_nr + 1,
+		   arg->updates_alloc);
+	arg->updates[arg->updates_nr].update = update;
+	oidcpy(&arg->updates[arg->updates_nr].current_oid, current_oid);
+	update->backend_data = &arg->updates[arg->updates_nr++];
+
+	return 0;
+}
+
+static int reftable_be_transaction_prepare(struct ref_store *ref_store,
+					   struct ref_transaction *transaction,
+					   struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
+	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
+	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct reftable_transaction_data *tx_data = NULL;
+	struct object_id head_oid;
+	unsigned int head_type = 0;
+	size_t i;
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	tx_data = xcalloc(1, sizeof(*tx_data));
+
+	/*
+	 * Preprocess all updates. For one we check that there are no duplicate
+	 * reference updates in this transaction. Second, we lock all stacks
+	 * that will be modified during the transaction.
+	 */
+	for (i = 0; i < transaction->nr; i++) {
+		ret = prepare_transaction_update(NULL, refs, tx_data,
+						 transaction->updates[i], err);
+		if (ret)
+			goto done;
+
+		string_list_append(&affected_refnames,
+				   transaction->updates[i]->refname);
+	}
+
+	/*
+	 * Now that we have counted updates per stack we can preallocate their
+	 * arrays. This avoids having to reallocate many times.
+	 */
+	for (i = 0; i < tx_data->args_nr; i++) {
+		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
+		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
+	}
+
+	/*
+	 * Fail if a refname appears more than once in the transaction.
+	 * This code is taken from the files backend and is a good candidate to
+	 * be moved into the generic layer.
+	 */
+	string_list_sort(&affected_refnames);
+	if (ref_update_reject_duplicates(&affected_refnames, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto done;
+	}
+
+	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
+				      &head_referent, &head_type);
+	if (ret < 0)
+		goto done;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		struct object_id current_oid = {0};
+		struct reftable_stack *stack;
+		const char *rewritten_ref;
+
+		stack = stack_for(refs, u->refname, &rewritten_ref);
+
+		/* Verify that the new object ID is valid. */
+		if ((u->flags & REF_HAVE_NEW) && !is_null_oid(&u->new_oid) &&
+		    !(u->flags & REF_SKIP_OID_VERIFICATION) &&
+		    !(u->flags & REF_LOG_ONLY)) {
+			struct object *o = parse_object(refs->base.repo, &u->new_oid);
+			if (!o) {
+				strbuf_addf(err,
+					    _("trying to write ref '%s' with nonexistent object %s"),
+					    u->refname, oid_to_hex(&u->new_oid));
+				ret = -1;
+				goto done;
+			}
+
+			if (o->type != OBJ_COMMIT && is_branch(u->refname)) {
+				strbuf_addf(err, _("trying to write non-commit object %s to branch '%s'"),
+					    oid_to_hex(&u->new_oid), u->refname);
+				ret = -1;
+				goto done;
+			}
+		}
+
+		/*
+		 * When we update the reference that HEAD points to we enqueue
+		 * a second log-only update for HEAD so that its reflog is
+		 * updated accordingly.
+		 */
+		if (head_type == REF_ISSYMREF &&
+		    !(u->flags & REF_LOG_ONLY) &&
+		    !(u->flags & REF_UPDATE_VIA_HEAD) &&
+		    !strcmp(rewritten_ref, head_referent.buf)) {
+			struct ref_update *new_update;
+
+			/*
+			 * First make sure that HEAD is not already in the
+			 * transaction. This check is O(lg N) in the transaction
+			 * size, but it happens at most once per transaction.
+			 */
+			if (string_list_has_string(&affected_refnames, "HEAD")) {
+				/* An entry already existed */
+				strbuf_addf(err,
+					    _("multiple updates for 'HEAD' (including one "
+					    "via its referent '%s') are not allowed"),
+					    u->refname);
+				ret = TRANSACTION_NAME_CONFLICT;
+				goto done;
+			}
+
+			new_update = ref_transaction_add_update(
+					transaction, "HEAD",
+					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
+					&u->new_oid, &u->old_oid, u->msg);
+			string_list_insert(&affected_refnames, new_update->refname);
+		}
+
+		ret = read_ref_without_reload(stack, rewritten_ref,
+					      &current_oid, &referent, &u->type);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 && (!(u->flags & REF_HAVE_OLD) || is_null_oid(&u->old_oid))) {
+			/*
+			 * The reference does not exist, and we either have no
+			 * old object ID or expect the reference to not exist.
+			 * We can thus skip below safety checks as well as the
+			 * symref splitting. But we do want to verify that
+			 * there is no conflicting reference here so that we
+			 * can output a proper error message instead of failing
+			 * at a later point.
+			 */
+			ret = refs_verify_refname_available(ref_store, u->refname,
+							    &affected_refnames, NULL, err);
+			if (ret < 0)
+				goto done;
+
+			/*
+			 * There is no need to write the reference deletion
+			 * when the reference in question doesn't exist.
+			 */
+			 if (u->flags & REF_HAVE_NEW && !is_null_oid(&u->new_oid)) {
+				 ret = queue_transaction_update(refs, tx_data, u,
+								&current_oid, err);
+				 if (ret)
+					 goto done;
+			 }
+
+			continue;
+		}
+		if (ret > 0) {
+			/* The reference does not exist, but we expected it to. */
+			strbuf_addf(err, _("cannot lock ref '%s': "
+				    "unable to resolve reference '%s'"),
+				    original_update_refname(u), u->refname);
+			ret = -1;
+			goto done;
+		}
+
+		if (u->type & REF_ISSYMREF) {
+			/*
+			 * The reftable stack is locked at this point already,
+			 * so it is safe to call `refs_resolve_ref_unsafe()`
+			 * here without causing races.
+			 */
+			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
+								       &current_oid, NULL);
+
+			if (u->flags & REF_NO_DEREF) {
+				if (u->flags & REF_HAVE_OLD && !resolved) {
+					strbuf_addf(err, _("cannot lock ref '%s': "
+						    "error reading reference"), u->refname);
+					ret = -1;
+					goto done;
+				}
+			} else {
+				struct ref_update *new_update;
+				int new_flags;
+
+				new_flags = u->flags;
+				if (!strcmp(rewritten_ref, "HEAD"))
+					new_flags |= REF_UPDATE_VIA_HEAD;
+
+				/*
+				 * If we are updating a symref (eg. HEAD), we should also
+				 * update the branch that the symref points to.
+				 *
+				 * This is generic functionality, and would be better
+				 * done in refs.c, but the current implementation is
+				 * intertwined with the locking in files-backend.c.
+				 */
+				new_update = ref_transaction_add_update(
+						transaction, referent.buf, new_flags,
+						&u->new_oid, &u->old_oid, u->msg);
+				new_update->parent_update = u;
+
+				/*
+				 * Change the symbolic ref update to log only. Also, it
+				 * doesn't need to check its old OID value, as that will be
+				 * done when new_update is processed.
+				 */
+				u->flags |= REF_LOG_ONLY | REF_NO_DEREF;
+				u->flags &= ~REF_HAVE_OLD;
+
+				if (string_list_has_string(&affected_refnames, new_update->refname)) {
+					strbuf_addf(err,
+						    _("multiple updates for '%s' (including one "
+						    "via symref '%s') are not allowed"),
+						    referent.buf, u->refname);
+					ret = TRANSACTION_NAME_CONFLICT;
+					goto done;
+				}
+				string_list_insert(&affected_refnames, new_update->refname);
+			}
+		}
+
+		/*
+		 * Verify that the old object matches our expectations. Note
+		 * that the error messages here do not make a lot of sense in
+		 * the context of the reftable backend as we never lock
+		 * individual refs. But the error messages match what the files
+		 * backend returns, which keeps our tests happy.
+		 */
+		if (u->flags & REF_HAVE_OLD && !oideq(&current_oid, &u->old_oid)) {
+			if (is_null_oid(&u->old_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference already exists"),
+					    original_update_refname(u));
+			else if (is_null_oid(&current_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference is missing but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&u->old_oid));
+			else
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "is at %s but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&current_oid),
+					    oid_to_hex(&u->old_oid));
+			ret = -1;
+			goto done;
+		}
+
+		/*
+		 * If all of the following conditions are true:
+		 *
+		 *   - We're not about to write a symref.
+		 *   - We're not about to write a log-only entry.
+		 *   - Old and new object ID are different.
+		 *
+		 * Then we're essentially doing a no-op update that can be
+		 * skipped. This is not only for the sake of efficiency, but
+		 * also skips writing unneeded reflog entries.
+		 */
+		if ((u->type & REF_ISSYMREF) ||
+		    (u->flags & REF_LOG_ONLY) ||
+		    (u->flags & REF_HAVE_NEW && !oideq(&current_oid, &u->new_oid))) {
+			ret = queue_transaction_update(refs, tx_data, u,
+						       &current_oid, err);
+			if (ret)
+				goto done;
+		}
+	}
+
+	transaction->backend_data = tx_data;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret < 0) {
+		free_transaction_data(tx_data);
+		transaction->state = REF_TRANSACTION_CLOSED;
+		if (!err->len)
+			strbuf_addf(err, _("reftable: transaction prepare: %s"),
+				    reftable_error_str(ret));
+	}
+	string_list_clear(&affected_refnames, 0);
+	strbuf_release(&referent);
+	strbuf_release(&head_referent);
+
+	return ret;
+}
+
+static int reftable_be_transaction_abort(struct ref_store *ref_store,
+					 struct ref_transaction *transaction,
+					 struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	return 0;
+}
+
+static int transaction_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
+		      ((struct reftable_transaction_update *)b)->update->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_transaction_table_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record *logs = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret = 0;
+
+	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < arg->updates_nr; i++) {
+		struct reftable_transaction_update *tx_update = &arg->updates[i];
+		struct ref_update *u = tx_update->update;
+
+		/*
+		 * Write a reflog entry when updating a ref to point to
+		 * something new in either of the following cases:
+		 *
+		 * - The reference is about to be deleted. We always want to
+		 *   delete the reflog in that case.
+		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
+		 *   the reflog entry.
+		 * - `core.logAllRefUpdates` tells us to create the reflog for
+		 *   the given ref.
+		 */
+		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
+			struct reftable_log_record log = {0};
+			struct reftable_iterator it = {0};
+
+			/*
+			 * When deleting refs we also delete all reflog entries
+			 * with them. While it is not strictly required to
+			 * delete reflogs together with their refs, this
+			 * matches the behaviour of the files backend.
+			 *
+			 * Unfortunately, we have no better way than to delete
+			 * all reflog entries one by one.
+			 */
+			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
+			while (ret == 0) {
+				struct reftable_log_record *tombstone;
+
+				ret = reftable_iterator_next_log(&it, &log);
+				if (ret < 0)
+					break;
+				if (ret > 0 || strcmp(log.refname, u->refname)) {
+					ret = 0;
+					break;
+				}
+
+				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+				tombstone = &logs[logs_nr++];
+				tombstone->refname = xstrdup(u->refname);
+				tombstone->value_type = REFTABLE_LOG_DELETION;
+				tombstone->update_index = log.update_index;
+			}
+
+			reftable_log_record_release(&log);
+			reftable_iterator_destroy(&it);
+
+			if (ret)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW &&
+			   (u->flags & REF_FORCE_CREATE_REFLOG ||
+			    should_write_log(&arg->refs->base, u->refname))) {
+			struct reftable_log_record *log;
+
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			log = &logs[logs_nr++];
+			memset(log, 0, sizeof(*log));
+
+			fill_reftable_log_record(log);
+			log->update_index = ts;
+			log->refname = xstrdup(u->refname);
+			log->value.update.new_hash = u->new_oid.hash;
+			log->value.update.old_hash = tx_update->current_oid.hash;
+			log->value.update.message =
+				xstrndup(u->msg, arg->refs->write_options.block_size / 2);
+		}
+
+		if (u->flags & REF_LOG_ONLY)
+			continue;
+
+		if (u->flags & REF_HAVE_NEW && is_null_oid(&u->new_oid)) {
+			struct reftable_ref_record ref = {
+				.refname = (char *)u->refname,
+				.update_index = ts,
+				.value_type = REFTABLE_REF_DELETION,
+			};
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = {0};
+			struct object_id peeled;
+			int peel_error;
+
+			ref.refname = (char *)u->refname;
+			ref.update_index = ts;
+
+			peel_error = peel_object(&u->new_oid, &peeled);
+			if (!peel_error) {
+				ref.value_type = REFTABLE_REF_VAL2;
+				memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+				memcpy(ref.value.val2.value, u->new_oid.hash, GIT_MAX_RAWSZ);
+			} else if (!is_null_oid(&u->new_oid)) {
+				ref.value_type = REFTABLE_REF_VAL1;
+				memcpy(ref.value.val1, u->new_oid.hash, GIT_MAX_RAWSZ);
+			}
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		}
+	}
+
+	/*
+	 * Logs are written at the end so that we do not have intermixed ref
+	 * and log blocks.
+	 */
+	if (logs) {
+		ret = reftable_writer_add_logs(writer, logs, logs_nr);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	for (i = 0; i < logs_nr; i++)
+		clear_reftable_log_record(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_transaction_finish(struct ref_store *ref_store,
+					  struct ref_transaction *transaction,
+					  struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	int ret = 0;
+
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		ret = reftable_addition_add(tx_data->args[i].addition,
+					    write_transaction_table, &tx_data->args[i]);
+		if (ret < 0)
+			goto done;
+
+		ret = reftable_addition_commit(tx_data->args[i].addition);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+
+	if (ret) {
+		strbuf_addf(err, _("reftable: transaction failure: %s"),
+			    reftable_error_str(ret));
+		return -1;
+	}
+	return ret;
+}
+
+static int reftable_be_initial_transaction_commit(struct ref_store *ref_store UNUSED,
+						  struct ref_transaction *transaction,
+						  struct strbuf *err)
+{
+	return ref_transaction_commit(transaction, err);
+}
+
+static int reftable_be_pack_refs(struct ref_store *ref_store,
+				 struct pack_refs_opts *opts)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB, "pack_refs");
+	struct reftable_stack *stack;
+	int ret;
+
+	if (refs->err)
+		return refs->err;
+
+	stack = refs->worktree_stack;
+	if (!stack)
+		stack = refs->main_stack;
+
+	ret = reftable_stack_compact_all(stack, NULL);
+	if (ret)
+		goto out;
+	ret = reftable_stack_clean(stack);
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_create_symref_arg *create = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(create->stack);
+	struct reftable_ref_record ref = {
+		.refname = (char *)create->refname,
+		.value_type = REFTABLE_REF_SYMREF,
+		.value.symref = (char *)create->target,
+		.update_index = ts,
+	};
+	struct reftable_log_record log = {0};
+	struct object_id new_oid;
+	struct object_id old_oid;
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	ret = reftable_writer_add_ref(writer, &ref);
+	if (ret)
+		return ret;
+
+	/*
+	 * Note that it is important to try and resolve the reference before we
+	 * write the log entry. This is because `should_write_log()` will munge
+	 * `core.logAllRefUpdates`, which is undesirable when we create a new
+	 * repository because it would be written into the config. As HEAD will
+	 * not resolve for new repositories this ordering will ensure that this
+	 * never happens.
+	 */
+	if (!create->logmsg ||
+	    !refs_resolve_ref_unsafe(&create->refs->base, create->target,
+				     RESOLVE_REF_READING, &new_oid, NULL) ||
+	    !should_write_log(&create->refs->base, create->refname))
+		return 0;
+
+	fill_reftable_log_record(&log);
+	log.refname = xstrdup(create->refname);
+	log.update_index = ts;
+	log.value.update.message = xstrndup(create->logmsg,
+					    create->refs->write_options.block_size / 2);
+	log.value.update.new_hash = new_oid.hash;
+	if (refs_resolve_ref_unsafe(&create->refs->base, create->refname,
+				    RESOLVE_REF_READING, &old_oid, NULL))
+		log.value.update.old_hash = old_oid.hash;
+
+	ret = reftable_writer_add_log(writer, &log);
+	clear_reftable_log_record(&log);
+	return ret;
+}
+
+static int reftable_be_create_symref(struct ref_store *ref_store,
+				     const char *refname,
+				     const char *target,
+				     const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_symref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_create_symref_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+		.target = target,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_create_symref_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret)
+		error("unable to write symref for %s: %s", refname,
+		      reftable_error_str(ret));
+	return ret;
+}
+
+struct write_copy_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+	int delete_old;
+};
+
+static int write_copy_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_copy_arg *arg = cb_data;
+	uint64_t deletion_ts, creation_ts;
+	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
+	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
+	struct reftable_log_record old_log = {0}, *logs = NULL;
+	struct reftable_iterator it = {0};
+	struct string_list skip = STRING_LIST_INIT_NODUP;
+	struct strbuf errbuf = STRBUF_INIT;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (reftable_stack_read_ref(arg->stack, arg->oldname, &old_ref)) {
+		ret = error(_("refname %s not found"), arg->oldname);
+		goto done;
+	}
+	if (old_ref.value_type == REFTABLE_REF_SYMREF) {
+		ret = error(_("refname %s is a symbolic ref, copying it is not supported"),
+			    arg->oldname);
+		goto done;
+	}
+
+	/*
+	 * There's nothing to do in case the old and new name are the same, so
+	 * we exit early in that case.
+	 */
+	if (!strcmp(arg->oldname, arg->newname)) {
+		ret = 0;
+		goto done;
+	}
+
+	/*
+	 * Verify that the new refname is available.
+	 */
+	string_list_insert(&skip, arg->oldname);
+	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
+					    NULL, &skip, &errbuf);
+	if (ret < 0) {
+		error("%s", errbuf.buf);
+		goto done;
+	}
+
+	/*
+	 * When deleting the old reference we have to use two update indices:
+	 * once to delete the old ref and its reflog, and once to create the
+	 * new ref and its reflog. They need to be staged with two separate
+	 * indices because the new reflog needs to encode both the deletion of
+	 * the old branch and the creation of the new branch, and we cannot do
+	 * two changes to a reflog in a single update.
+	 */
+	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
+	if (arg->delete_old)
+		creation_ts++;
+	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+
+	/*
+	 * Add the new reference. If this is a rename then we also delete the
+	 * old reference.
+	 */
+	refs[0] = old_ref;
+	refs[0].refname = (char *)arg->newname;
+	refs[0].update_index = creation_ts;
+	if (arg->delete_old) {
+		refs[1].refname = (char *)arg->oldname;
+		refs[1].value_type = REFTABLE_REF_DELETION;
+		refs[1].update_index = deletion_ts;
+	}
+	ret = reftable_writer_add_refs(writer, refs, arg->delete_old ? 2 : 1);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * When deleting the old branch we need to create a reflog entry on the
+	 * new branch name that indicates that the old branch has been deleted
+	 * and then recreated. This is a tad weird, but matches what the files
+	 * backend does.
+	 */
+	if (arg->delete_old) {
+		struct strbuf head_referent = STRBUF_INIT;
+		struct object_id head_oid;
+		int append_head_reflog;
+		unsigned head_type = 0;
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+		fill_reftable_log_record(&logs[logs_nr]);
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs[logs_nr].update_index = deletion_ts;
+		logs[logs_nr].value.update.message =
+			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+		logs[logs_nr].value.update.old_hash = old_ref.value.val1;
+		logs_nr++;
+
+		ret = read_ref_without_reload(arg->stack, "HEAD", &head_oid, &head_referent, &head_type);
+		if (ret < 0)
+			goto done;
+		append_head_reflog = (head_type & REF_ISSYMREF) && !strcmp(head_referent.buf, arg->oldname);
+		strbuf_release(&head_referent);
+
+		/*
+		 * The files backend uses `refs_delete_ref()` to delete the old
+		 * branch name, which will append a reflog entry for HEAD in
+		 * case it points to the old branch.
+		 */
+		if (append_head_reflog) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			logs[logs_nr] = logs[logs_nr - 1];
+			logs[logs_nr].refname = "HEAD";
+			logs_nr++;
+		}
+	}
+
+	/*
+	 * Create the reflog entry for the newly created branch.
+	 */
+	ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+	memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+	fill_reftable_log_record(&logs[logs_nr]);
+	logs[logs_nr].refname = (char *)arg->newname;
+	logs[logs_nr].update_index = creation_ts;
+	logs[logs_nr].value.update.message =
+		xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+	logs[logs_nr].value.update.new_hash = old_ref.value.val1;
+	logs_nr++;
+
+	/*
+	 * In addition to writing the reflog entry for the new branch, we also
+	 * copy over all log entries from the old reflog. Last but not least,
+	 * when renaming we also have to delete all the old reflog entries.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
+	if (ret < 0)
+		return ret;
+
+	while (1) {
+		ret = reftable_iterator_next_log(&it, &old_log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(old_log.refname, arg->oldname)) {
+			ret = 0;
+			break;
+		}
+
+		free(old_log.refname);
+
+		/*
+		 * Copy over the old reflog entry with the new refname.
+		 */
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr] = old_log;
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs_nr++;
+
+		/*
+		 * Delete the old reflog entry in case we are renaming.
+		 */
+		if (arg->delete_old) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+			logs[logs_nr].refname = (char *)arg->oldname;
+			logs[logs_nr].value_type = REFTABLE_LOG_DELETION;
+			logs[logs_nr].update_index = old_log.update_index;
+			logs_nr++;
+		}
+
+		/*
+		 * Transfer ownership of the log record we're iterating over to
+		 * the array of log records. Otherwise, the pointers would get
+		 * free'd or reallocated by the iterator.
+		 */
+		memset(&old_log, 0, sizeof(old_log));
+	}
+
+	ret = reftable_writer_add_logs(writer, logs, logs_nr);
+	if (ret < 0)
+		goto done;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_iterator_destroy(&it);
+	string_list_clear(&skip, 0);
+	strbuf_release(&errbuf);
+	for (i = 0; i < logs_nr; i++) {
+		if (!strcmp(logs[i].refname, "HEAD"))
+			continue;
+		if (logs[i].value.update.old_hash == old_ref.value.val1)
+			logs[i].value.update.old_hash = NULL;
+		if (logs[i].value.update.new_hash == old_ref.value.val1)
+			logs[i].value.update.new_hash = NULL;
+		logs[i].refname = NULL;
+		reftable_log_record_release(&logs[i]);
+	}
+	free(logs);
+	reftable_ref_record_release(&old_ref);
+	reftable_log_record_release(&old_log);
+	return ret;
+}
+
+static int reftable_be_rename_ref(struct ref_store *ref_store,
+				  const char *oldrefname,
+				  const char *newrefname,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "rename_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+		.delete_old = 1,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+static int reftable_be_copy_ref(struct ref_store *ref_store,
+				const char *oldrefname,
+				const char *newrefname,
+				const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "copy_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reftable_reflog_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+	int err;
+};
+
+static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+
+	while (!iter->err) {
+		int flags;
+
+		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
+		if (iter->err)
+			break;
+
+		/*
+		 * We want the refnames that we have reflogs for, so we skip if
+		 * we've already produced this name. This could be faster by
+		 * seeking directly to reflog@update_index==0.
+		 */
+		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
+			continue;
+
+		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
+					     0, &iter->oid, &flags)) {
+			error(_("bad ref for %s"), iter->log.refname);
+			continue;
+		}
+
+		free(iter->last_name);
+		iter->last_name = xstrdup(iter->log.refname);
+		iter->base.refname = iter->log.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
+						 struct object_id *peeled)
+{
+	BUG("reftable reflog iterator cannot be peeled");
+	return -1;
+}
+
+static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+	reftable_log_record_release(&iter->log);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter->last_name);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
+	.advance = reftable_reflog_iterator_advance,
+	.peel = reftable_reflog_iterator_peel,
+	.abort = reftable_reflog_iterator_abort
+};
+
+static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
+								  struct reftable_stack *stack)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_reflog_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
+	iter->refs = refs;
+	iter->base.oid = &iter->oid;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(refs->main_stack);
+	if (ret < 0)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
+	if (ret < 0)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_iterator_begin");
+	struct reftable_reflog_iterator *main_iter, *worktree_iter;
+
+	main_iter = reflog_iterator_for_stack(refs, refs->main_stack);
+	if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
+
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int yield_log_record(struct reftable_log_record *log,
+			    each_reflog_ent_fn fn,
+			    void *cb_data)
+{
+	struct object_id old_oid, new_oid;
+	const char *full_committer;
+
+	oidread(&old_oid, log->value.update.old_hash);
+	oidread(&new_oid, log->value.update.new_hash);
+
+	/*
+	 * When both the old object ID and the new object ID are null
+	 * then this is the reflog existence marker. The caller must
+	 * not be aware of it.
+	 */
+	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
+		return 0;
+
+	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
+				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
+	return fn(&old_oid, &new_oid, full_committer,
+		  log->value.update.time, log->value.update.tz_offset,
+		  log->value.update.message, cb_data);
+}
+
+static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+						   const char *refname,
+						   each_reflog_ent_fn fn,
+						   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			ret = 0;
+			break;
+		}
+
+		ret = yield_log_record(&log, fn, cb_data);
+		if (ret)
+			break;
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
+					   const char *refname,
+					   each_reflog_ent_fn fn,
+					   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	struct reftable_iterator it = {0};
+	size_t logs_alloc = 0, logs_nr = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		struct reftable_log_record log = {0};
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			ret = 0;
+			break;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	for (i = logs_nr; i--;) {
+		ret = yield_log_record(&logs[i], fn, cb_data);
+		if (ret)
+			goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_reflog_exists(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Check whether we get at least one log record for the given ref name.
+	 * If so, the reflog exists, otherwise it doesn't.
+	 */
+	ret = reftable_iterator_next_log(&it, &log);
+	if (ret < 0)
+		goto done;
+	if (ret > 0) {
+		ret = 0;
+		goto done;
+	}
+
+	ret = strcmp(log.refname, refname) == 0;
+
+done:
+	reftable_iterator_destroy(&it);
+	reftable_log_record_release(&log);
+	if (ret < 0)
+		ret = 0;
+	return ret;
+}
+
+struct write_reflog_existence_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	struct reftable_stack *stack;
+};
+
+static int write_reflog_existence_table(struct reftable_writer *writer,
+					void *cb_data)
+{
+	struct write_reflog_existence_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record log = {0};
+	int ret;
+
+	ret = reftable_stack_read_log(arg->stack, arg->refname, &log);
+	if (ret <= 0)
+		goto done;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * The existence entry has both old and new object ID set to the the
+	 * null object ID. Our iterators are aware of this and will not present
+	 * them to their callers.
+	 */
+	log.refname = xstrdup(arg->refname);
+	log.update_index = ts;
+	log.value_type = REFTABLE_LOG_UPDATE;
+	ret = reftable_writer_add_log(writer, &log);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_log_record_release(&log);
+	return ret;
+}
+
+static int reftable_be_create_reflog(struct ref_store *ref_store,
+				     const char *refname,
+				     struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_existence_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_reflog_existence_table, &arg);
+
+done:
+	return ret;
+}
+
+struct write_reflog_delete_arg {
+	struct reftable_stack *stack;
+	const char *refname;
+};
+
+static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_reflog_delete_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	struct reftable_log_record log = {0}, tombstone = {0};
+	struct reftable_iterator it = {0};
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * In order to delete a table we need to delete all reflog entries one
+	 * by one. This is inefficient, but the reftable format does not have a
+	 * better marker right now.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->refname);
+	while (ret == 0) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, arg->refname)) {
+			ret = 0;
+			break;
+		}
+
+		tombstone.refname = (char *)arg->refname;
+		tombstone.value_type = REFTABLE_LOG_DELETION;
+		tombstone.update_index = log.update_index;
+
+		ret = reftable_writer_add_log(writer, &tombstone);
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_delete_reflog(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "delete_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_delete_arg arg = {
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+	ret = reftable_stack_add(stack, &write_reflog_delete_table, &arg);
+
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reflog_expiry_arg {
+	struct reftable_stack *stack;
+	struct reftable_log_record *records;
+	struct object_id update_oid;
+	const char *refname;
+	size_t len;
+};
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct reflog_expiry_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	uint64_t live_records = 0;
+	size_t i;
+	int ret;
+
+	for (i = 0; i < arg->len; i++)
+		if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
+			live_records++;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	if (!is_null_oid(&arg->update_oid)) {
+		struct reftable_ref_record ref = {0};
+		struct object_id peeled;
+
+		ref.refname = (char *)arg->refname;
+		ref.update_index = ts;
+
+		if (!peel_object(&arg->update_oid, &peeled)) {
+			ref.value_type = REFTABLE_REF_VAL2;
+			memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+			memcpy(ref.value.val2.value, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		} else {
+			ref.value_type = REFTABLE_REF_VAL1;
+			memcpy(ref.value.val1, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		}
+
+		ret = reftable_writer_add_ref(writer, &ref);
+		if (ret < 0)
+			return ret;
+	}
+
+	/*
+	 * When there are no more entries left in the reflog we empty it
+	 * completely, but write a placeholder reflog entry that indicates that
+	 * the reflog still exists.
+	 */
+	if (!live_records) {
+		struct reftable_log_record log = {
+			.refname = (char *)arg->refname,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.update_index = ts,
+		};
+
+		ret = reftable_writer_add_log(writer, &log);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < arg->len; i++) {
+		ret = reftable_writer_add_log(writer, &arg->records[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int reftable_be_reflog_expire(struct ref_store *ref_store,
+				     const char *refname,
+				     unsigned int flags,
+				     reflog_expiry_prepare_fn prepare_fn,
+				     reflog_expiry_should_prune_fn should_prune_fn,
+				     reflog_expiry_cleanup_fn cleanup_fn,
+				     void *policy_cb_data)
+{
+	/*
+	 * For log expiry, we write tombstones for every single reflog entry
+	 * that is to be expired. This means that the entries are still
+	 * retrievable by delving into the stack, and expiring entries
+	 * paradoxically takes extra memory. This memory is only reclaimed when
+	 * compacting the reftable stack.
+	 *
+	 * It would be better if the refs backend supported an API that sets a
+	 * criterion for all refs, passing the criterion to pack_refs().
+	 *
+	 * On the plus side, because we do the expiration per ref, we can easily
+	 * insert the reflog existence dummies.
+	 */
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "reflog_expire");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record *logs = NULL;
+	struct reftable_log_record *rewritten = NULL;
+	struct reftable_ref_record ref_record = {0};
+	struct reftable_iterator it = {0};
+	struct reftable_addition *add = NULL;
+	struct reflog_expiry_arg arg = {0};
+	struct object_id oid = {0};
+	uint8_t *last_hash = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_new_addition(&add, stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref_record);
+	if (ret < 0)
+		goto done;
+	if (reftable_ref_record_val1(&ref_record))
+		oidread(&oid, reftable_ref_record_val1(&ref_record));
+	prepare_fn(refname, &oid, policy_cb_data);
+
+	while (1) {
+		struct reftable_log_record log = {0};
+		struct object_id old_oid, new_oid;
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			break;
+		}
+
+		oidread(&old_oid, log.value.update.old_hash);
+		oidread(&new_oid, log.value.update.new_hash);
+
+		/*
+		 * Skip over the reflog existence marker. We will add it back
+		 * in when there are no live reflog records.
+		 */
+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid)) {
+			reftable_log_record_release(&log);
+			continue;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	/*
+	 * We need to rewrite all reflog entries according to the pruning
+	 * callback function:
+	 *
+	 *   - If a reflog entry shall be pruned we mark the record for
+	 *     deletion.
+	 *
+	 *   - Otherwise we may have to rewrite the chain of reflog entries so
+	 *     that gaps created by just-deleted records get backfilled.
+	 */
+	CALLOC_ARRAY(rewritten, logs_nr);
+	for (i = logs_nr; i--;) {
+		struct reftable_log_record *dest = &rewritten[i];
+		struct object_id old_oid, new_oid;
+
+		*dest = logs[i];
+		oidread(&old_oid, logs[i].value.update.old_hash);
+		oidread(&new_oid, logs[i].value.update.new_hash);
+
+		if (should_prune_fn(&old_oid, &new_oid, logs[i].value.update.email,
+				    (timestamp_t)logs[i].value.update.time,
+				    logs[i].value.update.tz_offset,
+				    logs[i].value.update.message,
+				    policy_cb_data)) {
+			dest->value_type = REFTABLE_LOG_DELETION;
+		} else {
+			if ((flags & EXPIRE_REFLOGS_REWRITE) && last_hash)
+				dest->value.update.old_hash = last_hash;
+			last_hash = logs[i].value.update.new_hash;
+		}
+	}
+
+	if (flags & EXPIRE_REFLOGS_UPDATE_REF && last_hash &&
+	    reftable_ref_record_val1(&ref_record))
+		oidread(&arg.update_oid, last_hash);
+
+	arg.records = rewritten;
+	arg.len = logs_nr;
+	arg.stack = stack,
+	arg.refname = refname,
+
+	ret = reftable_addition_add(add, &write_reflog_expiry_table, &arg);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Future improvement: we could skip writing records that were
+	 * not changed.
+	 */
+	if (!(flags & EXPIRE_REFLOGS_DRY_RUN))
+		ret = reftable_addition_commit(add);
+
+done:
+	if (add)
+		cleanup_fn(policy_cb_data);
+	assert(ret != REFTABLE_API_ERROR);
+
+	reftable_ref_record_release(&ref_record);
+	reftable_iterator_destroy(&it);
+	reftable_addition_destroy(add);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	free(rewritten);
+	return ret;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	.name = "reftable",
+	.init = reftable_be_init,
+	.init_db = reftable_be_init_db,
+	.transaction_prepare = reftable_be_transaction_prepare,
+	.transaction_finish = reftable_be_transaction_finish,
+	.transaction_abort = reftable_be_transaction_abort,
+	.initial_transaction_commit = reftable_be_initial_transaction_commit,
+
+	.pack_refs = reftable_be_pack_refs,
+	.create_symref = reftable_be_create_symref,
+	.rename_ref = reftable_be_rename_ref,
+	.copy_ref = reftable_be_copy_ref,
+
+	.iterator_begin = reftable_be_iterator_begin,
+	.read_raw_ref = reftable_be_read_raw_ref,
+	.read_symbolic_ref = reftable_be_read_symbolic_ref,
+
+	.reflog_iterator_begin = reftable_be_reflog_iterator_begin,
+	.for_each_reflog_ent = reftable_be_for_each_reflog_ent,
+	.for_each_reflog_ent_reverse = reftable_be_for_each_reflog_ent_reverse,
+	.reflog_exists = reftable_be_reflog_exists,
+	.create_reflog = reftable_be_create_reflog,
+	.delete_reflog = reftable_be_delete_reflog,
+	.reflog_expire = reftable_be_reflog_expire,
+};
diff --git a/repository.h b/repository.h
index 7a250a6605..1645cef518 100644
--- a/repository.h
+++ b/repository.h
@@ -24,8 +24,9 @@ enum fetch_negotiation_setting {
 	FETCH_NEGOTIATION_NOOP,
 };
 
-#define REF_STORAGE_FORMAT_UNKNOWN 0
-#define REF_STORAGE_FORMAT_FILES   1
+#define REF_STORAGE_FORMAT_UNKNOWN  0
+#define REF_STORAGE_FORMAT_FILES    1
+#define REF_STORAGE_FORMAT_REFTABLE 2
 
 struct repo_settings {
 	int initialized;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
new file mode 100755
index 0000000000..94ee46984a
--- /dev/null
+++ b/t/t0610-reftable-basics.sh
@@ -0,0 +1,887 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+if ! test_have_prereq REFTABLE
+then
+	skip_all='skipping reftable tests; set GIT_TEST_DEFAULT_REF_FORMAT=reftable'
+	test_done
+fi
+
+INVALID_OID=$(test_oid 001)
+
+test_expect_success 'init: creates basic reftable structures' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_path_is_dir repo/.git/reftable &&
+	test_path_is_file repo/.git/reftable/tables.list &&
+	echo reftable >expect &&
+	git -C repo rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via environment variable' '
+	test_when_finished "rm -rf repo" &&
+	GIT_DEFAULT_HASH=sha256 git init repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via option' '
+	test_when_finished "rm -rf repo" &&
+	git init --object-format=sha256 repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing reftable backend succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo for-each-ref >expect &&
+	git init --ref-format=reftable repo &&
+	git -C repo for-each-ref >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing files with reftable backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=reftable repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_success 'init: reinitializing reftable with files backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=reftable repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=files repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_perms () {
+	local perms="$1"
+	local file="$2"
+	local actual=$(ls -l "$file") &&
+
+	case "$actual" in
+	$perms*)
+		: happy
+		;;
+	*)
+		echo "$(basename $2) is not $perms but $actual"
+		false
+		;;
+	esac
+}
+
+for umask in 002 022
+do
+	test_expect_success POSIXPERM 'init: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test 1 = "$(git -C repo config core.sharedrepository)"
+		) &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+test_expect_success 'clone: can clone reftable repository' '
+	test_when_finished "rm -rf repo clone" &&
+	git init repo &&
+	test_commit -C repo message1 file1 &&
+
+	git clone repo cloned &&
+	echo reftable >expect &&
+	git -C cloned rev-parse --show-ref-format >actual &&
+	test_cmp expect actual &&
+	test_path_is_file cloned/file1
+'
+
+test_expect_success 'clone: can clone reffiles into reftable repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=files reffiles &&
+	test_commit -C reffiles A &&
+	git clone --ref-format=reftable ./reffiles reftable &&
+
+	git -C reffiles rev-parse HEAD >expect &&
+	git -C reftable rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone: can clone reftable into reffiles repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=reftable reftable &&
+	test_commit -C reftable A &&
+	git clone --ref-format=files ./reftable reffiles &&
+
+	git -C reftable rev-parse HEAD >expect &&
+	git -C reffiles rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'ref transaction: corrupted tables cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		for f in .git/reftable/*.ref
+		do
+			: >"$f" || return 1
+		done &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: corrupted tables.list cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		echo garbage >.git/reftable/tables.list &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: refuses to write ref causing F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref refs/heads/main/forbidden
+'
+
+test_expect_success 'ref transaction: deleting ref with invalid name fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref -d ../../my-private-file
+'
+
+test_expect_success 'ref transaction: can skip object ID verification' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID 0 &&
+	test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION
+'
+
+test_expect_success 'ref transaction: updating same ref multiple times fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >updates <<-EOF &&
+	update refs/heads/main $A
+	update refs/heads/main $A
+	EOF
+	cat >expect <<-EOF &&
+	fatal: multiple updates for ref ${SQ}refs/heads/main${SQ} not allowed
+	EOF
+	test_must_fail git -C repo update-ref --stdin <updates 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: can delete symbolic self-reference with git-symbolic-ref(1)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo symbolic-ref -d refs/heads/self
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference without --no-deref fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	cat >expect <<-EOF &&
+	error: multiple updates for ${SQ}refs/heads/self${SQ} (including one via symref ${SQ}refs/heads/self${SQ}) are not allowed
+	EOF
+	test_must_fail git -C repo update-ref -d refs/heads/self 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference with --no-deref succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo update-ref -d --no-deref refs/heads/self
+'
+
+test_expect_success 'ref transaction: creating symbolic ref fails with F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >expect <<-EOF &&
+	error: unable to write symref for refs/heads: file/directory conflict
+	EOF
+	test_must_fail git -C repo symbolic-ref refs/heads refs/heads/foo 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: ref deletion' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		HEAD_OID=$(git show-ref -s --verify HEAD) &&
+		cat >expect <<-EOF &&
+		$HEAD_OID refs/heads/main
+		$HEAD_OID refs/tags/file
+		EOF
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		test_must_fail git update-ref -d refs/tags/file $INVALID_OID &&
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		git update-ref -d refs/tags/file $HEAD_OID &&
+		echo "$HEAD_OID refs/heads/main" >expect &&
+		git show-ref >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref transaction: writes cause auto-compaction' '
+	test_when_finished "rm -rf repo" &&
+
+	git init repo &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag A &&
+	test_line_count = 2 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag B &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+check_fsync_events () {
+	local trace="$1" &&
+	shift &&
+
+	cat >expect &&
+	sed -n \
+		-e '/^{"event":"counter",.*"category":"fsync",/ {
+			s/.*"category":"fsync",//;
+			s/}$//;
+			p;
+		}' \
+		<"$trace" >actual &&
+	test_cmp expect actual
+}
+
+# A fix for this is in flight via jc/reftable-core-fsync.
+test_expect_failure 'ref transaction: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference update-ref refs/heads/branch HEAD &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'pack-refs: compacts tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	test_commit -C repo A &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 4 table-files &&
+	test_line_count = 3 repo/.git/reftable/tables.list &&
+
+	git -C repo pack-refs &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'pack-refs: prunes stale tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/stale-table.ref &&
+	git -C repo pack-refs &&
+	test_path_is_missing repo/.git/reftable/stable-ref.ref
+'
+
+test_expect_success 'pack-refs: does not prune non-table files' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/garbage &&
+	git -C repo pack-refs &&
+	test_path_is_file repo/.git/reftable/garbage
+'
+
+for umask in 002 022
+do
+	# A fix for this is in flight via ps/reftable-compacted-tables-permission-fix.
+	test_expect_failure POSIXPERM 'pack-refs: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test_commit -C repo A &&
+			test_line_count = 3 repo/.git/reftable/tables.list
+		) &&
+		git -C repo pack-refs &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+# A fix for this is in flight.
+test_expect_failure 'packed-refs: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+	test_line_count = 2 table-files &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference pack-refs &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'ref iterator: bogus names are flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit --no-tag file &&
+		test-tool ref-store main update-ref msg "refs/heads/bogus..name" $(git rev-parse HEAD) $ZERO_OID REF_SKIP_REFNAME_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$ZERO_OID refs/heads/bogus..name 0xc
+		$(git rev-parse HEAD) refs/heads/main 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref iterator: missing object IDs are not flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test-tool ref-store main update-ref msg "refs/heads/broken-hash" $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$INVALID_OID refs/heads/broken-hash 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: commit and list refs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_write_lines refs/heads/main refs/tags/file >expect &&
+	git -C repo for-each-ref --format="%(refname)" >actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'basic: can write large commit message' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	perl -e "
+		print \"this is a long commit message\" x 50000
+	" >commit-msg &&
+	git -C repo commit --allow-empty --file=../commit-msg
+'
+
+test_expect_success 'basic: show-ref fails with empty repository' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail git -C repo show-ref >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'basic: can check out unborn branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo checkout -b main
+'
+
+test_expect_success 'basic: peeled tags are stored' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	git -C repo tag -m "annotated tag" test_tag HEAD &&
+	for ref in refs/heads/main refs/tags/file refs/tags/test_tag refs/tags/test_tag^{}
+	do
+		echo "$(git -C repo rev-parse "$ref") $ref" || return 1
+	done >expect &&
+	git -C repo show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic: for-each-ref can print symrefs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git branch &&
+		git symbolic-ref refs/heads/sym refs/heads/main &&
+		cat >expected <<-EOF &&
+		refs/heads/main
+		EOF
+		git for-each-ref --format="%(symref)" refs/heads/sym >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'basic: notes' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		write_script fake_editor <<-\EOF &&
+		echo "$MSG" >"$1"
+		echo "$MSG" >&2
+		EOF
+
+		test_commit 1st &&
+		test_commit 2nd &&
+		GIT_EDITOR=./fake_editor MSG=b4 git notes add &&
+		GIT_EDITOR=./fake_editor MSG=b3 git notes edit &&
+		echo b4 >expect &&
+		git notes --ref commits@{1} show >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: stash' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git stash list >expect &&
+		test_line_count = 0 expect &&
+
+		echo hoi >>file.t &&
+		git stash push -m stashed &&
+		git stash list >expect &&
+		test_line_count = 1 expect &&
+
+		git stash clear &&
+		git stash list >expect &&
+		test_line_count = 0 expect
+	)
+'
+
+test_expect_success 'basic: cherry-pick' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git cherry-pick source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'basic: rebase' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git rebase source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'reflog: can delete separate reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit file &&
+		test_commit file2 &&
+		test_commit file3 &&
+		test_commit file4 &&
+		git reflog >actual &&
+		grep file3 actual &&
+
+		git reflog delete HEAD@{1} &&
+		git reflog >actual &&
+		! grep file3 actual
+	)
+'
+
+test_expect_success 'reflog: can switch to previous branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		git checkout -b branch1 &&
+		test_commit file2 &&
+		git checkout -b branch2 &&
+		git switch - &&
+		git rev-parse --symbolic-full-name HEAD >actual &&
+		echo refs/heads/branch1 >expect &&
+		test_cmp actual expect
+	)
+'
+
+test_expect_success 'reflog: copying branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		test_commit file2 &&
+		oid=$(git rev-parse --short HEAD) &&
+		git branch src &&
+		cat >expect <<-EOF &&
+		${oid} dst@{0}: Branch: copied refs/heads/src to refs/heads/dst
+		${oid} dst@{1}: branch: Created from main
+		EOF
+		git branch -c src dst &&
+		git reflog dst >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog: renaming branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git symbolic-ref HEAD refs/heads/before &&
+		test_commit file &&
+		git show-ref >expected.refs &&
+		sed s/before/after/g <expected.refs >expected &&
+		git branch -M after &&
+		git show-ref >actual &&
+		test_cmp expected actual &&
+		echo refs/heads/after >expected &&
+		git symbolic-ref HEAD >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'reflog: can store empty logs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main create-reflog refs/heads/branch &&
+		test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main for-each-reflog-ent-reverse refs/heads/branch >actual &&
+		test_must_be_empty actual
+	)
+'
+
+test_expect_success 'reflog: expiry empties reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit initial &&
+		git checkout -b branch &&
+		test_commit fileA &&
+		test_commit fileB &&
+
+		cat >expect <<-EOF &&
+		commit: fileB
+		commit: fileA
+		branch: Created from HEAD
+		EOF
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_cmp expect actual &&
+
+		git reflog expire branch --expire=all &&
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_must_be_empty actual &&
+		test-tool ref-store main reflog-exists refs/heads/branch
+	)
+'
+
+test_expect_success 'reflog: can be deleted' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		test-tool ref-store main reflog-exists refs/heads/main &&
+		test-tool ref-store main delete-reflog refs/heads/main &&
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/main
+	)
+'
+
+test_expect_success 'reflog: garbage collection deletes reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		for count in $(test_seq 1 10)
+		do
+			test_commit "number $count" file.t $count number-$count ||
+			return 1
+		done &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 10 actual &&
+		grep "commit (initial): number 1" actual &&
+		grep "commit: number 10" actual &&
+
+		git gc &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 0 actual
+	)
+'
+
+test_expect_success 'reflog: updates via HEAD update HEAD reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit main-one &&
+		git checkout -b new-branch &&
+		test_commit new-one &&
+		test_commit new-two &&
+
+		echo new-one >expect &&
+		git log -1 --format=%s HEAD@{1} >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'worktree: adding worktree creates separate stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	test_path_is_file repo/.git/worktrees/worktree/refs/heads &&
+	echo "ref: refs/heads/.invalid" >expect &&
+	test_cmp expect repo/.git/worktrees/worktree/HEAD &&
+	test_path_is_dir repo/.git/worktrees/worktree/reftable &&
+	test_path_is_file repo/.git/worktrees/worktree/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in main repo packs main refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C repo pack-refs &&
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating shared ref updates main stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/heads/shared HEAD &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref updates worktree stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from main repo' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C repo update-ref worktrees/worktree/refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from second worktree' '
+	test_when_finished "rm -rf repo wt1 wt2" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../wt1 &&
+	git -C repo worktree add ../wt2 &&
+	git -C repo pack-refs &&
+	git -C wt1 pack-refs &&
+	git -C wt2 pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C wt1 update-ref worktrees/wt2/refs/bisect/per-worktree HEAD &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 2 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can create shared and per-worktree ref in one transaction' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	cat >stdin <<-EOF &&
+	create worktrees/worktree/refs/bisect/per-worktree HEAD
+	create refs/branches/shared HEAD
+	EOF
+	git -C repo update-ref --stdin <stdin &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can access common refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo file1 &&
+	git -C repo branch branch1 &&
+	git -C repo worktree add ../worktree &&
+
+	echo refs/heads/worktree >expect &&
+	git -C worktree symbolic-ref HEAD >actual &&
+	test_cmp expect actual &&
+	git -C worktree checkout branch1
+'
+
+test_expect_success 'worktree: adds worktree with detached HEAD' '
+	test_when_finished "rm -rf repo worktree" &&
+
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo rev-parse main >expect &&
+
+	git -C repo worktree add --detach ../worktree main &&
+	git -C worktree rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'fetch: accessing FETCH_HEAD special ref works' '
+	test_when_finished "rm -rf repo sub" &&
+
+	git init sub &&
+	test_commit -C sub two &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git init repo &&
+	test_commit -C repo one &&
+	git -C repo fetch ../sub &&
+	git -C repo rev-parse FETCH_HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/t/t0611-reftable-httpd.sh b/t/t0611-reftable-httpd.sh
new file mode 100755
index 0000000000..5e05b9c1f2
--- /dev/null
+++ b/t/t0611-reftable-httpd.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description='reftable HTTPD tests'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+
+test_expect_success 'serving ls-remote' '
+	git init --ref-format=reftable -b main "$REPO" &&
+	cd "$REPO" &&
+	test_commit m1 &&
+	>.git/git-daemon-export-ok &&
+	git ls-remote "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo" | cut -f 2-2 -d "	" >actual &&
+	cat >expect <<-EOF &&
+	HEAD
+	refs/heads/main
+	refs/tags/m1
+	EOF
+	test_cmp actual expect
+'
+
+test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index fc93aa57e6..0fb8757b7d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1750,6 +1750,8 @@ esac
 case "$GIT_DEFAULT_REF_FORMAT" in
 files)
 	test_set_prereq REFFILES;;
+reftable)
+	test_set_prereq REFTABLE;;
 *)
 	echo 2>&1 "error: unknown ref format $GIT_DEFAULT_REF_FORMAT"
 	exit 1
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/2] ci: add jobs to test with the reftable backend
  2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
  2024-02-02  8:38   ` [PATCH v2 1/2] " Patrick Steinhardt
@ 2024-02-02  8:38   ` Patrick Steinhardt
  2024-02-02 13:02   ` [PATCH v2 0/2] refs: introduce " Karthik Nayak
  2024-02-03 20:41   ` Junio C Hamano
  3 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-02  8:38 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3294 bytes --]

Add CI jobs for both GitHub Workflows and GitLab CI to run Git with the
new reftable backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 .github/workflows/main.yml | 9 +++++++++
 .gitlab-ci.yml             | 9 +++++++++
 ci/lib.sh                  | 2 +-
 ci/run-build-and-tests.sh  | 3 +++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 4d97da57ec..1b43e49dad 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -266,6 +266,9 @@ jobs:
           - jobname: linux-sha256
             cc: clang
             pool: ubuntu-latest
+          - jobname: linux-reftable
+            cc: clang
+            pool: ubuntu-latest
           - jobname: linux-gcc
             cc: gcc
             cc_package: gcc-8
@@ -277,6 +280,9 @@ jobs:
           - jobname: osx-clang
             cc: clang
             pool: macos-13
+          - jobname: osx-reftable
+            cc: clang
+            pool: macos-13
           - jobname: osx-gcc
             cc: gcc
             cc_package: gcc-13
@@ -287,6 +293,9 @@ jobs:
           - jobname: linux-leaks
             cc: gcc
             pool: ubuntu-latest
+          - jobname: linux-reftable-leaks
+            cc: gcc
+            pool: ubuntu-latest
           - jobname: linux-asan-ubsan
             cc: clang
             pool: ubuntu-latest
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 43bfbd8834..c0fa2fe90b 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -26,6 +26,9 @@ test:linux:
       - jobname: linux-sha256
         image: ubuntu:latest
         CC: clang
+      - jobname: linux-reftable
+        image: ubuntu:latest
+        CC: clang
       - jobname: linux-gcc
         image: ubuntu:20.04
         CC: gcc
@@ -40,6 +43,9 @@ test:linux:
       - jobname: linux-leaks
         image: ubuntu:latest
         CC: gcc
+      - jobname: linux-reftable-leaks
+        image: ubuntu:latest
+        CC: gcc
       - jobname: linux-asan-ubsan
         image: ubuntu:latest
         CC: clang
@@ -79,6 +85,9 @@ test:osx:
       - jobname: osx-clang
         image: macos-13-xcode-14
         CC: clang
+      - jobname: osx-reftable
+        image: macos-13-xcode-14
+        CC: clang
   artifacts:
     paths:
       - t/failed-test-artifacts
diff --git a/ci/lib.sh b/ci/lib.sh
index d5dd2f2697..0a73fc7bd1 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -367,7 +367,7 @@ linux-musl)
 	MAKEFLAGS="$MAKEFLAGS NO_REGEX=Yes ICONV_OMITS_BOM=Yes"
 	MAKEFLAGS="$MAKEFLAGS GIT_TEST_UTF8_LOCALE=C.UTF-8"
 	;;
-linux-leaks)
+linux-leaks|linux-reftable-leaks)
 	export SANITIZE=leak
 	export GIT_TEST_PASSING_SANITIZE_LEAK=true
 	export GIT_TEST_SANITIZE_LEAK_LOG=true
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 7a1466b868..c192bd613c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -37,6 +37,9 @@ linux-clang)
 linux-sha256)
 	export GIT_TEST_DEFAULT_HASH=sha256
 	;;
+linux-reftable|linux-reftable-leaks|osx-reftable)
+	export GIT_TEST_DEFAULT_REF_FORMAT=reftable
+	;;
 pedantic)
 	# Don't run the tests; we only care about whether Git can be
 	# built.
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] refs: introduce reftable backend
  2024-02-02  8:30     ` Patrick Steinhardt
@ 2024-02-02 10:52       ` Patrick Steinhardt
  0 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-02 10:52 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 1935 bytes --]

On Fri, Feb 02, 2024 at 09:30:39AM +0100, Patrick Steinhardt wrote:
> On Thu, Feb 01, 2024 at 07:17:45AM -0800, Karthik Nayak wrote:
> > > +		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
> > > +		if (iter->err)
> > > +			break;
> > > +
> > > +		/*
> > > +		 * The files backend only lists references contained in
> > > +		 * "refs/". We emulate the same behaviour here and thus skip
> > > +		 * all references that don't start with this prefix.
> > > +		 */
> > > +		if (!starts_with(iter->ref.refname, "refs/"))
> > > +			continue;
> > > +
> > 
> > Since my patch series [1] to print all refs is now merged to next, maybe
> > you could add this in?
> > 
> >     diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> >     index 895de0b273..3f4f905292 100644
> >     --- a/refs/reftable-backend.c
> >     +++ b/refs/reftable-backend.c
> >     @@ -348,11 +348,10 @@ static int
> > reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >      			break;
> > 
> >      		/*
> >     -		 * The files backend only lists references contained in
> >     -		 * "refs/". We emulate the same behaviour here and thus skip
> >     -		 * all references that don't start with this prefix.
> >     +		 * Unless the `DO_FOR_EACH_INCLUDE_ALL_REFS` flag is use, we only
> >     +		 * list references contained in "refs/" to mimic the file-backend.
> >      		 */
> >     -		if (!starts_with(iter->ref.refname, "refs/"))
> >     +		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
> > !starts_with(iter->ref.refname, "refs/"))
> >      			continue;
> > 
> >      		if (iter->prefix &&
> 
> I'm still waiting for your patch series to hit `next` before making it a
> dependency of this series, but thanks for the patch!

Oops, didn't read this carefully enough and missed that it already is in
`next`. I'll adapt accordingly in the next version, thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] refs: introduce reftable backend
  2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
  2024-02-02  8:38   ` [PATCH v2 1/2] " Patrick Steinhardt
  2024-02-02  8:38   ` [PATCH v2 2/2] ci: add jobs to test with the " Patrick Steinhardt
@ 2024-02-02 13:02   ` Karthik Nayak
  2024-02-03 20:41   ` Junio C Hamano
  3 siblings, 0 replies; 23+ messages in thread
From: Karthik Nayak @ 2024-02-02 13:02 UTC (permalink / raw
  To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 279 bytes --]

Hello,

Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this is the second version of my patch series that introduces the
> reftable backend.
>

The diff looks good to, apart from the tests (which I didn't get around
to reviewing), everything else looks great!

Thanks,
Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] refs: introduce reftable backend
  2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-02-02 13:02   ` [PATCH v2 0/2] refs: introduce " Karthik Nayak
@ 2024-02-03 20:41   ` Junio C Hamano
  2024-02-04  6:00     ` Patrick Steinhardt
  3 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2024-02-03 20:41 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Karthik Nayak

Patrick Steinhardt <ps@pks.im> writes:

> this is the second version of my patch series that introduces the
> reftable backend.
>
> This version addresses the feedback by Karthik. I don't think it really
> makes sense to spell out every change here -- the range diff should
> likely be easier to digest.

This, when merged to 'seen' (which also has "for-each-ref now thinks
an empty string is a signal to show ref-like things outside the
refs/ hierarchy" topic, kn/for-all-refs), seems to break *-reftable
CI tests, cf.  https://github.com/git/git/actions/runs/7765401528

I'll tentatively eject the topic from 'seen', even though I have a
suspicion that it byitself would pass all the tests.

Thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] refs: introduce reftable backend
  2024-02-03 20:41   ` Junio C Hamano
@ 2024-02-04  6:00     ` Patrick Steinhardt
  0 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-04  6:00 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]

On Sat, Feb 03, 2024 at 12:41:01PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > this is the second version of my patch series that introduces the
> > reftable backend.
> >
> > This version addresses the feedback by Karthik. I don't think it really
> > makes sense to spell out every change here -- the range diff should
> > likely be easier to digest.
> 
> This, when merged to 'seen' (which also has "for-each-ref now thinks
> an empty string is a signal to show ref-like things outside the
> refs/ hierarchy" topic, kn/for-all-refs), seems to break *-reftable
> CI tests, cf.  https://github.com/git/git/actions/runs/7765401528
> 
> I'll tentatively eject the topic from 'seen', even though I have a
> suspicion that it byitself would pass all the tests.
> 
> Thanks.

Yup, this is known due to the semantic conflict with kn/for-all-refs, as
you point out. Karthik already sent a single-line change in this thread
that fixes the issue. I'll rebase the patch series tomorrow and pull in
the topic as a dependency and adapt accordingly.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 0/2] refs: introduce reftable backend
  2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
@ 2024-02-05  6:02 ` Patrick Steinhardt
  2024-02-05  6:02   ` [PATCH v3 1/2] " Patrick Steinhardt
                     ` (2 more replies)
  2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
  5 siblings, 3 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-05  6:02 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

Hi,

this is the third version of my patch series that introduces the
reftable backend.

The only real change in this version is that I've pulled in
kn/for-all-refs at 693e807311 (for-each-ref: avoid filtering on empty
pattern, 2024-01-29) as a dependency. This series introduces a new
DO_FOR_EACH_INCLUDE_ALL_REFS flag that we need to handle in the reftable
backend. With this change all tests should pass again.

Patrick

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2297 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3248 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

Range-diff against v2:
1:  d6548dcfad ! 1:  d83e66e980 refs: introduce reftable backend
    @@ refs/reftable-backend.c (new)
     +{
     +	switch (log->value_type) {
     +	case REFTABLE_LOG_UPDATE:
    -+		/* when we write log records, the hashes are owned by a struct
    -+		 * oid */
    ++		/*
    ++		 * When we write log records, the hashes are owned by the
    ++		 * caller and thus shouldn't be free'd.
    ++		 */
     +		log->value.update.old_hash = NULL;
     +		log->value.update.new_hash = NULL;
     +		break;
    @@ refs/reftable-backend.c (new)
     +			break;
     +
     +		/*
    -+		 * The files backend only lists references contained in
    -+		 * "refs/". We emulate the same behaviour here and thus skip
    -+		 * all references that don't start with this prefix.
    ++		 * Unless DO_FOR_EACH_INCLUDE_ALL_REFS is set, we only list
    ++		 * refs starting with "refs/" to mimic the "files" backend.
     +		 */
    -+		if (!starts_with(iter->ref.refname, "refs/"))
    ++		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
    ++		    !starts_with(iter->ref.refname, "refs/"))
     +			continue;
     +
     +		if (iter->prefix &&
2:  63eafc9f5b = 2:  146bb95c03 ci: add jobs to test with the reftable backend
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] refs: introduce reftable backend
  2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
@ 2024-02-05  6:02   ` Patrick Steinhardt
  2024-02-05  6:02   ` [PATCH v3 2/2] ci: add jobs to test with the " Patrick Steinhardt
  2024-02-05 13:10   ` [PATCH v3 0/2] refs: introduce " Karthik Nayak
  2 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-05  6:02 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 118111 bytes --]

Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.

This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:

  - It becomes possible to do truly atomic writes where either all refs
    are committed to disk or none are. This was not possible with the
    "files" backend because ref updates were split across multiple loose
    files.

  - The disk space required to store many refs is reduced, both compared
    to loose refs and packed-refs. This is enabled both by the reftable
    format being a binary format, which is more compact, and by prefix
    compression.

  - We can ignore filesystem-specific behaviour as ref names are not
    encoded via paths anymore. This means there is no need to handle
    case sensitivity on Windows systems or Unicode precomposition on
    macOS.

  - There is no need to rewrite the complete refdb anymore every time a
    ref is being deleted like it was the case for packed-refs. This
    means that ref deletions are now constant time instead of scaling
    linearly with the number of refs.

  - We can ignore file/directory conflicts so that it becomes possible
    to store both "refs/heads/foo" and "refs/heads/foo/bar".

  - Due to this property we can retain reflogs for deleted refs. We have
    previously been deleting reflogs together with their refs to avoid
    file/directory conflicts, which is not necessary anymore.

  - We can properly enumerate all refs. With the "files" backend it is
    not easily possible to distinguish between refs and non-refs because
    they may live side by side in the gitdir.

Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.

Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:

  - It makes it significantly easier to land the "reftable" backend as
    tests behave the same. It would be tough to argue for each and every
    single test that doesn't pass with the "reftable" backend.

  - It ensures compatibility between repositories that use the "files"
    backend and repositories that use the "reftable" backend. Like this,
    hosters can migrate their repositories to use the "reftable" backend
    without causing issues for clients that use the "files" backend in
    their clones.

It is expected that these artificial limitations may eventually go away
in the long term.

Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:

  - Creating N refs in separate transactions shows that the "files"
    backend is ~50% faster. This is not surprising given that creating a
    ref only requires us to create a single loose ref. The "reftable"
    backend will also perform auto compaction on updates. In real-world
    workloads we would likely also want to perform pack loose refs,
    which would likely change the picture.

        Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
          Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
          Range (min … max):     1.8 ms …   4.3 ms    133 runs

        Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
          Range (min … max):     2.4 ms …   2.9 ms    132 runs

        Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
          Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
          Range (min … max):    1.969 s …  1.980 s    3 runs

        Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
          Range (min … max):    2.597 s …  2.622 s    3 runs

        Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
          Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
          Range (min … max):   198.189 s … 198.670 s    3 runs

        Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
          Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
          Range (min … max):   290.223 s … 298.761 s    3 runs

  - Creating N refs in a single transaction shows that the "files"
    backend is significantly slower once we start to write many refs.
    The "reftable" backend only needs to update two files, whereas the
    "files" backend needs to write one file per ref.

        Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
          Range (min … max):     1.8 ms …   2.6 ms    151 runs

        Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
          Range (min … max):     2.4 ms …   3.4 ms    148 runs

        Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
          Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
          Range (min … max):   148.5 ms … 167.8 ms    15 runs

        Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
          Range (min … max):    56.3 ms …  72.9 ms    40 runs

        Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
          Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
          Range (min … max):   152.165 s … 153.542 s    3 runs

        Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
          Range (min … max):   51.769 s … 52.012 s    3 runs

  - Deleting a ref in a fully-packed repository shows that the "files"
    backend scales with the number of refs. The "reftable" backend has
    constant-time deletions.

        Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
          Range (min … max):     1.6 ms …   2.1 ms    316 runs

        Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
          Range (min … max):     1.7 ms …   2.1 ms    294 runs

        Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
          Range (min … max):     1.9 ms …   2.5 ms    287 runs

        Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
          Range (min … max):     1.8 ms …   2.1 ms    217 runs

        Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
          Range (min … max):   224.6 ms … 245.2 ms    6 runs

        Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
          Range (min … max):     2.0 ms …   2.1 ms    3 runs

  - Listing all refs shows no significant advantage for either of the
    backends. The "files" backend is a bit faster, but not by a
    significant margin. When repositories are not packed the "reftable"
    backend outperforms the "files" backend because the "reftable"
    backend performs auto-compaction.

        Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1729 runs

        Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.8 ms    1816 runs

        Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
          Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
          Range (min … max):     4.1 ms …   4.6 ms    645 runs

        Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
          Range (min … max):     4.2 ms …   5.9 ms    643 runs

        Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
          Range (min … max):    2.511 s …  2.627 s    10 runs

        Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
          Range (min … max):    2.692 s …  2.752 s    10 runs

        Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.9 ms    1834 runs

        Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.0 ms    1840 runs

        Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
          Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
          Range (min … max):    13.3 ms …  14.5 ms    208 runs

        Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
          Range (min … max):     4.3 ms …   6.2 ms    624 runs

        Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
          Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
          Range (min … max):   11.965 s … 12.370 s    10 runs

        Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
          Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
          Range (min … max):    2.769 s …  2.836 s    10 runs

  - Printing a single ref shows no real difference between the "files"
    and "reftable" backends.

        Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
          Range (min … max):     1.4 ms …   1.8 ms    1779 runs

        Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.5 ms    1753 runs

        Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   1.9 ms    1840 runs

        Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1831 runs

        Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1848 runs

        Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1762 runs

So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.

The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".

There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.

As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.

[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/

Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2297 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 13 files changed, 3226 insertions(+), 6 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

diff --git a/Documentation/ref-storage-format.txt b/Documentation/ref-storage-format.txt
index 1a65cac468..14fff8a9c6 100644
--- a/Documentation/ref-storage-format.txt
+++ b/Documentation/ref-storage-format.txt
@@ -1 +1,3 @@
 * `files` for loose files with packed-refs. This is the default.
+* `reftable` for the reftable format. This format is experimental and its
+  internals are subject to change.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 27be3741e6..47281420fc 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -103,5 +103,6 @@ GIT_COMMON_DIR/worktrees/<id>/config.worktree)
 
 ==== `refStorage`
 
-Specifies the file format for the ref database. The only valid value
-is `files` (loose references with a packed-refs file).
+Specifies the file format for the ref database. The valid values are
+`files` (loose references with a packed-refs file) and `reftable` (see
+Documentation/technical/reftable.txt).
diff --git a/Makefile b/Makefile
index 0f748a52e6..b302516173 100644
--- a/Makefile
+++ b/Makefile
@@ -1127,6 +1127,7 @@ LIB_OBJS += reflog.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
index 888c34a521..989197aace 100755
--- a/contrib/workdir/git-new-workdir
+++ b/contrib/workdir/git-new-workdir
@@ -79,7 +79,7 @@ trap cleanup $siglist
 # create the links to the original repo.  explicitly exclude index, HEAD and
 # logs/HEAD from the list since they are purely related to the current working
 # directory, and should not be shared.
-for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn
+for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn reftable
 do
 	# create a containing directory if needed
 	case $x in
diff --git a/path.c b/path.c
index 0fb527918b..8bb223c92c 100644
--- a/path.c
+++ b/path.c
@@ -871,7 +871,7 @@ const char *enter_repo(const char *path, int strict)
 	return NULL;
 }
 
-static int calc_shared_perm(int mode)
+int calc_shared_perm(int mode)
 {
 	int tweak;
 
diff --git a/path.h b/path.h
index b3233c51fa..e053effef2 100644
--- a/path.h
+++ b/path.h
@@ -181,6 +181,7 @@ const char *git_path_shallow(struct repository *r);
 int ends_with_path_components(const char *path, const char *components);
 int validate_headref(const char *ref);
 
+int calc_shared_perm(int mode);
 int adjust_shared_perm(const char *path);
 
 char *interpolate_path(const char *path, int real_home);
diff --git a/refs.c b/refs.c
index 3190df8d81..7e6970ca1d 100644
--- a/refs.c
+++ b/refs.c
@@ -35,6 +35,7 @@
  */
 static const struct ref_storage_be *refs_backends[] = {
 	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
+	[REF_STORAGE_FORMAT_REFTABLE] = &refs_be_reftable,
 };
 
 static const struct ref_storage_be *find_ref_storage_backend(unsigned int ref_storage_format)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index be15c19c02..2285abe1ec 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -700,6 +700,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..207cf2f8e9
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,2297 @@
+#include "../git-compat-util.h"
+#include "../abspath.h"
+#include "../chdir-notify.h"
+#include "../environment.h"
+#include "../gettext.h"
+#include "../hash.h"
+#include "../hex.h"
+#include "../iterator.h"
+#include "../ident.h"
+#include "../lockfile.h"
+#include "../object.h"
+#include "../path.h"
+#include "../refs.h"
+#include "../reftable/reftable-stack.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-merged.h"
+#include "../setup.h"
+#include "../strmap.h"
+#include "refs-internal.h"
+
+/*
+ * Used as a flag in ref_update::flags when the ref_update was via an
+ * update to HEAD.
+ */
+#define REF_UPDATE_VIA_HEAD (1 << 8)
+
+struct reftable_ref_store {
+	struct ref_store base;
+
+	/*
+	 * The main stack refers to the common dir and thus contains common
+	 * refs as well as refs of the main repository.
+	 */
+	struct reftable_stack *main_stack;
+	/*
+	 * The worktree stack refers to the gitdir in case the refdb is opened
+	 * via a worktree. It thus contains the per-worktree refs.
+	 */
+	struct reftable_stack *worktree_stack;
+	/*
+	 * Map of worktree stacks by their respective worktree names. The map
+	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
+	 */
+	struct strmap worktree_stacks;
+	struct reftable_write_options write_options;
+
+	unsigned int store_flags;
+	int err;
+};
+
+/*
+ * Downcast ref_store to reftable_ref_store. Die if ref_store is not a
+ * reftable_ref_store. required_flags is compared with ref_store's store_flags
+ * to ensure the ref_store has all required capabilities. "caller" is used in
+ * any necessary error messages.
+ */
+static struct reftable_ref_store *reftable_be_downcast(struct ref_store *ref_store,
+						       unsigned int required_flags,
+						       const char *caller)
+{
+	struct reftable_ref_store *refs;
+
+	if (ref_store->be != &refs_be_reftable)
+		BUG("ref_store is type \"%s\" not \"reftables\" in %s",
+		    ref_store->be->name, caller);
+
+	refs = (struct reftable_ref_store *)ref_store;
+
+	if ((refs->store_flags & required_flags) != required_flags)
+		BUG("operation %s requires abilities 0x%x, but only have 0x%x",
+		    caller, required_flags, refs->store_flags);
+
+	return refs;
+}
+
+/*
+ * Some refs are global to the repository (refs/heads/{*}), while others are
+ * local to the worktree (eg. HEAD, refs/bisect/{*}). We solve this by having
+ * multiple separate databases (ie. multiple reftable/ directories), one for
+ * the shared refs, one for the current worktree refs, and one for each
+ * additional worktree. For reading, we merge the view of both the shared and
+ * the current worktree's refs, when necessary.
+ *
+ * This function also optionally assigns the rewritten reference name that is
+ * local to the stack. This translation is required when using worktree refs
+ * like `worktrees/$worktree/refs/heads/foo` as worktree stacks will store
+ * those references in their normalized form.
+ */
+static struct reftable_stack *stack_for(struct reftable_ref_store *store,
+					const char *refname,
+					const char **rewritten_ref)
+{
+	const char *wtname;
+	int wtname_len;
+
+	if (!refname)
+		return store->main_stack;
+
+	switch (parse_worktree_ref(refname, &wtname, &wtname_len, rewritten_ref)) {
+	case REF_WORKTREE_OTHER: {
+		static struct strbuf wtname_buf = STRBUF_INIT;
+		struct strbuf wt_dir = STRBUF_INIT;
+		struct reftable_stack *stack;
+
+		/*
+		 * We're using a static buffer here so that we don't need to
+		 * allocate the worktree name whenever we look up a reference.
+		 * This could be avoided if the strmap interface knew how to
+		 * handle keys with a length.
+		 */
+		strbuf_reset(&wtname_buf);
+		strbuf_add(&wtname_buf, wtname, wtname_len);
+
+		/*
+		 * There is an edge case here: when the worktree references the
+		 * current worktree, then we set up the stack once via
+		 * `worktree_stacks` and once via `worktree_stack`. This is
+		 * wasteful, but in the reading case it shouldn't matter. And
+		 * in the writing case we would notice that the stack is locked
+		 * already and error out when trying to write a reference via
+		 * both stacks.
+		 */
+		stack = strmap_get(&store->worktree_stacks, wtname_buf.buf);
+		if (!stack) {
+			strbuf_addf(&wt_dir, "%s/worktrees/%s/reftable",
+				    store->base.repo->commondir, wtname_buf.buf);
+
+			store->err = reftable_new_stack(&stack, wt_dir.buf,
+							store->write_options);
+			assert(store->err != REFTABLE_API_ERROR);
+			strmap_put(&store->worktree_stacks, wtname_buf.buf, stack);
+		}
+
+		strbuf_release(&wt_dir);
+		return stack;
+	}
+	case REF_WORKTREE_CURRENT:
+		/*
+		 * If there is no worktree stack then we're currently in the
+		 * main worktree. We thus return the main stack in that case.
+		 */
+		if (!store->worktree_stack)
+			return store->main_stack;
+		return store->worktree_stack;
+	case REF_WORKTREE_MAIN:
+	case REF_WORKTREE_SHARED:
+		return store->main_stack;
+	default:
+		BUG("unhandled worktree reference type");
+	}
+}
+
+static int should_write_log(struct ref_store *refs, const char *refname)
+{
+	if (log_all_ref_updates == LOG_REFS_UNSET)
+		log_all_ref_updates = is_bare_repository() ? LOG_REFS_NONE : LOG_REFS_NORMAL;
+
+	switch (log_all_ref_updates) {
+	case LOG_REFS_NONE:
+		return refs_reflog_exists(refs, refname);
+	case LOG_REFS_ALWAYS:
+		return 1;
+	case LOG_REFS_NORMAL:
+		if (should_autocreate_reflog(refname))
+			return 1;
+		return refs_reflog_exists(refs, refname);
+	default:
+		BUG("unhandled core.logAllRefUpdates value %d", log_all_ref_updates);
+	}
+}
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	switch (log->value_type) {
+	case REFTABLE_LOG_UPDATE:
+		/*
+		 * When we write log records, the hashes are owned by the
+		 * caller and thus shouldn't be free'd.
+		 */
+		log->value.update.old_hash = NULL;
+		log->value.update.new_hash = NULL;
+		break;
+	case REFTABLE_LOG_DELETION:
+		break;
+	}
+	reftable_log_record_release(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {0};
+	int sign = 1;
+
+	if (split_ident_line(&split, info, strlen(info)))
+		BUG("failed splitting committer info");
+
+	reftable_log_record_release(log);
+	log->value_type = REFTABLE_LOG_UPDATE;
+	log->value.update.name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->value.update.email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->value.update.time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->value.update.tz_offset = sign * atoi(split.tz_begin);
+}
+
+static int read_ref_without_reload(struct reftable_stack *stack,
+				   const char *refname,
+				   struct object_id *oid,
+				   struct strbuf *referent,
+				   unsigned int *type)
+{
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret)
+		goto done;
+
+	if (ref.value_type == REFTABLE_REF_SYMREF) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.value.symref);
+		*type |= REF_ISSYMREF;
+	} else if (reftable_ref_record_val1(&ref)) {
+		oidread(oid, reftable_ref_record_val1(&ref));
+	} else {
+		/* We got a tombstone, which should not happen. */
+		BUG("unhandled reference value type %d", ref.value_type);
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+static struct ref_store *reftable_be_init(struct repository *repo,
+					  const char *gitdir,
+					  unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct strbuf path = STRBUF_INIT;
+	int is_worktree;
+	mode_t mask;
+
+	mask = umask(0);
+	umask(mask);
+
+	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
+	strmap_init(&refs->worktree_stacks);
+	refs->store_flags = store_flags;
+	refs->write_options.block_size = 4096;
+	refs->write_options.hash_id = repo->hash_algo->format_id;
+	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
+
+	/*
+	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
+	 * This stack contains both the shared and the main worktree refs.
+	 *
+	 * Note that we don't try to resolve the path in case we have a
+	 * worktree because `get_common_dir_noenv()` already does it for us.
+	 */
+	is_worktree = get_common_dir_noenv(&path, gitdir);
+	if (!is_worktree) {
+		strbuf_reset(&path);
+		strbuf_realpath(&path, gitdir, 0);
+	}
+	strbuf_addstr(&path, "/reftable");
+	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
+				       refs->write_options);
+	if (refs->err)
+		goto done;
+
+	/*
+	 * If we're in a worktree we also need to set up the worktree reftable
+	 * stack that is contained in the per-worktree GIT_DIR.
+	 *
+	 * Ideally, we would also add the stack to our worktree stack map. But
+	 * we have no way to figure out the worktree name here and thus can't
+	 * do it efficiently.
+	 */
+	if (is_worktree) {
+		strbuf_reset(&path);
+		strbuf_addf(&path, "%s/reftable", gitdir);
+
+		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
+					       refs->write_options);
+		if (refs->err)
+			goto done;
+	}
+
+	chdir_notify_reparent("reftables-backend $GIT_DIR", &refs->base.gitdir);
+
+done:
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&path);
+	return &refs->base;
+}
+
+static int reftable_be_init_db(struct ref_store *ref_store,
+			       int flags UNUSED,
+			       struct strbuf *err UNUSED)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "init_db");
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addf(&sb, "%s/reftable", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->base.gitdir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	adjust_shared_perm(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->base.gitdir);
+	write_file(sb.buf, "this repository uses the reftable format");
+	adjust_shared_perm(sb.buf);
+
+	strbuf_release(&sb);
+	return 0;
+}
+
+struct reftable_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+
+	const char *prefix;
+	unsigned int flags;
+	int err;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	struct reftable_ref_store *refs = iter->refs;
+
+	while (!iter->err) {
+		int flags = 0;
+
+		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
+		if (iter->err)
+			break;
+
+		/*
+		 * Unless DO_FOR_EACH_INCLUDE_ALL_REFS is set, we only list
+		 * refs starting with "refs/" to mimic the "files" backend.
+		 */
+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
+		    !starts_with(iter->ref.refname, "refs/"))
+			continue;
+
+		if (iter->prefix &&
+		    strncmp(iter->prefix, iter->ref.refname, strlen(iter->prefix))) {
+			iter->err = 1;
+			break;
+		}
+
+		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    parse_worktree_ref(iter->ref.refname, NULL, NULL, NULL) !=
+			    REF_WORKTREE_CURRENT)
+			continue;
+
+		switch (iter->ref.value_type) {
+		case REFTABLE_REF_VAL1:
+			oidread(&iter->oid, iter->ref.value.val1);
+			break;
+		case REFTABLE_REF_VAL2:
+			oidread(&iter->oid, iter->ref.value.val2.value);
+			break;
+		case REFTABLE_REF_SYMREF:
+			if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->ref.refname,
+						     RESOLVE_REF_READING, &iter->oid, &flags))
+				oidclr(&iter->oid);
+			break;
+		default:
+			BUG("unhandled reference value type %d", iter->ref.value_type);
+		}
+
+		if (is_null_oid(&iter->oid))
+			flags |= REF_ISBROKEN;
+
+		if (check_refname_format(iter->ref.refname, REFNAME_ALLOW_ONELEVEL)) {
+			if (!refname_is_safe(iter->ref.refname))
+				die(_("refname is dangerous: %s"), iter->ref.refname);
+			oidclr(&iter->oid);
+			flags |= REF_BAD_NAME | REF_ISBROKEN;
+		}
+
+		if (iter->flags & DO_FOR_EACH_OMIT_DANGLING_SYMREFS &&
+		    flags & REF_ISSYMREF &&
+		    flags & REF_ISBROKEN)
+			continue;
+
+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(iter->ref.refname, refs->base.repo,
+					    &iter->oid, flags))
+				continue;
+
+		iter->base.refname = iter->ref.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	if (iter->ref.value_type == REFTABLE_REF_VAL2) {
+		oidread(peeled, iter->ref.value.val2.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	reftable_ref_record_release(&iter->ref);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	.advance = reftable_ref_iterator_advance,
+	.peel = reftable_ref_iterator_peel,
+	.abort = reftable_ref_iterator_abort
+};
+
+static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_store *refs,
+							    struct reftable_stack *stack,
+							    const char *prefix,
+							    int flags)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_ref_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable, 1);
+	iter->prefix = prefix;
+	iter->base.oid = &iter->oid;
+	iter->flags = flags;
+	iter->refs = refs;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
+	if (ret)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
+					       struct ref_iterator *iter_common,
+					       void *cb_data UNUSED)
+{
+	if (iter_worktree && !iter_common) {
+		/*
+		 * Return the worktree ref if there are no more common refs.
+		 */
+		return ITER_SELECT_0;
+	} else if (iter_common) {
+		/*
+		 * In case we have pending worktree and common refs we need to
+		 * yield them based on their lexicographical order. Worktree
+		 * refs that have the same name as common refs shadow the
+		 * latter.
+		 */
+		if (iter_worktree) {
+			int cmp = strcmp(iter_worktree->refname,
+					 iter_common->refname);
+			if (cmp < 0)
+				return ITER_SELECT_0;
+			else if (!cmp)
+				return ITER_SELECT_0_SKIP_1;
+		}
+
+		 /*
+		  * We now know that the lexicographically-next ref is a common
+		  * ref. When the common ref is a shared one we return it.
+		  */
+		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
+				       NULL) == REF_WORKTREE_SHARED)
+			return ITER_SELECT_1;
+
+		/*
+		 * Otherwise, if the common ref is a per-worktree ref we skip
+		 * it because it would belong to the main worktree, not ours.
+		 */
+		return ITER_SKIP_1;
+	} else {
+		return ITER_DONE;
+	}
+}
+
+static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_store,
+						       const char *prefix,
+						       const char **exclude_patterns,
+						       unsigned int flags)
+{
+	struct reftable_ref_iterator *main_iter, *worktree_iter;
+	struct reftable_ref_store *refs;
+	unsigned int required_flags = REF_STORE_READ;
+
+	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
+		required_flags |= REF_STORE_ODB;
+	refs = reftable_be_downcast(ref_store, required_flags, "ref_iterator_begin");
+
+	main_iter = ref_iterator_for_stack(refs, refs->main_stack, prefix, flags);
+
+	/*
+	 * The worktree stack is only set when we're in an actual worktree
+	 * right now. If we aren't, then we return the common reftable
+	 * iterator, only.
+	 */
+	 if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	/*
+	 * Otherwise we merge both the common and the per-worktree refs into a
+	 * single iterator.
+	 */
+	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int reftable_be_read_raw_ref(struct ref_store *ref_store,
+				    const char *refname,
+				    struct object_id *oid,
+				    struct strbuf *referent,
+				    unsigned int *type,
+				    int *failure_errno)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = read_ref_without_reload(stack, refname, oid, referent, type);
+	if (ret < 0)
+		return ret;
+	if (ret > 0) {
+		*failure_errno = ENOENT;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_be_read_symbolic_ref(struct ref_store *ref_store,
+					 const char *refname,
+					 struct strbuf *referent)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_symbolic_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret == 0 && ref.value_type == REFTABLE_REF_SYMREF)
+		strbuf_addstr(referent, ref.value.symref);
+	else
+		ret = -1;
+
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+/*
+ * Return the refname under which update was originally requested.
+ */
+static const char *original_update_refname(struct ref_update *update)
+{
+	while (update->parent_update)
+		update = update->parent_update;
+	return update->refname;
+}
+
+struct reftable_transaction_update {
+	struct ref_update *update;
+	struct object_id current_oid;
+};
+
+struct write_transaction_table_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	struct reftable_addition *addition;
+	struct reftable_transaction_update *updates;
+	size_t updates_nr;
+	size_t updates_alloc;
+	size_t updates_expected;
+};
+
+struct reftable_transaction_data {
+	struct write_transaction_table_arg *args;
+	size_t args_nr, args_alloc;
+};
+
+static void free_transaction_data(struct reftable_transaction_data *tx_data)
+{
+	if (!tx_data)
+		return;
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		reftable_addition_destroy(tx_data->args[i].addition);
+		free(tx_data->args[i].updates);
+	}
+	free(tx_data->args);
+	free(tx_data);
+}
+
+/*
+ * Prepare transaction update for the given reference update. This will cause
+ * us to lock the corresponding reftable stack for concurrent modification.
+ */
+static int prepare_transaction_update(struct write_transaction_table_arg **out,
+				      struct reftable_ref_store *refs,
+				      struct reftable_transaction_data *tx_data,
+				      struct ref_update *update,
+				      struct strbuf *err)
+{
+	struct reftable_stack *stack = stack_for(refs, update->refname, NULL);
+	struct write_transaction_table_arg *arg = NULL;
+	size_t i;
+	int ret;
+
+	/*
+	 * Search for a preexisting stack update. If there is one then we add
+	 * the update to it, otherwise we set up a new stack update.
+	 */
+	for (i = 0; !arg && i < tx_data->args_nr; i++)
+		if (tx_data->args[i].stack == stack)
+			arg = &tx_data->args[i];
+
+	if (!arg) {
+		struct reftable_addition *addition;
+
+		ret = reftable_stack_reload(stack);
+		if (ret)
+			return ret;
+
+		ret = reftable_stack_new_addition(&addition, stack);
+		if (ret) {
+			if (ret == REFTABLE_LOCK_ERROR)
+				strbuf_addstr(err, "cannot lock references");
+			return ret;
+		}
+
+		ALLOC_GROW(tx_data->args, tx_data->args_nr + 1,
+			   tx_data->args_alloc);
+		arg = &tx_data->args[tx_data->args_nr++];
+		arg->refs = refs;
+		arg->stack = stack;
+		arg->addition = addition;
+		arg->updates = NULL;
+		arg->updates_nr = 0;
+		arg->updates_alloc = 0;
+		arg->updates_expected = 0;
+	}
+
+	arg->updates_expected++;
+
+	if (out)
+		*out = arg;
+
+	return 0;
+}
+
+/*
+ * Queue a reference update for the correct stack. We potentially need to
+ * handle multiple stack updates in a single transaction when it spans across
+ * multiple worktrees.
+ */
+static int queue_transaction_update(struct reftable_ref_store *refs,
+				    struct reftable_transaction_data *tx_data,
+				    struct ref_update *update,
+				    struct object_id *current_oid,
+				    struct strbuf *err)
+{
+	struct write_transaction_table_arg *arg = NULL;
+	int ret;
+
+	if (update->backend_data)
+		BUG("reference update queued more than once");
+
+	ret = prepare_transaction_update(&arg, refs, tx_data, update, err);
+	if (ret < 0)
+		return ret;
+
+	ALLOC_GROW(arg->updates, arg->updates_nr + 1,
+		   arg->updates_alloc);
+	arg->updates[arg->updates_nr].update = update;
+	oidcpy(&arg->updates[arg->updates_nr].current_oid, current_oid);
+	update->backend_data = &arg->updates[arg->updates_nr++];
+
+	return 0;
+}
+
+static int reftable_be_transaction_prepare(struct ref_store *ref_store,
+					   struct ref_transaction *transaction,
+					   struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
+	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
+	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct reftable_transaction_data *tx_data = NULL;
+	struct object_id head_oid;
+	unsigned int head_type = 0;
+	size_t i;
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	tx_data = xcalloc(1, sizeof(*tx_data));
+
+	/*
+	 * Preprocess all updates. For one we check that there are no duplicate
+	 * reference updates in this transaction. Second, we lock all stacks
+	 * that will be modified during the transaction.
+	 */
+	for (i = 0; i < transaction->nr; i++) {
+		ret = prepare_transaction_update(NULL, refs, tx_data,
+						 transaction->updates[i], err);
+		if (ret)
+			goto done;
+
+		string_list_append(&affected_refnames,
+				   transaction->updates[i]->refname);
+	}
+
+	/*
+	 * Now that we have counted updates per stack we can preallocate their
+	 * arrays. This avoids having to reallocate many times.
+	 */
+	for (i = 0; i < tx_data->args_nr; i++) {
+		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
+		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
+	}
+
+	/*
+	 * Fail if a refname appears more than once in the transaction.
+	 * This code is taken from the files backend and is a good candidate to
+	 * be moved into the generic layer.
+	 */
+	string_list_sort(&affected_refnames);
+	if (ref_update_reject_duplicates(&affected_refnames, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto done;
+	}
+
+	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
+				      &head_referent, &head_type);
+	if (ret < 0)
+		goto done;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		struct object_id current_oid = {0};
+		struct reftable_stack *stack;
+		const char *rewritten_ref;
+
+		stack = stack_for(refs, u->refname, &rewritten_ref);
+
+		/* Verify that the new object ID is valid. */
+		if ((u->flags & REF_HAVE_NEW) && !is_null_oid(&u->new_oid) &&
+		    !(u->flags & REF_SKIP_OID_VERIFICATION) &&
+		    !(u->flags & REF_LOG_ONLY)) {
+			struct object *o = parse_object(refs->base.repo, &u->new_oid);
+			if (!o) {
+				strbuf_addf(err,
+					    _("trying to write ref '%s' with nonexistent object %s"),
+					    u->refname, oid_to_hex(&u->new_oid));
+				ret = -1;
+				goto done;
+			}
+
+			if (o->type != OBJ_COMMIT && is_branch(u->refname)) {
+				strbuf_addf(err, _("trying to write non-commit object %s to branch '%s'"),
+					    oid_to_hex(&u->new_oid), u->refname);
+				ret = -1;
+				goto done;
+			}
+		}
+
+		/*
+		 * When we update the reference that HEAD points to we enqueue
+		 * a second log-only update for HEAD so that its reflog is
+		 * updated accordingly.
+		 */
+		if (head_type == REF_ISSYMREF &&
+		    !(u->flags & REF_LOG_ONLY) &&
+		    !(u->flags & REF_UPDATE_VIA_HEAD) &&
+		    !strcmp(rewritten_ref, head_referent.buf)) {
+			struct ref_update *new_update;
+
+			/*
+			 * First make sure that HEAD is not already in the
+			 * transaction. This check is O(lg N) in the transaction
+			 * size, but it happens at most once per transaction.
+			 */
+			if (string_list_has_string(&affected_refnames, "HEAD")) {
+				/* An entry already existed */
+				strbuf_addf(err,
+					    _("multiple updates for 'HEAD' (including one "
+					    "via its referent '%s') are not allowed"),
+					    u->refname);
+				ret = TRANSACTION_NAME_CONFLICT;
+				goto done;
+			}
+
+			new_update = ref_transaction_add_update(
+					transaction, "HEAD",
+					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
+					&u->new_oid, &u->old_oid, u->msg);
+			string_list_insert(&affected_refnames, new_update->refname);
+		}
+
+		ret = read_ref_without_reload(stack, rewritten_ref,
+					      &current_oid, &referent, &u->type);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 && (!(u->flags & REF_HAVE_OLD) || is_null_oid(&u->old_oid))) {
+			/*
+			 * The reference does not exist, and we either have no
+			 * old object ID or expect the reference to not exist.
+			 * We can thus skip below safety checks as well as the
+			 * symref splitting. But we do want to verify that
+			 * there is no conflicting reference here so that we
+			 * can output a proper error message instead of failing
+			 * at a later point.
+			 */
+			ret = refs_verify_refname_available(ref_store, u->refname,
+							    &affected_refnames, NULL, err);
+			if (ret < 0)
+				goto done;
+
+			/*
+			 * There is no need to write the reference deletion
+			 * when the reference in question doesn't exist.
+			 */
+			 if (u->flags & REF_HAVE_NEW && !is_null_oid(&u->new_oid)) {
+				 ret = queue_transaction_update(refs, tx_data, u,
+								&current_oid, err);
+				 if (ret)
+					 goto done;
+			 }
+
+			continue;
+		}
+		if (ret > 0) {
+			/* The reference does not exist, but we expected it to. */
+			strbuf_addf(err, _("cannot lock ref '%s': "
+				    "unable to resolve reference '%s'"),
+				    original_update_refname(u), u->refname);
+			ret = -1;
+			goto done;
+		}
+
+		if (u->type & REF_ISSYMREF) {
+			/*
+			 * The reftable stack is locked at this point already,
+			 * so it is safe to call `refs_resolve_ref_unsafe()`
+			 * here without causing races.
+			 */
+			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
+								       &current_oid, NULL);
+
+			if (u->flags & REF_NO_DEREF) {
+				if (u->flags & REF_HAVE_OLD && !resolved) {
+					strbuf_addf(err, _("cannot lock ref '%s': "
+						    "error reading reference"), u->refname);
+					ret = -1;
+					goto done;
+				}
+			} else {
+				struct ref_update *new_update;
+				int new_flags;
+
+				new_flags = u->flags;
+				if (!strcmp(rewritten_ref, "HEAD"))
+					new_flags |= REF_UPDATE_VIA_HEAD;
+
+				/*
+				 * If we are updating a symref (eg. HEAD), we should also
+				 * update the branch that the symref points to.
+				 *
+				 * This is generic functionality, and would be better
+				 * done in refs.c, but the current implementation is
+				 * intertwined with the locking in files-backend.c.
+				 */
+				new_update = ref_transaction_add_update(
+						transaction, referent.buf, new_flags,
+						&u->new_oid, &u->old_oid, u->msg);
+				new_update->parent_update = u;
+
+				/*
+				 * Change the symbolic ref update to log only. Also, it
+				 * doesn't need to check its old OID value, as that will be
+				 * done when new_update is processed.
+				 */
+				u->flags |= REF_LOG_ONLY | REF_NO_DEREF;
+				u->flags &= ~REF_HAVE_OLD;
+
+				if (string_list_has_string(&affected_refnames, new_update->refname)) {
+					strbuf_addf(err,
+						    _("multiple updates for '%s' (including one "
+						    "via symref '%s') are not allowed"),
+						    referent.buf, u->refname);
+					ret = TRANSACTION_NAME_CONFLICT;
+					goto done;
+				}
+				string_list_insert(&affected_refnames, new_update->refname);
+			}
+		}
+
+		/*
+		 * Verify that the old object matches our expectations. Note
+		 * that the error messages here do not make a lot of sense in
+		 * the context of the reftable backend as we never lock
+		 * individual refs. But the error messages match what the files
+		 * backend returns, which keeps our tests happy.
+		 */
+		if (u->flags & REF_HAVE_OLD && !oideq(&current_oid, &u->old_oid)) {
+			if (is_null_oid(&u->old_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference already exists"),
+					    original_update_refname(u));
+			else if (is_null_oid(&current_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference is missing but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&u->old_oid));
+			else
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "is at %s but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&current_oid),
+					    oid_to_hex(&u->old_oid));
+			ret = -1;
+			goto done;
+		}
+
+		/*
+		 * If all of the following conditions are true:
+		 *
+		 *   - We're not about to write a symref.
+		 *   - We're not about to write a log-only entry.
+		 *   - Old and new object ID are different.
+		 *
+		 * Then we're essentially doing a no-op update that can be
+		 * skipped. This is not only for the sake of efficiency, but
+		 * also skips writing unneeded reflog entries.
+		 */
+		if ((u->type & REF_ISSYMREF) ||
+		    (u->flags & REF_LOG_ONLY) ||
+		    (u->flags & REF_HAVE_NEW && !oideq(&current_oid, &u->new_oid))) {
+			ret = queue_transaction_update(refs, tx_data, u,
+						       &current_oid, err);
+			if (ret)
+				goto done;
+		}
+	}
+
+	transaction->backend_data = tx_data;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret < 0) {
+		free_transaction_data(tx_data);
+		transaction->state = REF_TRANSACTION_CLOSED;
+		if (!err->len)
+			strbuf_addf(err, _("reftable: transaction prepare: %s"),
+				    reftable_error_str(ret));
+	}
+	string_list_clear(&affected_refnames, 0);
+	strbuf_release(&referent);
+	strbuf_release(&head_referent);
+
+	return ret;
+}
+
+static int reftable_be_transaction_abort(struct ref_store *ref_store,
+					 struct ref_transaction *transaction,
+					 struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	return 0;
+}
+
+static int transaction_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
+		      ((struct reftable_transaction_update *)b)->update->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_transaction_table_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record *logs = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret = 0;
+
+	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < arg->updates_nr; i++) {
+		struct reftable_transaction_update *tx_update = &arg->updates[i];
+		struct ref_update *u = tx_update->update;
+
+		/*
+		 * Write a reflog entry when updating a ref to point to
+		 * something new in either of the following cases:
+		 *
+		 * - The reference is about to be deleted. We always want to
+		 *   delete the reflog in that case.
+		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
+		 *   the reflog entry.
+		 * - `core.logAllRefUpdates` tells us to create the reflog for
+		 *   the given ref.
+		 */
+		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
+			struct reftable_log_record log = {0};
+			struct reftable_iterator it = {0};
+
+			/*
+			 * When deleting refs we also delete all reflog entries
+			 * with them. While it is not strictly required to
+			 * delete reflogs together with their refs, this
+			 * matches the behaviour of the files backend.
+			 *
+			 * Unfortunately, we have no better way than to delete
+			 * all reflog entries one by one.
+			 */
+			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
+			while (ret == 0) {
+				struct reftable_log_record *tombstone;
+
+				ret = reftable_iterator_next_log(&it, &log);
+				if (ret < 0)
+					break;
+				if (ret > 0 || strcmp(log.refname, u->refname)) {
+					ret = 0;
+					break;
+				}
+
+				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+				tombstone = &logs[logs_nr++];
+				tombstone->refname = xstrdup(u->refname);
+				tombstone->value_type = REFTABLE_LOG_DELETION;
+				tombstone->update_index = log.update_index;
+			}
+
+			reftable_log_record_release(&log);
+			reftable_iterator_destroy(&it);
+
+			if (ret)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW &&
+			   (u->flags & REF_FORCE_CREATE_REFLOG ||
+			    should_write_log(&arg->refs->base, u->refname))) {
+			struct reftable_log_record *log;
+
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			log = &logs[logs_nr++];
+			memset(log, 0, sizeof(*log));
+
+			fill_reftable_log_record(log);
+			log->update_index = ts;
+			log->refname = xstrdup(u->refname);
+			log->value.update.new_hash = u->new_oid.hash;
+			log->value.update.old_hash = tx_update->current_oid.hash;
+			log->value.update.message =
+				xstrndup(u->msg, arg->refs->write_options.block_size / 2);
+		}
+
+		if (u->flags & REF_LOG_ONLY)
+			continue;
+
+		if (u->flags & REF_HAVE_NEW && is_null_oid(&u->new_oid)) {
+			struct reftable_ref_record ref = {
+				.refname = (char *)u->refname,
+				.update_index = ts,
+				.value_type = REFTABLE_REF_DELETION,
+			};
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = {0};
+			struct object_id peeled;
+			int peel_error;
+
+			ref.refname = (char *)u->refname;
+			ref.update_index = ts;
+
+			peel_error = peel_object(&u->new_oid, &peeled);
+			if (!peel_error) {
+				ref.value_type = REFTABLE_REF_VAL2;
+				memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+				memcpy(ref.value.val2.value, u->new_oid.hash, GIT_MAX_RAWSZ);
+			} else if (!is_null_oid(&u->new_oid)) {
+				ref.value_type = REFTABLE_REF_VAL1;
+				memcpy(ref.value.val1, u->new_oid.hash, GIT_MAX_RAWSZ);
+			}
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		}
+	}
+
+	/*
+	 * Logs are written at the end so that we do not have intermixed ref
+	 * and log blocks.
+	 */
+	if (logs) {
+		ret = reftable_writer_add_logs(writer, logs, logs_nr);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	for (i = 0; i < logs_nr; i++)
+		clear_reftable_log_record(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_transaction_finish(struct ref_store *ref_store,
+					  struct ref_transaction *transaction,
+					  struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	int ret = 0;
+
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		ret = reftable_addition_add(tx_data->args[i].addition,
+					    write_transaction_table, &tx_data->args[i]);
+		if (ret < 0)
+			goto done;
+
+		ret = reftable_addition_commit(tx_data->args[i].addition);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+
+	if (ret) {
+		strbuf_addf(err, _("reftable: transaction failure: %s"),
+			    reftable_error_str(ret));
+		return -1;
+	}
+	return ret;
+}
+
+static int reftable_be_initial_transaction_commit(struct ref_store *ref_store UNUSED,
+						  struct ref_transaction *transaction,
+						  struct strbuf *err)
+{
+	return ref_transaction_commit(transaction, err);
+}
+
+static int reftable_be_pack_refs(struct ref_store *ref_store,
+				 struct pack_refs_opts *opts)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB, "pack_refs");
+	struct reftable_stack *stack;
+	int ret;
+
+	if (refs->err)
+		return refs->err;
+
+	stack = refs->worktree_stack;
+	if (!stack)
+		stack = refs->main_stack;
+
+	ret = reftable_stack_compact_all(stack, NULL);
+	if (ret)
+		goto out;
+	ret = reftable_stack_clean(stack);
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_create_symref_arg *create = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(create->stack);
+	struct reftable_ref_record ref = {
+		.refname = (char *)create->refname,
+		.value_type = REFTABLE_REF_SYMREF,
+		.value.symref = (char *)create->target,
+		.update_index = ts,
+	};
+	struct reftable_log_record log = {0};
+	struct object_id new_oid;
+	struct object_id old_oid;
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	ret = reftable_writer_add_ref(writer, &ref);
+	if (ret)
+		return ret;
+
+	/*
+	 * Note that it is important to try and resolve the reference before we
+	 * write the log entry. This is because `should_write_log()` will munge
+	 * `core.logAllRefUpdates`, which is undesirable when we create a new
+	 * repository because it would be written into the config. As HEAD will
+	 * not resolve for new repositories this ordering will ensure that this
+	 * never happens.
+	 */
+	if (!create->logmsg ||
+	    !refs_resolve_ref_unsafe(&create->refs->base, create->target,
+				     RESOLVE_REF_READING, &new_oid, NULL) ||
+	    !should_write_log(&create->refs->base, create->refname))
+		return 0;
+
+	fill_reftable_log_record(&log);
+	log.refname = xstrdup(create->refname);
+	log.update_index = ts;
+	log.value.update.message = xstrndup(create->logmsg,
+					    create->refs->write_options.block_size / 2);
+	log.value.update.new_hash = new_oid.hash;
+	if (refs_resolve_ref_unsafe(&create->refs->base, create->refname,
+				    RESOLVE_REF_READING, &old_oid, NULL))
+		log.value.update.old_hash = old_oid.hash;
+
+	ret = reftable_writer_add_log(writer, &log);
+	clear_reftable_log_record(&log);
+	return ret;
+}
+
+static int reftable_be_create_symref(struct ref_store *ref_store,
+				     const char *refname,
+				     const char *target,
+				     const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_symref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_create_symref_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+		.target = target,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_create_symref_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret)
+		error("unable to write symref for %s: %s", refname,
+		      reftable_error_str(ret));
+	return ret;
+}
+
+struct write_copy_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+	int delete_old;
+};
+
+static int write_copy_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_copy_arg *arg = cb_data;
+	uint64_t deletion_ts, creation_ts;
+	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
+	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
+	struct reftable_log_record old_log = {0}, *logs = NULL;
+	struct reftable_iterator it = {0};
+	struct string_list skip = STRING_LIST_INIT_NODUP;
+	struct strbuf errbuf = STRBUF_INIT;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (reftable_stack_read_ref(arg->stack, arg->oldname, &old_ref)) {
+		ret = error(_("refname %s not found"), arg->oldname);
+		goto done;
+	}
+	if (old_ref.value_type == REFTABLE_REF_SYMREF) {
+		ret = error(_("refname %s is a symbolic ref, copying it is not supported"),
+			    arg->oldname);
+		goto done;
+	}
+
+	/*
+	 * There's nothing to do in case the old and new name are the same, so
+	 * we exit early in that case.
+	 */
+	if (!strcmp(arg->oldname, arg->newname)) {
+		ret = 0;
+		goto done;
+	}
+
+	/*
+	 * Verify that the new refname is available.
+	 */
+	string_list_insert(&skip, arg->oldname);
+	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
+					    NULL, &skip, &errbuf);
+	if (ret < 0) {
+		error("%s", errbuf.buf);
+		goto done;
+	}
+
+	/*
+	 * When deleting the old reference we have to use two update indices:
+	 * once to delete the old ref and its reflog, and once to create the
+	 * new ref and its reflog. They need to be staged with two separate
+	 * indices because the new reflog needs to encode both the deletion of
+	 * the old branch and the creation of the new branch, and we cannot do
+	 * two changes to a reflog in a single update.
+	 */
+	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
+	if (arg->delete_old)
+		creation_ts++;
+	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+
+	/*
+	 * Add the new reference. If this is a rename then we also delete the
+	 * old reference.
+	 */
+	refs[0] = old_ref;
+	refs[0].refname = (char *)arg->newname;
+	refs[0].update_index = creation_ts;
+	if (arg->delete_old) {
+		refs[1].refname = (char *)arg->oldname;
+		refs[1].value_type = REFTABLE_REF_DELETION;
+		refs[1].update_index = deletion_ts;
+	}
+	ret = reftable_writer_add_refs(writer, refs, arg->delete_old ? 2 : 1);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * When deleting the old branch we need to create a reflog entry on the
+	 * new branch name that indicates that the old branch has been deleted
+	 * and then recreated. This is a tad weird, but matches what the files
+	 * backend does.
+	 */
+	if (arg->delete_old) {
+		struct strbuf head_referent = STRBUF_INIT;
+		struct object_id head_oid;
+		int append_head_reflog;
+		unsigned head_type = 0;
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+		fill_reftable_log_record(&logs[logs_nr]);
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs[logs_nr].update_index = deletion_ts;
+		logs[logs_nr].value.update.message =
+			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+		logs[logs_nr].value.update.old_hash = old_ref.value.val1;
+		logs_nr++;
+
+		ret = read_ref_without_reload(arg->stack, "HEAD", &head_oid, &head_referent, &head_type);
+		if (ret < 0)
+			goto done;
+		append_head_reflog = (head_type & REF_ISSYMREF) && !strcmp(head_referent.buf, arg->oldname);
+		strbuf_release(&head_referent);
+
+		/*
+		 * The files backend uses `refs_delete_ref()` to delete the old
+		 * branch name, which will append a reflog entry for HEAD in
+		 * case it points to the old branch.
+		 */
+		if (append_head_reflog) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			logs[logs_nr] = logs[logs_nr - 1];
+			logs[logs_nr].refname = "HEAD";
+			logs_nr++;
+		}
+	}
+
+	/*
+	 * Create the reflog entry for the newly created branch.
+	 */
+	ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+	memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+	fill_reftable_log_record(&logs[logs_nr]);
+	logs[logs_nr].refname = (char *)arg->newname;
+	logs[logs_nr].update_index = creation_ts;
+	logs[logs_nr].value.update.message =
+		xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+	logs[logs_nr].value.update.new_hash = old_ref.value.val1;
+	logs_nr++;
+
+	/*
+	 * In addition to writing the reflog entry for the new branch, we also
+	 * copy over all log entries from the old reflog. Last but not least,
+	 * when renaming we also have to delete all the old reflog entries.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
+	if (ret < 0)
+		return ret;
+
+	while (1) {
+		ret = reftable_iterator_next_log(&it, &old_log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(old_log.refname, arg->oldname)) {
+			ret = 0;
+			break;
+		}
+
+		free(old_log.refname);
+
+		/*
+		 * Copy over the old reflog entry with the new refname.
+		 */
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr] = old_log;
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs_nr++;
+
+		/*
+		 * Delete the old reflog entry in case we are renaming.
+		 */
+		if (arg->delete_old) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+			logs[logs_nr].refname = (char *)arg->oldname;
+			logs[logs_nr].value_type = REFTABLE_LOG_DELETION;
+			logs[logs_nr].update_index = old_log.update_index;
+			logs_nr++;
+		}
+
+		/*
+		 * Transfer ownership of the log record we're iterating over to
+		 * the array of log records. Otherwise, the pointers would get
+		 * free'd or reallocated by the iterator.
+		 */
+		memset(&old_log, 0, sizeof(old_log));
+	}
+
+	ret = reftable_writer_add_logs(writer, logs, logs_nr);
+	if (ret < 0)
+		goto done;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_iterator_destroy(&it);
+	string_list_clear(&skip, 0);
+	strbuf_release(&errbuf);
+	for (i = 0; i < logs_nr; i++) {
+		if (!strcmp(logs[i].refname, "HEAD"))
+			continue;
+		if (logs[i].value.update.old_hash == old_ref.value.val1)
+			logs[i].value.update.old_hash = NULL;
+		if (logs[i].value.update.new_hash == old_ref.value.val1)
+			logs[i].value.update.new_hash = NULL;
+		logs[i].refname = NULL;
+		reftable_log_record_release(&logs[i]);
+	}
+	free(logs);
+	reftable_ref_record_release(&old_ref);
+	reftable_log_record_release(&old_log);
+	return ret;
+}
+
+static int reftable_be_rename_ref(struct ref_store *ref_store,
+				  const char *oldrefname,
+				  const char *newrefname,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "rename_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+		.delete_old = 1,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+static int reftable_be_copy_ref(struct ref_store *ref_store,
+				const char *oldrefname,
+				const char *newrefname,
+				const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "copy_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reftable_reflog_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+	int err;
+};
+
+static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+
+	while (!iter->err) {
+		int flags;
+
+		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
+		if (iter->err)
+			break;
+
+		/*
+		 * We want the refnames that we have reflogs for, so we skip if
+		 * we've already produced this name. This could be faster by
+		 * seeking directly to reflog@update_index==0.
+		 */
+		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
+			continue;
+
+		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
+					     0, &iter->oid, &flags)) {
+			error(_("bad ref for %s"), iter->log.refname);
+			continue;
+		}
+
+		free(iter->last_name);
+		iter->last_name = xstrdup(iter->log.refname);
+		iter->base.refname = iter->log.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
+						 struct object_id *peeled)
+{
+	BUG("reftable reflog iterator cannot be peeled");
+	return -1;
+}
+
+static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+	reftable_log_record_release(&iter->log);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter->last_name);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
+	.advance = reftable_reflog_iterator_advance,
+	.peel = reftable_reflog_iterator_peel,
+	.abort = reftable_reflog_iterator_abort
+};
+
+static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
+								  struct reftable_stack *stack)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_reflog_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
+	iter->refs = refs;
+	iter->base.oid = &iter->oid;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(refs->main_stack);
+	if (ret < 0)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
+	if (ret < 0)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_iterator_begin");
+	struct reftable_reflog_iterator *main_iter, *worktree_iter;
+
+	main_iter = reflog_iterator_for_stack(refs, refs->main_stack);
+	if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
+
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int yield_log_record(struct reftable_log_record *log,
+			    each_reflog_ent_fn fn,
+			    void *cb_data)
+{
+	struct object_id old_oid, new_oid;
+	const char *full_committer;
+
+	oidread(&old_oid, log->value.update.old_hash);
+	oidread(&new_oid, log->value.update.new_hash);
+
+	/*
+	 * When both the old object ID and the new object ID are null
+	 * then this is the reflog existence marker. The caller must
+	 * not be aware of it.
+	 */
+	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
+		return 0;
+
+	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
+				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
+	return fn(&old_oid, &new_oid, full_committer,
+		  log->value.update.time, log->value.update.tz_offset,
+		  log->value.update.message, cb_data);
+}
+
+static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+						   const char *refname,
+						   each_reflog_ent_fn fn,
+						   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			ret = 0;
+			break;
+		}
+
+		ret = yield_log_record(&log, fn, cb_data);
+		if (ret)
+			break;
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
+					   const char *refname,
+					   each_reflog_ent_fn fn,
+					   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	struct reftable_iterator it = {0};
+	size_t logs_alloc = 0, logs_nr = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		struct reftable_log_record log = {0};
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			ret = 0;
+			break;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	for (i = logs_nr; i--;) {
+		ret = yield_log_record(&logs[i], fn, cb_data);
+		if (ret)
+			goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_reflog_exists(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Check whether we get at least one log record for the given ref name.
+	 * If so, the reflog exists, otherwise it doesn't.
+	 */
+	ret = reftable_iterator_next_log(&it, &log);
+	if (ret < 0)
+		goto done;
+	if (ret > 0) {
+		ret = 0;
+		goto done;
+	}
+
+	ret = strcmp(log.refname, refname) == 0;
+
+done:
+	reftable_iterator_destroy(&it);
+	reftable_log_record_release(&log);
+	if (ret < 0)
+		ret = 0;
+	return ret;
+}
+
+struct write_reflog_existence_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	struct reftable_stack *stack;
+};
+
+static int write_reflog_existence_table(struct reftable_writer *writer,
+					void *cb_data)
+{
+	struct write_reflog_existence_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record log = {0};
+	int ret;
+
+	ret = reftable_stack_read_log(arg->stack, arg->refname, &log);
+	if (ret <= 0)
+		goto done;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * The existence entry has both old and new object ID set to the the
+	 * null object ID. Our iterators are aware of this and will not present
+	 * them to their callers.
+	 */
+	log.refname = xstrdup(arg->refname);
+	log.update_index = ts;
+	log.value_type = REFTABLE_LOG_UPDATE;
+	ret = reftable_writer_add_log(writer, &log);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_log_record_release(&log);
+	return ret;
+}
+
+static int reftable_be_create_reflog(struct ref_store *ref_store,
+				     const char *refname,
+				     struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_existence_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_reflog_existence_table, &arg);
+
+done:
+	return ret;
+}
+
+struct write_reflog_delete_arg {
+	struct reftable_stack *stack;
+	const char *refname;
+};
+
+static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_reflog_delete_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	struct reftable_log_record log = {0}, tombstone = {0};
+	struct reftable_iterator it = {0};
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * In order to delete a table we need to delete all reflog entries one
+	 * by one. This is inefficient, but the reftable format does not have a
+	 * better marker right now.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->refname);
+	while (ret == 0) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, arg->refname)) {
+			ret = 0;
+			break;
+		}
+
+		tombstone.refname = (char *)arg->refname;
+		tombstone.value_type = REFTABLE_LOG_DELETION;
+		tombstone.update_index = log.update_index;
+
+		ret = reftable_writer_add_log(writer, &tombstone);
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_delete_reflog(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "delete_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_delete_arg arg = {
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+	ret = reftable_stack_add(stack, &write_reflog_delete_table, &arg);
+
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reflog_expiry_arg {
+	struct reftable_stack *stack;
+	struct reftable_log_record *records;
+	struct object_id update_oid;
+	const char *refname;
+	size_t len;
+};
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct reflog_expiry_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	uint64_t live_records = 0;
+	size_t i;
+	int ret;
+
+	for (i = 0; i < arg->len; i++)
+		if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
+			live_records++;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	if (!is_null_oid(&arg->update_oid)) {
+		struct reftable_ref_record ref = {0};
+		struct object_id peeled;
+
+		ref.refname = (char *)arg->refname;
+		ref.update_index = ts;
+
+		if (!peel_object(&arg->update_oid, &peeled)) {
+			ref.value_type = REFTABLE_REF_VAL2;
+			memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+			memcpy(ref.value.val2.value, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		} else {
+			ref.value_type = REFTABLE_REF_VAL1;
+			memcpy(ref.value.val1, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		}
+
+		ret = reftable_writer_add_ref(writer, &ref);
+		if (ret < 0)
+			return ret;
+	}
+
+	/*
+	 * When there are no more entries left in the reflog we empty it
+	 * completely, but write a placeholder reflog entry that indicates that
+	 * the reflog still exists.
+	 */
+	if (!live_records) {
+		struct reftable_log_record log = {
+			.refname = (char *)arg->refname,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.update_index = ts,
+		};
+
+		ret = reftable_writer_add_log(writer, &log);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < arg->len; i++) {
+		ret = reftable_writer_add_log(writer, &arg->records[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int reftable_be_reflog_expire(struct ref_store *ref_store,
+				     const char *refname,
+				     unsigned int flags,
+				     reflog_expiry_prepare_fn prepare_fn,
+				     reflog_expiry_should_prune_fn should_prune_fn,
+				     reflog_expiry_cleanup_fn cleanup_fn,
+				     void *policy_cb_data)
+{
+	/*
+	 * For log expiry, we write tombstones for every single reflog entry
+	 * that is to be expired. This means that the entries are still
+	 * retrievable by delving into the stack, and expiring entries
+	 * paradoxically takes extra memory. This memory is only reclaimed when
+	 * compacting the reftable stack.
+	 *
+	 * It would be better if the refs backend supported an API that sets a
+	 * criterion for all refs, passing the criterion to pack_refs().
+	 *
+	 * On the plus side, because we do the expiration per ref, we can easily
+	 * insert the reflog existence dummies.
+	 */
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "reflog_expire");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record *logs = NULL;
+	struct reftable_log_record *rewritten = NULL;
+	struct reftable_ref_record ref_record = {0};
+	struct reftable_iterator it = {0};
+	struct reftable_addition *add = NULL;
+	struct reflog_expiry_arg arg = {0};
+	struct object_id oid = {0};
+	uint8_t *last_hash = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_new_addition(&add, stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref_record);
+	if (ret < 0)
+		goto done;
+	if (reftable_ref_record_val1(&ref_record))
+		oidread(&oid, reftable_ref_record_val1(&ref_record));
+	prepare_fn(refname, &oid, policy_cb_data);
+
+	while (1) {
+		struct reftable_log_record log = {0};
+		struct object_id old_oid, new_oid;
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			break;
+		}
+
+		oidread(&old_oid, log.value.update.old_hash);
+		oidread(&new_oid, log.value.update.new_hash);
+
+		/*
+		 * Skip over the reflog existence marker. We will add it back
+		 * in when there are no live reflog records.
+		 */
+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid)) {
+			reftable_log_record_release(&log);
+			continue;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	/*
+	 * We need to rewrite all reflog entries according to the pruning
+	 * callback function:
+	 *
+	 *   - If a reflog entry shall be pruned we mark the record for
+	 *     deletion.
+	 *
+	 *   - Otherwise we may have to rewrite the chain of reflog entries so
+	 *     that gaps created by just-deleted records get backfilled.
+	 */
+	CALLOC_ARRAY(rewritten, logs_nr);
+	for (i = logs_nr; i--;) {
+		struct reftable_log_record *dest = &rewritten[i];
+		struct object_id old_oid, new_oid;
+
+		*dest = logs[i];
+		oidread(&old_oid, logs[i].value.update.old_hash);
+		oidread(&new_oid, logs[i].value.update.new_hash);
+
+		if (should_prune_fn(&old_oid, &new_oid, logs[i].value.update.email,
+				    (timestamp_t)logs[i].value.update.time,
+				    logs[i].value.update.tz_offset,
+				    logs[i].value.update.message,
+				    policy_cb_data)) {
+			dest->value_type = REFTABLE_LOG_DELETION;
+		} else {
+			if ((flags & EXPIRE_REFLOGS_REWRITE) && last_hash)
+				dest->value.update.old_hash = last_hash;
+			last_hash = logs[i].value.update.new_hash;
+		}
+	}
+
+	if (flags & EXPIRE_REFLOGS_UPDATE_REF && last_hash &&
+	    reftable_ref_record_val1(&ref_record))
+		oidread(&arg.update_oid, last_hash);
+
+	arg.records = rewritten;
+	arg.len = logs_nr;
+	arg.stack = stack,
+	arg.refname = refname,
+
+	ret = reftable_addition_add(add, &write_reflog_expiry_table, &arg);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Future improvement: we could skip writing records that were
+	 * not changed.
+	 */
+	if (!(flags & EXPIRE_REFLOGS_DRY_RUN))
+		ret = reftable_addition_commit(add);
+
+done:
+	if (add)
+		cleanup_fn(policy_cb_data);
+	assert(ret != REFTABLE_API_ERROR);
+
+	reftable_ref_record_release(&ref_record);
+	reftable_iterator_destroy(&it);
+	reftable_addition_destroy(add);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	free(rewritten);
+	return ret;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	.name = "reftable",
+	.init = reftable_be_init,
+	.init_db = reftable_be_init_db,
+	.transaction_prepare = reftable_be_transaction_prepare,
+	.transaction_finish = reftable_be_transaction_finish,
+	.transaction_abort = reftable_be_transaction_abort,
+	.initial_transaction_commit = reftable_be_initial_transaction_commit,
+
+	.pack_refs = reftable_be_pack_refs,
+	.create_symref = reftable_be_create_symref,
+	.rename_ref = reftable_be_rename_ref,
+	.copy_ref = reftable_be_copy_ref,
+
+	.iterator_begin = reftable_be_iterator_begin,
+	.read_raw_ref = reftable_be_read_raw_ref,
+	.read_symbolic_ref = reftable_be_read_symbolic_ref,
+
+	.reflog_iterator_begin = reftable_be_reflog_iterator_begin,
+	.for_each_reflog_ent = reftable_be_for_each_reflog_ent,
+	.for_each_reflog_ent_reverse = reftable_be_for_each_reflog_ent_reverse,
+	.reflog_exists = reftable_be_reflog_exists,
+	.create_reflog = reftable_be_create_reflog,
+	.delete_reflog = reftable_be_delete_reflog,
+	.reflog_expire = reftable_be_reflog_expire,
+};
diff --git a/repository.h b/repository.h
index 7a250a6605..1645cef518 100644
--- a/repository.h
+++ b/repository.h
@@ -24,8 +24,9 @@ enum fetch_negotiation_setting {
 	FETCH_NEGOTIATION_NOOP,
 };
 
-#define REF_STORAGE_FORMAT_UNKNOWN 0
-#define REF_STORAGE_FORMAT_FILES   1
+#define REF_STORAGE_FORMAT_UNKNOWN  0
+#define REF_STORAGE_FORMAT_FILES    1
+#define REF_STORAGE_FORMAT_REFTABLE 2
 
 struct repo_settings {
 	int initialized;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
new file mode 100755
index 0000000000..94ee46984a
--- /dev/null
+++ b/t/t0610-reftable-basics.sh
@@ -0,0 +1,887 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+if ! test_have_prereq REFTABLE
+then
+	skip_all='skipping reftable tests; set GIT_TEST_DEFAULT_REF_FORMAT=reftable'
+	test_done
+fi
+
+INVALID_OID=$(test_oid 001)
+
+test_expect_success 'init: creates basic reftable structures' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_path_is_dir repo/.git/reftable &&
+	test_path_is_file repo/.git/reftable/tables.list &&
+	echo reftable >expect &&
+	git -C repo rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via environment variable' '
+	test_when_finished "rm -rf repo" &&
+	GIT_DEFAULT_HASH=sha256 git init repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via option' '
+	test_when_finished "rm -rf repo" &&
+	git init --object-format=sha256 repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing reftable backend succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo for-each-ref >expect &&
+	git init --ref-format=reftable repo &&
+	git -C repo for-each-ref >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing files with reftable backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=reftable repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_success 'init: reinitializing reftable with files backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=reftable repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=files repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_perms () {
+	local perms="$1"
+	local file="$2"
+	local actual=$(ls -l "$file") &&
+
+	case "$actual" in
+	$perms*)
+		: happy
+		;;
+	*)
+		echo "$(basename $2) is not $perms but $actual"
+		false
+		;;
+	esac
+}
+
+for umask in 002 022
+do
+	test_expect_success POSIXPERM 'init: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test 1 = "$(git -C repo config core.sharedrepository)"
+		) &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+test_expect_success 'clone: can clone reftable repository' '
+	test_when_finished "rm -rf repo clone" &&
+	git init repo &&
+	test_commit -C repo message1 file1 &&
+
+	git clone repo cloned &&
+	echo reftable >expect &&
+	git -C cloned rev-parse --show-ref-format >actual &&
+	test_cmp expect actual &&
+	test_path_is_file cloned/file1
+'
+
+test_expect_success 'clone: can clone reffiles into reftable repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=files reffiles &&
+	test_commit -C reffiles A &&
+	git clone --ref-format=reftable ./reffiles reftable &&
+
+	git -C reffiles rev-parse HEAD >expect &&
+	git -C reftable rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone: can clone reftable into reffiles repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=reftable reftable &&
+	test_commit -C reftable A &&
+	git clone --ref-format=files ./reftable reffiles &&
+
+	git -C reftable rev-parse HEAD >expect &&
+	git -C reffiles rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'ref transaction: corrupted tables cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		for f in .git/reftable/*.ref
+		do
+			: >"$f" || return 1
+		done &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: corrupted tables.list cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		echo garbage >.git/reftable/tables.list &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: refuses to write ref causing F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref refs/heads/main/forbidden
+'
+
+test_expect_success 'ref transaction: deleting ref with invalid name fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref -d ../../my-private-file
+'
+
+test_expect_success 'ref transaction: can skip object ID verification' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID 0 &&
+	test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION
+'
+
+test_expect_success 'ref transaction: updating same ref multiple times fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >updates <<-EOF &&
+	update refs/heads/main $A
+	update refs/heads/main $A
+	EOF
+	cat >expect <<-EOF &&
+	fatal: multiple updates for ref ${SQ}refs/heads/main${SQ} not allowed
+	EOF
+	test_must_fail git -C repo update-ref --stdin <updates 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: can delete symbolic self-reference with git-symbolic-ref(1)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo symbolic-ref -d refs/heads/self
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference without --no-deref fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	cat >expect <<-EOF &&
+	error: multiple updates for ${SQ}refs/heads/self${SQ} (including one via symref ${SQ}refs/heads/self${SQ}) are not allowed
+	EOF
+	test_must_fail git -C repo update-ref -d refs/heads/self 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference with --no-deref succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo update-ref -d --no-deref refs/heads/self
+'
+
+test_expect_success 'ref transaction: creating symbolic ref fails with F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >expect <<-EOF &&
+	error: unable to write symref for refs/heads: file/directory conflict
+	EOF
+	test_must_fail git -C repo symbolic-ref refs/heads refs/heads/foo 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: ref deletion' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		HEAD_OID=$(git show-ref -s --verify HEAD) &&
+		cat >expect <<-EOF &&
+		$HEAD_OID refs/heads/main
+		$HEAD_OID refs/tags/file
+		EOF
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		test_must_fail git update-ref -d refs/tags/file $INVALID_OID &&
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		git update-ref -d refs/tags/file $HEAD_OID &&
+		echo "$HEAD_OID refs/heads/main" >expect &&
+		git show-ref >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref transaction: writes cause auto-compaction' '
+	test_when_finished "rm -rf repo" &&
+
+	git init repo &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag A &&
+	test_line_count = 2 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag B &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+check_fsync_events () {
+	local trace="$1" &&
+	shift &&
+
+	cat >expect &&
+	sed -n \
+		-e '/^{"event":"counter",.*"category":"fsync",/ {
+			s/.*"category":"fsync",//;
+			s/}$//;
+			p;
+		}' \
+		<"$trace" >actual &&
+	test_cmp expect actual
+}
+
+# A fix for this is in flight via jc/reftable-core-fsync.
+test_expect_failure 'ref transaction: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference update-ref refs/heads/branch HEAD &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'pack-refs: compacts tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	test_commit -C repo A &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 4 table-files &&
+	test_line_count = 3 repo/.git/reftable/tables.list &&
+
+	git -C repo pack-refs &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'pack-refs: prunes stale tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/stale-table.ref &&
+	git -C repo pack-refs &&
+	test_path_is_missing repo/.git/reftable/stable-ref.ref
+'
+
+test_expect_success 'pack-refs: does not prune non-table files' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/garbage &&
+	git -C repo pack-refs &&
+	test_path_is_file repo/.git/reftable/garbage
+'
+
+for umask in 002 022
+do
+	# A fix for this is in flight via ps/reftable-compacted-tables-permission-fix.
+	test_expect_failure POSIXPERM 'pack-refs: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test_commit -C repo A &&
+			test_line_count = 3 repo/.git/reftable/tables.list
+		) &&
+		git -C repo pack-refs &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+# A fix for this is in flight.
+test_expect_failure 'packed-refs: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+	test_line_count = 2 table-files &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference pack-refs &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'ref iterator: bogus names are flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit --no-tag file &&
+		test-tool ref-store main update-ref msg "refs/heads/bogus..name" $(git rev-parse HEAD) $ZERO_OID REF_SKIP_REFNAME_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$ZERO_OID refs/heads/bogus..name 0xc
+		$(git rev-parse HEAD) refs/heads/main 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref iterator: missing object IDs are not flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test-tool ref-store main update-ref msg "refs/heads/broken-hash" $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$INVALID_OID refs/heads/broken-hash 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: commit and list refs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_write_lines refs/heads/main refs/tags/file >expect &&
+	git -C repo for-each-ref --format="%(refname)" >actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'basic: can write large commit message' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	perl -e "
+		print \"this is a long commit message\" x 50000
+	" >commit-msg &&
+	git -C repo commit --allow-empty --file=../commit-msg
+'
+
+test_expect_success 'basic: show-ref fails with empty repository' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail git -C repo show-ref >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'basic: can check out unborn branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo checkout -b main
+'
+
+test_expect_success 'basic: peeled tags are stored' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	git -C repo tag -m "annotated tag" test_tag HEAD &&
+	for ref in refs/heads/main refs/tags/file refs/tags/test_tag refs/tags/test_tag^{}
+	do
+		echo "$(git -C repo rev-parse "$ref") $ref" || return 1
+	done >expect &&
+	git -C repo show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic: for-each-ref can print symrefs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git branch &&
+		git symbolic-ref refs/heads/sym refs/heads/main &&
+		cat >expected <<-EOF &&
+		refs/heads/main
+		EOF
+		git for-each-ref --format="%(symref)" refs/heads/sym >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'basic: notes' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		write_script fake_editor <<-\EOF &&
+		echo "$MSG" >"$1"
+		echo "$MSG" >&2
+		EOF
+
+		test_commit 1st &&
+		test_commit 2nd &&
+		GIT_EDITOR=./fake_editor MSG=b4 git notes add &&
+		GIT_EDITOR=./fake_editor MSG=b3 git notes edit &&
+		echo b4 >expect &&
+		git notes --ref commits@{1} show >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: stash' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git stash list >expect &&
+		test_line_count = 0 expect &&
+
+		echo hoi >>file.t &&
+		git stash push -m stashed &&
+		git stash list >expect &&
+		test_line_count = 1 expect &&
+
+		git stash clear &&
+		git stash list >expect &&
+		test_line_count = 0 expect
+	)
+'
+
+test_expect_success 'basic: cherry-pick' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git cherry-pick source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'basic: rebase' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git rebase source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'reflog: can delete separate reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit file &&
+		test_commit file2 &&
+		test_commit file3 &&
+		test_commit file4 &&
+		git reflog >actual &&
+		grep file3 actual &&
+
+		git reflog delete HEAD@{1} &&
+		git reflog >actual &&
+		! grep file3 actual
+	)
+'
+
+test_expect_success 'reflog: can switch to previous branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		git checkout -b branch1 &&
+		test_commit file2 &&
+		git checkout -b branch2 &&
+		git switch - &&
+		git rev-parse --symbolic-full-name HEAD >actual &&
+		echo refs/heads/branch1 >expect &&
+		test_cmp actual expect
+	)
+'
+
+test_expect_success 'reflog: copying branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		test_commit file2 &&
+		oid=$(git rev-parse --short HEAD) &&
+		git branch src &&
+		cat >expect <<-EOF &&
+		${oid} dst@{0}: Branch: copied refs/heads/src to refs/heads/dst
+		${oid} dst@{1}: branch: Created from main
+		EOF
+		git branch -c src dst &&
+		git reflog dst >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog: renaming branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git symbolic-ref HEAD refs/heads/before &&
+		test_commit file &&
+		git show-ref >expected.refs &&
+		sed s/before/after/g <expected.refs >expected &&
+		git branch -M after &&
+		git show-ref >actual &&
+		test_cmp expected actual &&
+		echo refs/heads/after >expected &&
+		git symbolic-ref HEAD >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'reflog: can store empty logs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main create-reflog refs/heads/branch &&
+		test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main for-each-reflog-ent-reverse refs/heads/branch >actual &&
+		test_must_be_empty actual
+	)
+'
+
+test_expect_success 'reflog: expiry empties reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit initial &&
+		git checkout -b branch &&
+		test_commit fileA &&
+		test_commit fileB &&
+
+		cat >expect <<-EOF &&
+		commit: fileB
+		commit: fileA
+		branch: Created from HEAD
+		EOF
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_cmp expect actual &&
+
+		git reflog expire branch --expire=all &&
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_must_be_empty actual &&
+		test-tool ref-store main reflog-exists refs/heads/branch
+	)
+'
+
+test_expect_success 'reflog: can be deleted' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		test-tool ref-store main reflog-exists refs/heads/main &&
+		test-tool ref-store main delete-reflog refs/heads/main &&
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/main
+	)
+'
+
+test_expect_success 'reflog: garbage collection deletes reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		for count in $(test_seq 1 10)
+		do
+			test_commit "number $count" file.t $count number-$count ||
+			return 1
+		done &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 10 actual &&
+		grep "commit (initial): number 1" actual &&
+		grep "commit: number 10" actual &&
+
+		git gc &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 0 actual
+	)
+'
+
+test_expect_success 'reflog: updates via HEAD update HEAD reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit main-one &&
+		git checkout -b new-branch &&
+		test_commit new-one &&
+		test_commit new-two &&
+
+		echo new-one >expect &&
+		git log -1 --format=%s HEAD@{1} >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'worktree: adding worktree creates separate stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	test_path_is_file repo/.git/worktrees/worktree/refs/heads &&
+	echo "ref: refs/heads/.invalid" >expect &&
+	test_cmp expect repo/.git/worktrees/worktree/HEAD &&
+	test_path_is_dir repo/.git/worktrees/worktree/reftable &&
+	test_path_is_file repo/.git/worktrees/worktree/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in main repo packs main refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C repo pack-refs &&
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating shared ref updates main stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/heads/shared HEAD &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref updates worktree stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from main repo' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C repo update-ref worktrees/worktree/refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from second worktree' '
+	test_when_finished "rm -rf repo wt1 wt2" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../wt1 &&
+	git -C repo worktree add ../wt2 &&
+	git -C repo pack-refs &&
+	git -C wt1 pack-refs &&
+	git -C wt2 pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C wt1 update-ref worktrees/wt2/refs/bisect/per-worktree HEAD &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 2 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can create shared and per-worktree ref in one transaction' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	cat >stdin <<-EOF &&
+	create worktrees/worktree/refs/bisect/per-worktree HEAD
+	create refs/branches/shared HEAD
+	EOF
+	git -C repo update-ref --stdin <stdin &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can access common refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo file1 &&
+	git -C repo branch branch1 &&
+	git -C repo worktree add ../worktree &&
+
+	echo refs/heads/worktree >expect &&
+	git -C worktree symbolic-ref HEAD >actual &&
+	test_cmp expect actual &&
+	git -C worktree checkout branch1
+'
+
+test_expect_success 'worktree: adds worktree with detached HEAD' '
+	test_when_finished "rm -rf repo worktree" &&
+
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo rev-parse main >expect &&
+
+	git -C repo worktree add --detach ../worktree main &&
+	git -C worktree rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'fetch: accessing FETCH_HEAD special ref works' '
+	test_when_finished "rm -rf repo sub" &&
+
+	git init sub &&
+	test_commit -C sub two &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git init repo &&
+	test_commit -C repo one &&
+	git -C repo fetch ../sub &&
+	git -C repo rev-parse FETCH_HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/t/t0611-reftable-httpd.sh b/t/t0611-reftable-httpd.sh
new file mode 100755
index 0000000000..5e05b9c1f2
--- /dev/null
+++ b/t/t0611-reftable-httpd.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description='reftable HTTPD tests'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+
+test_expect_success 'serving ls-remote' '
+	git init --ref-format=reftable -b main "$REPO" &&
+	cd "$REPO" &&
+	test_commit m1 &&
+	>.git/git-daemon-export-ok &&
+	git ls-remote "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo" | cut -f 2-2 -d "	" >actual &&
+	cat >expect <<-EOF &&
+	HEAD
+	refs/heads/main
+	refs/tags/m1
+	EOF
+	test_cmp actual expect
+'
+
+test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index fc93aa57e6..0fb8757b7d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1750,6 +1750,8 @@ esac
 case "$GIT_DEFAULT_REF_FORMAT" in
 files)
 	test_set_prereq REFFILES;;
+reftable)
+	test_set_prereq REFTABLE;;
 *)
 	echo 2>&1 "error: unknown ref format $GIT_DEFAULT_REF_FORMAT"
 	exit 1
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] ci: add jobs to test with the reftable backend
  2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
  2024-02-05  6:02   ` [PATCH v3 1/2] " Patrick Steinhardt
@ 2024-02-05  6:02   ` Patrick Steinhardt
  2024-02-05 13:10   ` [PATCH v3 0/2] refs: introduce " Karthik Nayak
  2 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-05  6:02 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3294 bytes --]

Add CI jobs for both GitHub Workflows and GitLab CI to run Git with the
new reftable backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 .github/workflows/main.yml | 9 +++++++++
 .gitlab-ci.yml             | 9 +++++++++
 ci/lib.sh                  | 2 +-
 ci/run-build-and-tests.sh  | 3 +++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 4d97da57ec..1b43e49dad 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -266,6 +266,9 @@ jobs:
           - jobname: linux-sha256
             cc: clang
             pool: ubuntu-latest
+          - jobname: linux-reftable
+            cc: clang
+            pool: ubuntu-latest
           - jobname: linux-gcc
             cc: gcc
             cc_package: gcc-8
@@ -277,6 +280,9 @@ jobs:
           - jobname: osx-clang
             cc: clang
             pool: macos-13
+          - jobname: osx-reftable
+            cc: clang
+            pool: macos-13
           - jobname: osx-gcc
             cc: gcc
             cc_package: gcc-13
@@ -287,6 +293,9 @@ jobs:
           - jobname: linux-leaks
             cc: gcc
             pool: ubuntu-latest
+          - jobname: linux-reftable-leaks
+            cc: gcc
+            pool: ubuntu-latest
           - jobname: linux-asan-ubsan
             cc: clang
             pool: ubuntu-latest
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 43bfbd8834..c0fa2fe90b 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -26,6 +26,9 @@ test:linux:
       - jobname: linux-sha256
         image: ubuntu:latest
         CC: clang
+      - jobname: linux-reftable
+        image: ubuntu:latest
+        CC: clang
       - jobname: linux-gcc
         image: ubuntu:20.04
         CC: gcc
@@ -40,6 +43,9 @@ test:linux:
       - jobname: linux-leaks
         image: ubuntu:latest
         CC: gcc
+      - jobname: linux-reftable-leaks
+        image: ubuntu:latest
+        CC: gcc
       - jobname: linux-asan-ubsan
         image: ubuntu:latest
         CC: clang
@@ -79,6 +85,9 @@ test:osx:
       - jobname: osx-clang
         image: macos-13-xcode-14
         CC: clang
+      - jobname: osx-reftable
+        image: macos-13-xcode-14
+        CC: clang
   artifacts:
     paths:
       - t/failed-test-artifacts
diff --git a/ci/lib.sh b/ci/lib.sh
index d5dd2f2697..0a73fc7bd1 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -367,7 +367,7 @@ linux-musl)
 	MAKEFLAGS="$MAKEFLAGS NO_REGEX=Yes ICONV_OMITS_BOM=Yes"
 	MAKEFLAGS="$MAKEFLAGS GIT_TEST_UTF8_LOCALE=C.UTF-8"
 	;;
-linux-leaks)
+linux-leaks|linux-reftable-leaks)
 	export SANITIZE=leak
 	export GIT_TEST_PASSING_SANITIZE_LEAK=true
 	export GIT_TEST_SANITIZE_LEAK_LOG=true
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 7a1466b868..c192bd613c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -37,6 +37,9 @@ linux-clang)
 linux-sha256)
 	export GIT_TEST_DEFAULT_HASH=sha256
 	;;
+linux-reftable|linux-reftable-leaks|osx-reftable)
+	export GIT_TEST_DEFAULT_REF_FORMAT=reftable
+	;;
 pedantic)
 	# Don't run the tests; we only care about whether Git can be
 	# built.
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/2] refs: introduce reftable backend
  2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
  2024-02-05  6:02   ` [PATCH v3 1/2] " Patrick Steinhardt
  2024-02-05  6:02   ` [PATCH v3 2/2] ci: add jobs to test with the " Patrick Steinhardt
@ 2024-02-05 13:10   ` Karthik Nayak
  2 siblings, 0 replies; 23+ messages in thread
From: Karthik Nayak @ 2024-02-05 13:10 UTC (permalink / raw
  To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 3234 bytes --]

Hello,

Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this is the third version of my patch series that introduces the
> reftable backend.
>
> The only real change in this version is that I've pulled in
> kn/for-all-refs at 693e807311 (for-each-ref: avoid filtering on empty
> pattern, 2024-01-29) as a dependency. This series introduces a new
> DO_FOR_EACH_INCLUDE_ALL_REFS flag that we need to handle in the reftable
> backend. With this change all tests should pass again.
>
> Patrick
>
> Patrick Steinhardt (2):
>   refs: introduce reftable backend
>   ci: add jobs to test with the reftable backend
>
>  .github/workflows/main.yml                    |    9 +
>  .gitlab-ci.yml                                |    9 +
>  Documentation/ref-storage-format.txt          |    2 +
>  .../technical/repository-version.txt          |    5 +-
>  Makefile                                      |    1 +
>  ci/lib.sh                                     |    2 +-
>  ci/run-build-and-tests.sh                     |    3 +
>  contrib/workdir/git-new-workdir               |    2 +-
>  path.c                                        |    2 +-
>  path.h                                        |    1 +
>  refs.c                                        |    1 +
>  refs/refs-internal.h                          |    1 +
>  refs/reftable-backend.c                       | 2297 +++++++++++++++++
>  repository.h                                  |    5 +-
>  t/t0610-reftable-basics.sh                    |  887 +++++++
>  t/t0611-reftable-httpd.sh                     |   26 +
>  t/test-lib.sh                                 |    2 +
>  17 files changed, 3248 insertions(+), 7 deletions(-)
>  create mode 100644 refs/reftable-backend.c
>  create mode 100755 t/t0610-reftable-basics.sh
>  create mode 100755 t/t0611-reftable-httpd.sh
>
> Range-diff against v2:
> 1:  d6548dcfad ! 1:  d83e66e980 refs: introduce reftable backend
>     @@ refs/reftable-backend.c (new)
>      +{
>      +	switch (log->value_type) {
>      +	case REFTABLE_LOG_UPDATE:
>     -+		/* when we write log records, the hashes are owned by a struct
>     -+		 * oid */
>     ++		/*
>     ++		 * When we write log records, the hashes are owned by the
>     ++		 * caller and thus shouldn't be free'd.
>     ++		 */
>      +		log->value.update.old_hash = NULL;
>      +		log->value.update.new_hash = NULL;
>      +		break;
>     @@ refs/reftable-backend.c (new)
>      +			break;
>      +
>      +		/*
>     -+		 * The files backend only lists references contained in
>     -+		 * "refs/". We emulate the same behaviour here and thus skip
>     -+		 * all references that don't start with this prefix.
>     ++		 * Unless DO_FOR_EACH_INCLUDE_ALL_REFS is set, we only list
>     ++		 * refs starting with "refs/" to mimic the "files" backend.
>      +		 */
>     -+		if (!starts_with(iter->ref.refname, "refs/"))
>     ++		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
>     ++		    !starts_with(iter->ref.refname, "refs/"))
>      +			continue;
>      +
>      +		if (iter->prefix &&
> 2:  63eafc9f5b = 2:  146bb95c03 ci: add jobs to test with the reftable backend

This range-diff looks good to me. I'd already reviewed the previous
version, so overall looks good now.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 0/2] refs: introduce reftable backend
  2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
@ 2024-02-07  7:20 ` Patrick Steinhardt
  2024-02-07  7:20   ` [PATCH v4 1/2] " Patrick Steinhardt
  2024-02-07  7:20   ` [PATCH v4 2/2] ci: add jobs to test with the " Patrick Steinhardt
  5 siblings, 2 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-07  7:20 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 5421 bytes --]

Hi,

this is the fourth version of my patch series that introduces the
reftable backend.

Changes compared to v3:

  - kn/for-all-refs has been reverted due to discussions around its user
    interface. I have thus evicted the patch series as a dependency and
    dropped handling of DO_FOR_EACH_INCLUDE_ALL_REFS.

  - ps/reftable-compacted-tables-permission-fix has been merged to
    "master" now, fixing permissions on "tables.list" when compacting
    reftables. I've marked the corresponding test as succeeding.

  - jc/reftable-core-fsync has been merged to "master" now, which adds
    fsyncing logic to reftable. I've marked the corresponding test as
    succeeding.

  - I noticed that the fsync tests fail on macOS because there we use a
    different fsync method by default. I fixed that by explicitly saying
    which fsync method should be used in the corresponding tests.

  - I also noticed that the second fsync test reused "trace2.txt" from
    the first fsync test because it only appends to the file. Thus, we
    saw two fsync events instead of one. I fixed that by truncating the
    file.

The patch series is based on the current "master" branch at 235986be82
(The fourteenth batch, 2024-02-06).

Thanks!

Patrick

Patrick Steinhardt (2):
  refs: introduce reftable backend
  ci: add jobs to test with the reftable backend

 .github/workflows/main.yml                    |    9 +
 .gitlab-ci.yml                                |    9 +
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 ci/lib.sh                                     |    2 +-
 ci/run-build-and-tests.sh                     |    3 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2297 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 17 files changed, 3248 insertions(+), 7 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

Range-diff against v3:
1:  d83e66e980 ! 1:  5de60d46bd refs: introduce reftable backend
    @@ refs/reftable-backend.c (new)
     +			break;
     +
     +		/*
    -+		 * Unless DO_FOR_EACH_INCLUDE_ALL_REFS is set, we only list
    -+		 * refs starting with "refs/" to mimic the "files" backend.
    ++		 * The files backend only lists references contained in
    ++		 * "refs/". We emulate the same behaviour here and thus skip
    ++		 * all references that don't start with this prefix.
     +		 */
    -+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
    -+		    !starts_with(iter->ref.refname, "refs/"))
    ++		if (!starts_with(iter->ref.refname, "refs/"))
     +			continue;
     +
     +		if (iter->prefix &&
    @@ t/t0610-reftable-basics.sh (new)
     +	test_cmp expect actual
     +}
     +
    -+# A fix for this is in flight via jc/reftable-core-fsync.
    -+test_expect_failure 'ref transaction: writes are synced' '
    ++test_expect_success 'ref transaction: writes are synced' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	test_commit -C repo initial &&
     +
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +	GIT_TEST_FSYNC=true \
    -+		git -C repo -c core.fsync=reference update-ref refs/heads/branch HEAD &&
    ++		git -C repo -c core.fsync=reference \
    ++		-c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
     +	check_fsync_events trace2.txt <<-EOF
     +	"name":"hardware-flush","count":2
     +	EOF
    @@ t/t0610-reftable-basics.sh (new)
     +
     +for umask in 002 022
     +do
    -+	# A fix for this is in flight via ps/reftable-compacted-tables-permission-fix.
    -+	test_expect_failure POSIXPERM 'pack-refs: honors core.sharedRepository' '
    ++	test_expect_success POSIXPERM 'pack-refs: honors core.sharedRepository' '
     +		test_when_finished "rm -rf repo" &&
     +		(
     +			umask $umask &&
    @@ t/t0610-reftable-basics.sh (new)
     +	'
     +done
     +
    -+# A fix for this is in flight.
    -+test_expect_failure 'packed-refs: writes are synced' '
    ++test_expect_success 'packed-refs: writes are synced' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	test_commit -C repo initial &&
     +	test_line_count = 2 table-files &&
     +
    ++	: >trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +	GIT_TEST_FSYNC=true \
    -+		git -C repo -c core.fsync=reference pack-refs &&
    ++		git -C repo -c core.fsync=reference \
    ++		-c core.fsyncMethod=fsync pack-refs &&
     +	check_fsync_events trace2.txt <<-EOF
     +	"name":"hardware-flush","count":2
     +	EOF
2:  146bb95c03 = 2:  30e5feb28c ci: add jobs to test with the reftable backend

base-commit: 235986be822c9f8689be2e9a0b7804d0b1b6d821
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/2] refs: introduce reftable backend
  2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
@ 2024-02-07  7:20   ` Patrick Steinhardt
  2024-02-07 22:31     ` Jeff King
  2024-02-07  7:20   ` [PATCH v4 2/2] ci: add jobs to test with the " Patrick Steinhardt
  1 sibling, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-07  7:20 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 118017 bytes --]

Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.

This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:

  - It becomes possible to do truly atomic writes where either all refs
    are committed to disk or none are. This was not possible with the
    "files" backend because ref updates were split across multiple loose
    files.

  - The disk space required to store many refs is reduced, both compared
    to loose refs and packed-refs. This is enabled both by the reftable
    format being a binary format, which is more compact, and by prefix
    compression.

  - We can ignore filesystem-specific behaviour as ref names are not
    encoded via paths anymore. This means there is no need to handle
    case sensitivity on Windows systems or Unicode precomposition on
    macOS.

  - There is no need to rewrite the complete refdb anymore every time a
    ref is being deleted like it was the case for packed-refs. This
    means that ref deletions are now constant time instead of scaling
    linearly with the number of refs.

  - We can ignore file/directory conflicts so that it becomes possible
    to store both "refs/heads/foo" and "refs/heads/foo/bar".

  - Due to this property we can retain reflogs for deleted refs. We have
    previously been deleting reflogs together with their refs to avoid
    file/directory conflicts, which is not necessary anymore.

  - We can properly enumerate all refs. With the "files" backend it is
    not easily possible to distinguish between refs and non-refs because
    they may live side by side in the gitdir.

Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.

Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:

  - It makes it significantly easier to land the "reftable" backend as
    tests behave the same. It would be tough to argue for each and every
    single test that doesn't pass with the "reftable" backend.

  - It ensures compatibility between repositories that use the "files"
    backend and repositories that use the "reftable" backend. Like this,
    hosters can migrate their repositories to use the "reftable" backend
    without causing issues for clients that use the "files" backend in
    their clones.

It is expected that these artificial limitations may eventually go away
in the long term.

Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:

  - Creating N refs in separate transactions shows that the "files"
    backend is ~50% faster. This is not surprising given that creating a
    ref only requires us to create a single loose ref. The "reftable"
    backend will also perform auto compaction on updates. In real-world
    workloads we would likely also want to perform pack loose refs,
    which would likely change the picture.

        Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
          Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
          Range (min … max):     1.8 ms …   4.3 ms    133 runs

        Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
          Range (min … max):     2.4 ms …   2.9 ms    132 runs

        Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
          Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
          Range (min … max):    1.969 s …  1.980 s    3 runs

        Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
          Range (min … max):    2.597 s …  2.622 s    3 runs

        Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
          Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
          Range (min … max):   198.189 s … 198.670 s    3 runs

        Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
          Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
          Range (min … max):   290.223 s … 298.761 s    3 runs

  - Creating N refs in a single transaction shows that the "files"
    backend is significantly slower once we start to write many refs.
    The "reftable" backend only needs to update two files, whereas the
    "files" backend needs to write one file per ref.

        Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
          Range (min … max):     1.8 ms …   2.6 ms    151 runs

        Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
          Range (min … max):     2.4 ms …   3.4 ms    148 runs

        Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
          Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
          Range (min … max):   148.5 ms … 167.8 ms    15 runs

        Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
          Range (min … max):    56.3 ms …  72.9 ms    40 runs

        Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
          Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
          Range (min … max):   152.165 s … 153.542 s    3 runs

        Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
          Range (min … max):   51.769 s … 52.012 s    3 runs

  - Deleting a ref in a fully-packed repository shows that the "files"
    backend scales with the number of refs. The "reftable" backend has
    constant-time deletions.

        Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
          Range (min … max):     1.6 ms …   2.1 ms    316 runs

        Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
          Range (min … max):     1.7 ms …   2.1 ms    294 runs

        Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
          Range (min … max):     1.9 ms …   2.5 ms    287 runs

        Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
          Range (min … max):     1.8 ms …   2.1 ms    217 runs

        Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
          Range (min … max):   224.6 ms … 245.2 ms    6 runs

        Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
          Range (min … max):     2.0 ms …   2.1 ms    3 runs

  - Listing all refs shows no significant advantage for either of the
    backends. The "files" backend is a bit faster, but not by a
    significant margin. When repositories are not packed the "reftable"
    backend outperforms the "files" backend because the "reftable"
    backend performs auto-compaction.

        Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1729 runs

        Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.8 ms    1816 runs

        Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
          Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
          Range (min … max):     4.1 ms …   4.6 ms    645 runs

        Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
          Range (min … max):     4.2 ms …   5.9 ms    643 runs

        Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
          Range (min … max):    2.511 s …  2.627 s    10 runs

        Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
          Range (min … max):    2.692 s …  2.752 s    10 runs

        Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.9 ms    1834 runs

        Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.0 ms    1840 runs

        Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
          Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
          Range (min … max):    13.3 ms …  14.5 ms    208 runs

        Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
          Range (min … max):     4.3 ms …   6.2 ms    624 runs

        Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
          Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
          Range (min … max):   11.965 s … 12.370 s    10 runs

        Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
          Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
          Range (min … max):    2.769 s …  2.836 s    10 runs

  - Printing a single ref shows no real difference between the "files"
    and "reftable" backends.

        Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
          Range (min … max):     1.4 ms …   1.8 ms    1779 runs

        Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.5 ms    1753 runs

        Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   1.9 ms    1840 runs

        Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1831 runs

        Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1848 runs

        Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1762 runs

So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.

The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".

There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.

As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.

[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/

Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/ref-storage-format.txt          |    2 +
 .../technical/repository-version.txt          |    5 +-
 Makefile                                      |    1 +
 contrib/workdir/git-new-workdir               |    2 +-
 path.c                                        |    2 +-
 path.h                                        |    1 +
 refs.c                                        |    1 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 2297 +++++++++++++++++
 repository.h                                  |    5 +-
 t/t0610-reftable-basics.sh                    |  887 +++++++
 t/t0611-reftable-httpd.sh                     |   26 +
 t/test-lib.sh                                 |    2 +
 13 files changed, 3226 insertions(+), 6 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0610-reftable-basics.sh
 create mode 100755 t/t0611-reftable-httpd.sh

diff --git a/Documentation/ref-storage-format.txt b/Documentation/ref-storage-format.txt
index 1a65cac468..14fff8a9c6 100644
--- a/Documentation/ref-storage-format.txt
+++ b/Documentation/ref-storage-format.txt
@@ -1 +1,3 @@
 * `files` for loose files with packed-refs. This is the default.
+* `reftable` for the reftable format. This format is experimental and its
+  internals are subject to change.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 27be3741e6..47281420fc 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -103,5 +103,6 @@ GIT_COMMON_DIR/worktrees/<id>/config.worktree)
 
 ==== `refStorage`
 
-Specifies the file format for the ref database. The only valid value
-is `files` (loose references with a packed-refs file).
+Specifies the file format for the ref database. The valid values are
+`files` (loose references with a packed-refs file) and `reftable` (see
+Documentation/technical/reftable.txt).
diff --git a/Makefile b/Makefile
index 302b863548..dd7b1cf9dd 100644
--- a/Makefile
+++ b/Makefile
@@ -1127,6 +1127,7 @@ LIB_OBJS += reflog.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
index 888c34a521..989197aace 100755
--- a/contrib/workdir/git-new-workdir
+++ b/contrib/workdir/git-new-workdir
@@ -79,7 +79,7 @@ trap cleanup $siglist
 # create the links to the original repo.  explicitly exclude index, HEAD and
 # logs/HEAD from the list since they are purely related to the current working
 # directory, and should not be shared.
-for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn
+for x in config refs logs/refs objects info hooks packed-refs remotes rr-cache svn reftable
 do
 	# create a containing directory if needed
 	case $x in
diff --git a/path.c b/path.c
index 0fb527918b..8bb223c92c 100644
--- a/path.c
+++ b/path.c
@@ -871,7 +871,7 @@ const char *enter_repo(const char *path, int strict)
 	return NULL;
 }
 
-static int calc_shared_perm(int mode)
+int calc_shared_perm(int mode)
 {
 	int tweak;
 
diff --git a/path.h b/path.h
index b3233c51fa..e053effef2 100644
--- a/path.h
+++ b/path.h
@@ -181,6 +181,7 @@ const char *git_path_shallow(struct repository *r);
 int ends_with_path_components(const char *path, const char *components);
 int validate_headref(const char *ref);
 
+int calc_shared_perm(int mode);
 int adjust_shared_perm(const char *path);
 
 char *interpolate_path(const char *path, int real_home);
diff --git a/refs.c b/refs.c
index c633abf284..fff343c256 100644
--- a/refs.c
+++ b/refs.c
@@ -35,6 +35,7 @@
  */
 static const struct ref_storage_be *refs_backends[] = {
 	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
+	[REF_STORAGE_FORMAT_REFTABLE] = &refs_be_reftable,
 };
 
 static const struct ref_storage_be *find_ref_storage_backend(unsigned int ref_storage_format)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 82219829b0..83e0f0bba3 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..85214baa60
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,2297 @@
+#include "../git-compat-util.h"
+#include "../abspath.h"
+#include "../chdir-notify.h"
+#include "../environment.h"
+#include "../gettext.h"
+#include "../hash.h"
+#include "../hex.h"
+#include "../iterator.h"
+#include "../ident.h"
+#include "../lockfile.h"
+#include "../object.h"
+#include "../path.h"
+#include "../refs.h"
+#include "../reftable/reftable-stack.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-merged.h"
+#include "../setup.h"
+#include "../strmap.h"
+#include "refs-internal.h"
+
+/*
+ * Used as a flag in ref_update::flags when the ref_update was via an
+ * update to HEAD.
+ */
+#define REF_UPDATE_VIA_HEAD (1 << 8)
+
+struct reftable_ref_store {
+	struct ref_store base;
+
+	/*
+	 * The main stack refers to the common dir and thus contains common
+	 * refs as well as refs of the main repository.
+	 */
+	struct reftable_stack *main_stack;
+	/*
+	 * The worktree stack refers to the gitdir in case the refdb is opened
+	 * via a worktree. It thus contains the per-worktree refs.
+	 */
+	struct reftable_stack *worktree_stack;
+	/*
+	 * Map of worktree stacks by their respective worktree names. The map
+	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
+	 */
+	struct strmap worktree_stacks;
+	struct reftable_write_options write_options;
+
+	unsigned int store_flags;
+	int err;
+};
+
+/*
+ * Downcast ref_store to reftable_ref_store. Die if ref_store is not a
+ * reftable_ref_store. required_flags is compared with ref_store's store_flags
+ * to ensure the ref_store has all required capabilities. "caller" is used in
+ * any necessary error messages.
+ */
+static struct reftable_ref_store *reftable_be_downcast(struct ref_store *ref_store,
+						       unsigned int required_flags,
+						       const char *caller)
+{
+	struct reftable_ref_store *refs;
+
+	if (ref_store->be != &refs_be_reftable)
+		BUG("ref_store is type \"%s\" not \"reftables\" in %s",
+		    ref_store->be->name, caller);
+
+	refs = (struct reftable_ref_store *)ref_store;
+
+	if ((refs->store_flags & required_flags) != required_flags)
+		BUG("operation %s requires abilities 0x%x, but only have 0x%x",
+		    caller, required_flags, refs->store_flags);
+
+	return refs;
+}
+
+/*
+ * Some refs are global to the repository (refs/heads/{*}), while others are
+ * local to the worktree (eg. HEAD, refs/bisect/{*}). We solve this by having
+ * multiple separate databases (ie. multiple reftable/ directories), one for
+ * the shared refs, one for the current worktree refs, and one for each
+ * additional worktree. For reading, we merge the view of both the shared and
+ * the current worktree's refs, when necessary.
+ *
+ * This function also optionally assigns the rewritten reference name that is
+ * local to the stack. This translation is required when using worktree refs
+ * like `worktrees/$worktree/refs/heads/foo` as worktree stacks will store
+ * those references in their normalized form.
+ */
+static struct reftable_stack *stack_for(struct reftable_ref_store *store,
+					const char *refname,
+					const char **rewritten_ref)
+{
+	const char *wtname;
+	int wtname_len;
+
+	if (!refname)
+		return store->main_stack;
+
+	switch (parse_worktree_ref(refname, &wtname, &wtname_len, rewritten_ref)) {
+	case REF_WORKTREE_OTHER: {
+		static struct strbuf wtname_buf = STRBUF_INIT;
+		struct strbuf wt_dir = STRBUF_INIT;
+		struct reftable_stack *stack;
+
+		/*
+		 * We're using a static buffer here so that we don't need to
+		 * allocate the worktree name whenever we look up a reference.
+		 * This could be avoided if the strmap interface knew how to
+		 * handle keys with a length.
+		 */
+		strbuf_reset(&wtname_buf);
+		strbuf_add(&wtname_buf, wtname, wtname_len);
+
+		/*
+		 * There is an edge case here: when the worktree references the
+		 * current worktree, then we set up the stack once via
+		 * `worktree_stacks` and once via `worktree_stack`. This is
+		 * wasteful, but in the reading case it shouldn't matter. And
+		 * in the writing case we would notice that the stack is locked
+		 * already and error out when trying to write a reference via
+		 * both stacks.
+		 */
+		stack = strmap_get(&store->worktree_stacks, wtname_buf.buf);
+		if (!stack) {
+			strbuf_addf(&wt_dir, "%s/worktrees/%s/reftable",
+				    store->base.repo->commondir, wtname_buf.buf);
+
+			store->err = reftable_new_stack(&stack, wt_dir.buf,
+							store->write_options);
+			assert(store->err != REFTABLE_API_ERROR);
+			strmap_put(&store->worktree_stacks, wtname_buf.buf, stack);
+		}
+
+		strbuf_release(&wt_dir);
+		return stack;
+	}
+	case REF_WORKTREE_CURRENT:
+		/*
+		 * If there is no worktree stack then we're currently in the
+		 * main worktree. We thus return the main stack in that case.
+		 */
+		if (!store->worktree_stack)
+			return store->main_stack;
+		return store->worktree_stack;
+	case REF_WORKTREE_MAIN:
+	case REF_WORKTREE_SHARED:
+		return store->main_stack;
+	default:
+		BUG("unhandled worktree reference type");
+	}
+}
+
+static int should_write_log(struct ref_store *refs, const char *refname)
+{
+	if (log_all_ref_updates == LOG_REFS_UNSET)
+		log_all_ref_updates = is_bare_repository() ? LOG_REFS_NONE : LOG_REFS_NORMAL;
+
+	switch (log_all_ref_updates) {
+	case LOG_REFS_NONE:
+		return refs_reflog_exists(refs, refname);
+	case LOG_REFS_ALWAYS:
+		return 1;
+	case LOG_REFS_NORMAL:
+		if (should_autocreate_reflog(refname))
+			return 1;
+		return refs_reflog_exists(refs, refname);
+	default:
+		BUG("unhandled core.logAllRefUpdates value %d", log_all_ref_updates);
+	}
+}
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	switch (log->value_type) {
+	case REFTABLE_LOG_UPDATE:
+		/*
+		 * When we write log records, the hashes are owned by the
+		 * caller and thus shouldn't be free'd.
+		 */
+		log->value.update.old_hash = NULL;
+		log->value.update.new_hash = NULL;
+		break;
+	case REFTABLE_LOG_DELETION:
+		break;
+	}
+	reftable_log_record_release(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {0};
+	int sign = 1;
+
+	if (split_ident_line(&split, info, strlen(info)))
+		BUG("failed splitting committer info");
+
+	reftable_log_record_release(log);
+	log->value_type = REFTABLE_LOG_UPDATE;
+	log->value.update.name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->value.update.email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->value.update.time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->value.update.tz_offset = sign * atoi(split.tz_begin);
+}
+
+static int read_ref_without_reload(struct reftable_stack *stack,
+				   const char *refname,
+				   struct object_id *oid,
+				   struct strbuf *referent,
+				   unsigned int *type)
+{
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret)
+		goto done;
+
+	if (ref.value_type == REFTABLE_REF_SYMREF) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.value.symref);
+		*type |= REF_ISSYMREF;
+	} else if (reftable_ref_record_val1(&ref)) {
+		oidread(oid, reftable_ref_record_val1(&ref));
+	} else {
+		/* We got a tombstone, which should not happen. */
+		BUG("unhandled reference value type %d", ref.value_type);
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+static struct ref_store *reftable_be_init(struct repository *repo,
+					  const char *gitdir,
+					  unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct strbuf path = STRBUF_INIT;
+	int is_worktree;
+	mode_t mask;
+
+	mask = umask(0);
+	umask(mask);
+
+	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
+	strmap_init(&refs->worktree_stacks);
+	refs->store_flags = store_flags;
+	refs->write_options.block_size = 4096;
+	refs->write_options.hash_id = repo->hash_algo->format_id;
+	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
+
+	/*
+	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
+	 * This stack contains both the shared and the main worktree refs.
+	 *
+	 * Note that we don't try to resolve the path in case we have a
+	 * worktree because `get_common_dir_noenv()` already does it for us.
+	 */
+	is_worktree = get_common_dir_noenv(&path, gitdir);
+	if (!is_worktree) {
+		strbuf_reset(&path);
+		strbuf_realpath(&path, gitdir, 0);
+	}
+	strbuf_addstr(&path, "/reftable");
+	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
+				       refs->write_options);
+	if (refs->err)
+		goto done;
+
+	/*
+	 * If we're in a worktree we also need to set up the worktree reftable
+	 * stack that is contained in the per-worktree GIT_DIR.
+	 *
+	 * Ideally, we would also add the stack to our worktree stack map. But
+	 * we have no way to figure out the worktree name here and thus can't
+	 * do it efficiently.
+	 */
+	if (is_worktree) {
+		strbuf_reset(&path);
+		strbuf_addf(&path, "%s/reftable", gitdir);
+
+		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
+					       refs->write_options);
+		if (refs->err)
+			goto done;
+	}
+
+	chdir_notify_reparent("reftables-backend $GIT_DIR", &refs->base.gitdir);
+
+done:
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&path);
+	return &refs->base;
+}
+
+static int reftable_be_init_db(struct ref_store *ref_store,
+			       int flags UNUSED,
+			       struct strbuf *err UNUSED)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "init_db");
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addf(&sb, "%s/reftable", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->base.gitdir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	adjust_shared_perm(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->base.gitdir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->base.gitdir);
+	write_file(sb.buf, "this repository uses the reftable format");
+	adjust_shared_perm(sb.buf);
+
+	strbuf_release(&sb);
+	return 0;
+}
+
+struct reftable_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+
+	const char *prefix;
+	unsigned int flags;
+	int err;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	struct reftable_ref_store *refs = iter->refs;
+
+	while (!iter->err) {
+		int flags = 0;
+
+		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
+		if (iter->err)
+			break;
+
+		/*
+		 * The files backend only lists references contained in
+		 * "refs/". We emulate the same behaviour here and thus skip
+		 * all references that don't start with this prefix.
+		 */
+		if (!starts_with(iter->ref.refname, "refs/"))
+			continue;
+
+		if (iter->prefix &&
+		    strncmp(iter->prefix, iter->ref.refname, strlen(iter->prefix))) {
+			iter->err = 1;
+			break;
+		}
+
+		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    parse_worktree_ref(iter->ref.refname, NULL, NULL, NULL) !=
+			    REF_WORKTREE_CURRENT)
+			continue;
+
+		switch (iter->ref.value_type) {
+		case REFTABLE_REF_VAL1:
+			oidread(&iter->oid, iter->ref.value.val1);
+			break;
+		case REFTABLE_REF_VAL2:
+			oidread(&iter->oid, iter->ref.value.val2.value);
+			break;
+		case REFTABLE_REF_SYMREF:
+			if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->ref.refname,
+						     RESOLVE_REF_READING, &iter->oid, &flags))
+				oidclr(&iter->oid);
+			break;
+		default:
+			BUG("unhandled reference value type %d", iter->ref.value_type);
+		}
+
+		if (is_null_oid(&iter->oid))
+			flags |= REF_ISBROKEN;
+
+		if (check_refname_format(iter->ref.refname, REFNAME_ALLOW_ONELEVEL)) {
+			if (!refname_is_safe(iter->ref.refname))
+				die(_("refname is dangerous: %s"), iter->ref.refname);
+			oidclr(&iter->oid);
+			flags |= REF_BAD_NAME | REF_ISBROKEN;
+		}
+
+		if (iter->flags & DO_FOR_EACH_OMIT_DANGLING_SYMREFS &&
+		    flags & REF_ISSYMREF &&
+		    flags & REF_ISBROKEN)
+			continue;
+
+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(iter->ref.refname, refs->base.repo,
+					    &iter->oid, flags))
+				continue;
+
+		iter->base.refname = iter->ref.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+
+	if (iter->ref.value_type == REFTABLE_REF_VAL2) {
+		oidread(peeled, iter->ref.value.val2.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_ref_iterator *iter =
+		(struct reftable_ref_iterator *)ref_iterator;
+	reftable_ref_record_release(&iter->ref);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	.advance = reftable_ref_iterator_advance,
+	.peel = reftable_ref_iterator_peel,
+	.abort = reftable_ref_iterator_abort
+};
+
+static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_store *refs,
+							    struct reftable_stack *stack,
+							    const char *prefix,
+							    int flags)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_ref_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_ref_iterator_vtable, 1);
+	iter->prefix = prefix;
+	iter->base.oid = &iter->oid;
+	iter->flags = flags;
+	iter->refs = refs;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
+	if (ret)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
+					       struct ref_iterator *iter_common,
+					       void *cb_data UNUSED)
+{
+	if (iter_worktree && !iter_common) {
+		/*
+		 * Return the worktree ref if there are no more common refs.
+		 */
+		return ITER_SELECT_0;
+	} else if (iter_common) {
+		/*
+		 * In case we have pending worktree and common refs we need to
+		 * yield them based on their lexicographical order. Worktree
+		 * refs that have the same name as common refs shadow the
+		 * latter.
+		 */
+		if (iter_worktree) {
+			int cmp = strcmp(iter_worktree->refname,
+					 iter_common->refname);
+			if (cmp < 0)
+				return ITER_SELECT_0;
+			else if (!cmp)
+				return ITER_SELECT_0_SKIP_1;
+		}
+
+		 /*
+		  * We now know that the lexicographically-next ref is a common
+		  * ref. When the common ref is a shared one we return it.
+		  */
+		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
+				       NULL) == REF_WORKTREE_SHARED)
+			return ITER_SELECT_1;
+
+		/*
+		 * Otherwise, if the common ref is a per-worktree ref we skip
+		 * it because it would belong to the main worktree, not ours.
+		 */
+		return ITER_SKIP_1;
+	} else {
+		return ITER_DONE;
+	}
+}
+
+static struct ref_iterator *reftable_be_iterator_begin(struct ref_store *ref_store,
+						       const char *prefix,
+						       const char **exclude_patterns,
+						       unsigned int flags)
+{
+	struct reftable_ref_iterator *main_iter, *worktree_iter;
+	struct reftable_ref_store *refs;
+	unsigned int required_flags = REF_STORE_READ;
+
+	if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN))
+		required_flags |= REF_STORE_ODB;
+	refs = reftable_be_downcast(ref_store, required_flags, "ref_iterator_begin");
+
+	main_iter = ref_iterator_for_stack(refs, refs->main_stack, prefix, flags);
+
+	/*
+	 * The worktree stack is only set when we're in an actual worktree
+	 * right now. If we aren't, then we return the common reftable
+	 * iterator, only.
+	 */
+	 if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	/*
+	 * Otherwise we merge both the common and the per-worktree refs into a
+	 * single iterator.
+	 */
+	worktree_iter = ref_iterator_for_stack(refs, refs->worktree_stack, prefix, flags);
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int reftable_be_read_raw_ref(struct ref_store *ref_store,
+				    const char *refname,
+				    struct object_id *oid,
+				    struct strbuf *referent,
+				    unsigned int *type,
+				    int *failure_errno)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = read_ref_without_reload(stack, refname, oid, referent, type);
+	if (ret < 0)
+		return ret;
+	if (ret > 0) {
+		*failure_errno = ENOENT;
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_be_read_symbolic_ref(struct ref_store *ref_store,
+					 const char *refname,
+					 struct strbuf *referent)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "read_symbolic_ref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_ref_record ref = {0};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref);
+	if (ret == 0 && ref.value_type == REFTABLE_REF_SYMREF)
+		strbuf_addstr(referent, ref.value.symref);
+	else
+		ret = -1;
+
+	reftable_ref_record_release(&ref);
+	return ret;
+}
+
+/*
+ * Return the refname under which update was originally requested.
+ */
+static const char *original_update_refname(struct ref_update *update)
+{
+	while (update->parent_update)
+		update = update->parent_update;
+	return update->refname;
+}
+
+struct reftable_transaction_update {
+	struct ref_update *update;
+	struct object_id current_oid;
+};
+
+struct write_transaction_table_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	struct reftable_addition *addition;
+	struct reftable_transaction_update *updates;
+	size_t updates_nr;
+	size_t updates_alloc;
+	size_t updates_expected;
+};
+
+struct reftable_transaction_data {
+	struct write_transaction_table_arg *args;
+	size_t args_nr, args_alloc;
+};
+
+static void free_transaction_data(struct reftable_transaction_data *tx_data)
+{
+	if (!tx_data)
+		return;
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		reftable_addition_destroy(tx_data->args[i].addition);
+		free(tx_data->args[i].updates);
+	}
+	free(tx_data->args);
+	free(tx_data);
+}
+
+/*
+ * Prepare transaction update for the given reference update. This will cause
+ * us to lock the corresponding reftable stack for concurrent modification.
+ */
+static int prepare_transaction_update(struct write_transaction_table_arg **out,
+				      struct reftable_ref_store *refs,
+				      struct reftable_transaction_data *tx_data,
+				      struct ref_update *update,
+				      struct strbuf *err)
+{
+	struct reftable_stack *stack = stack_for(refs, update->refname, NULL);
+	struct write_transaction_table_arg *arg = NULL;
+	size_t i;
+	int ret;
+
+	/*
+	 * Search for a preexisting stack update. If there is one then we add
+	 * the update to it, otherwise we set up a new stack update.
+	 */
+	for (i = 0; !arg && i < tx_data->args_nr; i++)
+		if (tx_data->args[i].stack == stack)
+			arg = &tx_data->args[i];
+
+	if (!arg) {
+		struct reftable_addition *addition;
+
+		ret = reftable_stack_reload(stack);
+		if (ret)
+			return ret;
+
+		ret = reftable_stack_new_addition(&addition, stack);
+		if (ret) {
+			if (ret == REFTABLE_LOCK_ERROR)
+				strbuf_addstr(err, "cannot lock references");
+			return ret;
+		}
+
+		ALLOC_GROW(tx_data->args, tx_data->args_nr + 1,
+			   tx_data->args_alloc);
+		arg = &tx_data->args[tx_data->args_nr++];
+		arg->refs = refs;
+		arg->stack = stack;
+		arg->addition = addition;
+		arg->updates = NULL;
+		arg->updates_nr = 0;
+		arg->updates_alloc = 0;
+		arg->updates_expected = 0;
+	}
+
+	arg->updates_expected++;
+
+	if (out)
+		*out = arg;
+
+	return 0;
+}
+
+/*
+ * Queue a reference update for the correct stack. We potentially need to
+ * handle multiple stack updates in a single transaction when it spans across
+ * multiple worktrees.
+ */
+static int queue_transaction_update(struct reftable_ref_store *refs,
+				    struct reftable_transaction_data *tx_data,
+				    struct ref_update *update,
+				    struct object_id *current_oid,
+				    struct strbuf *err)
+{
+	struct write_transaction_table_arg *arg = NULL;
+	int ret;
+
+	if (update->backend_data)
+		BUG("reference update queued more than once");
+
+	ret = prepare_transaction_update(&arg, refs, tx_data, update, err);
+	if (ret < 0)
+		return ret;
+
+	ALLOC_GROW(arg->updates, arg->updates_nr + 1,
+		   arg->updates_alloc);
+	arg->updates[arg->updates_nr].update = update;
+	oidcpy(&arg->updates[arg->updates_nr].current_oid, current_oid);
+	update->backend_data = &arg->updates[arg->updates_nr++];
+
+	return 0;
+}
+
+static int reftable_be_transaction_prepare(struct ref_store *ref_store,
+					   struct ref_transaction *transaction,
+					   struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
+	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
+	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
+	struct reftable_transaction_data *tx_data = NULL;
+	struct object_id head_oid;
+	unsigned int head_type = 0;
+	size_t i;
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	tx_data = xcalloc(1, sizeof(*tx_data));
+
+	/*
+	 * Preprocess all updates. For one we check that there are no duplicate
+	 * reference updates in this transaction. Second, we lock all stacks
+	 * that will be modified during the transaction.
+	 */
+	for (i = 0; i < transaction->nr; i++) {
+		ret = prepare_transaction_update(NULL, refs, tx_data,
+						 transaction->updates[i], err);
+		if (ret)
+			goto done;
+
+		string_list_append(&affected_refnames,
+				   transaction->updates[i]->refname);
+	}
+
+	/*
+	 * Now that we have counted updates per stack we can preallocate their
+	 * arrays. This avoids having to reallocate many times.
+	 */
+	for (i = 0; i < tx_data->args_nr; i++) {
+		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
+		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
+	}
+
+	/*
+	 * Fail if a refname appears more than once in the transaction.
+	 * This code is taken from the files backend and is a good candidate to
+	 * be moved into the generic layer.
+	 */
+	string_list_sort(&affected_refnames);
+	if (ref_update_reject_duplicates(&affected_refnames, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto done;
+	}
+
+	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
+				      &head_referent, &head_type);
+	if (ret < 0)
+		goto done;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		struct object_id current_oid = {0};
+		struct reftable_stack *stack;
+		const char *rewritten_ref;
+
+		stack = stack_for(refs, u->refname, &rewritten_ref);
+
+		/* Verify that the new object ID is valid. */
+		if ((u->flags & REF_HAVE_NEW) && !is_null_oid(&u->new_oid) &&
+		    !(u->flags & REF_SKIP_OID_VERIFICATION) &&
+		    !(u->flags & REF_LOG_ONLY)) {
+			struct object *o = parse_object(refs->base.repo, &u->new_oid);
+			if (!o) {
+				strbuf_addf(err,
+					    _("trying to write ref '%s' with nonexistent object %s"),
+					    u->refname, oid_to_hex(&u->new_oid));
+				ret = -1;
+				goto done;
+			}
+
+			if (o->type != OBJ_COMMIT && is_branch(u->refname)) {
+				strbuf_addf(err, _("trying to write non-commit object %s to branch '%s'"),
+					    oid_to_hex(&u->new_oid), u->refname);
+				ret = -1;
+				goto done;
+			}
+		}
+
+		/*
+		 * When we update the reference that HEAD points to we enqueue
+		 * a second log-only update for HEAD so that its reflog is
+		 * updated accordingly.
+		 */
+		if (head_type == REF_ISSYMREF &&
+		    !(u->flags & REF_LOG_ONLY) &&
+		    !(u->flags & REF_UPDATE_VIA_HEAD) &&
+		    !strcmp(rewritten_ref, head_referent.buf)) {
+			struct ref_update *new_update;
+
+			/*
+			 * First make sure that HEAD is not already in the
+			 * transaction. This check is O(lg N) in the transaction
+			 * size, but it happens at most once per transaction.
+			 */
+			if (string_list_has_string(&affected_refnames, "HEAD")) {
+				/* An entry already existed */
+				strbuf_addf(err,
+					    _("multiple updates for 'HEAD' (including one "
+					    "via its referent '%s') are not allowed"),
+					    u->refname);
+				ret = TRANSACTION_NAME_CONFLICT;
+				goto done;
+			}
+
+			new_update = ref_transaction_add_update(
+					transaction, "HEAD",
+					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
+					&u->new_oid, &u->old_oid, u->msg);
+			string_list_insert(&affected_refnames, new_update->refname);
+		}
+
+		ret = read_ref_without_reload(stack, rewritten_ref,
+					      &current_oid, &referent, &u->type);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 && (!(u->flags & REF_HAVE_OLD) || is_null_oid(&u->old_oid))) {
+			/*
+			 * The reference does not exist, and we either have no
+			 * old object ID or expect the reference to not exist.
+			 * We can thus skip below safety checks as well as the
+			 * symref splitting. But we do want to verify that
+			 * there is no conflicting reference here so that we
+			 * can output a proper error message instead of failing
+			 * at a later point.
+			 */
+			ret = refs_verify_refname_available(ref_store, u->refname,
+							    &affected_refnames, NULL, err);
+			if (ret < 0)
+				goto done;
+
+			/*
+			 * There is no need to write the reference deletion
+			 * when the reference in question doesn't exist.
+			 */
+			 if (u->flags & REF_HAVE_NEW && !is_null_oid(&u->new_oid)) {
+				 ret = queue_transaction_update(refs, tx_data, u,
+								&current_oid, err);
+				 if (ret)
+					 goto done;
+			 }
+
+			continue;
+		}
+		if (ret > 0) {
+			/* The reference does not exist, but we expected it to. */
+			strbuf_addf(err, _("cannot lock ref '%s': "
+				    "unable to resolve reference '%s'"),
+				    original_update_refname(u), u->refname);
+			ret = -1;
+			goto done;
+		}
+
+		if (u->type & REF_ISSYMREF) {
+			/*
+			 * The reftable stack is locked at this point already,
+			 * so it is safe to call `refs_resolve_ref_unsafe()`
+			 * here without causing races.
+			 */
+			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
+								       &current_oid, NULL);
+
+			if (u->flags & REF_NO_DEREF) {
+				if (u->flags & REF_HAVE_OLD && !resolved) {
+					strbuf_addf(err, _("cannot lock ref '%s': "
+						    "error reading reference"), u->refname);
+					ret = -1;
+					goto done;
+				}
+			} else {
+				struct ref_update *new_update;
+				int new_flags;
+
+				new_flags = u->flags;
+				if (!strcmp(rewritten_ref, "HEAD"))
+					new_flags |= REF_UPDATE_VIA_HEAD;
+
+				/*
+				 * If we are updating a symref (eg. HEAD), we should also
+				 * update the branch that the symref points to.
+				 *
+				 * This is generic functionality, and would be better
+				 * done in refs.c, but the current implementation is
+				 * intertwined with the locking in files-backend.c.
+				 */
+				new_update = ref_transaction_add_update(
+						transaction, referent.buf, new_flags,
+						&u->new_oid, &u->old_oid, u->msg);
+				new_update->parent_update = u;
+
+				/*
+				 * Change the symbolic ref update to log only. Also, it
+				 * doesn't need to check its old OID value, as that will be
+				 * done when new_update is processed.
+				 */
+				u->flags |= REF_LOG_ONLY | REF_NO_DEREF;
+				u->flags &= ~REF_HAVE_OLD;
+
+				if (string_list_has_string(&affected_refnames, new_update->refname)) {
+					strbuf_addf(err,
+						    _("multiple updates for '%s' (including one "
+						    "via symref '%s') are not allowed"),
+						    referent.buf, u->refname);
+					ret = TRANSACTION_NAME_CONFLICT;
+					goto done;
+				}
+				string_list_insert(&affected_refnames, new_update->refname);
+			}
+		}
+
+		/*
+		 * Verify that the old object matches our expectations. Note
+		 * that the error messages here do not make a lot of sense in
+		 * the context of the reftable backend as we never lock
+		 * individual refs. But the error messages match what the files
+		 * backend returns, which keeps our tests happy.
+		 */
+		if (u->flags & REF_HAVE_OLD && !oideq(&current_oid, &u->old_oid)) {
+			if (is_null_oid(&u->old_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference already exists"),
+					    original_update_refname(u));
+			else if (is_null_oid(&current_oid))
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "reference is missing but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&u->old_oid));
+			else
+				strbuf_addf(err, _("cannot lock ref '%s': "
+					    "is at %s but expected %s"),
+					    original_update_refname(u),
+					    oid_to_hex(&current_oid),
+					    oid_to_hex(&u->old_oid));
+			ret = -1;
+			goto done;
+		}
+
+		/*
+		 * If all of the following conditions are true:
+		 *
+		 *   - We're not about to write a symref.
+		 *   - We're not about to write a log-only entry.
+		 *   - Old and new object ID are different.
+		 *
+		 * Then we're essentially doing a no-op update that can be
+		 * skipped. This is not only for the sake of efficiency, but
+		 * also skips writing unneeded reflog entries.
+		 */
+		if ((u->type & REF_ISSYMREF) ||
+		    (u->flags & REF_LOG_ONLY) ||
+		    (u->flags & REF_HAVE_NEW && !oideq(&current_oid, &u->new_oid))) {
+			ret = queue_transaction_update(refs, tx_data, u,
+						       &current_oid, err);
+			if (ret)
+				goto done;
+		}
+	}
+
+	transaction->backend_data = tx_data;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret < 0) {
+		free_transaction_data(tx_data);
+		transaction->state = REF_TRANSACTION_CLOSED;
+		if (!err->len)
+			strbuf_addf(err, _("reftable: transaction prepare: %s"),
+				    reftable_error_str(ret));
+	}
+	string_list_clear(&affected_refnames, 0);
+	strbuf_release(&referent);
+	strbuf_release(&head_referent);
+
+	return ret;
+}
+
+static int reftable_be_transaction_abort(struct ref_store *ref_store,
+					 struct ref_transaction *transaction,
+					 struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	return 0;
+}
+
+static int transaction_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_transaction_update *)a)->update->refname,
+		      ((struct reftable_transaction_update *)b)->update->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_transaction_table_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record *logs = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret = 0;
+
+	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < arg->updates_nr; i++) {
+		struct reftable_transaction_update *tx_update = &arg->updates[i];
+		struct ref_update *u = tx_update->update;
+
+		/*
+		 * Write a reflog entry when updating a ref to point to
+		 * something new in either of the following cases:
+		 *
+		 * - The reference is about to be deleted. We always want to
+		 *   delete the reflog in that case.
+		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
+		 *   the reflog entry.
+		 * - `core.logAllRefUpdates` tells us to create the reflog for
+		 *   the given ref.
+		 */
+		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
+			struct reftable_log_record log = {0};
+			struct reftable_iterator it = {0};
+
+			/*
+			 * When deleting refs we also delete all reflog entries
+			 * with them. While it is not strictly required to
+			 * delete reflogs together with their refs, this
+			 * matches the behaviour of the files backend.
+			 *
+			 * Unfortunately, we have no better way than to delete
+			 * all reflog entries one by one.
+			 */
+			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
+			while (ret == 0) {
+				struct reftable_log_record *tombstone;
+
+				ret = reftable_iterator_next_log(&it, &log);
+				if (ret < 0)
+					break;
+				if (ret > 0 || strcmp(log.refname, u->refname)) {
+					ret = 0;
+					break;
+				}
+
+				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+				tombstone = &logs[logs_nr++];
+				tombstone->refname = xstrdup(u->refname);
+				tombstone->value_type = REFTABLE_LOG_DELETION;
+				tombstone->update_index = log.update_index;
+			}
+
+			reftable_log_record_release(&log);
+			reftable_iterator_destroy(&it);
+
+			if (ret)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW &&
+			   (u->flags & REF_FORCE_CREATE_REFLOG ||
+			    should_write_log(&arg->refs->base, u->refname))) {
+			struct reftable_log_record *log;
+
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			log = &logs[logs_nr++];
+			memset(log, 0, sizeof(*log));
+
+			fill_reftable_log_record(log);
+			log->update_index = ts;
+			log->refname = xstrdup(u->refname);
+			log->value.update.new_hash = u->new_oid.hash;
+			log->value.update.old_hash = tx_update->current_oid.hash;
+			log->value.update.message =
+				xstrndup(u->msg, arg->refs->write_options.block_size / 2);
+		}
+
+		if (u->flags & REF_LOG_ONLY)
+			continue;
+
+		if (u->flags & REF_HAVE_NEW && is_null_oid(&u->new_oid)) {
+			struct reftable_ref_record ref = {
+				.refname = (char *)u->refname,
+				.update_index = ts,
+				.value_type = REFTABLE_REF_DELETION,
+			};
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		} else if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = {0};
+			struct object_id peeled;
+			int peel_error;
+
+			ref.refname = (char *)u->refname;
+			ref.update_index = ts;
+
+			peel_error = peel_object(&u->new_oid, &peeled);
+			if (!peel_error) {
+				ref.value_type = REFTABLE_REF_VAL2;
+				memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+				memcpy(ref.value.val2.value, u->new_oid.hash, GIT_MAX_RAWSZ);
+			} else if (!is_null_oid(&u->new_oid)) {
+				ref.value_type = REFTABLE_REF_VAL1;
+				memcpy(ref.value.val1, u->new_oid.hash, GIT_MAX_RAWSZ);
+			}
+
+			ret = reftable_writer_add_ref(writer, &ref);
+			if (ret < 0)
+				goto done;
+		}
+	}
+
+	/*
+	 * Logs are written at the end so that we do not have intermixed ref
+	 * and log blocks.
+	 */
+	if (logs) {
+		ret = reftable_writer_add_logs(writer, logs, logs_nr);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	for (i = 0; i < logs_nr; i++)
+		clear_reftable_log_record(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_transaction_finish(struct ref_store *ref_store,
+					  struct ref_transaction *transaction,
+					  struct strbuf *err)
+{
+	struct reftable_transaction_data *tx_data = transaction->backend_data;
+	int ret = 0;
+
+	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		ret = reftable_addition_add(tx_data->args[i].addition,
+					    write_transaction_table, &tx_data->args[i]);
+		if (ret < 0)
+			goto done;
+
+		ret = reftable_addition_commit(tx_data->args[i].addition);
+		if (ret < 0)
+			goto done;
+	}
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	free_transaction_data(tx_data);
+	transaction->state = REF_TRANSACTION_CLOSED;
+
+	if (ret) {
+		strbuf_addf(err, _("reftable: transaction failure: %s"),
+			    reftable_error_str(ret));
+		return -1;
+	}
+	return ret;
+}
+
+static int reftable_be_initial_transaction_commit(struct ref_store *ref_store UNUSED,
+						  struct ref_transaction *transaction,
+						  struct strbuf *err)
+{
+	return ref_transaction_commit(transaction, err);
+}
+
+static int reftable_be_pack_refs(struct ref_store *ref_store,
+				 struct pack_refs_opts *opts)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB, "pack_refs");
+	struct reftable_stack *stack;
+	int ret;
+
+	if (refs->err)
+		return refs->err;
+
+	stack = refs->worktree_stack;
+	if (!stack)
+		stack = refs->main_stack;
+
+	ret = reftable_stack_compact_all(stack, NULL);
+	if (ret)
+		goto out;
+	ret = reftable_stack_clean(stack);
+	if (ret)
+		goto out;
+
+out:
+	return ret;
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_create_symref_arg *create = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(create->stack);
+	struct reftable_ref_record ref = {
+		.refname = (char *)create->refname,
+		.value_type = REFTABLE_REF_SYMREF,
+		.value.symref = (char *)create->target,
+		.update_index = ts,
+	};
+	struct reftable_log_record log = {0};
+	struct object_id new_oid;
+	struct object_id old_oid;
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	ret = reftable_writer_add_ref(writer, &ref);
+	if (ret)
+		return ret;
+
+	/*
+	 * Note that it is important to try and resolve the reference before we
+	 * write the log entry. This is because `should_write_log()` will munge
+	 * `core.logAllRefUpdates`, which is undesirable when we create a new
+	 * repository because it would be written into the config. As HEAD will
+	 * not resolve for new repositories this ordering will ensure that this
+	 * never happens.
+	 */
+	if (!create->logmsg ||
+	    !refs_resolve_ref_unsafe(&create->refs->base, create->target,
+				     RESOLVE_REF_READING, &new_oid, NULL) ||
+	    !should_write_log(&create->refs->base, create->refname))
+		return 0;
+
+	fill_reftable_log_record(&log);
+	log.refname = xstrdup(create->refname);
+	log.update_index = ts;
+	log.value.update.message = xstrndup(create->logmsg,
+					    create->refs->write_options.block_size / 2);
+	log.value.update.new_hash = new_oid.hash;
+	if (refs_resolve_ref_unsafe(&create->refs->base, create->refname,
+				    RESOLVE_REF_READING, &old_oid, NULL))
+		log.value.update.old_hash = old_oid.hash;
+
+	ret = reftable_writer_add_log(writer, &log);
+	clear_reftable_log_record(&log);
+	return ret;
+}
+
+static int reftable_be_create_symref(struct ref_store *ref_store,
+				     const char *refname,
+				     const char *target,
+				     const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_symref");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_create_symref_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+		.target = target,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_create_symref_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	if (ret)
+		error("unable to write symref for %s: %s", refname,
+		      reftable_error_str(ret));
+	return ret;
+}
+
+struct write_copy_arg {
+	struct reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+	int delete_old;
+};
+
+static int write_copy_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_copy_arg *arg = cb_data;
+	uint64_t deletion_ts, creation_ts;
+	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
+	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
+	struct reftable_log_record old_log = {0}, *logs = NULL;
+	struct reftable_iterator it = {0};
+	struct string_list skip = STRING_LIST_INIT_NODUP;
+	struct strbuf errbuf = STRBUF_INIT;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (reftable_stack_read_ref(arg->stack, arg->oldname, &old_ref)) {
+		ret = error(_("refname %s not found"), arg->oldname);
+		goto done;
+	}
+	if (old_ref.value_type == REFTABLE_REF_SYMREF) {
+		ret = error(_("refname %s is a symbolic ref, copying it is not supported"),
+			    arg->oldname);
+		goto done;
+	}
+
+	/*
+	 * There's nothing to do in case the old and new name are the same, so
+	 * we exit early in that case.
+	 */
+	if (!strcmp(arg->oldname, arg->newname)) {
+		ret = 0;
+		goto done;
+	}
+
+	/*
+	 * Verify that the new refname is available.
+	 */
+	string_list_insert(&skip, arg->oldname);
+	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
+					    NULL, &skip, &errbuf);
+	if (ret < 0) {
+		error("%s", errbuf.buf);
+		goto done;
+	}
+
+	/*
+	 * When deleting the old reference we have to use two update indices:
+	 * once to delete the old ref and its reflog, and once to create the
+	 * new ref and its reflog. They need to be staged with two separate
+	 * indices because the new reflog needs to encode both the deletion of
+	 * the old branch and the creation of the new branch, and we cannot do
+	 * two changes to a reflog in a single update.
+	 */
+	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
+	if (arg->delete_old)
+		creation_ts++;
+	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+
+	/*
+	 * Add the new reference. If this is a rename then we also delete the
+	 * old reference.
+	 */
+	refs[0] = old_ref;
+	refs[0].refname = (char *)arg->newname;
+	refs[0].update_index = creation_ts;
+	if (arg->delete_old) {
+		refs[1].refname = (char *)arg->oldname;
+		refs[1].value_type = REFTABLE_REF_DELETION;
+		refs[1].update_index = deletion_ts;
+	}
+	ret = reftable_writer_add_refs(writer, refs, arg->delete_old ? 2 : 1);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * When deleting the old branch we need to create a reflog entry on the
+	 * new branch name that indicates that the old branch has been deleted
+	 * and then recreated. This is a tad weird, but matches what the files
+	 * backend does.
+	 */
+	if (arg->delete_old) {
+		struct strbuf head_referent = STRBUF_INIT;
+		struct object_id head_oid;
+		int append_head_reflog;
+		unsigned head_type = 0;
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+		fill_reftable_log_record(&logs[logs_nr]);
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs[logs_nr].update_index = deletion_ts;
+		logs[logs_nr].value.update.message =
+			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+		logs[logs_nr].value.update.old_hash = old_ref.value.val1;
+		logs_nr++;
+
+		ret = read_ref_without_reload(arg->stack, "HEAD", &head_oid, &head_referent, &head_type);
+		if (ret < 0)
+			goto done;
+		append_head_reflog = (head_type & REF_ISSYMREF) && !strcmp(head_referent.buf, arg->oldname);
+		strbuf_release(&head_referent);
+
+		/*
+		 * The files backend uses `refs_delete_ref()` to delete the old
+		 * branch name, which will append a reflog entry for HEAD in
+		 * case it points to the old branch.
+		 */
+		if (append_head_reflog) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			logs[logs_nr] = logs[logs_nr - 1];
+			logs[logs_nr].refname = "HEAD";
+			logs_nr++;
+		}
+	}
+
+	/*
+	 * Create the reflog entry for the newly created branch.
+	 */
+	ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+	memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+	fill_reftable_log_record(&logs[logs_nr]);
+	logs[logs_nr].refname = (char *)arg->newname;
+	logs[logs_nr].update_index = creation_ts;
+	logs[logs_nr].value.update.message =
+		xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
+	logs[logs_nr].value.update.new_hash = old_ref.value.val1;
+	logs_nr++;
+
+	/*
+	 * In addition to writing the reflog entry for the new branch, we also
+	 * copy over all log entries from the old reflog. Last but not least,
+	 * when renaming we also have to delete all the old reflog entries.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
+	if (ret < 0)
+		return ret;
+
+	while (1) {
+		ret = reftable_iterator_next_log(&it, &old_log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(old_log.refname, arg->oldname)) {
+			ret = 0;
+			break;
+		}
+
+		free(old_log.refname);
+
+		/*
+		 * Copy over the old reflog entry with the new refname.
+		 */
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr] = old_log;
+		logs[logs_nr].refname = (char *)arg->newname;
+		logs_nr++;
+
+		/*
+		 * Delete the old reflog entry in case we are renaming.
+		 */
+		if (arg->delete_old) {
+			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+			memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
+			logs[logs_nr].refname = (char *)arg->oldname;
+			logs[logs_nr].value_type = REFTABLE_LOG_DELETION;
+			logs[logs_nr].update_index = old_log.update_index;
+			logs_nr++;
+		}
+
+		/*
+		 * Transfer ownership of the log record we're iterating over to
+		 * the array of log records. Otherwise, the pointers would get
+		 * free'd or reallocated by the iterator.
+		 */
+		memset(&old_log, 0, sizeof(old_log));
+	}
+
+	ret = reftable_writer_add_logs(writer, logs, logs_nr);
+	if (ret < 0)
+		goto done;
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_iterator_destroy(&it);
+	string_list_clear(&skip, 0);
+	strbuf_release(&errbuf);
+	for (i = 0; i < logs_nr; i++) {
+		if (!strcmp(logs[i].refname, "HEAD"))
+			continue;
+		if (logs[i].value.update.old_hash == old_ref.value.val1)
+			logs[i].value.update.old_hash = NULL;
+		if (logs[i].value.update.new_hash == old_ref.value.val1)
+			logs[i].value.update.new_hash = NULL;
+		logs[i].refname = NULL;
+		reftable_log_record_release(&logs[i]);
+	}
+	free(logs);
+	reftable_ref_record_release(&old_ref);
+	reftable_log_record_release(&old_log);
+	return ret;
+}
+
+static int reftable_be_rename_ref(struct ref_store *ref_store,
+				  const char *oldrefname,
+				  const char *newrefname,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "rename_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+		.delete_old = 1,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+static int reftable_be_copy_ref(struct ref_store *ref_store,
+				const char *oldrefname,
+				const char *newrefname,
+				const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "copy_ref");
+	struct reftable_stack *stack = stack_for(refs, newrefname, &newrefname);
+	struct write_copy_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+	ret = reftable_stack_add(stack, &write_copy_table, &arg);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reftable_reflog_iterator {
+	struct ref_iterator base;
+	struct reftable_ref_store *refs;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+	int err;
+};
+
+static int reftable_reflog_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+
+	while (!iter->err) {
+		int flags;
+
+		iter->err = reftable_iterator_next_log(&iter->iter, &iter->log);
+		if (iter->err)
+			break;
+
+		/*
+		 * We want the refnames that we have reflogs for, so we skip if
+		 * we've already produced this name. This could be faster by
+		 * seeking directly to reflog@update_index==0.
+		 */
+		if (iter->last_name && !strcmp(iter->log.refname, iter->last_name))
+			continue;
+
+		if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->log.refname,
+					     0, &iter->oid, &flags)) {
+			error(_("bad ref for %s"), iter->log.refname);
+			continue;
+		}
+
+		free(iter->last_name);
+		iter->last_name = xstrdup(iter->log.refname);
+		iter->base.refname = iter->log.refname;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+
+		break;
+	}
+
+	if (iter->err > 0) {
+		if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+			return ITER_ERROR;
+		return ITER_DONE;
+	}
+
+	if (iter->err < 0) {
+		ref_iterator_abort(ref_iterator);
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_reflog_iterator_peel(struct ref_iterator *ref_iterator,
+						 struct object_id *peeled)
+{
+	BUG("reftable reflog iterator cannot be peeled");
+	return -1;
+}
+
+static int reftable_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_iterator *iter =
+		(struct reftable_reflog_iterator *)ref_iterator;
+	reftable_log_record_release(&iter->log);
+	reftable_iterator_destroy(&iter->iter);
+	free(iter->last_name);
+	free(iter);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
+	.advance = reftable_reflog_iterator_advance,
+	.peel = reftable_reflog_iterator_peel,
+	.abort = reftable_reflog_iterator_abort
+};
+
+static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
+								  struct reftable_stack *stack)
+{
+	struct reftable_merged_table *merged_table;
+	struct reftable_reflog_iterator *iter;
+	int ret;
+
+	iter = xcalloc(1, sizeof(*iter));
+	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
+	iter->refs = refs;
+	iter->base.oid = &iter->oid;
+
+	ret = refs->err;
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_reload(refs->main_stack);
+	if (ret < 0)
+		goto done;
+
+	merged_table = reftable_stack_merged_table(stack);
+
+	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
+	if (ret < 0)
+		goto done;
+
+done:
+	iter->err = ret;
+	return iter;
+}
+
+static struct ref_iterator *reftable_be_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_iterator_begin");
+	struct reftable_reflog_iterator *main_iter, *worktree_iter;
+
+	main_iter = reflog_iterator_for_stack(refs, refs->main_stack);
+	if (!refs->worktree_stack)
+		return &main_iter->base;
+
+	worktree_iter = reflog_iterator_for_stack(refs, refs->worktree_stack);
+
+	return merge_ref_iterator_begin(1, &worktree_iter->base, &main_iter->base,
+					iterator_select, NULL);
+}
+
+static int yield_log_record(struct reftable_log_record *log,
+			    each_reflog_ent_fn fn,
+			    void *cb_data)
+{
+	struct object_id old_oid, new_oid;
+	const char *full_committer;
+
+	oidread(&old_oid, log->value.update.old_hash);
+	oidread(&new_oid, log->value.update.new_hash);
+
+	/*
+	 * When both the old object ID and the new object ID are null
+	 * then this is the reflog existence marker. The caller must
+	 * not be aware of it.
+	 */
+	if (is_null_oid(&old_oid) && is_null_oid(&new_oid))
+		return 0;
+
+	full_committer = fmt_ident(log->value.update.name, log->value.update.email,
+				   WANT_COMMITTER_IDENT, NULL, IDENT_NO_DATE);
+	return fn(&old_oid, &new_oid, full_committer,
+		  log->value.update.time, log->value.update.tz_offset,
+		  log->value.update.message, cb_data);
+}
+
+static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+						   const char *refname,
+						   each_reflog_ent_fn fn,
+						   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			ret = 0;
+			break;
+		}
+
+		ret = yield_log_record(&log, fn, cb_data);
+		if (ret)
+			break;
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
+					   const char *refname,
+					   each_reflog_ent_fn fn,
+					   void *cb_data)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	struct reftable_iterator it = {0};
+	size_t logs_alloc = 0, logs_nr = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	mt = reftable_stack_merged_table(stack);
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	while (!ret) {
+		struct reftable_log_record log = {0};
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			ret = 0;
+			break;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	for (i = logs_nr; i--;) {
+		ret = yield_log_record(&logs[i], fn, cb_data);
+		if (ret)
+			goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	return ret;
+}
+
+static int reftable_be_reflog_exists(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record log = {0};
+	struct reftable_iterator it = {0};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Check whether we get at least one log record for the given ref name.
+	 * If so, the reflog exists, otherwise it doesn't.
+	 */
+	ret = reftable_iterator_next_log(&it, &log);
+	if (ret < 0)
+		goto done;
+	if (ret > 0) {
+		ret = 0;
+		goto done;
+	}
+
+	ret = strcmp(log.refname, refname) == 0;
+
+done:
+	reftable_iterator_destroy(&it);
+	reftable_log_record_release(&log);
+	if (ret < 0)
+		ret = 0;
+	return ret;
+}
+
+struct write_reflog_existence_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	struct reftable_stack *stack;
+};
+
+static int write_reflog_existence_table(struct reftable_writer *writer,
+					void *cb_data)
+{
+	struct write_reflog_existence_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_log_record log = {0};
+	int ret;
+
+	ret = reftable_stack_read_log(arg->stack, arg->refname, &log);
+	if (ret <= 0)
+		goto done;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * The existence entry has both old and new object ID set to the the
+	 * null object ID. Our iterators are aware of this and will not present
+	 * them to their callers.
+	 */
+	log.refname = xstrdup(arg->refname);
+	log.update_index = ts;
+	log.value_type = REFTABLE_LOG_UPDATE;
+	ret = reftable_writer_add_log(writer, &log);
+
+done:
+	assert(ret != REFTABLE_API_ERROR);
+	reftable_log_record_release(&log);
+	return ret;
+}
+
+static int reftable_be_create_reflog(struct ref_store *ref_store,
+				     const char *refname,
+				     struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "create_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_existence_arg arg = {
+		.refs = refs,
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = refs->err;
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		goto done;
+
+	ret = reftable_stack_add(stack, &write_reflog_existence_table, &arg);
+
+done:
+	return ret;
+}
+
+struct write_reflog_delete_arg {
+	struct reftable_stack *stack;
+	const char *refname;
+};
+
+static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct write_reflog_delete_arg *arg = cb_data;
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(arg->stack);
+	struct reftable_log_record log = {0}, tombstone = {0};
+	struct reftable_iterator it = {0};
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int ret;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	/*
+	 * In order to delete a table we need to delete all reflog entries one
+	 * by one. This is inefficient, but the reftable format does not have a
+	 * better marker right now.
+	 */
+	ret = reftable_merged_table_seek_log(mt, &it, arg->refname);
+	while (ret == 0) {
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			break;
+		if (ret > 0 || strcmp(log.refname, arg->refname)) {
+			ret = 0;
+			break;
+		}
+
+		tombstone.refname = (char *)arg->refname;
+		tombstone.value_type = REFTABLE_LOG_DELETION;
+		tombstone.update_index = log.update_index;
+
+		ret = reftable_writer_add_log(writer, &tombstone);
+	}
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	return ret;
+}
+
+static int reftable_be_delete_reflog(struct ref_store *ref_store,
+				     const char *refname)
+{
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "delete_reflog");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct write_reflog_delete_arg arg = {
+		.stack = stack,
+		.refname = refname,
+	};
+	int ret;
+
+	ret = reftable_stack_reload(stack);
+	if (ret)
+		return ret;
+	ret = reftable_stack_add(stack, &write_reflog_delete_table, &arg);
+
+	assert(ret != REFTABLE_API_ERROR);
+	return ret;
+}
+
+struct reflog_expiry_arg {
+	struct reftable_stack *stack;
+	struct reftable_log_record *records;
+	struct object_id update_oid;
+	const char *refname;
+	size_t len;
+};
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_data)
+{
+	struct reflog_expiry_arg *arg = cb_data;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	uint64_t live_records = 0;
+	size_t i;
+	int ret;
+
+	for (i = 0; i < arg->len; i++)
+		if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
+			live_records++;
+
+	reftable_writer_set_limits(writer, ts, ts);
+
+	if (!is_null_oid(&arg->update_oid)) {
+		struct reftable_ref_record ref = {0};
+		struct object_id peeled;
+
+		ref.refname = (char *)arg->refname;
+		ref.update_index = ts;
+
+		if (!peel_object(&arg->update_oid, &peeled)) {
+			ref.value_type = REFTABLE_REF_VAL2;
+			memcpy(ref.value.val2.target_value, peeled.hash, GIT_MAX_RAWSZ);
+			memcpy(ref.value.val2.value, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		} else {
+			ref.value_type = REFTABLE_REF_VAL1;
+			memcpy(ref.value.val1, arg->update_oid.hash, GIT_MAX_RAWSZ);
+		}
+
+		ret = reftable_writer_add_ref(writer, &ref);
+		if (ret < 0)
+			return ret;
+	}
+
+	/*
+	 * When there are no more entries left in the reflog we empty it
+	 * completely, but write a placeholder reflog entry that indicates that
+	 * the reflog still exists.
+	 */
+	if (!live_records) {
+		struct reftable_log_record log = {
+			.refname = (char *)arg->refname,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.update_index = ts,
+		};
+
+		ret = reftable_writer_add_log(writer, &log);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < arg->len; i++) {
+		ret = reftable_writer_add_log(writer, &arg->records[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int reftable_be_reflog_expire(struct ref_store *ref_store,
+				     const char *refname,
+				     unsigned int flags,
+				     reflog_expiry_prepare_fn prepare_fn,
+				     reflog_expiry_should_prune_fn should_prune_fn,
+				     reflog_expiry_cleanup_fn cleanup_fn,
+				     void *policy_cb_data)
+{
+	/*
+	 * For log expiry, we write tombstones for every single reflog entry
+	 * that is to be expired. This means that the entries are still
+	 * retrievable by delving into the stack, and expiring entries
+	 * paradoxically takes extra memory. This memory is only reclaimed when
+	 * compacting the reftable stack.
+	 *
+	 * It would be better if the refs backend supported an API that sets a
+	 * criterion for all refs, passing the criterion to pack_refs().
+	 *
+	 * On the plus side, because we do the expiration per ref, we can easily
+	 * insert the reflog existence dummies.
+	 */
+	struct reftable_ref_store *refs =
+		reftable_be_downcast(ref_store, REF_STORE_WRITE, "reflog_expire");
+	struct reftable_stack *stack = stack_for(refs, refname, &refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record *logs = NULL;
+	struct reftable_log_record *rewritten = NULL;
+	struct reftable_ref_record ref_record = {0};
+	struct reftable_iterator it = {0};
+	struct reftable_addition *add = NULL;
+	struct reflog_expiry_arg arg = {0};
+	struct object_id oid = {0};
+	uint8_t *last_hash = NULL;
+	size_t logs_nr = 0, logs_alloc = 0, i;
+	int ret;
+
+	if (refs->err < 0)
+		return refs->err;
+
+	ret = reftable_stack_reload(stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_new_addition(&add, stack);
+	if (ret < 0)
+		goto done;
+
+	ret = reftable_stack_read_ref(stack, refname, &ref_record);
+	if (ret < 0)
+		goto done;
+	if (reftable_ref_record_val1(&ref_record))
+		oidread(&oid, reftable_ref_record_val1(&ref_record));
+	prepare_fn(refname, &oid, policy_cb_data);
+
+	while (1) {
+		struct reftable_log_record log = {0};
+		struct object_id old_oid, new_oid;
+
+		ret = reftable_iterator_next_log(&it, &log);
+		if (ret < 0)
+			goto done;
+		if (ret > 0 || strcmp(log.refname, refname)) {
+			reftable_log_record_release(&log);
+			break;
+		}
+
+		oidread(&old_oid, log.value.update.old_hash);
+		oidread(&new_oid, log.value.update.new_hash);
+
+		/*
+		 * Skip over the reflog existence marker. We will add it back
+		 * in when there are no live reflog records.
+		 */
+		if (is_null_oid(&old_oid) && is_null_oid(&new_oid)) {
+			reftable_log_record_release(&log);
+			continue;
+		}
+
+		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
+		logs[logs_nr++] = log;
+	}
+
+	/*
+	 * We need to rewrite all reflog entries according to the pruning
+	 * callback function:
+	 *
+	 *   - If a reflog entry shall be pruned we mark the record for
+	 *     deletion.
+	 *
+	 *   - Otherwise we may have to rewrite the chain of reflog entries so
+	 *     that gaps created by just-deleted records get backfilled.
+	 */
+	CALLOC_ARRAY(rewritten, logs_nr);
+	for (i = logs_nr; i--;) {
+		struct reftable_log_record *dest = &rewritten[i];
+		struct object_id old_oid, new_oid;
+
+		*dest = logs[i];
+		oidread(&old_oid, logs[i].value.update.old_hash);
+		oidread(&new_oid, logs[i].value.update.new_hash);
+
+		if (should_prune_fn(&old_oid, &new_oid, logs[i].value.update.email,
+				    (timestamp_t)logs[i].value.update.time,
+				    logs[i].value.update.tz_offset,
+				    logs[i].value.update.message,
+				    policy_cb_data)) {
+			dest->value_type = REFTABLE_LOG_DELETION;
+		} else {
+			if ((flags & EXPIRE_REFLOGS_REWRITE) && last_hash)
+				dest->value.update.old_hash = last_hash;
+			last_hash = logs[i].value.update.new_hash;
+		}
+	}
+
+	if (flags & EXPIRE_REFLOGS_UPDATE_REF && last_hash &&
+	    reftable_ref_record_val1(&ref_record))
+		oidread(&arg.update_oid, last_hash);
+
+	arg.records = rewritten;
+	arg.len = logs_nr;
+	arg.stack = stack,
+	arg.refname = refname,
+
+	ret = reftable_addition_add(add, &write_reflog_expiry_table, &arg);
+	if (ret < 0)
+		goto done;
+
+	/*
+	 * Future improvement: we could skip writing records that were
+	 * not changed.
+	 */
+	if (!(flags & EXPIRE_REFLOGS_DRY_RUN))
+		ret = reftable_addition_commit(add);
+
+done:
+	if (add)
+		cleanup_fn(policy_cb_data);
+	assert(ret != REFTABLE_API_ERROR);
+
+	reftable_ref_record_release(&ref_record);
+	reftable_iterator_destroy(&it);
+	reftable_addition_destroy(add);
+	for (i = 0; i < logs_nr; i++)
+		reftable_log_record_release(&logs[i]);
+	free(logs);
+	free(rewritten);
+	return ret;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	.name = "reftable",
+	.init = reftable_be_init,
+	.init_db = reftable_be_init_db,
+	.transaction_prepare = reftable_be_transaction_prepare,
+	.transaction_finish = reftable_be_transaction_finish,
+	.transaction_abort = reftable_be_transaction_abort,
+	.initial_transaction_commit = reftable_be_initial_transaction_commit,
+
+	.pack_refs = reftable_be_pack_refs,
+	.create_symref = reftable_be_create_symref,
+	.rename_ref = reftable_be_rename_ref,
+	.copy_ref = reftable_be_copy_ref,
+
+	.iterator_begin = reftable_be_iterator_begin,
+	.read_raw_ref = reftable_be_read_raw_ref,
+	.read_symbolic_ref = reftable_be_read_symbolic_ref,
+
+	.reflog_iterator_begin = reftable_be_reflog_iterator_begin,
+	.for_each_reflog_ent = reftable_be_for_each_reflog_ent,
+	.for_each_reflog_ent_reverse = reftable_be_for_each_reflog_ent_reverse,
+	.reflog_exists = reftable_be_reflog_exists,
+	.create_reflog = reftable_be_create_reflog,
+	.delete_reflog = reftable_be_delete_reflog,
+	.reflog_expire = reftable_be_reflog_expire,
+};
diff --git a/repository.h b/repository.h
index 7a250a6605..1645cef518 100644
--- a/repository.h
+++ b/repository.h
@@ -24,8 +24,9 @@ enum fetch_negotiation_setting {
 	FETCH_NEGOTIATION_NOOP,
 };
 
-#define REF_STORAGE_FORMAT_UNKNOWN 0
-#define REF_STORAGE_FORMAT_FILES   1
+#define REF_STORAGE_FORMAT_UNKNOWN  0
+#define REF_STORAGE_FORMAT_FILES    1
+#define REF_STORAGE_FORMAT_REFTABLE 2
 
 struct repo_settings {
 	int initialized;
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
new file mode 100755
index 0000000000..6a131e40b8
--- /dev/null
+++ b/t/t0610-reftable-basics.sh
@@ -0,0 +1,887 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+if ! test_have_prereq REFTABLE
+then
+	skip_all='skipping reftable tests; set GIT_TEST_DEFAULT_REF_FORMAT=reftable'
+	test_done
+fi
+
+INVALID_OID=$(test_oid 001)
+
+test_expect_success 'init: creates basic reftable structures' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_path_is_dir repo/.git/reftable &&
+	test_path_is_file repo/.git/reftable/tables.list &&
+	echo reftable >expect &&
+	git -C repo rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via environment variable' '
+	test_when_finished "rm -rf repo" &&
+	GIT_DEFAULT_HASH=sha256 git init repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: sha256 object format via option' '
+	test_when_finished "rm -rf repo" &&
+	git init --object-format=sha256 repo &&
+	cat >expect <<-EOF &&
+	sha256
+	reftable
+	EOF
+	git -C repo rev-parse --show-object-format --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing reftable backend succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo for-each-ref >expect &&
+	git init --ref-format=reftable repo &&
+	git -C repo for-each-ref >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init: reinitializing files with reftable backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=reftable repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_success 'init: reinitializing reftable with files backend fails' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=reftable repo &&
+	test_commit -C repo file &&
+
+	cp repo/.git/HEAD expect &&
+	test_must_fail git init --ref-format=files repo &&
+	test_cmp expect repo/.git/HEAD
+'
+
+test_expect_perms () {
+	local perms="$1"
+	local file="$2"
+	local actual=$(ls -l "$file") &&
+
+	case "$actual" in
+	$perms*)
+		: happy
+		;;
+	*)
+		echo "$(basename $2) is not $perms but $actual"
+		false
+		;;
+	esac
+}
+
+for umask in 002 022
+do
+	test_expect_success POSIXPERM 'init: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test 1 = "$(git -C repo config core.sharedrepository)"
+		) &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+test_expect_success 'clone: can clone reftable repository' '
+	test_when_finished "rm -rf repo clone" &&
+	git init repo &&
+	test_commit -C repo message1 file1 &&
+
+	git clone repo cloned &&
+	echo reftable >expect &&
+	git -C cloned rev-parse --show-ref-format >actual &&
+	test_cmp expect actual &&
+	test_path_is_file cloned/file1
+'
+
+test_expect_success 'clone: can clone reffiles into reftable repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=files reffiles &&
+	test_commit -C reffiles A &&
+	git clone --ref-format=reftable ./reffiles reftable &&
+
+	git -C reffiles rev-parse HEAD >expect &&
+	git -C reftable rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone: can clone reftable into reffiles repository' '
+	test_when_finished "rm -rf reffiles reftable" &&
+	git init --ref-format=reftable reftable &&
+	test_commit -C reftable A &&
+	git clone --ref-format=files ./reftable reffiles &&
+
+	git -C reftable rev-parse HEAD >expect &&
+	git -C reffiles rev-parse HEAD >actual &&
+	test_cmp expect actual &&
+
+	git -C reftable rev-parse --show-ref-format >actual &&
+	echo reftable >expect &&
+	test_cmp expect actual &&
+
+	git -C reffiles rev-parse --show-ref-format >actual &&
+	echo files >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'ref transaction: corrupted tables cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		for f in .git/reftable/*.ref
+		do
+			: >"$f" || return 1
+		done &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: corrupted tables.list cause failure' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		echo garbage >.git/reftable/tables.list &&
+		test_must_fail git update-ref refs/heads/main HEAD
+	)
+'
+
+test_expect_success 'ref transaction: refuses to write ref causing F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref refs/heads/main/forbidden
+'
+
+test_expect_success 'ref transaction: deleting ref with invalid name fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_must_fail git -C repo update-ref -d ../../my-private-file
+'
+
+test_expect_success 'ref transaction: can skip object ID verification' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID 0 &&
+	test-tool -C repo ref-store main update-ref msg refs/heads/branch $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION
+'
+
+test_expect_success 'ref transaction: updating same ref multiple times fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >updates <<-EOF &&
+	update refs/heads/main $A
+	update refs/heads/main $A
+	EOF
+	cat >expect <<-EOF &&
+	fatal: multiple updates for ref ${SQ}refs/heads/main${SQ} not allowed
+	EOF
+	test_must_fail git -C repo update-ref --stdin <updates 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: can delete symbolic self-reference with git-symbolic-ref(1)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo symbolic-ref -d refs/heads/self
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference without --no-deref fails' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	cat >expect <<-EOF &&
+	error: multiple updates for ${SQ}refs/heads/self${SQ} (including one via symref ${SQ}refs/heads/self${SQ}) are not allowed
+	EOF
+	test_must_fail git -C repo update-ref -d refs/heads/self 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: deleting symbolic self-reference with --no-deref succeeds' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo symbolic-ref refs/heads/self refs/heads/self &&
+	git -C repo update-ref -d --no-deref refs/heads/self
+'
+
+test_expect_success 'ref transaction: creating symbolic ref fails with F/D conflict' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo A &&
+	cat >expect <<-EOF &&
+	error: unable to write symref for refs/heads: file/directory conflict
+	EOF
+	test_must_fail git -C repo symbolic-ref refs/heads refs/heads/foo 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success 'ref transaction: ref deletion' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		HEAD_OID=$(git show-ref -s --verify HEAD) &&
+		cat >expect <<-EOF &&
+		$HEAD_OID refs/heads/main
+		$HEAD_OID refs/tags/file
+		EOF
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		test_must_fail git update-ref -d refs/tags/file $INVALID_OID &&
+		git show-ref >actual &&
+		test_cmp expect actual &&
+
+		git update-ref -d refs/tags/file $HEAD_OID &&
+		echo "$HEAD_OID refs/heads/main" >expect &&
+		git show-ref >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref transaction: writes cause auto-compaction' '
+	test_when_finished "rm -rf repo" &&
+
+	git init repo &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag A &&
+	test_line_count = 2 repo/.git/reftable/tables.list &&
+
+	test_commit -C repo --no-tag B &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+check_fsync_events () {
+	local trace="$1" &&
+	shift &&
+
+	cat >expect &&
+	sed -n \
+		-e '/^{"event":"counter",.*"category":"fsync",/ {
+			s/.*"category":"fsync",//;
+			s/}$//;
+			p;
+		}' \
+		<"$trace" >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success 'ref transaction: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference \
+		-c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'pack-refs: compacts tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	test_commit -C repo A &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 4 table-files &&
+	test_line_count = 3 repo/.git/reftable/tables.list &&
+
+	git -C repo pack-refs &&
+	ls -1 repo/.git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'pack-refs: prunes stale tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/stale-table.ref &&
+	git -C repo pack-refs &&
+	test_path_is_missing repo/.git/reftable/stable-ref.ref
+'
+
+test_expect_success 'pack-refs: does not prune non-table files' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	touch repo/.git/reftable/garbage &&
+	git -C repo pack-refs &&
+	test_path_is_file repo/.git/reftable/garbage
+'
+
+for umask in 002 022
+do
+	test_expect_success POSIXPERM 'pack-refs: honors core.sharedRepository' '
+		test_when_finished "rm -rf repo" &&
+		(
+			umask $umask &&
+			git init --shared=true repo &&
+			test_commit -C repo A &&
+			test_line_count = 3 repo/.git/reftable/tables.list
+		) &&
+		git -C repo pack-refs &&
+		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
+		for table in repo/.git/reftable/*.ref
+		do
+			test_expect_perms "-rw-rw-r--" "$table" ||
+			return 1
+		done
+	'
+done
+
+test_expect_success 'packed-refs: writes are synced' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo initial &&
+	test_line_count = 2 table-files &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
+		git -C repo -c core.fsync=reference \
+		-c core.fsyncMethod=fsync pack-refs &&
+	check_fsync_events trace2.txt <<-EOF
+	"name":"hardware-flush","count":2
+	EOF
+'
+
+test_expect_success 'ref iterator: bogus names are flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit --no-tag file &&
+		test-tool ref-store main update-ref msg "refs/heads/bogus..name" $(git rev-parse HEAD) $ZERO_OID REF_SKIP_REFNAME_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$ZERO_OID refs/heads/bogus..name 0xc
+		$(git rev-parse HEAD) refs/heads/main 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'ref iterator: missing object IDs are not flagged' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test-tool ref-store main update-ref msg "refs/heads/broken-hash" $INVALID_OID $ZERO_OID REF_SKIP_OID_VERIFICATION &&
+
+		cat >expect <<-EOF &&
+		$INVALID_OID refs/heads/broken-hash 0x0
+		EOF
+		test-tool ref-store main for-each-ref "" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: commit and list refs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	test_write_lines refs/heads/main refs/tags/file >expect &&
+	git -C repo for-each-ref --format="%(refname)" >actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'basic: can write large commit message' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	perl -e "
+		print \"this is a long commit message\" x 50000
+	" >commit-msg &&
+	git -C repo commit --allow-empty --file=../commit-msg
+'
+
+test_expect_success 'basic: show-ref fails with empty repository' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_must_fail git -C repo show-ref >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'basic: can check out unborn branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	git -C repo checkout -b main
+'
+
+test_expect_success 'basic: peeled tags are stored' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo file &&
+	git -C repo tag -m "annotated tag" test_tag HEAD &&
+	for ref in refs/heads/main refs/tags/file refs/tags/test_tag refs/tags/test_tag^{}
+	do
+		echo "$(git -C repo rev-parse "$ref") $ref" || return 1
+	done >expect &&
+	git -C repo show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic: for-each-ref can print symrefs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git branch &&
+		git symbolic-ref refs/heads/sym refs/heads/main &&
+		cat >expected <<-EOF &&
+		refs/heads/main
+		EOF
+		git for-each-ref --format="%(symref)" refs/heads/sym >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'basic: notes' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		write_script fake_editor <<-\EOF &&
+		echo "$MSG" >"$1"
+		echo "$MSG" >&2
+		EOF
+
+		test_commit 1st &&
+		test_commit 2nd &&
+		GIT_EDITOR=./fake_editor MSG=b4 git notes add &&
+		GIT_EDITOR=./fake_editor MSG=b3 git notes edit &&
+		echo b4 >expect &&
+		git notes --ref commits@{1} show >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'basic: stash' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file &&
+		git stash list >expect &&
+		test_line_count = 0 expect &&
+
+		echo hoi >>file.t &&
+		git stash push -m stashed &&
+		git stash list >expect &&
+		test_line_count = 1 expect &&
+
+		git stash clear &&
+		git stash list >expect &&
+		test_line_count = 0 expect
+	)
+'
+
+test_expect_success 'basic: cherry-pick' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git cherry-pick source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'basic: rebase' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit message1 file1 &&
+		test_commit message2 file2 &&
+		git branch source &&
+		git checkout HEAD^ &&
+		test_commit message3 file3 &&
+		git rebase source &&
+		test_path_is_file file2
+	)
+'
+
+test_expect_success 'reflog: can delete separate reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit file &&
+		test_commit file2 &&
+		test_commit file3 &&
+		test_commit file4 &&
+		git reflog >actual &&
+		grep file3 actual &&
+
+		git reflog delete HEAD@{1} &&
+		git reflog >actual &&
+		! grep file3 actual
+	)
+'
+
+test_expect_success 'reflog: can switch to previous branch' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		git checkout -b branch1 &&
+		test_commit file2 &&
+		git checkout -b branch2 &&
+		git switch - &&
+		git rev-parse --symbolic-full-name HEAD >actual &&
+		echo refs/heads/branch1 >expect &&
+		test_cmp actual expect
+	)
+'
+
+test_expect_success 'reflog: copying branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit file1 &&
+		test_commit file2 &&
+		oid=$(git rev-parse --short HEAD) &&
+		git branch src &&
+		cat >expect <<-EOF &&
+		${oid} dst@{0}: Branch: copied refs/heads/src to refs/heads/dst
+		${oid} dst@{1}: branch: Created from main
+		EOF
+		git branch -c src dst &&
+		git reflog dst >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'reflog: renaming branch writes reflog entry' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git symbolic-ref HEAD refs/heads/before &&
+		test_commit file &&
+		git show-ref >expected.refs &&
+		sed s/before/after/g <expected.refs >expected &&
+		git branch -M after &&
+		git show-ref >actual &&
+		test_cmp expected actual &&
+		echo refs/heads/after >expected &&
+		git symbolic-ref HEAD >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'reflog: can store empty logs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main create-reflog refs/heads/branch &&
+		test-tool ref-store main reflog-exists refs/heads/branch &&
+		test-tool ref-store main for-each-reflog-ent-reverse refs/heads/branch >actual &&
+		test_must_be_empty actual
+	)
+'
+
+test_expect_success 'reflog: expiry empties reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit initial &&
+		git checkout -b branch &&
+		test_commit fileA &&
+		test_commit fileB &&
+
+		cat >expect <<-EOF &&
+		commit: fileB
+		commit: fileA
+		branch: Created from HEAD
+		EOF
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_cmp expect actual &&
+
+		git reflog expire branch --expire=all &&
+		git reflog show --format="%gs" refs/heads/branch >actual &&
+		test_must_be_empty actual &&
+		test-tool ref-store main reflog-exists refs/heads/branch
+	)
+'
+
+test_expect_success 'reflog: can be deleted' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		test-tool ref-store main reflog-exists refs/heads/main &&
+		test-tool ref-store main delete-reflog refs/heads/main &&
+		test_must_fail test-tool ref-store main reflog-exists refs/heads/main
+	)
+'
+
+test_expect_success 'reflog: garbage collection deletes reflog entries' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		for count in $(test_seq 1 10)
+		do
+			test_commit "number $count" file.t $count number-$count ||
+			return 1
+		done &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 10 actual &&
+		grep "commit (initial): number 1" actual &&
+		grep "commit: number 10" actual &&
+
+		git gc &&
+		git reflog refs/heads/main >actual &&
+		test_line_count = 0 actual
+	)
+'
+
+test_expect_success 'reflog: updates via HEAD update HEAD reflog' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit main-one &&
+		git checkout -b new-branch &&
+		test_commit new-one &&
+		test_commit new-two &&
+
+		echo new-one >expect &&
+		git log -1 --format=%s HEAD@{1} >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'worktree: adding worktree creates separate stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	test_path_is_file repo/.git/worktrees/worktree/refs/heads &&
+	echo "ref: refs/heads/.invalid" >expect &&
+	test_cmp expect repo/.git/worktrees/worktree/HEAD &&
+	test_path_is_dir repo/.git/worktrees/worktree/reftable &&
+	test_path_is_file repo/.git/worktrees/worktree/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in main repo packs main refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C repo pack-refs &&
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo worktree add ../worktree &&
+
+	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 4 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating shared ref updates main stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/heads/shared HEAD &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref updates worktree stack' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C worktree update-ref refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from main repo' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C repo update-ref worktrees/worktree/refs/bisect/per-worktree HEAD &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: creating per-worktree ref from second worktree' '
+	test_when_finished "rm -rf repo wt1 wt2" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../wt1 &&
+	git -C repo worktree add ../wt2 &&
+	git -C repo pack-refs &&
+	git -C wt1 pack-refs &&
+	git -C wt2 pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 1 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	git -C wt1 update-ref worktrees/wt2/refs/bisect/per-worktree HEAD &&
+	test_line_count = 1 repo/.git/worktrees/wt1/reftable/tables.list &&
+	test_line_count = 2 repo/.git/worktrees/wt2/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can create shared and per-worktree ref in one transaction' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo A &&
+
+	git -C repo worktree add ../worktree &&
+	git -C repo pack-refs &&
+	git -C worktree pack-refs &&
+	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 1 repo/.git/reftable/tables.list &&
+
+	cat >stdin <<-EOF &&
+	create worktrees/worktree/refs/bisect/per-worktree HEAD
+	create refs/branches/shared HEAD
+	EOF
+	git -C repo update-ref --stdin <stdin &&
+	test_line_count = 2 repo/.git/worktrees/worktree/reftable/tables.list &&
+	test_line_count = 2 repo/.git/reftable/tables.list
+'
+
+test_expect_success 'worktree: can access common refs' '
+	test_when_finished "rm -rf repo worktree" &&
+	git init repo &&
+	test_commit -C repo file1 &&
+	git -C repo branch branch1 &&
+	git -C repo worktree add ../worktree &&
+
+	echo refs/heads/worktree >expect &&
+	git -C worktree symbolic-ref HEAD >actual &&
+	test_cmp expect actual &&
+	git -C worktree checkout branch1
+'
+
+test_expect_success 'worktree: adds worktree with detached HEAD' '
+	test_when_finished "rm -rf repo worktree" &&
+
+	git init repo &&
+	test_commit -C repo A &&
+	git -C repo rev-parse main >expect &&
+
+	git -C repo worktree add --detach ../worktree main &&
+	git -C worktree rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'fetch: accessing FETCH_HEAD special ref works' '
+	test_when_finished "rm -rf repo sub" &&
+
+	git init sub &&
+	test_commit -C sub two &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git init repo &&
+	test_commit -C repo one &&
+	git -C repo fetch ../sub &&
+	git -C repo rev-parse FETCH_HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/t/t0611-reftable-httpd.sh b/t/t0611-reftable-httpd.sh
new file mode 100755
index 0000000000..5e05b9c1f2
--- /dev/null
+++ b/t/t0611-reftable-httpd.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description='reftable HTTPD tests'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+
+test_expect_success 'serving ls-remote' '
+	git init --ref-format=reftable -b main "$REPO" &&
+	cd "$REPO" &&
+	test_commit m1 &&
+	>.git/git-daemon-export-ok &&
+	git ls-remote "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo" | cut -f 2-2 -d "	" >actual &&
+	cat >expect <<-EOF &&
+	HEAD
+	refs/heads/main
+	refs/tags/m1
+	EOF
+	test_cmp actual expect
+'
+
+test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 042f557a6f..c8af8dab79 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1755,6 +1755,8 @@ esac
 case "$GIT_DEFAULT_REF_FORMAT" in
 files)
 	test_set_prereq REFFILES;;
+reftable)
+	test_set_prereq REFTABLE;;
 *)
 	echo 2>&1 "error: unknown ref format $GIT_DEFAULT_REF_FORMAT"
 	exit 1
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 2/2] ci: add jobs to test with the reftable backend
  2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
  2024-02-07  7:20   ` [PATCH v4 1/2] " Patrick Steinhardt
@ 2024-02-07  7:20   ` Patrick Steinhardt
  1 sibling, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-07  7:20 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 3294 bytes --]

Add CI jobs for both GitHub Workflows and GitLab CI to run Git with the
new reftable backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 .github/workflows/main.yml | 9 +++++++++
 .gitlab-ci.yml             | 9 +++++++++
 ci/lib.sh                  | 2 +-
 ci/run-build-and-tests.sh  | 3 +++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 4d97da57ec..1b43e49dad 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -266,6 +266,9 @@ jobs:
           - jobname: linux-sha256
             cc: clang
             pool: ubuntu-latest
+          - jobname: linux-reftable
+            cc: clang
+            pool: ubuntu-latest
           - jobname: linux-gcc
             cc: gcc
             cc_package: gcc-8
@@ -277,6 +280,9 @@ jobs:
           - jobname: osx-clang
             cc: clang
             pool: macos-13
+          - jobname: osx-reftable
+            cc: clang
+            pool: macos-13
           - jobname: osx-gcc
             cc: gcc
             cc_package: gcc-13
@@ -287,6 +293,9 @@ jobs:
           - jobname: linux-leaks
             cc: gcc
             pool: ubuntu-latest
+          - jobname: linux-reftable-leaks
+            cc: gcc
+            pool: ubuntu-latest
           - jobname: linux-asan-ubsan
             cc: clang
             pool: ubuntu-latest
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 43bfbd8834..c0fa2fe90b 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -26,6 +26,9 @@ test:linux:
       - jobname: linux-sha256
         image: ubuntu:latest
         CC: clang
+      - jobname: linux-reftable
+        image: ubuntu:latest
+        CC: clang
       - jobname: linux-gcc
         image: ubuntu:20.04
         CC: gcc
@@ -40,6 +43,9 @@ test:linux:
       - jobname: linux-leaks
         image: ubuntu:latest
         CC: gcc
+      - jobname: linux-reftable-leaks
+        image: ubuntu:latest
+        CC: gcc
       - jobname: linux-asan-ubsan
         image: ubuntu:latest
         CC: clang
@@ -79,6 +85,9 @@ test:osx:
       - jobname: osx-clang
         image: macos-13-xcode-14
         CC: clang
+      - jobname: osx-reftable
+        image: macos-13-xcode-14
+        CC: clang
   artifacts:
     paths:
       - t/failed-test-artifacts
diff --git a/ci/lib.sh b/ci/lib.sh
index d5dd2f2697..0a73fc7bd1 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -367,7 +367,7 @@ linux-musl)
 	MAKEFLAGS="$MAKEFLAGS NO_REGEX=Yes ICONV_OMITS_BOM=Yes"
 	MAKEFLAGS="$MAKEFLAGS GIT_TEST_UTF8_LOCALE=C.UTF-8"
 	;;
-linux-leaks)
+linux-leaks|linux-reftable-leaks)
 	export SANITIZE=leak
 	export GIT_TEST_PASSING_SANITIZE_LEAK=true
 	export GIT_TEST_SANITIZE_LEAK_LOG=true
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 7a1466b868..c192bd613c 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -37,6 +37,9 @@ linux-clang)
 linux-sha256)
 	export GIT_TEST_DEFAULT_HASH=sha256
 	;;
+linux-reftable|linux-reftable-leaks|osx-reftable)
+	export GIT_TEST_DEFAULT_REF_FORMAT=reftable
+	;;
 pedantic)
 	# Don't run the tests; we only care about whether Git can be
 	# built.
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/2] refs: introduce reftable backend
  2024-02-07  7:20   ` [PATCH v4 1/2] " Patrick Steinhardt
@ 2024-02-07 22:31     ` Jeff King
  2024-02-08  5:11       ` Patrick Steinhardt
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2024-02-07 22:31 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

On Wed, Feb 07, 2024 at 08:20:31AM +0100, Patrick Steinhardt wrote:

> +static int write_copy_table(struct reftable_writer *writer, void *cb_data)
> +{
> [...]
> +	/*
> +	 * Create the reflog entry for the newly created branch.
> +	 */
> +	ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> +	memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
> +	fill_reftable_log_record(&logs[logs_nr]);
> +	logs[logs_nr].refname = (char *)arg->newname;
> +	logs[logs_nr].update_index = creation_ts;
> +	logs[logs_nr].value.update.message =
> +		xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
> +	logs[logs_nr].value.update.new_hash = old_ref.value.val1;
> +	logs_nr++;
> +
> +	/*
> +	 * In addition to writing the reflog entry for the new branch, we also
> +	 * copy over all log entries from the old reflog. Last but not least,
> +	 * when renaming we also have to delete all the old reflog entries.
> +	 */
> +	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
> +	if (ret < 0)
> +		return ret;

Should this last line be "goto done" as is used elsewhere in the
function? Otherwise we are at least leaking the "logs" array (and
possibly some of the other cleanup is important, too).

> +	while (1) {
> +		ret = reftable_iterator_next_log(&it, &old_log);
> +		if (ret < 0)
> +			goto done;
> +		if (ret > 0 || strcmp(old_log.refname, arg->oldname)) {
> +			ret = 0;
> +			break;
> +		}

This "ret = 0" doesn't have any effect. We break out of the loop, and
then...

> +	}
> +
> +	ret = reftable_writer_add_logs(writer, logs, logs_nr);
> +	if (ret < 0)
> +		goto done;

...the first thing we do is write over it. I dunno if it's worth keeping
as a maintenance precaution, though (if the code after the loop changed
to omit that assignment, then setting "ret" would become important).

Both were noticed by Coverity (along with several other false
positives).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/2] refs: introduce reftable backend
  2024-02-07 22:31     ` Jeff King
@ 2024-02-08  5:11       ` Patrick Steinhardt
  2024-02-15  5:39         ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2024-02-08  5:11 UTC (permalink / raw
  To: Jeff King; +Cc: git, Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]

On Wed, Feb 07, 2024 at 05:31:20PM -0500, Jeff King wrote:
> On Wed, Feb 07, 2024 at 08:20:31AM +0100, Patrick Steinhardt wrote:
> 
> > +static int write_copy_table(struct reftable_writer *writer, void *cb_data)
> > +{
> > [...]
> > +	/*
> > +	 * Create the reflog entry for the newly created branch.
> > +	 */
> > +	ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> > +	memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
> > +	fill_reftable_log_record(&logs[logs_nr]);
> > +	logs[logs_nr].refname = (char *)arg->newname;
> > +	logs[logs_nr].update_index = creation_ts;
> > +	logs[logs_nr].value.update.message =
> > +		xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);
> > +	logs[logs_nr].value.update.new_hash = old_ref.value.val1;
> > +	logs_nr++;
> > +
> > +	/*
> > +	 * In addition to writing the reflog entry for the new branch, we also
> > +	 * copy over all log entries from the old reflog. Last but not least,
> > +	 * when renaming we also have to delete all the old reflog entries.
> > +	 */
> > +	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
> > +	if (ret < 0)
> > +		return ret;
> 
> Should this last line be "goto done" as is used elsewhere in the
> function? Otherwise we are at least leaking the "logs" array (and
> possibly some of the other cleanup is important, too).

Indeed we should.

> > +	while (1) {
> > +		ret = reftable_iterator_next_log(&it, &old_log);
> > +		if (ret < 0)
> > +			goto done;
> > +		if (ret > 0 || strcmp(old_log.refname, arg->oldname)) {
> > +			ret = 0;
> > +			break;
> > +		}
> 
> This "ret = 0" doesn't have any effect. We break out of the loop, and
> then...
> 
> > +	}
> > +
> > +	ret = reftable_writer_add_logs(writer, logs, logs_nr);
> > +	if (ret < 0)
> > +		goto done;
> 
> ...the first thing we do is write over it. I dunno if it's worth keeping
> as a maintenance precaution, though (if the code after the loop changed
> to omit that assignment, then setting "ret" would become important).

Yeah, I think this one we should keep. It's also a repeating pattern
that we have in many other places, so it helps lower the mental burden
when it looks similar to all the others.

> Both were noticed by Coverity (along with several other false
> positives).

Is the Coverity instance publicly accessible? If not, can you maybe
invite me so that I get a better feeling for them?

I'll send a patch to fix the missing `goto done` above, thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/2] refs: introduce reftable backend
  2024-02-08  5:11       ` Patrick Steinhardt
@ 2024-02-15  5:39         ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2024-02-15  5:39 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Han-Wen Nienhuys, Karthik Nayak

On Thu, Feb 08, 2024 at 06:11:57AM +0100, Patrick Steinhardt wrote:

> > > +	ret = reftable_writer_add_logs(writer, logs, logs_nr);
> > > +	if (ret < 0)
> > > +		goto done;
> > 
> > ...the first thing we do is write over it. I dunno if it's worth keeping
> > as a maintenance precaution, though (if the code after the loop changed
> > to omit that assignment, then setting "ret" would become important).
> 
> Yeah, I think this one we should keep. It's also a repeating pattern
> that we have in many other places, so it helps lower the mental burden
> when it looks similar to all the others.

That sounds quite reasonable to me.

> > Both were noticed by Coverity (along with several other false
> > positives).
> 
> Is the Coverity instance publicly accessible? If not, can you maybe
> invite me so that I get a better feeling for them?

I don't think there's a way to make it publicly accessible. Probably I
could invite you to the project if you sign up for a Coverity account.
But it's a build of next plus my personal in-progress topics, which is
probably not ideal for other people to look at.

There are instructions for setting up your own scan in a56b6230d0 (ci:
add a GitHub workflow to submit Coverity scans, 2023-09-25). The unsaid
part there is signing up for Coverity and registering a project to get
the necessary tokens. Probably start here:

  https://scan.coverity.com/users/sign_up

but I don't remember the details.

The Actions workflow is in "master", so in theory we could register a
project for git.git and get automatic builds when Junio pushes to
'next', etc. In practice I have found that it requires a human looking
over the results, but if people at least had access they could poke
around.

(I suspect it would also be easy to port the workflow over to run on
GitLab CI, as well, if you wanted to).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-02-15  5:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-30  8:05 [PATCH 0/2] refs: introduce reftable backend Patrick Steinhardt
2024-01-30  8:05 ` [PATCH 1/2] " Patrick Steinhardt
2024-02-01 15:17   ` Karthik Nayak
2024-02-02  8:30     ` Patrick Steinhardt
2024-02-02 10:52       ` Patrick Steinhardt
2024-01-30  8:05 ` [PATCH 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-01-30 22:08 ` [PATCH 0/2] refs: introduce " Junio C Hamano
2024-02-02  8:38 ` [PATCH v2 " Patrick Steinhardt
2024-02-02  8:38   ` [PATCH v2 1/2] " Patrick Steinhardt
2024-02-02  8:38   ` [PATCH v2 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-02 13:02   ` [PATCH v2 0/2] refs: introduce " Karthik Nayak
2024-02-03 20:41   ` Junio C Hamano
2024-02-04  6:00     ` Patrick Steinhardt
2024-02-05  6:02 ` [PATCH v3 " Patrick Steinhardt
2024-02-05  6:02   ` [PATCH v3 1/2] " Patrick Steinhardt
2024-02-05  6:02   ` [PATCH v3 2/2] ci: add jobs to test with the " Patrick Steinhardt
2024-02-05 13:10   ` [PATCH v3 0/2] refs: introduce " Karthik Nayak
2024-02-07  7:20 ` [PATCH v4 " Patrick Steinhardt
2024-02-07  7:20   ` [PATCH v4 1/2] " Patrick Steinhardt
2024-02-07 22:31     ` Jeff King
2024-02-08  5:11       ` Patrick Steinhardt
2024-02-15  5:39         ` Jeff King
2024-02-07  7:20   ` [PATCH v4 2/2] ci: add jobs to test with the " Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).