Linux-EROFS Archive mirror
 help / color / mirror / Atom feed
From: Mike Baynton <mike@mbaynton.com>
To: linux-erofs@lists.ozlabs.org
Subject: Feature request: erofs-utils mkfs: Efficient way to pipe only file metadata
Date: Sun, 18 Feb 2024 21:37:06 -0600	[thread overview]
Message-ID: <CAM56kJTupW_WZapYM6YzFLPtriYb5+FU-Y8-mYY8ETGYfQmG6g@mail.gmail.com> (raw)

Hello erofs developers,
I am integrating erofs with overlayfs in a manner similar to what
composefs is doing. So, I am interested in making erofs images
containing only file metadata and extended attributes, but no file
data, as in $ mkfs.erofs --tar=i (thanks for that!)

However, I would like to construct the erofs image from a set of files
selected dynamically by another program. This leads me to prefer
sending an unseekable stream to mkfs.erofs so that file selection and
image generation can run concurrently, instead of first making a
complete tarball and then making the erofs image. In this case, it
becomes necessary to transfer each file's worth of data through the
stream after each header only so that the tarball reader in tar.c does
not become desynchronized with the expected offset of the next tar
header.

A very straightforward solution that seems to be working just fine for
me is to simply introduce a new optarg for --tar that indicates the
input data will be simply a series of tar headers / metadata without
actual file data. This implies index mode and additionally prevents
the skipping of inode.size worth of bytes after each header:

diff --git a/include/erofs/tar.h b/include/erofs/tar.h
index a76f740..3d40a0f 100644
--- a/include/erofs/tar.h
+++ b/include/erofs/tar.h
@@ -46,7 +46,7 @@ struct erofs_tarfile {

  int fd;
  u64 offset;
- bool index_mode, aufs;
+ bool index_mode, headeronly_mode, aufs;
 };

 void erofs_iostream_close(struct erofs_iostream *ios);
diff --git a/lib/tar.c b/lib/tar.c
index 8204939..e916395 100644
--- a/lib/tar.c
+++ b/lib/tar.c
@@ -584,7 +584,7 @@ static int tarerofs_write_file_index(struct
erofs_inode *inode,
  ret = tarerofs_write_chunkes(inode, data_offset);
  if (ret)
  return ret;
- if (erofs_iostream_lskip(&tar->ios, inode->i_size))
+ if (!tar->headeronly_mode && erofs_iostream_lskip(&tar->ios, inode->i_size))
  return -EIO;
  return 0;
 }
diff --git a/mkfs/main.c b/mkfs/main.c
index 6d2b700..a72d30e 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -122,7 +122,7 @@ static void usage(void)
        " --max-extent-bytes=#  set maximum decompressed extent size #
in bytes\n"
        " --preserve-mtime      keep per-file modification time strictly\n"
        " --aufs                replace aufs special files with
overlayfs metadata\n"
-       " --tar=[fi]            generate an image from tarball(s)\n"
+       " --tar=[fih]           generate an image from tarball(s) or
tarball header data\n"
        " --ovlfs-strip=[01]    strip overlayfs metadata in the target
image (e.g. whiteouts)\n"
        " --quiet               quiet execution (do not write anything
to standard output.)\n"
 #ifndef NDEBUG
@@ -514,11 +514,13 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
  cfg.c_extra_ea_name_prefixes = true;
  break;
  case 20:
- if (optarg && (!strcmp(optarg, "i") ||
- !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2))) {
+ if (optarg && (!strcmp(optarg, "i") || (!strcmp(optarg, "h") ||
+ !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2)))) {
  erofstar.index_mode = true;
  if (!memcmp(optarg, "0,", 2))
  erofstar.mapfile = strdup(optarg + 2);
+ if (!strcmp(optarg, "h"))
+ erofstar.headeronly_mode = true;
  }
  tar_mode = true;
  break;

Using this requires generation of tarball-ish streams that can be
slightly difficult to cajole tar libraries into creating, but it does
work if you do it. I can imagine much more complex alternative ways to
do this too, such as supporting sparse tar files or supporting some
whole new input format.

Would some version of this feature be interesting and useful? If so,
is the simple way good enough? It wouldn't preclude future addition of
things like a sparse tar reader.

Regards,
Mike

             reply	other threads:[~2024-02-19  3:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19  3:37 Mike Baynton [this message]
2024-02-19  4:44 ` Feature request: erofs-utils mkfs: Efficient way to pipe only file metadata Gao Xiang
2024-02-19  7:46   ` Gao Xiang
2024-02-20  3:15   ` Mike Baynton
     [not found]   ` <CAM56kJTzkrbY-yRcaWopb4Ke0eqdm3BcHfTWp7Rfu20n86yp1w@mail.gmail.com>
2024-02-20  3:56     ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM56kJTupW_WZapYM6YzFLPtriYb5+FU-Y8-mYY8ETGYfQmG6g@mail.gmail.com \
    --to=mike@mbaynton.com \
    --cc=linux-erofs@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).