All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Tom Yan <tom.ty89@gmail.com>,
	48833@debbugs.gnu.org, linux-btrfs@vger.kernel.org
Subject: Re: bug#48833: reflink copying does not check/set No_COW attribute and fail
Date: Mon, 7 Jun 2021 22:41:44 -0400	[thread overview]
Message-ID: <20210608024144.GB11713@hungrycats.org> (raw)
In-Reply-To: <4354c814-9f9f-0c26-fb11-36567da6f530@cs.ucla.edu>

On Sun, Jun 06, 2021 at 10:47:05PM -0700, Paul Eggert wrote:
> On 6/5/21 10:42 PM, Zygo Blaxell wrote:
> > If cp -a implements the inode attribute propagation (or inheritance), then
> > only users of cp -a are impacted.  They are more likely to be aware that
> > they may be creating new files with reduced-integrity storage attributes.
> 
> True, although I think this aspect of attribute-copying will typically come
> as a surprise even to "cp -a" users.

Existing users might be surprised when "cp -a" starts replicating storage
attributes when it did not do so before, but I suspect most future cp
users would expect "cp -a" to preserve storage-policy attributes the same
way it currently preserves ownership, permissions, timestamps, extended
attributes, and security context--a list that initially contained only
the ownership, permissions, and timestamps in the past, the others were
added over time.  If not by default, then at least have the ability to
do it when requested with a "--preserve=datacow" switch.

We could say "cp --reflink=always implies --preserve=datacow because it
might not work otherwise", which solves the immediate issue as presented,
but I don't think we _want_ to say that because it has a potentially
bad surprising case when attribute changes are unexpected (it's the same
reason that we would not want to implement it that way in the kernel).
As a cp user, I would prefer to make the choice to copy the storage
attributes with --preserve or -a, and after that choice is made, then
--reflink=always will work because cp is setting attributes in dst to
match src as I requested it to do.  If I had made the opposite choice
(didn't use -a or --preserve, or did use --no-preserve=datacow), then
cp shall not copy the storage attributes, the dst inodes have attributes
inherited from dst's parent, --reflink=always fails when there are
mismatches as it does now, and --reflink=auto or =none copies may have
different storage attributes from the src (with possibly stronger or
weaker integrity).

We could say "'cp -a --reflink=always' implies --preserve=datacow, but
'--reflink=always' and '-a' by themselves do not."  It seems complex
to describe, but maybe it does surprise the fewest people in the most
beneficial way:  plain 'cp -a' users keep exactly the old behavior,
while 'cp -a --reflink=always' users get successful copies in one case
where they currently get unexpected failures.  'cp -r' doesn't imply
--preserve=all, so 'cp -r --reflink=always' would need to be accompanied
by '--preserve=datacow' or '--preserve=all' to get the attribute-copying.

The cp doc could be clearer that filesystems that support reflink
don't guarantee every file can be reflinked to every other file.
reflink is expected to fail in a growing number of cases over time,
as more filesystem features are created that are incompatible with it
(e.g. encryption, where reflinks between files with different owners could
be unimplementable).  I've seen a number of users get burned by making big
--reflink=always copies and not checking the results for errors, assuming
that only lack of space for metadata could cause a reflink copy to fail.
There are good reasons why --reflink=auto exists and is the default,
and users ignore them at their peril.

The really awful thing about all this is that it's not datacow, the
thing visible with chattr +/-C and implemented by other filesystems,
that is the problem.  The problem is the datasum bit hidden behind the
datacow bit on btrfs.  datasum can still be disabled even when datacow
is enabled, and datasum requires privileges to detect (unprivileged
users can only observe the datasum bit's effect on reflink copies to
files with the opposite datasum setting).  The proper option name should
be --preserve=datasum, but cp can only examine or change the datacow bit.

> > If the file is empty, you can chattr +C or -C.  If the file is not
> > empty, chattr fails with an error.
> 
> Although coreutils 'cp -a' currently truncates any already-existing output
> file (by opening it with O_TRUNC), it then calls copy_file_range before
> calling fsetxattr on the destination. Presumably cp should do the equivalent
> of chattr +C before doing the copy_file_range stuff. (Perhaps you've already
> mentioned this point; if so, my apologies for the duplication.)

The important thing is that got across, whether you extracted it from what
I wrote, or reconstructed it from context.

  reply	other threads:[~2021-06-08  2:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04 14:33 reflink copying does not check/set No_COW attribute and fail Tom Yan
2021-06-04 14:37 ` Tom Yan
2021-06-04 20:16   ` Zygo Blaxell
2021-06-05  5:56     ` Tom Yan
2021-06-05 10:35       ` Forza
2021-06-06  5:42       ` Zygo Blaxell
2021-06-07  5:47         ` bug#48833: " Paul Eggert
2021-06-08  2:41           ` Zygo Blaxell [this message]
2021-06-27 10:56             ` A L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210608024144.GB11713@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=48833@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tom.ty89@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.