Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, stable@vger.kernel.org,
	Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v2 2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print()
Date: Fri, 5 Apr 2024 08:32:11 +1030	[thread overview]
Message-ID: <d51b8a18-0b61-4f8b-ac41-01f5f7ddfa1f@gmx.com> (raw)
In-Reply-To: <20240404202644.GN14596@twin.jikos.cz>



在 2024/4/5 06:56, David Sterba 写道:
> On Wed, Mar 20, 2024 at 02:24:52PM +1030, Qu Wenruo wrote:
>> Currently when we increase the device statistics, it would always lead
>> to an error message in the kernel log.
>>
>> However this output is mostly duplicated with the existing ones:
>>
>> - For scrub operations
>>    We always have the following messages:
>>    * "fixed up error at logical %llu"
>>    * "unable to fixup (regular) error at logical %llu"
>>
>>    So no matter if the corruption is repaired or not, it scrub would
>>    output an error message to indicate the problem.
>>
>> - For non-scrub read operations
>>    We also have the following messages:
>>    * "csum failed root %lld inode %llu off %llu" for data csum mismatch
>>    * "bad (tree block start|fsid|tree block level)" for metadata
>>    * "read error corrected: ino %llu off %llu" for repaired data/metadata
>>
>> So the error message from btrfs_dev_stat_inc_and_print() is duplicated.
>>
>> The real usage for the btrfs device statistics is for some user space
>> daemon to check if there is any new errors, acting like some checks on
>> SMART, thus we don't really need/want those messages in dmesg.
>>
>> This patch would reduce the log level to debug (disabled by default) for
>> btrfs_dev_stat_inc_and_print().
>> For users really want to utilize btrfs devices statistics, they should
>> go check "btrfs device stats" periodically, and we should focus the
>> kernel error messages to more important things.
>
> I kind if disagree with each point.
>
> The message is meant to be logged as it will happen in production and
> outside of development, so the debug level does not make sense.
>
> The stats message is not duplicated for the individual causes, it
> additionally tracks the whole state.

I'd disagree with this.

We already have mount time output, and detailed causes are way more
useful and just a duplicated message repeating itself.

>
> Logging important messages to system log is a common thing and we do that
> a lot, this makes debugging and anlyzing things easier. We can't
> expect that there would always be a daemon collecting the stats, there's
> not standardized or recommended tool for that. A quick look to dmesg can
> show that something is wrong.

Then try supporting cases with all these duplicated and useless
messages, you'll hardly agree that they provide any usefulness.

>
> What we can do: reduce the number messages so the whole stats are
> printed once per transaction if there is a change.
>
> We can also tune which events also print the stats, for example flush
> errors are more interesting than read/write, comparing the number of
> events that can happen in a batch.

My another point is, if it's an important error, we should output it
with detailed reason/cause/extra info immediately.

And that's already the case for regular/scrub read errors.

For critical operations like flush, we should output extra error
messages, other than relying on that generic and vague error.

Thanks,
Qu

  reply	other threads:[~2024-04-04 22:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20  3:54 [PATCH v2 0/7] btrfs: scrub: refine the error messages output frequency Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 1/7] btrfs: scrub: fix incorrectly reported logical/physical address Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print() Qu Wenruo
2024-04-04 20:26   ` David Sterba
2024-04-04 22:02     ` Qu Wenruo [this message]
2024-03-20  3:54 ` [PATCH v2 3/7] btrfs: scrub: remove unused is_super parameter from scrub_print_common_warning() Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 4/7] btrfs: scrub: remove unnecessary dev/physical lookup for scrub_stripe_report_errors() Qu Wenruo
2024-03-20 13:09   ` Filipe Manana
2024-03-20  3:54 ` [PATCH v2 5/7] btrfs: scrub: simplify the inode iteration output Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 6/7] btrfs: scrub: ensure we output at least one error message for unrepaired corruption Qu Wenruo
2024-03-20 13:30   ` Filipe Manana
2024-03-20  3:54 ` [PATCH v2 7/7] btrfs: scrub: use generic ratelimit helpers to output error messages Qu Wenruo
2024-03-20 13:33   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d51b8a18-0b61-4f8b-ac41-01f5f7ddfa1f@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).