Re: [PATCH v2 2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print()

Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, stable@vger.kernel.org,
	Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v2 2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print()
Date: Fri, 5 Apr 2024 08:32:11 +1030	[thread overview]
Message-ID: <d51b8a18-0b61-4f8b-ac41-01f5f7ddfa1f@gmx.com> (raw)
In-Reply-To: <20240404202644.GN14596@twin.jikos.cz>



在 2024/4/5 06:56, David Sterba 写道:
> On Wed, Mar 20, 2024 at 02:24:52PM +1030, Qu Wenruo wrote:
>> Currently when we increase the device statistics, it would always lead
>> to an error message in the kernel log.
>>
>> However this output is mostly duplicated with the existing ones:
>>
>> - For scrub operations
>>    We always have the following messages:
>>    * "fixed up error at logical %llu"
>>    * "unable to fixup (regular) error at logical %llu"
>>
>>    So no matter if the corruption is repaired or not, it scrub would
>>    output an error message to indicate the problem.
>>
>> - For non-scrub read operations
>>    We also have the following messages:
>>    * "csum failed root %lld inode %llu off %llu" for data csum mismatch
>>    * "bad (tree block start|fsid|tree block level)" for metadata
>>    * "read error corrected: ino %llu off %llu" for repaired data/metadata
>>
>> So the error message from btrfs_dev_stat_inc_and_print() is duplicated.
>>
>> The real usage for the btrfs device statistics is for some user space
>> daemon to check if there is any new errors, acting like some checks on
>> SMART, thus we don't really need/want those messages in dmesg.
>>
>> This patch would reduce the log level to debug (disabled by default) for
>> btrfs_dev_stat_inc_and_print().
>> For users really want to utilize btrfs devices statistics, they should
>> go check "btrfs device stats" periodically, and we should focus the
>> kernel error messages to more important things.
>
> I kind if disagree with each point.
>
> The message is meant to be logged as it will happen in production and
> outside of development, so the debug level does not make sense.
>
> The stats message is not duplicated for the individual causes, it
> additionally tracks the whole state.

I'd disagree with this.

We already have mount time output, and detailed causes are way more
useful and just a duplicated message repeating itself.

>
> Logging important messages to system log is a common thing and we do that
> a lot, this makes debugging and anlyzing things easier. We can't
> expect that there would always be a daemon collecting the stats, there's
> not standardized or recommended tool for that. A quick look to dmesg can
> show that something is wrong.

Then try supporting cases with all these duplicated and useless
messages, you'll hardly agree that they provide any usefulness.

>
> What we can do: reduce the number messages so the whole stats are
> printed once per transaction if there is a change.
>
> We can also tune which events also print the stats, for example flush
> errors are more interesting than read/write, comparing the number of
> events that can happen in a batch.

My another point is, if it's an important error, we should output it
with detailed reason/cause/extra info immediately.

And that's already the case for regular/scrub read errors.

For critical operations like flush, we should output extra error
messages, other than relying on that generic and vague error.

Thanks,
Qu

next prev parent reply	other threads:[~2024-04-04 22:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20  3:54 [PATCH v2 0/7] btrfs: scrub: refine the error messages output frequency Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 1/7] btrfs: scrub: fix incorrectly reported logical/physical address Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print() Qu Wenruo
2024-04-04 20:26   ` David Sterba
2024-04-04 22:02     ` Qu Wenruo [this message]
2024-03-20  3:54 ` [PATCH v2 3/7] btrfs: scrub: remove unused is_super parameter from scrub_print_common_warning() Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 4/7] btrfs: scrub: remove unnecessary dev/physical lookup for scrub_stripe_report_errors() Qu Wenruo
2024-03-20 13:09   ` Filipe Manana
2024-03-20  3:54 ` [PATCH v2 5/7] btrfs: scrub: simplify the inode iteration output Qu Wenruo
2024-03-20  3:54 ` [PATCH v2 6/7] btrfs: scrub: ensure we output at least one error message for unrepaired corruption Qu Wenruo
2024-03-20 13:30   ` Filipe Manana
2024-03-20  3:54 ` [PATCH v2 7/7] btrfs: scrub: use generic ratelimit helpers to output error messages Qu Wenruo
2024-03-20 13:33   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d51b8a18-0b61-4f8b-ac41-01f5f7ddfa1f@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).