* RISC-V PMU driver issue
@ 2024-02-28 3:49 雷博涵
2024-03-13 0:02 ` Atish Patra
0 siblings, 1 reply; 7+ messages in thread
From: 雷博涵 @ 2024-02-28 3:49 UTC (permalink / raw
To: Atish Patra; +Cc: linux-riscv
Hi all,
I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
Thank you,
Bohan Lei
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RISC-V PMU driver issue
2024-02-28 3:49 RISC-V PMU driver issue 雷博涵
@ 2024-03-13 0:02 ` Atish Patra
2024-03-13 4:29 ` 雷博涵
0 siblings, 1 reply; 7+ messages in thread
From: Atish Patra @ 2024-03-13 0:02 UTC (permalink / raw
To: 雷博涵; +Cc: linux-riscv, Atish Patra
On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>
> Hi all,
>
> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
If I understand you correctly, you have a custom kernel counter for
the perf event ? Can you explain the use case for that ?
> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
What do you mean by user defined overflow handler ? The overflow
handler registered via the custom perf event kernel counter which gets
invoked from perf_event_overflow ?
> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
Not sure the exact code path you are referring to. You don't want to
invoke riscv_pmu_event_set_period if period_left <= 0 ?
>
> Thank you,
> Bohan Lei
--
Regards,
Atish
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RISC-V PMU driver issue
2024-03-13 0:02 ` Atish Patra
@ 2024-03-13 4:29 ` 雷博涵
2024-03-18 22:18 ` Atish Patra
0 siblings, 1 reply; 7+ messages in thread
From: 雷博涵 @ 2024-03-13 4:29 UTC (permalink / raw
To: Atish Patra; +Cc: linux-riscv, Atish Patra
Sorry for the duplicated message. The last one was an HTML e-mail so I resend this.
> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
>
> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>>
>> Hi all,
>>
>> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
>
> If I understand you correctly, you have a custom kernel counter for
> the perf event ? Can you explain the use case for that ?
Actually it was the perf-event backend of an open-source tool called PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to port to RISC-V. It creates a kernel counter using `perf_event_create_kernel_counter`, registering an overflow handler. The registered overflow handler calls `perf_event_read_value` to read the counter value.
>
>> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
>
> What do you mean by user defined overflow handler ? The overflow
> handler registered via the custom perf event kernel counter which gets
> invoked from perf_event_overflow ?
Yes.
>
>> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
>
> Not sure the exact code path you are referring to. You don't want to
> invoke riscv_pmu_event_set_period if period_left <= 0 ?
First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the perf event counter value (`event->count`) is updated. It calls `riscv_pmu_event_set_period` later, which calls `local64_set(&hwc->prev_count, (u64)-left)`. When `perf_event_read_value` gets invoked in the registered overflow handler, `riscv_pmu_event_update` will be invoked eventually. Because `prev_count` has been set to `(u64)-left` and the underlying hardware counter value (the value read via `rvpmu->ctr_read(event)`) has not been changed, `event->count` will be updated again, and the value this time will be incorrect.
I have found that the ARM PMU does not have the problem. The function `armpmu_event_set_period` in `arm_pmu.c` calls `armpmu->write_counter(event, (u64)(-left) & max_period)` after `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying hardware counter is also set to a new value. Thus, when `armpmu_event_update` gets called later in the registered handler, it will find that the underlying counter has the same value as `prev_count`, so `event->count` can keep the correct value.
However, as I mentioned before, the RISC-V SBI specification does not have such an API as “write_counter,” so the ARM solution could not be adapted easily. I was wondering if we could postpone `riscv_pmu_event_set_period` after the if-block, like this:
```
if (local64_read(&hw_evt->period_left) <= 0) {
perf_event_overflow(event, &data, regs);
}
riscv_pmu_event_set_period(event);
```
This way `perf_event_read_value` can return the correct counter value. `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. However, I am not sure if it can cause other problems.
>
>>
>> Thank you,
>> Bohan Lei
>
>
> --
> Regards,
> Atish
--
Regards,
Bohan Lei
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RISC-V PMU driver issue
2024-03-13 4:29 ` 雷博涵
@ 2024-03-18 22:18 ` Atish Patra
2024-04-27 0:07 ` Atish Patra
0 siblings, 1 reply; 7+ messages in thread
From: Atish Patra @ 2024-03-18 22:18 UTC (permalink / raw
To: 雷博涵, Atish Patra; +Cc: linux-riscv
On 3/12/24 21:29, 雷博涵 wrote:
> Sorry for the duplicated message. The last one was an HTML e-mail so I resend this.
>
>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
>>
>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>>>
>>> Hi all,
>>>
>>> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
>>
>> If I understand you correctly, you have a custom kernel counter for
>> the perf event ? Can you explain the use case for that ?
>
> Actually it was the perf-event backend of an open-source tool called PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to port to RISC-V. It creates a kernel counter using `perf_event_create_kernel_counter`, registering an overflow handler. The registered overflow handler calls `perf_event_read_value` to read the counter value.
>
Thanks. It makes sense now.
>>
>>> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
>>
>> What do you mean by user defined overflow handler ? The overflow
>> handler registered via the custom perf event kernel counter which gets
>> invoked from perf_event_overflow ?
>
> Yes.
>
>>
>>> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
>>
>> Not sure the exact code path you are referring to. You don't want to
>> invoke riscv_pmu_event_set_period if period_left <= 0 ?
>
> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the perf event counter value (`event->count`) is updated. It calls `riscv_pmu_event_set_period` later, which calls `local64_set(&hwc->prev_count, (u64)-left)`. When `perf_event_read_value` gets invoked in the registered overflow handler, `riscv_pmu_event_update` will be invoked eventually. Because `prev_count` has been set to `(u64)-left` and the underlying hardware counter value (the value read via `rvpmu->ctr_read(event)`) has not been changed, `event->count` will be updated again, and the value this time will be incorrect.
Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
support as it creates a kernel perf event as well.
>
> I have found that the ARM PMU does not have the problem. The function `armpmu_event_set_period` in `arm_pmu.c` calls `armpmu->write_counter(event, (u64)(-left) & max_period)` after `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying hardware counter is also set to a new value. Thus, when `armpmu_event_update` gets called later in the registered handler, it will find that the underlying counter has the same value as `prev_count`, so `event->count` can keep the correct value.
>
> However, as I mentioned before, the RISC-V SBI specification does not have such an API as “write_counter,” so the ARM solution could not be adapted easily. I was wondering if we could postpone `riscv_pmu_event_set_period` after the if-block, like this:
>
> ```
> if (local64_read(&hw_evt->period_left) <= 0) {
> perf_event_overflow(event, &data, regs);
> }
> riscv_pmu_event_set_period(event);
> ```
>
> This way `perf_event_read_value` can return the correct counter value. `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. However, I am not sure if it can cause other problems.
>
That seems like a hack. I think we can avoid updating again based on
perf state which can indicate that event->count is already updated.
Thus, no need to update it again. I will toy around with this idea and
get back to you.
>>
>>>
>>> Thank you,
>>> Bohan Lei
>>
>>
>> --
>> Regards,
>> Atish
>
> --
> Regards,
> Bohan Lei
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RISC-V PMU driver issue
2024-03-18 22:18 ` Atish Patra
@ 2024-04-27 0:07 ` Atish Patra
[not found] ` <2451aea.567f.18f3c8baa06.Coremail.garthlei@pku.edu.cn>
0 siblings, 1 reply; 7+ messages in thread
From: Atish Patra @ 2024-04-27 0:07 UTC (permalink / raw
To: 雷博涵, Atish Patra; +Cc: linux-riscv
On 3/18/24 15:18, Atish Patra wrote:
> On 3/12/24 21:29, 雷博涵 wrote:
>> Sorry for the duplicated message. The last one was an HTML e-mail so I
>> resend this.
>>
>>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
>>>
>>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I am having problems with the RISC-V PMU driver. The overflow
>>>> handler of my custom perf event kernel counter seems to read an
>>>> incorrect value from the event.
>>>
>>> If I understand you correctly, you have a custom kernel counter for
>>> the perf event ? Can you explain the use case for that ?
>>
>> Actually it was the perf-event backend of an open-source tool called
>> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
>> port to RISC-V. It creates a kernel counter using
>> `perf_event_create_kernel_counter`, registering an overflow handler.
>> The registered overflow handler calls `perf_event_read_value` to read
>> the counter value.
>>
>
> Thanks. It makes sense now.
>
>>>
>>>> It seems that the issue lies in the function
>>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
>>>> but does not modify the underlying hardware counter. When
>>>> `perf_event_read_value` gets called later in the user-defined
>>>> overflow handler, `riscv_pmu_event_update` will update the `count`
>>>> field again based on the unmodified hardware counter value and the
>>>> modified `prev_count` field, which causes an incorrect reading.
>>>
>>> What do you mean by user defined overflow handler ? The overflow
>>> handler registered via the custom perf event kernel counter which gets
>>> invoked from perf_event_overflow ?
>>
>> Yes.
>>
>>>
>>>> I noticed that other PMU drivers, such as the ARM one, write to the
>>>> underlying counter in their set_period functions, which prevents the
>>>> problem. However, the RISC-V SBI specification does not have such an
>>>> API to write to a counter without starting it. Using
>>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
>>>> condition to place `riscv_pmu_event_set_period(event)` after it
>>>> seems to work, but I am not sure whether it can cause other issues.
>>>
>>> Not sure the exact code path you are referring to. You don't want to
>>> invoke riscv_pmu_event_set_period if period_left <= 0 ?
>>
>> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
>> perf event counter value (`event->count`) is updated. It calls
>> `riscv_pmu_event_set_period` later, which calls
>> `local64_set(&hwc->prev_count, (u64)-left)`. When
>> `perf_event_read_value` gets invoked in the registered overflow
>> handler, `riscv_pmu_event_update` will be invoked eventually. Because
>> `prev_count` has been set to `(u64)-left` and the underlying hardware
>> counter value (the value read via `rvpmu->ctr_read(event)`) has not
>> been changed, `event->count` will be updated again, and the value this
>> time will be incorrect.
>
> Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> support as it creates a kernel perf event as well.
>
>>
>> I have found that the ARM PMU does not have the problem. The function
>> `armpmu_event_set_period` in `arm_pmu.c` calls
>> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
>> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
>> hardware counter is also set to a new value. Thus, when
>> `armpmu_event_update` gets called later in the registered handler, it
>> will find that the underlying counter has the same value as
>> `prev_count`, so `event->count` can keep the correct value.
>>
>> However, as I mentioned before, the RISC-V SBI specification does not
>> have such an API as “write_counter,” so the ARM solution could not be
>> adapted easily. I was wondering if we could postpone
>> `riscv_pmu_event_set_period` after the if-block, like this:
>>
>> ```
>> if (local64_read(&hw_evt->period_left) <= 0) {
>> perf_event_overflow(event, &data, regs);
>> }
>> riscv_pmu_event_set_period(event);
>> ```
>>
>> This way `perf_event_read_value` can return the correct counter value.
>> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
>> However, I am not sure if it can cause other problems.
>>
>
> That seems like a hack. I think we can avoid updating again based on
> perf state which can indicate that event->count is already updated.
> Thus, no need to update it again. I will toy around with this idea and
> get back to you.
>
Can you try this diff on latest upstream ?
diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index b4efdddb2ad9..e08529c45c50 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
unsigned long cmask;
u64 oldval, delta;
- if (!rvpmu->ctr_read)
+ if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
return 0;
cmask = riscv_pmu_ctr_get_width_mask(event);
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8cbe6e5f9c39..6ba825a38619 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
void *dev)
*/
overflowed_ctrs |= BIT(lidx);
hw_evt = &event->hw;
+ /* Update the event states here so that we know the
state while reading */
+ hw_evt->state |= PERF_HES_STOPPED;
riscv_pmu_event_update(event);
+ hw_evt->state |= PERF_HES_UPTODATE;
perf_sample_data_init(&data, 0, hw_evt->last_period);
if (riscv_pmu_event_set_period(event)) {
/*
@@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
*dev)
*/
perf_event_overflow(event, &data, regs);
}
+ /* Reset the state as we are going to start the counter
after the loop */
+ hw_evt->state = 0;
}
All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
ongoing series with snapshot support[1].
I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
this works for you.
[1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
>>>
>>>>
>>>> Thank you,
>>>> Bohan Lei
>>>
>>>
>>> --
>>> Regards,
>>> Atish
>>
>> --
>> Regards,
>> Bohan Lei
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Re: Re: RISC-V PMU driver issue
[not found] ` <CAHBxVyEotFmCfc+Ym-p7f=WAocAXtNA_z4MvOr7Ky8fXxPtVXA@mail.gmail.com>
@ 2024-05-08 3:01 ` 雷博涵
2024-05-08 6:57 ` Atish Kumar Patra
0 siblings, 1 reply; 7+ messages in thread
From: 雷博涵 @ 2024-05-08 3:01 UTC (permalink / raw
To: Atish Kumar Patra; +Cc: linux-riscv
I found the issue. `if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))`
should be `if (!rvpmu->ctr_read || (hwc->state & PERF_HES_UPTODATE))`. Now it
works.
-----Original Messages-----
From: "Atish Kumar Patra" <atishp@rivosinc.com>
Send time: Tuesday, 05/07/2024 05:03:33
To: 雷博涵 <garthlei@pku.edu.cn>
Subject: Re: Re: RISC-V PMU driver issue
Can you provide some more details about your platform and how are you testing it?
I have verified the change in qemu with kvm implementation and the riscv_pmu_event_update the value as PERF_HES_UPTODATE is already set.
Can you print some more debug messages in riscv_pmu_event_update to see why it gets updated twice after this change?
On Thu, May 2, 2024 at 8:42 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
I tried the diff on my platform (upstream kernel commit version 2c81593),
but it did not seem to work. The problem persists.
> -----Original Messages-----
> From: "Atish Patra" <atishp@rivosinc.com>
> Send time:Saturday, 04/27/2024 08:07:07
> To: 雷博涵 <garthlei@pku.edu.cn>, "Atish Patra" <atishp@atishpatra.org>
> Cc: linux-riscv@lists.infradead.org
> Subject: Re: RISC-V PMU driver issue
>
> On 3/18/24 15:18, Atish Patra wrote:
> > On 3/12/24 21:29, 雷博涵 wrote:
> >> Sorry for the duplicated message. The last one was an HTML e-mail so I
> >> resend this.
> >>
> >>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
> >>>
> >>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I am having problems with the RISC-V PMU driver. The overflow
> >>>> handler of my custom perf event kernel counter seems to read an
> >>>> incorrect value from the event.
> >>>
> >>> If I understand you correctly, you have a custom kernel counter for
> >>> the perf event ? Can you explain the use case for that ?
> >>
> >> Actually it was the perf-event backend of an open-source tool called
> >> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
> >> port to RISC-V. It creates a kernel counter using
> >> `perf_event_create_kernel_counter`, registering an overflow handler.
> >> The registered overflow handler calls `perf_event_read_value` to read
> >> the counter value.
> >>
> >
> > Thanks. It makes sense now.
> >
> >>>
> >>>> It seems that the issue lies in the function
> >>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
> >>>> but does not modify the underlying hardware counter. When
> >>>> `perf_event_read_value` gets called later in the user-defined
> >>>> overflow handler, `riscv_pmu_event_update` will update the `count`
> >>>> field again based on the unmodified hardware counter value and the
> >>>> modified `prev_count` field, which causes an incorrect reading.
> >>>
> >>> What do you mean by user defined overflow handler ? The overflow
> >>> handler registered via the custom perf event kernel counter which gets
> >>> invoked from perf_event_overflow ?
> >>
> >> Yes.
> >>
> >>>
> >>>> I noticed that other PMU drivers, such as the ARM one, write to the
> >>>> underlying counter in their set_period functions, which prevents the
> >>>> problem. However, the RISC-V SBI specification does not have such an
> >>>> API to write to a counter without starting it. Using
> >>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
> >>>> condition to place `riscv_pmu_event_set_period(event)` after it
> >>>> seems to work, but I am not sure whether it can cause other issues.
> >>>
> >>> Not sure the exact code path you are referring to. You don't want to
> >>> invoke riscv_pmu_event_set_period if period_left <= 0 ?
> >>
> >> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
> >> perf event counter value (`event->count`) is updated. It calls
> >> `riscv_pmu_event_set_period` later, which calls
> >> `local64_set(&hwc->prev_count, (u64)-left)`. When
> >> `perf_event_read_value` gets invoked in the registered overflow
> >> handler, `riscv_pmu_event_update` will be invoked eventually. Because
> >> `prev_count` has been set to `(u64)-left` and the underlying hardware
> >> counter value (the value read via `rvpmu->ctr_read(event)`) has not
> >> been changed, `event->count` will be updated again, and the value this
> >> time will be incorrect.
> >
> > Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> > support as it creates a kernel perf event as well.
> >
> >>
> >> I have found that the ARM PMU does not have the problem. The function
> >> `armpmu_event_set_period` in `arm_pmu.c` calls
> >> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
> >> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
> >> hardware counter is also set to a new value. Thus, when
> >> `armpmu_event_update` gets called later in the registered handler, it
> >> will find that the underlying counter has the same value as
> >> `prev_count`, so `event->count` can keep the correct value.
> >>
> >> However, as I mentioned before, the RISC-V SBI specification does not
> >> have such an API as “write_counter,” so the ARM solution could not be
> >> adapted easily. I was wondering if we could postpone
> >> `riscv_pmu_event_set_period` after the if-block, like this:
> >>
> >> ```
> >> if (local64_read(&hw_evt->period_left) <= 0) {
> >> perf_event_overflow(event, &data, regs);
> >> }
> >> riscv_pmu_event_set_period(event);
> >> ```
> >>
> >> This way `perf_event_read_value` can return the correct counter value.
> >> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
> >> However, I am not sure if it can cause other problems.
> >>
> >
> > That seems like a hack. I think we can avoid updating again based on
> > perf state which can indicate that event->count is already updated.
> > Thus, no need to update it again. I will toy around with this idea and
> > get back to you.
> >
>
> Can you try this diff on latest upstream ?
>
>
> diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> index b4efdddb2ad9..e08529c45c50 100644
> --- a/drivers/perf/riscv_pmu.c
> +++ b/drivers/perf/riscv_pmu.c
> @@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
> unsigned long cmask;
> u64 oldval, delta;
>
> - if (!rvpmu->ctr_read)
> + if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
> return 0;
>
> cmask = riscv_pmu_ctr_get_width_mask(event);
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 8cbe6e5f9c39..6ba825a38619 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
> void *dev)
> */
> overflowed_ctrs |= BIT(lidx);
> hw_evt = &event->hw;
> + /* Update the event states here so that we know the
> state while reading */
> + hw_evt->state |= PERF_HES_STOPPED;
> riscv_pmu_event_update(event);
> + hw_evt->state |= PERF_HES_UPTODATE;
> perf_sample_data_init(&data, 0, hw_evt->last_period);
> if (riscv_pmu_event_set_period(event)) {
> /*
> @@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
> *dev)
> */
> perf_event_overflow(event, &data, regs);
> }
> + /* Reset the state as we are going to start the counter
> after the loop */
> + hw_evt->state = 0;
> }
>
>
>
> All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
> ongoing series with snapshot support[1].
>
> I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
> this works for you.
>
> [1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
>
> >>>
> >>>>
> >>>> Thank you,
> >>>> Bohan Lei
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Atish
> >>
> >> --
> >> Regards,
> >> Bohan Lei
> >>
> >> _______________________________________________
> >> linux-riscv mailing list
> >> linux-riscv@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re: Re: RISC-V PMU driver issue
2024-05-08 3:01 ` Re: " 雷博涵
@ 2024-05-08 6:57 ` Atish Kumar Patra
0 siblings, 0 replies; 7+ messages in thread
From: Atish Kumar Patra @ 2024-05-08 6:57 UTC (permalink / raw
To: 雷博涵; +Cc: linux-riscv
On Tue, May 7, 2024 at 8:01 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>
> I found the issue. `if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))`
> should be `if (!rvpmu->ctr_read || (hwc->state & PERF_HES_UPTODATE))`. Now it
> works.
>
Ahh I had a typo there. Thanks for verifying the fix. I will send the
patch soon.
>
> -----Original Messages-----
> From: "Atish Kumar Patra" <atishp@rivosinc.com>
> Send time: Tuesday, 05/07/2024 05:03:33
> To: 雷博涵 <garthlei@pku.edu.cn>
> Subject: Re: Re: RISC-V PMU driver issue
>
> Can you provide some more details about your platform and how are you testing it?
> I have verified the change in qemu with kvm implementation and the riscv_pmu_event_update the value as PERF_HES_UPTODATE is already set.
> Can you print some more debug messages in riscv_pmu_event_update to see why it gets updated twice after this change?
>
> On Thu, May 2, 2024 at 8:42 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
> I tried the diff on my platform (upstream kernel commit version 2c81593),
> but it did not seem to work. The problem persists.
>
>
> > -----Original Messages-----
> > From: "Atish Patra" <atishp@rivosinc.com>
> > Send time:Saturday, 04/27/2024 08:07:07
> > To: 雷博涵 <garthlei@pku.edu.cn>, "Atish Patra" <atishp@atishpatra.org>
> > Cc: linux-riscv@lists.infradead.org
> > Subject: Re: RISC-V PMU driver issue
> >
> > On 3/18/24 15:18, Atish Patra wrote:
> > > On 3/12/24 21:29, 雷博涵 wrote:
> > >> Sorry for the duplicated message. The last one was an HTML e-mail so I
> > >> resend this.
> > >>
> > >>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
> > >>>
> > >>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I am having problems with the RISC-V PMU driver. The overflow
> > >>>> handler of my custom perf event kernel counter seems to read an
> > >>>> incorrect value from the event.
> > >>>
> > >>> If I understand you correctly, you have a custom kernel counter for
> > >>> the perf event ? Can you explain the use case for that ?
> > >>
> > >> Actually it was the perf-event backend of an open-source tool called
> > >> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
> > >> port to RISC-V. It creates a kernel counter using
> > >> `perf_event_create_kernel_counter`, registering an overflow handler.
> > >> The registered overflow handler calls `perf_event_read_value` to read
> > >> the counter value.
> > >>
> > >
> > > Thanks. It makes sense now.
> > >
> > >>>
> > >>>> It seems that the issue lies in the function
> > >>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
> > >>>> but does not modify the underlying hardware counter. When
> > >>>> `perf_event_read_value` gets called later in the user-defined
> > >>>> overflow handler, `riscv_pmu_event_update` will update the `count`
> > >>>> field again based on the unmodified hardware counter value and the
> > >>>> modified `prev_count` field, which causes an incorrect reading.
> > >>>
> > >>> What do you mean by user defined overflow handler ? The overflow
> > >>> handler registered via the custom perf event kernel counter which gets
> > >>> invoked from perf_event_overflow ?
> > >>
> > >> Yes.
> > >>
> > >>>
> > >>>> I noticed that other PMU drivers, such as the ARM one, write to the
> > >>>> underlying counter in their set_period functions, which prevents the
> > >>>> problem. However, the RISC-V SBI specification does not have such an
> > >>>> API to write to a counter without starting it. Using
> > >>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
> > >>>> condition to place `riscv_pmu_event_set_period(event)` after it
> > >>>> seems to work, but I am not sure whether it can cause other issues.
> > >>>
> > >>> Not sure the exact code path you are referring to. You don't want to
> > >>> invoke riscv_pmu_event_set_period if period_left <= 0 ?
> > >>
> > >> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
> > >> perf event counter value (`event->count`) is updated. It calls
> > >> `riscv_pmu_event_set_period` later, which calls
> > >> `local64_set(&hwc->prev_count, (u64)-left)`. When
> > >> `perf_event_read_value` gets invoked in the registered overflow
> > >> handler, `riscv_pmu_event_update` will be invoked eventually. Because
> > >> `prev_count` has been set to `(u64)-left` and the underlying hardware
> > >> counter value (the value read via `rvpmu->ctr_read(event)`) has not
> > >> been changed, `event->count` will be updated again, and the value this
> > >> time will be incorrect.
> > >
> > > Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> > > support as it creates a kernel perf event as well.
> > >
> > >>
> > >> I have found that the ARM PMU does not have the problem. The function
> > >> `armpmu_event_set_period` in `arm_pmu.c` calls
> > >> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
> > >> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
> > >> hardware counter is also set to a new value. Thus, when
> > >> `armpmu_event_update` gets called later in the registered handler, it
> > >> will find that the underlying counter has the same value as
> > >> `prev_count`, so `event->count` can keep the correct value.
> > >>
> > >> However, as I mentioned before, the RISC-V SBI specification does not
> > >> have such an API as “write_counter,” so the ARM solution could not be
> > >> adapted easily. I was wondering if we could postpone
> > >> `riscv_pmu_event_set_period` after the if-block, like this:
> > >>
> > >> ```
> > >> if (local64_read(&hw_evt->period_left) <= 0) {
> > >> perf_event_overflow(event, &data, regs);
> > >> }
> > >> riscv_pmu_event_set_period(event);
> > >> ```
> > >>
> > >> This way `perf_event_read_value` can return the correct counter value.
> > >> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
> > >> However, I am not sure if it can cause other problems.
> > >>
> > >
> > > That seems like a hack. I think we can avoid updating again based on
> > > perf state which can indicate that event->count is already updated.
> > > Thus, no need to update it again. I will toy around with this idea and
> > > get back to you.
> > >
> >
> > Can you try this diff on latest upstream ?
> >
> >
> > diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> > index b4efdddb2ad9..e08529c45c50 100644
> > --- a/drivers/perf/riscv_pmu.c
> > +++ b/drivers/perf/riscv_pmu.c
> > @@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
> > unsigned long cmask;
> > u64 oldval, delta;
> >
> > - if (!rvpmu->ctr_read)
> > + if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
> > return 0;
> >
> > cmask = riscv_pmu_ctr_get_width_mask(event);
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 8cbe6e5f9c39..6ba825a38619 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
> > void *dev)
> > */
> > overflowed_ctrs |= BIT(lidx);
> > hw_evt = &event->hw;
> > + /* Update the event states here so that we know the
> > state while reading */
> > + hw_evt->state |= PERF_HES_STOPPED;
> > riscv_pmu_event_update(event);
> > + hw_evt->state |= PERF_HES_UPTODATE;
> > perf_sample_data_init(&data, 0, hw_evt->last_period);
> > if (riscv_pmu_event_set_period(event)) {
> > /*
> > @@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
> > *dev)
> > */
> > perf_event_overflow(event, &data, regs);
> > }
> > + /* Reset the state as we are going to start the counter
> > after the loop */
> > + hw_evt->state = 0;
> > }
> >
> >
> >
> > All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
> > ongoing series with snapshot support[1].
> >
> > I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
> > this works for you.
> >
> > [1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
> >
> > >>>
> > >>>>
> > >>>> Thank you,
> > >>>> Bohan Lei
> > >>>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> Atish
> > >>
> > >> --
> > >> Regards,
> > >> Bohan Lei
> > >>
> > >> _______________________________________________
> > >> linux-riscv mailing list
> > >> linux-riscv@lists.infradead.org
> > >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-05-08 6:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-28 3:49 RISC-V PMU driver issue 雷博涵
2024-03-13 0:02 ` Atish Patra
2024-03-13 4:29 ` 雷博涵
2024-03-18 22:18 ` Atish Patra
2024-04-27 0:07 ` Atish Patra
[not found] ` <2451aea.567f.18f3c8baa06.Coremail.garthlei@pku.edu.cn>
[not found] ` <CAHBxVyEotFmCfc+Ym-p7f=WAocAXtNA_z4MvOr7Ky8fXxPtVXA@mail.gmail.com>
2024-05-08 3:01 ` Re: " 雷博涵
2024-05-08 6:57 ` Atish Kumar Patra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).