Linux-RISC-V Archive mirror
 help / color / mirror / Atom feed
* RISC-V PMU driver issue
@ 2024-02-28  3:49 雷博涵
  2024-03-13  0:02 ` Atish Patra
  0 siblings, 1 reply; 7+ messages in thread
From: 雷博涵 @ 2024-02-28  3:49 UTC (permalink / raw
  To: Atish Patra; +Cc: linux-riscv

Hi all,

I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.

Thank you,
Bohan Lei

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RISC-V PMU driver issue
  2024-02-28  3:49 RISC-V PMU driver issue 雷博涵
@ 2024-03-13  0:02 ` Atish Patra
  2024-03-13  4:29   ` 雷博涵
  0 siblings, 1 reply; 7+ messages in thread
From: Atish Patra @ 2024-03-13  0:02 UTC (permalink / raw
  To: 雷博涵; +Cc: linux-riscv, Atish Patra

On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>
> Hi all,
>
> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.

If I understand you correctly, you have a custom kernel counter for
the perf event ? Can you explain the use case for that ?

> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.

What do you mean by user defined overflow handler ? The overflow
handler registered via the custom perf event kernel counter which gets
invoked from perf_event_overflow ?

> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.

Not sure the exact code path you are referring to. You don't want to
invoke  riscv_pmu_event_set_period if period_left <= 0 ?

>
> Thank you,
> Bohan Lei


-- 
Regards,
Atish

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RISC-V PMU driver issue
  2024-03-13  0:02 ` Atish Patra
@ 2024-03-13  4:29   ` 雷博涵
  2024-03-18 22:18     ` Atish Patra
  0 siblings, 1 reply; 7+ messages in thread
From: 雷博涵 @ 2024-03-13  4:29 UTC (permalink / raw
  To: Atish Patra; +Cc: linux-riscv, Atish Patra

Sorry for the duplicated message. The last one was an HTML e-mail so I resend this.

> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
> 
> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>> 
>> Hi all,
>> 
>> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
> 
> If I understand you correctly, you have a custom kernel counter for
> the perf event ? Can you explain the use case for that ?

Actually it was the perf-event backend of an open-source tool called PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to port to RISC-V. It creates a kernel counter using `perf_event_create_kernel_counter`, registering an overflow handler. The registered overflow handler calls `perf_event_read_value` to read the counter value.

> 
>> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
> 
> What do you mean by user defined overflow handler ? The overflow
> handler registered via the custom perf event kernel counter which gets
> invoked from perf_event_overflow ?

Yes.

> 
>> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
> 
> Not sure the exact code path you are referring to. You don't want to
> invoke  riscv_pmu_event_set_period if period_left <= 0 ?

First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the perf event counter value (`event->count`) is updated. It calls `riscv_pmu_event_set_period` later, which calls `local64_set(&hwc->prev_count, (u64)-left)`. When `perf_event_read_value` gets invoked in the registered overflow handler, `riscv_pmu_event_update` will be invoked eventually. Because `prev_count` has been set to `(u64)-left` and the underlying hardware counter value (the value read via `rvpmu->ctr_read(event)`) has not been changed, `event->count` will be updated again, and the value this time will be incorrect.

I have found that the ARM PMU does not have the problem. The function `armpmu_event_set_period` in `arm_pmu.c` calls `armpmu->write_counter(event, (u64)(-left) & max_period)` after `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying hardware counter is also set to a new value. Thus, when `armpmu_event_update` gets called later in the registered handler, it will find that the underlying counter has the same value as `prev_count`, so `event->count` can keep the correct value.

However, as I mentioned before, the RISC-V SBI specification does not have such an API as “write_counter,” so the ARM solution could not be adapted easily. I was wondering if we could postpone `riscv_pmu_event_set_period` after the if-block, like this:

```
if (local64_read(&hw_evt->period_left) <= 0) {
  perf_event_overflow(event, &data, regs);
}
riscv_pmu_event_set_period(event);
```

This way `perf_event_read_value` can return the correct counter value. `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. However, I am not sure if it can cause other problems.

> 
>> 
>> Thank you,
>> Bohan Lei
> 
> 
> -- 
> Regards,
> Atish

--
Regards,
Bohan Lei

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RISC-V PMU driver issue
  2024-03-13  4:29   ` 雷博涵
@ 2024-03-18 22:18     ` Atish Patra
  2024-04-27  0:07       ` Atish Patra
  0 siblings, 1 reply; 7+ messages in thread
From: Atish Patra @ 2024-03-18 22:18 UTC (permalink / raw
  To: 雷博涵, Atish Patra; +Cc: linux-riscv

On 3/12/24 21:29, 雷博涵 wrote:
> Sorry for the duplicated message. The last one was an HTML e-mail so I resend this.
> 
>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
>>
>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>>>
>>> Hi all,
>>>
>>> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
>>
>> If I understand you correctly, you have a custom kernel counter for
>> the perf event ? Can you explain the use case for that ?
> 
> Actually it was the perf-event backend of an open-source tool called PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to port to RISC-V. It creates a kernel counter using `perf_event_create_kernel_counter`, registering an overflow handler. The registered overflow handler calls `perf_event_read_value` to read the counter value.
> 

Thanks. It makes sense now.

>>
>>> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
>>
>> What do you mean by user defined overflow handler ? The overflow
>> handler registered via the custom perf event kernel counter which gets
>> invoked from perf_event_overflow ?
> 
> Yes.
> 
>>
>>> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
>>
>> Not sure the exact code path you are referring to. You don't want to
>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
> 
> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the perf event counter value (`event->count`) is updated. It calls `riscv_pmu_event_set_period` later, which calls `local64_set(&hwc->prev_count, (u64)-left)`. When `perf_event_read_value` gets invoked in the registered overflow handler, `riscv_pmu_event_update` will be invoked eventually. Because `prev_count` has been set to `(u64)-left` and the underlying hardware counter value (the value read via `rvpmu->ctr_read(event)`) has not been changed, `event->count` will be updated again, and the value this time will be incorrect.

Got it. This is a genuine issue. I can reproduce it as well with kvm pmu 
support as it creates a kernel perf event as well.

> 
> I have found that the ARM PMU does not have the problem. The function `armpmu_event_set_period` in `arm_pmu.c` calls `armpmu->write_counter(event, (u64)(-left) & max_period)` after `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying hardware counter is also set to a new value. Thus, when `armpmu_event_update` gets called later in the registered handler, it will find that the underlying counter has the same value as `prev_count`, so `event->count` can keep the correct value.
> 
> However, as I mentioned before, the RISC-V SBI specification does not have such an API as “write_counter,” so the ARM solution could not be adapted easily. I was wondering if we could postpone `riscv_pmu_event_set_period` after the if-block, like this:
> 
> ```
> if (local64_read(&hw_evt->period_left) <= 0) {
>    perf_event_overflow(event, &data, regs);
> }
> riscv_pmu_event_set_period(event);
> ```
> 
> This way `perf_event_read_value` can return the correct counter value. `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. However, I am not sure if it can cause other problems.
> 

That seems like a hack. I think we can avoid updating again based on 
perf state which can indicate that event->count is already updated. 
Thus, no need to update it again. I will toy around with this idea and 
get back to you.

>>
>>>
>>> Thank you,
>>> Bohan Lei
>>
>>
>> -- 
>> Regards,
>> Atish
> 
> --
> Regards,
> Bohan Lei
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RISC-V PMU driver issue
  2024-03-18 22:18     ` Atish Patra
@ 2024-04-27  0:07       ` Atish Patra
       [not found]         ` <2451aea.567f.18f3c8baa06.Coremail.garthlei@pku.edu.cn>
  0 siblings, 1 reply; 7+ messages in thread
From: Atish Patra @ 2024-04-27  0:07 UTC (permalink / raw
  To: 雷博涵, Atish Patra; +Cc: linux-riscv

On 3/18/24 15:18, Atish Patra wrote:
> On 3/12/24 21:29, 雷博涵 wrote:
>> Sorry for the duplicated message. The last one was an HTML e-mail so I 
>> resend this.
>>
>>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
>>>
>>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I am having problems with the RISC-V PMU driver. The overflow 
>>>> handler of my custom perf event kernel counter seems to read an 
>>>> incorrect value from the event.
>>>
>>> If I understand you correctly, you have a custom kernel counter for
>>> the perf event ? Can you explain the use case for that ?
>>
>> Actually it was the perf-event backend of an open-source tool called 
>> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to 
>> port to RISC-V. It creates a kernel counter using 
>> `perf_event_create_kernel_counter`, registering an overflow handler. 
>> The registered overflow handler calls `perf_event_read_value` to read 
>> the counter value.
>>
> 
> Thanks. It makes sense now.
> 
>>>
>>>> It seems that the issue lies in the function 
>>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value 
>>>> but does not modify the underlying hardware counter. When 
>>>> `perf_event_read_value` gets called later in the user-defined 
>>>> overflow handler, `riscv_pmu_event_update` will update the `count` 
>>>> field again based on the unmodified hardware counter value and the 
>>>> modified `prev_count` field, which causes an incorrect reading.
>>>
>>> What do you mean by user defined overflow handler ? The overflow
>>> handler registered via the custom perf event kernel counter which gets
>>> invoked from perf_event_overflow ?
>>
>> Yes.
>>
>>>
>>>> I noticed that other PMU drivers, such as the ARM one, write to the 
>>>> underlying counter in their set_period functions, which prevents the 
>>>> problem. However, the RISC-V SBI specification does not have such an 
>>>> API to write to a counter without starting it. Using 
>>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard 
>>>> condition to place `riscv_pmu_event_set_period(event)` after it 
>>>> seems to work, but I am not sure whether it can cause other issues.
>>>
>>> Not sure the exact code path you are referring to. You don't want to
>>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
>>
>> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the 
>> perf event counter value (`event->count`) is updated. It calls 
>> `riscv_pmu_event_set_period` later, which calls 
>> `local64_set(&hwc->prev_count, (u64)-left)`. When 
>> `perf_event_read_value` gets invoked in the registered overflow 
>> handler, `riscv_pmu_event_update` will be invoked eventually. Because 
>> `prev_count` has been set to `(u64)-left` and the underlying hardware 
>> counter value (the value read via `rvpmu->ctr_read(event)`) has not 
>> been changed, `event->count` will be updated again, and the value this 
>> time will be incorrect.
> 
> Got it. This is a genuine issue. I can reproduce it as well with kvm pmu 
> support as it creates a kernel perf event as well.
> 
>>
>> I have found that the ARM PMU does not have the problem. The function 
>> `armpmu_event_set_period` in `arm_pmu.c` calls 
>> `armpmu->write_counter(event, (u64)(-left) & max_period)` after 
>> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying 
>> hardware counter is also set to a new value. Thus, when 
>> `armpmu_event_update` gets called later in the registered handler, it 
>> will find that the underlying counter has the same value as 
>> `prev_count`, so `event->count` can keep the correct value.
>>
>> However, as I mentioned before, the RISC-V SBI specification does not 
>> have such an API as “write_counter,” so the ARM solution could not be 
>> adapted easily. I was wondering if we could postpone 
>> `riscv_pmu_event_set_period` after the if-block, like this:
>>
>> ```
>> if (local64_read(&hw_evt->period_left) <= 0) {
>>    perf_event_overflow(event, &data, regs);
>> }
>> riscv_pmu_event_set_period(event);
>> ```
>>
>> This way `perf_event_read_value` can return the correct counter value. 
>> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. 
>> However, I am not sure if it can cause other problems.
>>
> 
> That seems like a hack. I think we can avoid updating again based on 
> perf state which can indicate that event->count is already updated. 
> Thus, no need to update it again. I will toy around with this idea and 
> get back to you.
> 

Can you try this diff on latest upstream ?


diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index b4efdddb2ad9..e08529c45c50 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
         unsigned long cmask;
         u64 oldval, delta;

-       if (!rvpmu->ctr_read)
+       if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
                 return 0;

         cmask = riscv_pmu_ctr_get_width_mask(event);
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8cbe6e5f9c39..6ba825a38619 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, 
void *dev)
                  */
                 overflowed_ctrs |= BIT(lidx);
                 hw_evt = &event->hw;
+               /* Update the event states here so that we know the 
state while reading */
+               hw_evt->state |= PERF_HES_STOPPED;
                 riscv_pmu_event_update(event);
+               hw_evt->state |= PERF_HES_UPTODATE;
                 perf_sample_data_init(&data, 0, hw_evt->last_period);
                 if (riscv_pmu_event_set_period(event)) {
                         /*
@@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void 
*dev)
                          */
                         perf_event_overflow(event, &data, regs);
                 }
+               /* Reset the state as we are going to start the counter 
after the loop */
+               hw_evt->state = 0;
         }



All the changes in drivers/perf/riscv_pmu_sbi.c are already part of 
ongoing series with snapshot support[1].

I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if 
this works for you.

[1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/

>>>
>>>>
>>>> Thank you,
>>>> Bohan Lei
>>>
>>>
>>> -- 
>>> Regards,
>>> Atish
>>
>> -- 
>> Regards,
>> Bohan Lei
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Re: Re: RISC-V PMU driver issue
       [not found]           ` <CAHBxVyEotFmCfc+Ym-p7f=WAocAXtNA_z4MvOr7Ky8fXxPtVXA@mail.gmail.com>
@ 2024-05-08  3:01             ` 雷博涵
  2024-05-08  6:57               ` Atish Kumar Patra
  0 siblings, 1 reply; 7+ messages in thread
From: 雷博涵 @ 2024-05-08  3:01 UTC (permalink / raw
  To: Atish Kumar Patra; +Cc: linux-riscv

I found the issue. `if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))`
should be `if (!rvpmu->ctr_read || (hwc->state & PERF_HES_UPTODATE))`. Now it
works.


-----Original Messages-----
From: "Atish Kumar Patra" <atishp@rivosinc.com>
Send time: Tuesday, 05/07/2024 05:03:33
To: 雷博涵 <garthlei@pku.edu.cn>
Subject: Re: Re: RISC-V PMU driver issue

Can you provide some more details about your platform and how are you testing it? 
I have verified the change in qemu with kvm implementation and the riscv_pmu_event_update the value as PERF_HES_UPTODATE is already set.
Can you print some more debug messages in riscv_pmu_event_update to see why it gets updated twice after this change?

On Thu, May 2, 2024 at 8:42 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
I tried the diff on my platform (upstream kernel commit version 2c81593),
but it did not seem to work. The problem persists.


> -----Original Messages-----
> From: "Atish Patra" <atishp@rivosinc.com>
> Send time:Saturday, 04/27/2024 08:07:07
> To: 雷博涵 <garthlei@pku.edu.cn>, "Atish Patra" <atishp@atishpatra.org>
> Cc: linux-riscv@lists.infradead.org
> Subject: Re: RISC-V PMU driver issue
>
> On 3/18/24 15:18, Atish Patra wrote:
> > On 3/12/24 21:29, 雷博涵 wrote:
> >> Sorry for the duplicated message. The last one was an HTML e-mail so I
> >> resend this.
> >>
> >>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
> >>>
> >>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I am having problems with the RISC-V PMU driver. The overflow
> >>>> handler of my custom perf event kernel counter seems to read an
> >>>> incorrect value from the event.
> >>>
> >>> If I understand you correctly, you have a custom kernel counter for
> >>> the perf event ? Can you explain the use case for that ?
> >>
> >> Actually it was the perf-event backend of an open-source tool called
> >> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
> >> port to RISC-V. It creates a kernel counter using
> >> `perf_event_create_kernel_counter`, registering an overflow handler.
> >> The registered overflow handler calls `perf_event_read_value` to read
> >> the counter value.
> >>
> >
> > Thanks. It makes sense now.
> >
> >>>
> >>>> It seems that the issue lies in the function
> >>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
> >>>> but does not modify the underlying hardware counter. When
> >>>> `perf_event_read_value` gets called later in the user-defined
> >>>> overflow handler, `riscv_pmu_event_update` will update the `count`
> >>>> field again based on the unmodified hardware counter value and the
> >>>> modified `prev_count` field, which causes an incorrect reading.
> >>>
> >>> What do you mean by user defined overflow handler ? The overflow
> >>> handler registered via the custom perf event kernel counter which gets
> >>> invoked from perf_event_overflow ?
> >>
> >> Yes.
> >>
> >>>
> >>>> I noticed that other PMU drivers, such as the ARM one, write to the
> >>>> underlying counter in their set_period functions, which prevents the
> >>>> problem. However, the RISC-V SBI specification does not have such an
> >>>> API to write to a counter without starting it. Using
> >>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
> >>>> condition to place `riscv_pmu_event_set_period(event)` after it
> >>>> seems to work, but I am not sure whether it can cause other issues.
> >>>
> >>> Not sure the exact code path you are referring to. You don't want to
> >>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
> >>
> >> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
> >> perf event counter value (`event->count`) is updated. It calls
> >> `riscv_pmu_event_set_period` later, which calls
> >> `local64_set(&hwc->prev_count, (u64)-left)`. When
> >> `perf_event_read_value` gets invoked in the registered overflow
> >> handler, `riscv_pmu_event_update` will be invoked eventually. Because
> >> `prev_count` has been set to `(u64)-left` and the underlying hardware
> >> counter value (the value read via `rvpmu->ctr_read(event)`) has not
> >> been changed, `event->count` will be updated again, and the value this
> >> time will be incorrect.
> >
> > Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> > support as it creates a kernel perf event as well.
> >
> >>
> >> I have found that the ARM PMU does not have the problem. The function
> >> `armpmu_event_set_period` in `arm_pmu.c` calls
> >> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
> >> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
> >> hardware counter is also set to a new value. Thus, when
> >> `armpmu_event_update` gets called later in the registered handler, it
> >> will find that the underlying counter has the same value as
> >> `prev_count`, so `event->count` can keep the correct value.
> >>
> >> However, as I mentioned before, the RISC-V SBI specification does not
> >> have such an API as “write_counter,” so the ARM solution could not be
> >> adapted easily. I was wondering if we could postpone
> >> `riscv_pmu_event_set_period` after the if-block, like this:
> >>
> >> ```
> >> if (local64_read(&hw_evt->period_left) <= 0) {
> >>    perf_event_overflow(event, &data, regs);
> >> }
> >> riscv_pmu_event_set_period(event);
> >> ```
> >>
> >> This way `perf_event_read_value` can return the correct counter value.
> >> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
> >> However, I am not sure if it can cause other problems.
> >>
> >
> > That seems like a hack. I think we can avoid updating again based on
> > perf state which can indicate that event->count is already updated.
> > Thus, no need to update it again. I will toy around with this idea and
> > get back to you.
> >
>
> Can you try this diff on latest upstream ?
>
>
> diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> index b4efdddb2ad9..e08529c45c50 100644
> --- a/drivers/perf/riscv_pmu.c
> +++ b/drivers/perf/riscv_pmu.c
> @@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
>          unsigned long cmask;
>          u64 oldval, delta;
>
> -       if (!rvpmu->ctr_read)
> +       if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
>                  return 0;
>
>          cmask = riscv_pmu_ctr_get_width_mask(event);
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 8cbe6e5f9c39..6ba825a38619 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
> void *dev)
>                   */
>                  overflowed_ctrs |= BIT(lidx);
>                  hw_evt = &event->hw;
> +               /* Update the event states here so that we know the
> state while reading */
> +               hw_evt->state |= PERF_HES_STOPPED;
>                  riscv_pmu_event_update(event);
> +               hw_evt->state |= PERF_HES_UPTODATE;
>                  perf_sample_data_init(&data, 0, hw_evt->last_period);
>                  if (riscv_pmu_event_set_period(event)) {
>                          /*
> @@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
> *dev)
>                           */
>                          perf_event_overflow(event, &data, regs);
>                  }
> +               /* Reset the state as we are going to start the counter
> after the loop */
> +               hw_evt->state = 0;
>          }
>
>
>
> All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
> ongoing series with snapshot support[1].
>
> I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
> this works for you.
>
> [1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
>
> >>>
> >>>>
> >>>> Thank you,
> >>>> Bohan Lei
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Atish
> >>
> >> --
> >> Regards,
> >> Bohan Lei
> >>
> >> _______________________________________________
> >> linux-riscv mailing list
> >> linux-riscv@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Re: RISC-V PMU driver issue
  2024-05-08  3:01             ` Re: " 雷博涵
@ 2024-05-08  6:57               ` Atish Kumar Patra
  0 siblings, 0 replies; 7+ messages in thread
From: Atish Kumar Patra @ 2024-05-08  6:57 UTC (permalink / raw
  To: 雷博涵; +Cc: linux-riscv

On Tue, May 7, 2024 at 8:01 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
>
> I found the issue. `if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))`
> should be `if (!rvpmu->ctr_read || (hwc->state & PERF_HES_UPTODATE))`. Now it
> works.
>

Ahh I had a typo there. Thanks for verifying the fix. I will send the
patch soon.

>
> -----Original Messages-----
> From: "Atish Kumar Patra" <atishp@rivosinc.com>
> Send time: Tuesday, 05/07/2024 05:03:33
> To: 雷博涵 <garthlei@pku.edu.cn>
> Subject: Re: Re: RISC-V PMU driver issue
>
> Can you provide some more details about your platform and how are you testing it?
> I have verified the change in qemu with kvm implementation and the riscv_pmu_event_update the value as PERF_HES_UPTODATE is already set.
> Can you print some more debug messages in riscv_pmu_event_update to see why it gets updated twice after this change?
>
> On Thu, May 2, 2024 at 8:42 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
> I tried the diff on my platform (upstream kernel commit version 2c81593),
> but it did not seem to work. The problem persists.
>
>
> > -----Original Messages-----
> > From: "Atish Patra" <atishp@rivosinc.com>
> > Send time:Saturday, 04/27/2024 08:07:07
> > To: 雷博涵 <garthlei@pku.edu.cn>, "Atish Patra" <atishp@atishpatra.org>
> > Cc: linux-riscv@lists.infradead.org
> > Subject: Re: RISC-V PMU driver issue
> >
> > On 3/18/24 15:18, Atish Patra wrote:
> > > On 3/12/24 21:29, 雷博涵 wrote:
> > >> Sorry for the duplicated message. The last one was an HTML e-mail so I
> > >> resend this.
> > >>
> > >>> 2024年3月13日 08:02,Atish Patra <atishp@atishpatra.org> 写道:
> > >>>
> > >>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei@pku.edu.cn> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I am having problems with the RISC-V PMU driver. The overflow
> > >>>> handler of my custom perf event kernel counter seems to read an
> > >>>> incorrect value from the event.
> > >>>
> > >>> If I understand you correctly, you have a custom kernel counter for
> > >>> the perf event ? Can you explain the use case for that ?
> > >>
> > >> Actually it was the perf-event backend of an open-source tool called
> > >> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
> > >> port to RISC-V. It creates a kernel counter using
> > >> `perf_event_create_kernel_counter`, registering an overflow handler.
> > >> The registered overflow handler calls `perf_event_read_value` to read
> > >> the counter value.
> > >>
> > >
> > > Thanks. It makes sense now.
> > >
> > >>>
> > >>>> It seems that the issue lies in the function
> > >>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
> > >>>> but does not modify the underlying hardware counter. When
> > >>>> `perf_event_read_value` gets called later in the user-defined
> > >>>> overflow handler, `riscv_pmu_event_update` will update the `count`
> > >>>> field again based on the unmodified hardware counter value and the
> > >>>> modified `prev_count` field, which causes an incorrect reading.
> > >>>
> > >>> What do you mean by user defined overflow handler ? The overflow
> > >>> handler registered via the custom perf event kernel counter which gets
> > >>> invoked from perf_event_overflow ?
> > >>
> > >> Yes.
> > >>
> > >>>
> > >>>> I noticed that other PMU drivers, such as the ARM one, write to the
> > >>>> underlying counter in their set_period functions, which prevents the
> > >>>> problem. However, the RISC-V SBI specification does not have such an
> > >>>> API to write to a counter without starting it. Using
> > >>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
> > >>>> condition to place `riscv_pmu_event_set_period(event)` after it
> > >>>> seems to work, but I am not sure whether it can cause other issues.
> > >>>
> > >>> Not sure the exact code path you are referring to. You don't want to
> > >>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
> > >>
> > >> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
> > >> perf event counter value (`event->count`) is updated. It calls
> > >> `riscv_pmu_event_set_period` later, which calls
> > >> `local64_set(&hwc->prev_count, (u64)-left)`. When
> > >> `perf_event_read_value` gets invoked in the registered overflow
> > >> handler, `riscv_pmu_event_update` will be invoked eventually. Because
> > >> `prev_count` has been set to `(u64)-left` and the underlying hardware
> > >> counter value (the value read via `rvpmu->ctr_read(event)`) has not
> > >> been changed, `event->count` will be updated again, and the value this
> > >> time will be incorrect.
> > >
> > > Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> > > support as it creates a kernel perf event as well.
> > >
> > >>
> > >> I have found that the ARM PMU does not have the problem. The function
> > >> `armpmu_event_set_period` in `arm_pmu.c` calls
> > >> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
> > >> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
> > >> hardware counter is also set to a new value. Thus, when
> > >> `armpmu_event_update` gets called later in the registered handler, it
> > >> will find that the underlying counter has the same value as
> > >> `prev_count`, so `event->count` can keep the correct value.
> > >>
> > >> However, as I mentioned before, the RISC-V SBI specification does not
> > >> have such an API as “write_counter,” so the ARM solution could not be
> > >> adapted easily. I was wondering if we could postpone
> > >> `riscv_pmu_event_set_period` after the if-block, like this:
> > >>
> > >> ```
> > >> if (local64_read(&hw_evt->period_left) <= 0) {
> > >>    perf_event_overflow(event, &data, regs);
> > >> }
> > >> riscv_pmu_event_set_period(event);
> > >> ```
> > >>
> > >> This way `perf_event_read_value` can return the correct counter value.
> > >> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
> > >> However, I am not sure if it can cause other problems.
> > >>
> > >
> > > That seems like a hack. I think we can avoid updating again based on
> > > perf state which can indicate that event->count is already updated.
> > > Thus, no need to update it again. I will toy around with this idea and
> > > get back to you.
> > >
> >
> > Can you try this diff on latest upstream ?
> >
> >
> > diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> > index b4efdddb2ad9..e08529c45c50 100644
> > --- a/drivers/perf/riscv_pmu.c
> > +++ b/drivers/perf/riscv_pmu.c
> > @@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
> >          unsigned long cmask;
> >          u64 oldval, delta;
> >
> > -       if (!rvpmu->ctr_read)
> > +       if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
> >                  return 0;
> >
> >          cmask = riscv_pmu_ctr_get_width_mask(event);
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 8cbe6e5f9c39..6ba825a38619 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
> > void *dev)
> >                   */
> >                  overflowed_ctrs |= BIT(lidx);
> >                  hw_evt = &event->hw;
> > +               /* Update the event states here so that we know the
> > state while reading */
> > +               hw_evt->state |= PERF_HES_STOPPED;
> >                  riscv_pmu_event_update(event);
> > +               hw_evt->state |= PERF_HES_UPTODATE;
> >                  perf_sample_data_init(&data, 0, hw_evt->last_period);
> >                  if (riscv_pmu_event_set_period(event)) {
> >                          /*
> > @@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
> > *dev)
> >                           */
> >                          perf_event_overflow(event, &data, regs);
> >                  }
> > +               /* Reset the state as we are going to start the counter
> > after the loop */
> > +               hw_evt->state = 0;
> >          }
> >
> >
> >
> > All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
> > ongoing series with snapshot support[1].
> >
> > I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
> > this works for you.
> >
> > [1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
> >
> > >>>
> > >>>>
> > >>>> Thank you,
> > >>>> Bohan Lei
> > >>>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> Atish
> > >>
> > >> --
> > >> Regards,
> > >> Bohan Lei
> > >>
> > >> _______________________________________________
> > >> linux-riscv mailing list
> > >> linux-riscv@lists.infradead.org
> > >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-08  6:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-28  3:49 RISC-V PMU driver issue 雷博涵
2024-03-13  0:02 ` Atish Patra
2024-03-13  4:29   ` 雷博涵
2024-03-18 22:18     ` Atish Patra
2024-04-27  0:07       ` Atish Patra
     [not found]         ` <2451aea.567f.18f3c8baa06.Coremail.garthlei@pku.edu.cn>
     [not found]           ` <CAHBxVyEotFmCfc+Ym-p7f=WAocAXtNA_z4MvOr7Ky8fXxPtVXA@mail.gmail.com>
2024-05-08  3:01             ` Re: " 雷博涵
2024-05-08  6:57               ` Atish Kumar Patra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).