All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs
@ 2023-12-02 15:08 Shinas Rasheed
  2023-12-04 22:10 ` Michal Schmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Shinas Rasheed @ 2023-12-02 15:08 UTC (permalink / raw
  To: netdev, linux-kernel
  Cc: hgani, vimleshk, egallen, mschmidt, pabeni, horms, kuba, davem,
	wizhao, konguyen, Shinas Rasheed, Veerasenareddy Burru,
	Sathesh Edara, Eric Dumazet

Do INIT_WORK for the various workqueue tasks before the first
invocation of any control net APIs. Since octep_ctrl_net_get_info
was called before the control net receive work task was even
initialised, the function call wasn't returning actual firmware
info queried from Octeon.

Fixes: 8d6198a14e2b ("octeon_ep: support to fetch firmware info")
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
 .../net/ethernet/marvell/octeon_ep/octep_main.c    | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 552970c7dec0..3e7bfd3e0f56 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -1193,6 +1193,13 @@ int octep_device_setup(struct octep_device *oct)
 	if (ret)
 		return ret;
 
+	INIT_WORK(&oct->tx_timeout_task, octep_tx_timeout_task);
+	INIT_WORK(&oct->ctrl_mbox_task, octep_ctrl_mbox_task);
+	INIT_DELAYED_WORK(&oct->intr_poll_task, octep_intr_poll_task);
+	oct->poll_non_ioq_intr = true;
+	queue_delayed_work(octep_wq, &oct->intr_poll_task,
+			   msecs_to_jiffies(OCTEP_INTR_POLL_TIME_MSECS));
+
 	atomic_set(&oct->hb_miss_cnt, 0);
 	INIT_DELAYED_WORK(&oct->hb_task, octep_hb_timeout_task);
 
@@ -1333,13 +1340,6 @@ static int octep_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	queue_delayed_work(octep_wq, &octep_dev->hb_task,
 			   msecs_to_jiffies(octep_dev->conf->fw_info.hb_interval));
 
-	INIT_WORK(&octep_dev->tx_timeout_task, octep_tx_timeout_task);
-	INIT_WORK(&octep_dev->ctrl_mbox_task, octep_ctrl_mbox_task);
-	INIT_DELAYED_WORK(&octep_dev->intr_poll_task, octep_intr_poll_task);
-	octep_dev->poll_non_ioq_intr = true;
-	queue_delayed_work(octep_wq, &octep_dev->intr_poll_task,
-			   msecs_to_jiffies(OCTEP_INTR_POLL_TIME_MSECS));
-
 	netdev->netdev_ops = &octep_netdev_ops;
 	octep_set_ethtool_ops(netdev);
 	netif_carrier_off(netdev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs
  2023-12-02 15:08 [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs Shinas Rasheed
@ 2023-12-04 22:10 ` Michal Schmidt
  2023-12-04 22:13   ` Michal Schmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Schmidt @ 2023-12-04 22:10 UTC (permalink / raw
  To: Shinas Rasheed
  Cc: netdev, linux-kernel, hgani, vimleshk, egallen, pabeni, horms,
	kuba, davem, wizhao, konguyen, Veerasenareddy Burru,
	Sathesh Edara, Eric Dumazet

On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@marvell.com> wrote:
> Do INIT_WORK for the various workqueue tasks before the first
> invocation of any control net APIs. Since octep_ctrl_net_get_info
> was called before the control net receive work task was even
> initialised, the function call wasn't returning actual firmware
> info queried from Octeon.

It might be more accurate to say that octep_ctrl_net_get_info depends
on the processing of OEI events. This happens in intr_poll_task.
That's why intr_poll_task needs to be queued earlier.
Did octep_send_mbox_req previously always fail with EAGAIN after
running into the 500 ms timeout in octep_send_mbox_req?

Apropos octep_send_mbox_req... I think it has a race. "d" is put on
the ctrl_req_wait_list after sending the request to the hardware. If
the response arrives quickly, "d" might not yet be on the list when
process_mbox_resp looks for it.
Also, what protects ctrl_req_wait_list from concurrent access?

Michal

> Fixes: 8d6198a14e2b ("octeon_ep: support to fetch firmware info")
> Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> ---
>  .../net/ethernet/marvell/octeon_ep/octep_main.c    | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> index 552970c7dec0..3e7bfd3e0f56 100644
> --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> @@ -1193,6 +1193,13 @@ int octep_device_setup(struct octep_device *oct)
>         if (ret)
>                 return ret;
>
> +       INIT_WORK(&oct->tx_timeout_task, octep_tx_timeout_task);
> +       INIT_WORK(&oct->ctrl_mbox_task, octep_ctrl_mbox_task);
> +       INIT_DELAYED_WORK(&oct->intr_poll_task, octep_intr_poll_task);
> +       oct->poll_non_ioq_intr = true;
> +       queue_delayed_work(octep_wq, &oct->intr_poll_task,
> +                          msecs_to_jiffies(OCTEP_INTR_POLL_TIME_MSECS));
> +
>         atomic_set(&oct->hb_miss_cnt, 0);
>         INIT_DELAYED_WORK(&oct->hb_task, octep_hb_timeout_task);
>
> @@ -1333,13 +1340,6 @@ static int octep_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         queue_delayed_work(octep_wq, &octep_dev->hb_task,
>                            msecs_to_jiffies(octep_dev->conf->fw_info.hb_interval));
>
> -       INIT_WORK(&octep_dev->tx_timeout_task, octep_tx_timeout_task);
> -       INIT_WORK(&octep_dev->ctrl_mbox_task, octep_ctrl_mbox_task);
> -       INIT_DELAYED_WORK(&octep_dev->intr_poll_task, octep_intr_poll_task);
> -       octep_dev->poll_non_ioq_intr = true;
> -       queue_delayed_work(octep_wq, &octep_dev->intr_poll_task,
> -                          msecs_to_jiffies(OCTEP_INTR_POLL_TIME_MSECS));
> -
>         netdev->netdev_ops = &octep_netdev_ops;
>         octep_set_ethtool_ops(netdev);
>         netif_carrier_off(netdev);
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs
  2023-12-04 22:10 ` Michal Schmidt
@ 2023-12-04 22:13   ` Michal Schmidt
  2023-12-05  6:50     ` [EXT] " Shinas Rasheed
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Schmidt @ 2023-12-04 22:13 UTC (permalink / raw
  To: Shinas Rasheed
  Cc: netdev, linux-kernel, hgani, vimleshk, egallen, pabeni, horms,
	kuba, davem, wizhao, konguyen, Veerasenareddy Burru,
	Sathesh Edara, Eric Dumazet

On Mon, Dec 4, 2023 at 11:10 PM Michal Schmidt <mschmidt@redhat.com> wrote:
>
> On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@marvell.com> wrote:
> > Do INIT_WORK for the various workqueue tasks before the first
> > invocation of any control net APIs. Since octep_ctrl_net_get_info
> > was called before the control net receive work task was even
> > initialised, the function call wasn't returning actual firmware
> > info queried from Octeon.
>
> It might be more accurate to say that octep_ctrl_net_get_info depends
> on the processing of OEI events. This happens in intr_poll_task.
> That's why intr_poll_task needs to be queued earlier.
> Did octep_send_mbox_req previously always fail with EAGAIN after
          ^^^^^^^^^^^^^^^^^^^^^
I meant octep_ctrl_net_get_info here.

> running into the 500 ms timeout in octep_send_mbox_req?
>
> Apropos octep_send_mbox_req... I think it has a race. "d" is put on
> the ctrl_req_wait_list after sending the request to the hardware. If
> the response arrives quickly, "d" might not yet be on the list when
> process_mbox_resp looks for it.
> Also, what protects ctrl_req_wait_list from concurrent access?
>
> Michal
>
> > Fixes: 8d6198a14e2b ("octeon_ep: support to fetch firmware info")
> > Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
> > ---
> >  .../net/ethernet/marvell/octeon_ep/octep_main.c    | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > index 552970c7dec0..3e7bfd3e0f56 100644
> > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > @@ -1193,6 +1193,13 @@ int octep_device_setup(struct octep_device *oct)
> >         if (ret)
> >                 return ret;
> >
> > +       INIT_WORK(&oct->tx_timeout_task, octep_tx_timeout_task);
> > +       INIT_WORK(&oct->ctrl_mbox_task, octep_ctrl_mbox_task);
> > +       INIT_DELAYED_WORK(&oct->intr_poll_task, octep_intr_poll_task);
> > +       oct->poll_non_ioq_intr = true;
> > +       queue_delayed_work(octep_wq, &oct->intr_poll_task,
> > +                          msecs_to_jiffies(OCTEP_INTR_POLL_TIME_MSECS));
> > +
> >         atomic_set(&oct->hb_miss_cnt, 0);
> >         INIT_DELAYED_WORK(&oct->hb_task, octep_hb_timeout_task);
> >
> > @@ -1333,13 +1340,6 @@ static int octep_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> >         queue_delayed_work(octep_wq, &octep_dev->hb_task,
> >                            msecs_to_jiffies(octep_dev->conf->fw_info.hb_interval));
> >
> > -       INIT_WORK(&octep_dev->tx_timeout_task, octep_tx_timeout_task);
> > -       INIT_WORK(&octep_dev->ctrl_mbox_task, octep_ctrl_mbox_task);
> > -       INIT_DELAYED_WORK(&octep_dev->intr_poll_task, octep_intr_poll_task);
> > -       octep_dev->poll_non_ioq_intr = true;
> > -       queue_delayed_work(octep_wq, &octep_dev->intr_poll_task,
> > -                          msecs_to_jiffies(OCTEP_INTR_POLL_TIME_MSECS));
> > -
> >         netdev->netdev_ops = &octep_netdev_ops;
> >         octep_set_ethtool_ops(netdev);
> >         netif_carrier_off(netdev);
> > --
> > 2.25.1
> >


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs
  2023-12-04 22:13   ` Michal Schmidt
@ 2023-12-05  6:50     ` Shinas Rasheed
  2023-12-05  9:00       ` Michal Schmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Shinas Rasheed @ 2023-12-05  6:50 UTC (permalink / raw
  To: Michal Schmidt
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Haseeb Gani,
	Vimlesh Kumar, egallen@redhat.com, pabeni@redhat.com,
	horms@kernel.org, kuba@kernel.org, davem@davemloft.net,
	wizhao@redhat.com, konguyen@redhat.com, Veerasenareddy Burru,
	Sathesh B Edara, Eric Dumazet



> -----Original Message-----
> > On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@marvell.com>
> wrote:
> > > Do INIT_WORK for the various workqueue tasks before the first
> > > invocation of any control net APIs. Since octep_ctrl_net_get_info
> > > was called before the control net receive work task was even
> > > initialised, the function call wasn't returning actual firmware
> > > info queried from Octeon.
> >
> > It might be more accurate to say that octep_ctrl_net_get_info depends
> > on the processing of OEI events. This happens in intr_poll_task.
> > That's why intr_poll_task needs to be queued earlier.


Intr_poll_task is queued only when the interface is down and the PF cannot catch IRQs as they have been torn down.
Elsewise, OEI events will trigger the OEI IRQ and consequently its handler. Nevertheless, your point is correct in that it
needs to be queued earlier, but I think subsequently since it calls the control mbox task, that is more relevant and necessary as if it
is not initialized, it cannot be scheduled even if OEI interrupts have been caught.

> > Did octep_send_mbox_req previously always fail with EAGAIN after
>           ^^^^^^^^^^^^^^^^^^^^^
> I meant octep_ctrl_net_get_info here.
> 
> > running into the 500 ms timeout in octep_send_mbox_req?

Yes it did, but as it was silent (note that we're not checking any error value), it didn't stop operation. I think I might have to update this patch
to catch the error values as well (This was a relic from the original code which spawned an extra thread to setup device and hence couldn't give back
an error value. That implementation was discouraged and we setup things at probe itself in the upstreamed code and can check error values)

> > Apropos octep_send_mbox_req... I think it has a race. "d" is put on
> > the ctrl_req_wait_list after sending the request to the hardware. If
> > the response arrives quickly, "d" might not yet be on the list when
> > process_mbox_resp looks for it.
> > Also, what protects ctrl_req_wait_list from concurrent access?

Such a race condition is, I also think, valid, but is not currently occurring as response, after due processing from Octeon application,
wouldn't arrive that quickly. Regarding concurrent access, there is currently no protection for ctrl_req_wait_list. Concurrent access here,
can only happen if either two requests manage to get hold of the ctrl_req_wait_list or a request and a response manages to get hold of the
ctrl_req_wait_list (the case you stated above). 

In the first case, since locks are implemented atop the control mbox itself, requests would have to in effect wait for their chance to
queue their wait data "d" to ctrl_req_wait_list, avoiding concurrent access. 

The second case is valid, but as I stated, wouldn't happen practically. But I suppose we do have to handle all theoretical cases and perhaps
locking can be done. I suppose a separate patch for it might be better.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs
  2023-12-05  6:50     ` [EXT] " Shinas Rasheed
@ 2023-12-05  9:00       ` Michal Schmidt
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Schmidt @ 2023-12-05  9:00 UTC (permalink / raw
  To: Shinas Rasheed
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Haseeb Gani,
	Vimlesh Kumar, egallen@redhat.com, pabeni@redhat.com,
	horms@kernel.org, kuba@kernel.org, davem@davemloft.net,
	wizhao@redhat.com, konguyen@redhat.com, Veerasenareddy Burru,
	Sathesh B Edara, Eric Dumazet

On Tue, Dec 5, 2023 at 7:50 AM Shinas Rasheed <srasheed@marvell.com> wrote:
> > -----Original Message-----
> > > On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@marvell.com>
> > wrote:
> > > > Do INIT_WORK for the various workqueue tasks before the first
> > > > invocation of any control net APIs. Since octep_ctrl_net_get_info
> > > > was called before the control net receive work task was even
> > > > initialised, the function call wasn't returning actual firmware
> > > > info queried from Octeon.
> > >
> > > It might be more accurate to say that octep_ctrl_net_get_info depends
> > > on the processing of OEI events. This happens in intr_poll_task.
> > > That's why intr_poll_task needs to be queued earlier.
>
> Intr_poll_task is queued only when the interface is down and the PF cannot catch IRQs as they have been torn down.
> Elsewise, OEI events will trigger the OEI IRQ and consequently its handler.

Right. octep_ctrl_net_get_info is called from the probe function, and
at this point the netdev is not even registered yet. Hence the need
for intr_poll_task.
The reason I started wondering about intr_poll_task is that the commit
message talks about the INIT_WORK, but the patch also moves the
queue_delayed_work call and the reasoning for that move was missing in
the message.
I think the move is correct, but please expand the description.

> Nevertheless, your point is correct in that it
> needs to be queued earlier, but I think subsequently since it calls the control mbox task, that is more relevant and necessary as if it
> is not initialized, it cannot be scheduled even if OEI interrupts have been caught.

OK.

> > > Did octep_send_mbox_req previously always fail with EAGAIN after
> >           ^^^^^^^^^^^^^^^^^^^^^
> > I meant octep_ctrl_net_get_info here.
> >
> > > running into the 500 ms timeout in octep_send_mbox_req?
>
> Yes it did, but as it was silent (note that we're not checking any error value), it didn't stop operation. I think I might have to update this patch
> to catch the error values as well (This was a relic from the original code which spawned an extra thread to setup device and hence couldn't give back
> an error value. That implementation was discouraged and we setup things at probe itself in the upstreamed code and can check error values)

Yes, please, catch that error value.

> > > Apropos octep_send_mbox_req... I think it has a race. "d" is put on
> > > the ctrl_req_wait_list after sending the request to the hardware. If
> > > the response arrives quickly, "d" might not yet be on the list when
> > > process_mbox_resp looks for it.
> > > Also, what protects ctrl_req_wait_list from concurrent access?
>
> Such a race condition is, I also think, valid, but is not currently occurring as response, after due processing from Octeon application,
> wouldn't arrive that quickly. Regarding concurrent access, there is currently no protection for ctrl_req_wait_list. Concurrent access here,
> can only happen if either two requests manage to get hold of the ctrl_req_wait_list or a request and a response manages to get hold of the
> ctrl_req_wait_list (the case you stated above).
>
> In the first case, since locks are implemented atop the control mbox itself, requests would have to in effect wait for their chance to
> queue their wait data "d" to ctrl_req_wait_list, avoiding concurrent access.
>
> The second case is valid, but as I stated, wouldn't happen practically. But I suppose we do have to handle all theoretical cases and perhaps
> locking can be done. I suppose a separate patch for it might be better.

Yes, fixing this should be a separate patch.

Thanks,
Michal


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-12-05  9:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-02 15:08 [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs Shinas Rasheed
2023-12-04 22:10 ` Michal Schmidt
2023-12-04 22:13   ` Michal Schmidt
2023-12-05  6:50     ` [EXT] " Shinas Rasheed
2023-12-05  9:00       ` Michal Schmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.