All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
@ 2021-06-07 21:50 Pearson, Robert B
  2021-06-08  4:41 ` Leon Romanovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-07 21:50 UTC (permalink / raw)
  To: Jason Gunthorpe, RDMA mailing list

sorry/this time without the HTML.

======================================================================
ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
Verify bind memory window operation using the new post_send API.
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 292, in 
test_qp_ex_rc_bind_mw
     u.poll_cq(server.cq)
   File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, in poll_cq
     raise PyverbsRDMAError('Completion status is {s}'.
pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Memory 
window bind error. Errno: 6, No such device or address

This test attempts to bind a type 2 MW to an MR that does not have bind 
mw access set and expects the test to succeed.

Bob Pearson


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-07 21:50 [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation Pearson, Robert B
@ 2021-06-08  4:41 ` Leon Romanovsky
  2021-06-08  4:54   ` Pearson, Robert B
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2021-06-08  4:41 UTC (permalink / raw)
  To: Pearson, Robert B; +Cc: Jason Gunthorpe, RDMA mailing list

On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
> sorry/this time without the HTML.
> 
> ======================================================================
> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
> Verify bind memory window operation using the new post_send API.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 292, in
> test_qp_ex_rc_bind_mw
>     u.poll_cq(server.cq)
>   File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, in poll_cq
>     raise PyverbsRDMAError('Completion status is {s}'.
> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Memory window
> bind error. Errno: 6, No such device or address
> 
> This test attempts to bind a type 2 MW to an MR that does not have bind mw
> access set and expects the test to succeed.

Does the test break after your MW series? Or will it break not-merged
code yet?

Generally speaking, we expect that developers run rdma-core tests and
fixed/extend them prior to the submission.

Thanks

> 
> Bob Pearson
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08  4:41 ` Leon Romanovsky
@ 2021-06-08  4:54   ` Pearson, Robert B
  2021-06-08  6:47     ` Leon Romanovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-08  4:54 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>> sorry/this time without the HTML.
>>
>> ======================================================================
>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>> Verify bind memory window operation using the new post_send API.
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 292, in
>> test_qp_ex_rc_bind_mw
>>      u.poll_cq(server.cq)
>>    File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, in poll_cq
>>      raise PyverbsRDMAError('Completion status is {s}'.
>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Memory window
>> bind error. Errno: 6, No such device or address
>>
>> This test attempts to bind a type 2 MW to an MR that does not have bind mw
>> access set and expects the test to succeed.
> Does the test break after your MW series? Or will it break not-merged
> code yet?
>
> Generally speaking, we expect that developers run rdma-core tests and
> fixed/extend them prior to the submission.
>
> Thanks
>
>> Bob Pearson

Nope. I don't have real RNICs at home to test. But (see my note to Zhu) 
the non extended APIs do set the access flags correctly and the extended 
test case does not. The wr_bind_mw() function can't fix this for the 
test case. It has to set the access flags when it creates the MR and it 
didn't. It is possible that mlx5 doesn't check the bind access flag but 
that seems unlikely.

Bob


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08  4:54   ` Pearson, Robert B
@ 2021-06-08  6:47     ` Leon Romanovsky
  2021-06-08 11:53       ` Edward Srouji
  2021-06-08 16:46       ` Pearson, Robert B
  0 siblings, 2 replies; 12+ messages in thread
From: Leon Romanovsky @ 2021-06-08  6:47 UTC (permalink / raw)
  To: Pearson, Robert B; +Cc: Jason Gunthorpe, RDMA mailing list

On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
> 
> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
> > On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
> > > sorry/this time without the HTML.
> > > 
> > > ======================================================================
> > > ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
> > > Verify bind memory window operation using the new post_send API.
> > > ----------------------------------------------------------------------
> > > Traceback (most recent call last):
> > >    File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 292, in
> > > test_qp_ex_rc_bind_mw
> > >      u.poll_cq(server.cq)
> > >    File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, in poll_cq
> > >      raise PyverbsRDMAError('Completion status is {s}'.
> > > pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Memory window
> > > bind error. Errno: 6, No such device or address
> > > 
> > > This test attempts to bind a type 2 MW to an MR that does not have bind mw
> > > access set and expects the test to succeed.
> > Does the test break after your MW series? Or will it break not-merged
> > code yet?
> > 
> > Generally speaking, we expect that developers run rdma-core tests and
> > fixed/extend them prior to the submission.
> > 
> > Thanks
> > 
> > > Bob Pearson
> 
> Nope. I don't have real RNICs at home to test. But (see my note to Zhu) the
> non extended APIs do set the access flags correctly and the extended test
> case does not. The wr_bind_mw() function can't fix this for the test case.
> It has to set the access flags when it creates the MR and it didn't. It is
> possible that mlx5 doesn't check the bind access flag but that seems
> unlikely.

mlx5 devices support MW 1 & 2 and kernel checks that only these types
can be accepted from the user space. This is why mlx5 doesn't need to
check access flags again.

   903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
   904 {

....

   927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type != IB_MW_TYPE_2) {
   928                 ret = -EINVAL;
   929                 goto err_put;
   930         }


Thanks

> 
> Bob
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08  6:47     ` Leon Romanovsky
@ 2021-06-08 11:53       ` Edward Srouji
  2021-06-08 15:54         ` Pearson, Robert B
  2021-06-08 16:46       ` Pearson, Robert B
  1 sibling, 1 reply; 12+ messages in thread
From: Edward Srouji @ 2021-06-08 11:53 UTC (permalink / raw)
  To: Leon Romanovsky, Pearson, Robert B; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>> sorry/this time without the HTML.
>>>>
>>>> ======================================================================
>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>> Verify bind memory window operation using the new post_send API.
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 292, in
>>>> test_qp_ex_rc_bind_mw
>>>>       u.poll_cq(server.cq)
>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, in poll_cq
>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Memory window
>>>> bind error. Errno: 6, No such device or address
>>>>
>>>> This test attempts to bind a type 2 MW to an MR that does not have bind mw
>>>> access set and expects the test to succeed.

You're right, looks like a test bug. I'll send a fix upstream.

Can you please confirm that this solves your issue:

diff --git a/tests/test_qpex.py b/tests/test_qpex.py
index 4b58260f..c2d67ee8 100644
--- a/tests/test_qpex.py
+++ b/tests/test_qpex.py
@@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)

      def create_mr(self):
-        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
+        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE | 
e.IBV_ACCESS_MW_BIND)

>>> Does the test break after your MW series? Or will it break not-merged
>>> code yet?
>>>
>>> Generally speaking, we expect that developers run rdma-core tests and
>>> fixed/extend them prior to the submission.
>>>
>>> Thanks
>>>
>>>> Bob Pearson
>> Nope. I don't have real RNICs at home to test. But (see my note to Zhu) the
>> non extended APIs do set the access flags correctly and the extended test
>> case does not. The wr_bind_mw() function can't fix this for the test case.
>> It has to set the access flags when it creates the MR and it didn't. It is
>> possible that mlx5 doesn't check the bind access flag but that seems
>> unlikely.
> mlx5 devices support MW 1 & 2 and kernel checks that only these types
> can be accepted from the user space. This is why mlx5 doesn't need to
> check access flags again.
>
>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
>     904 {
>
> ....
>
>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type != IB_MW_TYPE_2) {
>     928                 ret = -EINVAL;
>     929                 goto err_put;
>     930         }
>
>
> Thanks

I see that mlx5 checks the access flags in userspace only if MW_DEBUG is 
turned on (in set_bind_wr()).

I guess that's for the sake of performance, as it's part of the data path.

>> Bob
>>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08 11:53       ` Edward Srouji
@ 2021-06-08 15:54         ` Pearson, Robert B
  2021-06-08 16:10           ` Edward Srouji
  2021-06-08 16:12           ` Pearson, Robert B
  0 siblings, 2 replies; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-08 15:54 UTC (permalink / raw)
  To: Edward Srouji, Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 6:53 AM, Edward Srouji wrote:
>
> On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
>> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>>> sorry/this time without the HTML.
>>>>>
>>>>> ====================================================================== 
>>>>>
>>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>>> Verify bind memory window operation using the new post_send API.
>>>>> ---------------------------------------------------------------------- 
>>>>>
>>>>> Traceback (most recent call last):
>>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 
>>>>> 292, in
>>>>> test_qp_ex_rc_bind_mw
>>>>>       u.poll_cq(server.cq)
>>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, 
>>>>> in poll_cq
>>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is 
>>>>> Memory window
>>>>> bind error. Errno: 6, No such device or address
>>>>>
>>>>> This test attempts to bind a type 2 MW to an MR that does not have 
>>>>> bind mw
>>>>> access set and expects the test to succeed.
>
> You're right, looks like a test bug. I'll send a fix upstream.
>
> Can you please confirm that this solves your issue:
Well I get further. I am hitting a seg fault in python at

         client.qp.wr_rdma_write(new_key, server.mr.buf)

in test_qp_ex_rc_bind_mw.

I'm trying to track it down. I'm not very familiar with python and don't 
know how to run the test under gdb.

Thanks for the fix.

Bob

>
> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
> index 4b58260f..c2d67ee8 100644
> --- a/tests/test_qpex.py
> +++ b/tests/test_qpex.py
> @@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
>          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>
>      def create_mr(self):
> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
> +        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE 
> | e.IBV_ACCESS_MW_BIND)
>
>>>> Does the test break after your MW series? Or will it break not-merged
>>>> code yet?
>>>>
>>>> Generally speaking, we expect that developers run rdma-core tests and
>>>> fixed/extend them prior to the submission.
>>>>
>>>> Thanks
>>>>
>>>>> Bob Pearson
>>> Nope. I don't have real RNICs at home to test. But (see my note to 
>>> Zhu) the
>>> non extended APIs do set the access flags correctly and the extended 
>>> test
>>> case does not. The wr_bind_mw() function can't fix this for the test 
>>> case.
>>> It has to set the access flags when it creates the MR and it didn't. 
>>> It is
>>> possible that mlx5 doesn't check the bind access flag but that seems
>>> unlikely.
>> mlx5 devices support MW 1 & 2 and kernel checks that only these types
>> can be accepted from the user space. This is why mlx5 doesn't need to
>> check access flags again.
>>
>>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
>>     904 {
>>
>> ....
>>
>>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type != 
>> IB_MW_TYPE_2) {
>>     928                 ret = -EINVAL;
>>     929                 goto err_put;
>>     930         }
>>
>>
>> Thanks
>
> I see that mlx5 checks the access flags in userspace only if MW_DEBUG 
> is turned on (in set_bind_wr()).
>
> I guess that's for the sake of performance, as it's part of the data 
> path.
>
>>> Bob
>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08 15:54         ` Pearson, Robert B
@ 2021-06-08 16:10           ` Edward Srouji
  2021-06-08 16:12           ` Pearson, Robert B
  1 sibling, 0 replies; 12+ messages in thread
From: Edward Srouji @ 2021-06-08 16:10 UTC (permalink / raw)
  To: Pearson, Robert B, Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 6:54 PM, Pearson, Robert B wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/8/2021 6:53 AM, Edward Srouji wrote:
>>
>> On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
>>> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>>>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>>>> sorry/this time without the HTML.
>>>>>>
>>>>>> ====================================================================== 
>>>>>>
>>>>>>
>>>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>>>> Verify bind memory window operation using the new post_send API.
>>>>>> ---------------------------------------------------------------------- 
>>>>>>
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line
>>>>>> 292, in
>>>>>> test_qp_ex_rc_bind_mw
>>>>>>       u.poll_cq(server.cq)
>>>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line 538,
>>>>>> in poll_cq
>>>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is
>>>>>> Memory window
>>>>>> bind error. Errno: 6, No such device or address
>>>>>>
>>>>>> This test attempts to bind a type 2 MW to an MR that does not have
>>>>>> bind mw
>>>>>> access set and expects the test to succeed.
>>
>> You're right, looks like a test bug. I'll send a fix upstream.
>>
>> Can you please confirm that this solves your issue:
> Well I get further. I am hitting a seg fault in python at
>
>         client.qp.wr_rdma_write(new_key, server.mr.buf)
>
> in test_qp_ex_rc_bind_mw.
>
> I'm trying to track it down. I'm not very familiar with python and don't
> know how to run the test under gdb.
>
I don't see the issue on mlx devices / rxe.

You can use gdb for python using: "gdb python"

Then : "run <program>" e.g.: "run tests/run_tests.py -k 
test_qp_ex_rc_bind_mw" (if you ran the command from the rdma-core root dir).

You can set breakpoints for C functions as you regularly do. If you want 
to add a breakpoint to the python test, I suggest you do "import pdb; 
pdb.set_trace()" at the line you want to set the bp at.

Good luck.

> Thanks for the fix.
>
> Bob
>
>>
>> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
>> index 4b58260f..c2d67ee8 100644
>> --- a/tests/test_qpex.py
>> +++ b/tests/test_qpex.py
>> @@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
>>          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>
>>      def create_mr(self):
>> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
>> +        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE
>> | e.IBV_ACCESS_MW_BIND)
>>
>>>>> Does the test break after your MW series? Or will it break not-merged
>>>>> code yet?
>>>>>
>>>>> Generally speaking, we expect that developers run rdma-core tests and
>>>>> fixed/extend them prior to the submission.
>>>>>
>>>>> Thanks
>>>>>
>>>>>> Bob Pearson
>>>> Nope. I don't have real RNICs at home to test. But (see my note to
>>>> Zhu) the
>>>> non extended APIs do set the access flags correctly and the extended
>>>> test
>>>> case does not. The wr_bind_mw() function can't fix this for the test
>>>> case.
>>>> It has to set the access flags when it creates the MR and it didn't.
>>>> It is
>>>> possible that mlx5 doesn't check the bind access flag but that seems
>>>> unlikely.
>>> mlx5 devices support MW 1 & 2 and kernel checks that only these types
>>> can be accepted from the user space. This is why mlx5 doesn't need to
>>> check access flags again.
>>>
>>>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
>>>     904 {
>>>
>>> ....
>>>
>>>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type !=
>>> IB_MW_TYPE_2) {
>>>     928                 ret = -EINVAL;
>>>     929                 goto err_put;
>>>     930         }
>>>
>>>
>>> Thanks
>>
>> I see that mlx5 checks the access flags in userspace only if MW_DEBUG
>> is turned on (in set_bind_wr()).
>>
>> I guess that's for the sake of performance, as it's part of the data
>> path.
>>
>>>> Bob
>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08 15:54         ` Pearson, Robert B
  2021-06-08 16:10           ` Edward Srouji
@ 2021-06-08 16:12           ` Pearson, Robert B
  2021-06-08 16:22             ` Pearson, Robert B
  1 sibling, 1 reply; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-08 16:12 UTC (permalink / raw)
  To: Edward Srouji, Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 10:54 AM, Pearson, Robert B wrote:
>
> On 6/8/2021 6:53 AM, Edward Srouji wrote:
>>
>> On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
>>> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>>>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>>>> sorry/this time without the HTML.
>>>>>>
>>>>>> ====================================================================== 
>>>>>>
>>>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>>>> Verify bind memory window operation using the new post_send API.
>>>>>> ---------------------------------------------------------------------- 
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 
>>>>>> 292, in
>>>>>> test_qp_ex_rc_bind_mw
>>>>>>       u.poll_cq(server.cq)
>>>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, 
>>>>>> in poll_cq
>>>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is 
>>>>>> Memory window
>>>>>> bind error. Errno: 6, No such device or address
>>>>>>
>>>>>> This test attempts to bind a type 2 MW to an MR that does not 
>>>>>> have bind mw
>>>>>> access set and expects the test to succeed.
>>
>> You're right, looks like a test bug. I'll send a fix upstream.
>>
>> Can you please confirm that this solves your issue:
> Well I get further. I am hitting a seg fault in python at
>
>         client.qp.wr_rdma_write(new_key, server.mr.buf)
>
> in test_qp_ex_rc_bind_mw.
>
> I'm trying to track it down. I'm not very familiar with python and 
> don't know how to run the test under gdb.
>
> Thanks for the fix.
>
> Bob

OK got it. In the setup for the test you write

     class QpExRCBindMw(RCResources):
         def create_qps(self):
             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)

         def create_mr(self):
             self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE |
                             e.IBV_ACCESS_MW_BIND)

which asks for qp_ex->wr_bind_mw() to be set but later in the test you write

     client.qp.wr_rdma_write(new_key, server.mr.buf)

which calls qp_ex->wr_rdma_write() which is not set causing the seg 
fault. I think you should have written

             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW | 
e.IBV_QP_EX_WITH_RDMA_WRITE)

since you need both extended QP operations.

Bob

>
>>
>> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
>> index 4b58260f..c2d67ee8 100644
>> --- a/tests/test_qpex.py
>> +++ b/tests/test_qpex.py
>> @@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
>>          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>
>>      def create_mr(self):
>> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
>> +        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE 
>> | e.IBV_ACCESS_MW_BIND)
>>
>>>>> Does the test break after your MW series? Or will it break not-merged
>>>>> code yet?
>>>>>
>>>>> Generally speaking, we expect that developers run rdma-core tests and
>>>>> fixed/extend them prior to the submission.
>>>>>
>>>>> Thanks
>>>>>
>>>>>> Bob Pearson
>>>> Nope. I don't have real RNICs at home to test. But (see my note to 
>>>> Zhu) the
>>>> non extended APIs do set the access flags correctly and the 
>>>> extended test
>>>> case does not. The wr_bind_mw() function can't fix this for the 
>>>> test case.
>>>> It has to set the access flags when it creates the MR and it 
>>>> didn't. It is
>>>> possible that mlx5 doesn't check the bind access flag but that seems
>>>> unlikely.
>>> mlx5 devices support MW 1 & 2 and kernel checks that only these types
>>> can be accepted from the user space. This is why mlx5 doesn't need to
>>> check access flags again.
>>>
>>>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
>>>     904 {
>>>
>>> ....
>>>
>>>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type != 
>>> IB_MW_TYPE_2) {
>>>     928                 ret = -EINVAL;
>>>     929                 goto err_put;
>>>     930         }
>>>
>>>
>>> Thanks
>>
>> I see that mlx5 checks the access flags in userspace only if MW_DEBUG 
>> is turned on (in set_bind_wr()).
>>
>> I guess that's for the sake of performance, as it's part of the data 
>> path.
>>
>>>> Bob
>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08 16:12           ` Pearson, Robert B
@ 2021-06-08 16:22             ` Pearson, Robert B
  2021-06-08 17:14               ` Edward Srouji
  0 siblings, 1 reply; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-08 16:22 UTC (permalink / raw)
  To: Edward Srouji, Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 11:12 AM, Pearson, Robert B wrote:
>
> On 6/8/2021 10:54 AM, Pearson, Robert B wrote:
>>
>> On 6/8/2021 6:53 AM, Edward Srouji wrote:
>>>
>>> On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
>>>> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>>>>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>>>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>>>>> sorry/this time without the HTML.
>>>>>>>
>>>>>>> ====================================================================== 
>>>>>>>
>>>>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>>>>> Verify bind memory window operation using the new post_send API.
>>>>>>> ---------------------------------------------------------------------- 
>>>>>>>
>>>>>>> Traceback (most recent call last):
>>>>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 
>>>>>>> 292, in
>>>>>>> test_qp_ex_rc_bind_mw
>>>>>>>       u.poll_cq(server.cq)
>>>>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line 
>>>>>>> 538, in poll_cq
>>>>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is 
>>>>>>> Memory window
>>>>>>> bind error. Errno: 6, No such device or address
>>>>>>>
>>>>>>> This test attempts to bind a type 2 MW to an MR that does not 
>>>>>>> have bind mw
>>>>>>> access set and expects the test to succeed.
>>>
>>> You're right, looks like a test bug. I'll send a fix upstream.
>>>
>>> Can you please confirm that this solves your issue:
>> Well I get further. I am hitting a seg fault in python at
>>
>>         client.qp.wr_rdma_write(new_key, server.mr.buf)
>>
>> in test_qp_ex_rc_bind_mw.
>>
>> I'm trying to track it down. I'm not very familiar with python and 
>> don't know how to run the test under gdb.
>>
>> Thanks for the fix.
>>
>> Bob
>
> OK got it. In the setup for the test you write
>
>     class QpExRCBindMw(RCResources):
>         def create_qps(self):
>             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>
>         def create_mr(self):
>             self.mr = u.create_custom_mr(self, 
> e.IBV_ACCESS_REMOTE_WRITE |
>                             e.IBV_ACCESS_MW_BIND)
>
> which asks for qp_ex->wr_bind_mw() to be set but later in the test you 
> write
>
>     client.qp.wr_rdma_write(new_key, server.mr.buf)
>
> which calls qp_ex->wr_rdma_write() which is not set causing the seg 
> fault. I think you should have written
>
>             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW 
> | e.IBV_QP_EX_WITH_RDMA_WRITE)
>
> since you need both extended QP operations.
>
> Bob

With this patch the test is now running correctly

diff --git a/tests/test_qpex.py b/tests/test_qpex.py
index 20288d45..0316bfcb 100644
--- a/tests/test_qpex.py
+++ b/tests/test_qpex.py
@@ -146,10 +146,12 @@ class QpExRCAtomicFetchAdd(RCResources):

  class QpExRCBindMw(RCResources):
      def create_qps(self):
-        create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
+        create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW |
+                       e.IBV_QP_EX_WITH_RDMA_WRITE)

      def create_mr(self):
-        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
+        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE |
+                       e.IBV_ACCESS_MW_BIND)


  class QpExTestCase(RDMATestCase):

>
>>
>>>
>>> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
>>> index 4b58260f..c2d67ee8 100644
>>> --- a/tests/test_qpex.py
>>> +++ b/tests/test_qpex.py
>>> @@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
>>>          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>>
>>>      def create_mr(self):
>>> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
>>> +        self.mr = u.create_custom_mr(self, 
>>> e.IBV_ACCESS_REMOTE_WRITE | e.IBV_ACCESS_MW_BIND)
>>>
>>>>>> Does the test break after your MW series? Or will it break 
>>>>>> not-merged
>>>>>> code yet?
>>>>>>
>>>>>> Generally speaking, we expect that developers run rdma-core tests 
>>>>>> and
>>>>>> fixed/extend them prior to the submission.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>> Bob Pearson
>>>>> Nope. I don't have real RNICs at home to test. But (see my note to 
>>>>> Zhu) the
>>>>> non extended APIs do set the access flags correctly and the 
>>>>> extended test
>>>>> case does not. The wr_bind_mw() function can't fix this for the 
>>>>> test case.
>>>>> It has to set the access flags when it creates the MR and it 
>>>>> didn't. It is
>>>>> possible that mlx5 doesn't check the bind access flag but that seems
>>>>> unlikely.
>>>> mlx5 devices support MW 1 & 2 and kernel checks that only these types
>>>> can be accepted from the user space. This is why mlx5 doesn't need to
>>>> check access flags again.
>>>>
>>>>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle 
>>>> *attrs)
>>>>     904 {
>>>>
>>>> ....
>>>>
>>>>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type != 
>>>> IB_MW_TYPE_2) {
>>>>     928                 ret = -EINVAL;
>>>>     929                 goto err_put;
>>>>     930         }
>>>>
>>>>
>>>> Thanks
>>>
>>> I see that mlx5 checks the access flags in userspace only if 
>>> MW_DEBUG is turned on (in set_bind_wr()).
>>>
>>> I guess that's for the sake of performance, as it's part of the data 
>>> path.
>>>
>>>>> Bob
>>>>>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08  6:47     ` Leon Romanovsky
  2021-06-08 11:53       ` Edward Srouji
@ 2021-06-08 16:46       ` Pearson, Robert B
  1 sibling, 0 replies; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-08 16:46 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 1:47 AM, Leon Romanovsky wrote:
> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>> sorry/this time without the HTML.
>>>>
>>>> ======================================================================
>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>> Verify bind memory window operation using the new post_send API.
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line 292, in
>>>> test_qp_ex_rc_bind_mw
>>>>       u.poll_cq(server.cq)
>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line 538, in poll_cq
>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Memory window
>>>> bind error. Errno: 6, No such device or address
>>>>
>>>> This test attempts to bind a type 2 MW to an MR that does not have bind mw
>>>> access set and expects the test to succeed.
>>> Does the test break after your MW series? Or will it break not-merged
>>> code yet?
>>>
>>> Generally speaking, we expect that developers run rdma-core tests and
>>> fixed/extend them prior to the submission.
>>>
>>> Thanks
>>>
>>>> Bob Pearson
>> Nope. I don't have real RNICs at home to test. But (see my note to Zhu) the
>> non extended APIs do set the access flags correctly and the extended test
>> case does not. The wr_bind_mw() function can't fix this for the test case.
>> It has to set the access flags when it creates the MR and it didn't. It is
>> possible that mlx5 doesn't check the bind access flag but that seems
>> unlikely.
> mlx5 devices support MW 1 & 2 and kernel checks that only these types
> can be accepted from the user space. This is why mlx5 doesn't need to
> check access flags again.
>
>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle *attrs)
>     904 {
>
> ....
>
>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type != IB_MW_TYPE_2) {
>     928                 ret = -EINVAL;
>     929                 goto err_put;
>     930         }
>
>
> Thanks

You check the type in alloc_mw but you only check the MR access flags if 
MW_DEBUG is set which is not by default. So you would fail a negative 
test which we sort of currently have. The second bug in the test which 
we found this morning is not correctly setting the op flags in the 
create_qp_ex call. According to the man page only the extended 
operations set in the op flags are 'implemented' for that QP. Apparently 
mlx5 goes ahead and populates them. Makes more sense to me since the API 
as described is kind of overwrought and dumb.

Bob


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08 16:22             ` Pearson, Robert B
@ 2021-06-08 17:14               ` Edward Srouji
  2021-06-08 17:36                 ` Pearson, Robert B
  0 siblings, 1 reply; 12+ messages in thread
From: Edward Srouji @ 2021-06-08 17:14 UTC (permalink / raw)
  To: Pearson, Robert B, Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 7:22 PM, Pearson, Robert B wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/8/2021 11:12 AM, Pearson, Robert B wrote:
>>
>> On 6/8/2021 10:54 AM, Pearson, Robert B wrote:
>>>
>>> On 6/8/2021 6:53 AM, Edward Srouji wrote:
>>>>
>>>> On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
>>>>> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>>>>>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>>>>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>>>>>> sorry/this time without the HTML.
>>>>>>>>
>>>>>>>> ====================================================================== 
>>>>>>>>
>>>>>>>>
>>>>>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>>>>>> Verify bind memory window operation using the new post_send API.
>>>>>>>> ---------------------------------------------------------------------- 
>>>>>>>>
>>>>>>>>
>>>>>>>> Traceback (most recent call last):
>>>>>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line
>>>>>>>> 292, in
>>>>>>>> test_qp_ex_rc_bind_mw
>>>>>>>>       u.poll_cq(server.cq)
>>>>>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line
>>>>>>>> 538, in poll_cq
>>>>>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>>>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is
>>>>>>>> Memory window
>>>>>>>> bind error. Errno: 6, No such device or address
>>>>>>>>
>>>>>>>> This test attempts to bind a type 2 MW to an MR that does not
>>>>>>>> have bind mw
>>>>>>>> access set and expects the test to succeed.
>>>>
>>>> You're right, looks like a test bug. I'll send a fix upstream.
>>>>
>>>> Can you please confirm that this solves your issue:
>>> Well I get further. I am hitting a seg fault in python at
>>>
>>>         client.qp.wr_rdma_write(new_key, server.mr.buf)
>>>
>>> in test_qp_ex_rc_bind_mw.
>>>
>>> I'm trying to track it down. I'm not very familiar with python and
>>> don't know how to run the test under gdb.
>>>
>>> Thanks for the fix.
>>>
>>> Bob
>>
>> OK got it. In the setup for the test you write
>>
>>     class QpExRCBindMw(RCResources):
>>         def create_qps(self):
>>             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>
>>         def create_mr(self):
>>             self.mr = u.create_custom_mr(self,
>> e.IBV_ACCESS_REMOTE_WRITE |
>>                             e.IBV_ACCESS_MW_BIND)
>>
>> which asks for qp_ex->wr_bind_mw() to be set but later in the test you
>> write
>>
>>     client.qp.wr_rdma_write(new_key, server.mr.buf)
>>
>> which calls qp_ex->wr_rdma_write() which is not set causing the seg
>> fault. I think you should have written
>>
>>             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW
>> | e.IBV_QP_EX_WITH_RDMA_WRITE)
>>
>> since you need both extended QP operations.
>>
>> Bob
>
> With this patch the test is now running correctly
>
> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
> index 20288d45..0316bfcb 100644
> --- a/tests/test_qpex.py
> +++ b/tests/test_qpex.py
> @@ -146,10 +146,12 @@ class QpExRCAtomicFetchAdd(RCResources):
>
>  class QpExRCBindMw(RCResources):
>      def create_qps(self):
> -        create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
> +        create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW |
> +                       e.IBV_QP_EX_WITH_RDMA_WRITE)
>
>      def create_mr(self):
> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
> +        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE |
> +                       e.IBV_ACCESS_MW_BIND)
>
>
I've sent a fix patch for upstream (you can see at github).
>  class QpExTestCase(RDMATestCase):
>
>>
>>>
>>>>
>>>> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
>>>> index 4b58260f..c2d67ee8 100644
>>>> --- a/tests/test_qpex.py
>>>> +++ b/tests/test_qpex.py
>>>> @@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
>>>>          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>>>
>>>>      def create_mr(self):
>>>> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
>>>> +        self.mr = u.create_custom_mr(self,
>>>> e.IBV_ACCESS_REMOTE_WRITE | e.IBV_ACCESS_MW_BIND)
>>>>
>>>>>>> Does the test break after your MW series? Or will it break
>>>>>>> not-merged
>>>>>>> code yet?
>>>>>>>
>>>>>>> Generally speaking, we expect that developers run rdma-core tests
>>>>>>> and
>>>>>>> fixed/extend them prior to the submission.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> Bob Pearson
>>>>>> Nope. I don't have real RNICs at home to test. But (see my note to
>>>>>> Zhu) the
>>>>>> non extended APIs do set the access flags correctly and the
>>>>>> extended test
>>>>>> case does not. The wr_bind_mw() function can't fix this for the
>>>>>> test case.
>>>>>> It has to set the access flags when it creates the MR and it
>>>>>> didn't. It is
>>>>>> possible that mlx5 doesn't check the bind access flag but that seems
>>>>>> unlikely.
>>>>> mlx5 devices support MW 1 & 2 and kernel checks that only these types
>>>>> can be accepted from the user space. This is why mlx5 doesn't need to
>>>>> check access flags again.
>>>>>
>>>>>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle
>>>>> *attrs)
>>>>>     904 {
>>>>>
>>>>> ....
>>>>>
>>>>>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type !=
>>>>> IB_MW_TYPE_2) {
>>>>>     928                 ret = -EINVAL;
>>>>>     929                 goto err_put;
>>>>>     930         }
>>>>>
>>>>>
>>>>> Thanks
>>>>
>>>> I see that mlx5 checks the access flags in userspace only if
>>>> MW_DEBUG is turned on (in set_bind_wr()).
>>>>
>>>> I guess that's for the sake of performance, as it's part of the data
>>>> path.
>>>>
>>>>>> Bob
>>>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation
  2021-06-08 17:14               ` Edward Srouji
@ 2021-06-08 17:36                 ` Pearson, Robert B
  0 siblings, 0 replies; 12+ messages in thread
From: Pearson, Robert B @ 2021-06-08 17:36 UTC (permalink / raw)
  To: Edward Srouji, Leon Romanovsky; +Cc: Jason Gunthorpe, RDMA mailing list


On 6/8/2021 12:14 PM, Edward Srouji wrote:
>
> On 6/8/2021 7:22 PM, Pearson, Robert B wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/8/2021 11:12 AM, Pearson, Robert B wrote:
>>>
>>> On 6/8/2021 10:54 AM, Pearson, Robert B wrote:
>>>>
>>>> On 6/8/2021 6:53 AM, Edward Srouji wrote:
>>>>>
>>>>> On 6/8/2021 9:47 AM, Leon Romanovsky wrote:
>>>>>> On Mon, Jun 07, 2021 at 11:54:29PM -0500, Pearson, Robert B wrote:
>>>>>>> On 6/7/2021 11:41 PM, Leon Romanovsky wrote:
>>>>>>>> On Mon, Jun 07, 2021 at 04:50:20PM -0500, Pearson, Robert B wrote:
>>>>>>>>> sorry/this time without the HTML.
>>>>>>>>>
>>>>>>>>> ====================================================================== 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ERROR: test_qp_ex_rc_bind_mw (tests.test_qpex.QpExTestCase)
>>>>>>>>> Verify bind memory window operation using the new post_send API.
>>>>>>>>> ---------------------------------------------------------------------- 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>     File "/home/rpearson/src/rdma-core/tests/test_qpex.py", line
>>>>>>>>> 292, in
>>>>>>>>> test_qp_ex_rc_bind_mw
>>>>>>>>>       u.poll_cq(server.cq)
>>>>>>>>>     File "/home/rpearson/src/rdma-core/tests/utils.py", line
>>>>>>>>> 538, in poll_cq
>>>>>>>>>       raise PyverbsRDMAError('Completion status is {s}'.
>>>>>>>>> pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is
>>>>>>>>> Memory window
>>>>>>>>> bind error. Errno: 6, No such device or address
>>>>>>>>>
>>>>>>>>> This test attempts to bind a type 2 MW to an MR that does not
>>>>>>>>> have bind mw
>>>>>>>>> access set and expects the test to succeed.
>>>>>
>>>>> You're right, looks like a test bug. I'll send a fix upstream.
>>>>>
>>>>> Can you please confirm that this solves your issue:
>>>> Well I get further. I am hitting a seg fault in python at
>>>>
>>>>         client.qp.wr_rdma_write(new_key, server.mr.buf)
>>>>
>>>> in test_qp_ex_rc_bind_mw.
>>>>
>>>> I'm trying to track it down. I'm not very familiar with python and
>>>> don't know how to run the test under gdb.
>>>>
>>>> Thanks for the fix.
>>>>
>>>> Bob
>>>
>>> OK got it. In the setup for the test you write
>>>
>>>     class QpExRCBindMw(RCResources):
>>>         def create_qps(self):
>>>             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>>
>>>         def create_mr(self):
>>>             self.mr = u.create_custom_mr(self,
>>> e.IBV_ACCESS_REMOTE_WRITE |
>>>                             e.IBV_ACCESS_MW_BIND)
>>>
>>> which asks for qp_ex->wr_bind_mw() to be set but later in the test you
>>> write
>>>
>>>     client.qp.wr_rdma_write(new_key, server.mr.buf)
>>>
>>> which calls qp_ex->wr_rdma_write() which is not set causing the seg
>>> fault. I think you should have written
>>>
>>>             create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW
>>> | e.IBV_QP_EX_WITH_RDMA_WRITE)
>>>
>>> since you need both extended QP operations.
>>>
>>> Bob
>>
>> With this patch the test is now running correctly
>>
>> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
>> index 20288d45..0316bfcb 100644
>> --- a/tests/test_qpex.py
>> +++ b/tests/test_qpex.py
>> @@ -146,10 +146,12 @@ class QpExRCAtomicFetchAdd(RCResources):
>>
>>  class QpExRCBindMw(RCResources):
>>      def create_qps(self):
>> -        create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>> +        create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW |
>> +                       e.IBV_QP_EX_WITH_RDMA_WRITE)
>>
>>      def create_mr(self):
>> -        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE)
>> +        self.mr = u.create_custom_mr(self, e.IBV_ACCESS_REMOTE_WRITE |
>> +                       e.IBV_ACCESS_MW_BIND)
>>
>>
> I've sent a fix patch for upstream (you can see at github).
Thanks!
>>  class QpExTestCase(RDMATestCase):
>>
>>>
>>>>
>>>>>
>>>>> diff --git a/tests/test_qpex.py b/tests/test_qpex.py
>>>>> index 4b58260f..c2d67ee8 100644
>>>>> --- a/tests/test_qpex.py
>>>>> +++ b/tests/test_qpex.py
>>>>> @@ -149,7 +149,7 @@ class QpExRCBindMw(RCResources):
>>>>>          create_qp_ex(self, e.IBV_QPT_RC, e.IBV_QP_EX_WITH_BIND_MW)
>>>>>
>>>>>      def create_mr(self):
>>>>> -        self.mr = u.create_custom_mr(self, 
>>>>> e.IBV_ACCESS_REMOTE_WRITE)
>>>>> +        self.mr = u.create_custom_mr(self,
>>>>> e.IBV_ACCESS_REMOTE_WRITE | e.IBV_ACCESS_MW_BIND)
>>>>>
>>>>>>>> Does the test break after your MW series? Or will it break
>>>>>>>> not-merged
>>>>>>>> code yet?
>>>>>>>>
>>>>>>>> Generally speaking, we expect that developers run rdma-core tests
>>>>>>>> and
>>>>>>>> fixed/extend them prior to the submission.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> Bob Pearson
>>>>>>> Nope. I don't have real RNICs at home to test. But (see my note to
>>>>>>> Zhu) the
>>>>>>> non extended APIs do set the access flags correctly and the
>>>>>>> extended test
>>>>>>> case does not. The wr_bind_mw() function can't fix this for the
>>>>>>> test case.
>>>>>>> It has to set the access flags when it creates the MR and it
>>>>>>> didn't. It is
>>>>>>> possible that mlx5 doesn't check the bind access flag but that 
>>>>>>> seems
>>>>>>> unlikely.
>>>>>> mlx5 devices support MW 1 & 2 and kernel checks that only these 
>>>>>> types
>>>>>> can be accepted from the user space. This is why mlx5 doesn't 
>>>>>> need to
>>>>>> check access flags again.
>>>>>>
>>>>>>     903 static int ib_uverbs_alloc_mw(struct uverbs_attr_bundle
>>>>>> *attrs)
>>>>>>     904 {
>>>>>>
>>>>>> ....
>>>>>>
>>>>>>     927         if (cmd.mw_type != IB_MW_TYPE_1 && cmd.mw_type !=
>>>>>> IB_MW_TYPE_2) {
>>>>>>     928                 ret = -EINVAL;
>>>>>>     929                 goto err_put;
>>>>>>     930         }
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>
>>>>> I see that mlx5 checks the access flags in userspace only if
>>>>> MW_DEBUG is turned on (in set_bind_wr()).
>>>>>
>>>>> I guess that's for the sake of performance, as it's part of the data
>>>>> path.
>>>>>
>>>>>>> Bob
>>>>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-08 17:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 21:50 [Bug Report] RDMA/core: test_qpex.py attempts invalid MW bind operation Pearson, Robert B
2021-06-08  4:41 ` Leon Romanovsky
2021-06-08  4:54   ` Pearson, Robert B
2021-06-08  6:47     ` Leon Romanovsky
2021-06-08 11:53       ` Edward Srouji
2021-06-08 15:54         ` Pearson, Robert B
2021-06-08 16:10           ` Edward Srouji
2021-06-08 16:12           ` Pearson, Robert B
2021-06-08 16:22             ` Pearson, Robert B
2021-06-08 17:14               ` Edward Srouji
2021-06-08 17:36                 ` Pearson, Robert B
2021-06-08 16:46       ` Pearson, Robert B

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.