[LSF/VM TOPIC] Handling of invalid requests in virtual HBAs

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
@ 2010-04-01  8:15 Hannes Reinecke
  2010-04-02  5:33 ` Nicholas A. Bellinger
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Hannes Reinecke @ 2010-04-01  8:15 UTC (permalink / raw
  To: lsf10-pc; +Cc: SCSI Mailing List

Hi all,

[Topic]
Handling of invalid requests in virtual HBAs

[Abstract]
This discussion will focus on the problem of correct request handling with virtual HBAs.
For KVM I have implemented a 'megasas' HBA emulation which serves as a backend for the
megaraid_sas linux driver.
It is now possible to connect several disks from different (physical) HBAs to that
HBA emulation, each having different logical capabilities wrt transfersize,
sgl size, sgl length etc.

The goal of this discussion is how to determine the 'best' capability setting for the
virtual HBA and how to handle hotplug scenarios, where a disk might be plugged in
which has incompatible settings from the one the virtual HBA is using currently.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-01  8:15 [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs Hannes Reinecke
@ 2010-04-02  5:33 ` Nicholas A. Bellinger
  2010-04-08 13:44   ` Hannes Reinecke
  2010-04-10 15:31 ` Vladislav Bolkhovitin
  2010-05-10  3:16 ` FUJITA Tomonori
  2 siblings, 1 reply; 13+ messages in thread
From: Nicholas A. Bellinger @ 2010-04-02  5:33 UTC (permalink / raw
  To: Hannes Reinecke; +Cc: lsf10-pc, SCSI Mailing List

On Thu, 2010-04-01 at 10:15 +0200, Hannes Reinecke wrote:
> Hi all,
> 

Greetings Hannes,

Just a few comments on your proposal..

> [Topic]
> Handling of invalid requests in virtual HBAs
> 
> [Abstract]
> This discussion will focus on the problem of correct request handling with virtual HBAs.
> For KVM I have implemented a 'megasas' HBA emulation which serves as a backend for the
> megaraid_sas linux driver.
> It is now possible to connect several disks from different (physical) HBAs to that
> HBA emulation, each having different logical capabilities wrt transfersize,
> sgl size, sgl length etc.
> 
> The goal of this discussion is how to determine the 'best' capability setting for the
> virtual HBA and how to handle hotplug scenarios, where a disk might be plugged in
> which has incompatible settings from the one the virtual HBA is using currently.
> 

Most of what you are describing here in terms of having a kernel target
enforce underlying LLD limitiations for LUNs is already available in TCM
v3.x.  Current TCM code will automatically handle the processing of a
single DATA_SG_IO CDB generated by KVM Guest + megasas emulation that
exceeds the underlying LLD max_sectors, and generate the multiple
internal se_task_t's in order to complete the original I/O generated by
KVM Guest + megasas.

This is one example but the main underlying question wrt to TCM and
interaction with Linux subsystems has historically been:

What values should be enforced by TCM based on metadata presented by TCM
subsystem plugins (pSCSI, IBLOCK, FILEIO) for struct block_device, and
what is expected to be enforced by underlying Linux subsystems
presenting struct block_device..?

For the virtual TCM subsystem plugin cases (IBLOCK, FILEIO, RAMDISK) the
can_queue is a competely arbitary value and is enforced by the
underyling Linux subsystem.  There are a couple of special cases:

*) For TCM/pSCSI, can_queue is enforced from struct scsi_device->queue_depth
   and max_sectors from the smaller of the two values from struct Scsi_Host->max_sectors
   and struct scsi_device->request_queue->limits.max_sectors.

*) For TCM/IBLOCK, max_sectors is enforced based on struct request_queue->limits.max_sectors.

*) For TCM/FILEIO and TCM/RAMDISK, both can_queue and max_sectors are
   set to arbitrarly high values.

Also I should mention that TCM_Loop code currently uses a hardcoded
struct scsi_host_template->can_queue=1 and ->max_sectors=128, but will
work fine with larger values.   Being able to change the Linux/SCSI
queue depth on the fly for TCM_Loop virtual SAS ports being used in KVM
guest could be quite useful for managing KVM Guest megasas emulation I/O
traffic on a larger scale..

The other big advantage of using TCM_Loop with your megasas guest
emulation means that existing TCM logic for >= SPC-3 T10 NAA naming, PR,
and ALUA emulation is immediately available to KVM guest, and does not
have to be reproduced in QEMU code.

Who knows, it might be interesting to be able to control KVM Guest disks
using ALUA primary and sectors access states, or even share a single
TCM_Loop virtual SAS Port using across multiple KVM Guests for cluster
purposes using persistent reservations..!

Best,

--nab

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-02  5:33 ` Nicholas A. Bellinger
@ 2010-04-08 13:44   ` Hannes Reinecke
  2010-04-10 23:50     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2010-04-08 13:44 UTC (permalink / raw
  To: linux-iscsi-target-dev; +Cc: lsf10-pc, SCSI Mailing List

Nicholas A. Bellinger wrote:
> On Thu, 2010-04-01 at 10:15 +0200, Hannes Reinecke wrote:
>> Hi all,
>>
> 
> Greetings Hannes,
> 
> Just a few comments on your proposal..
> 
>> [Topic]
>> Handling of invalid requests in virtual HBAs
>>
>> [Abstract]
>> This discussion will focus on the problem of correct request handling with virtual HBAs.
>> For KVM I have implemented a 'megasas' HBA emulation which serves as a backend for the
>> megaraid_sas linux driver.
>> It is now possible to connect several disks from different (physical) HBAs to that
>> HBA emulation, each having different logical capabilities wrt transfersize,
>> sgl size, sgl length etc.
>>
>> The goal of this discussion is how to determine the 'best' capability setting for the
>> virtual HBA and how to handle hotplug scenarios, where a disk might be plugged in
>> which has incompatible settings from the one the virtual HBA is using currently.
>>
> 
> Most of what you are describing here in terms of having a kernel target
> enforce underlying LLD limitiations for LUNs is already available in TCM
> v3.x.  Current TCM code will automatically handle the processing of a
> single DATA_SG_IO CDB generated by KVM Guest + megasas emulation that
> exceeds the underlying LLD max_sectors, and generate the multiple
> internal se_task_t's in order to complete the original I/O generated by
> KVM Guest + megasas.
> 

Hmm, yes.

> This is one example but the main underlying question wrt to TCM and
> interaction with Linux subsystems has historically been:
> 
> What values should be enforced by TCM based on metadata presented by TCM
> subsystem plugins (pSCSI, IBLOCK, FILEIO) for struct block_device, and
> what is expected to be enforced by underlying Linux subsystems
> presenting struct block_device..?
> 
> For the virtual TCM subsystem plugin cases (IBLOCK, FILEIO, RAMDISK) the
> can_queue is a competely arbitary value and is enforced by the
> underyling Linux subsystem.  There are a couple of special cases:
> 
> *) For TCM/pSCSI, can_queue is enforced from struct scsi_device->queue_depth
>    and max_sectors from the smaller of the two values from struct Scsi_Host->max_sectors
>    and struct scsi_device->request_queue->limits.max_sectors.
> 
> *) For TCM/IBLOCK, max_sectors is enforced based on struct request_queue->limits.max_sectors.
> 
> *) For TCM/FILEIO and TCM/RAMDISK, both can_queue and max_sectors are
>    set to arbitrarly high values.
> 
> Also I should mention that TCM_Loop code currently uses a hardcoded
> struct scsi_host_template->can_queue=1 and ->max_sectors=128, but will
> work fine with larger values.   Being able to change the Linux/SCSI
> queue depth on the fly for TCM_Loop virtual SAS ports being used in KVM
> guest could be quite useful for managing KVM Guest megasas emulation I/O
> traffic on a larger scale..
> 
And my question / topic here is how to handle a hotplug capability in these
cases: What happens if a device / HBA is plugged in with different / lower
capabilities than those announced?
Can we change the announced settings for the HBA on the fly?

> The other big advantage of using TCM_Loop with your megasas guest
> emulation means that existing TCM logic for >= SPC-3 T10 NAA naming, PR,
> and ALUA emulation is immediately available to KVM guest, and does not
> have to be reproduced in QEMU code.
> 
I'm not doubting that using TCM_loop here would be advantageous.
But I have to find a solution for folks just wanting to run on plain /dev/sdX.

And I need to find a common ground here to argue with the KVM folks,
whose main objection against the megasas emulation is this issue.

Either way would be fine by me, I just think we should come to a common
understanding.

My initial idea here was to just pass the request back as partially completed;
that would solve the issue nicely.

Sadly the SCSI midlayer interprets partial completion always as an error :-(
Would've been really neat.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-01  8:15 [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs Hannes Reinecke
  2010-04-02  5:33 ` Nicholas A. Bellinger
@ 2010-04-10 15:31 ` Vladislav Bolkhovitin
  2010-04-13  8:56   ` Hannes Reinecke
  2010-05-10  3:16 ` FUJITA Tomonori
  2 siblings, 1 reply; 13+ messages in thread
From: Vladislav Bolkhovitin @ 2010-04-10 15:31 UTC (permalink / raw
  To: Hannes Reinecke; +Cc: lsf10-pc, SCSI Mailing List

Hello Hannes,

Hannes Reinecke, on 04/01/2010 12:15 PM wrote:
> Hi all,
> 
> [Topic]
> Handling of invalid requests in virtual HBAs
> 
> [Abstract]
> This discussion will focus on the problem of correct request handling with virtual HBAs.
> For KVM I have implemented a 'megasas' HBA emulation which serves as a backend for the
> megaraid_sas linux driver.
> It is now possible to connect several disks from different (physical) HBAs to that
> HBA emulation, each having different logical capabilities wrt transfersize,
> sgl size, sgl length etc.
> 
> The goal of this discussion is how to determine the 'best' capability setting for the
> virtual HBA and how to handle hotplug scenarios, where a disk might be plugged in
> which has incompatible settings from the one the virtual HBA is using currently.

If I understand correctly, you need to allow several KVM guests to share 
the same physical disks?

If so, then it's a bit more complicated, than just to match capabilities 
between physical and virtual disks, because to safely share disks some 
advanced SCSI functionality should be emulated by the mid-level, like 
reservations and Unit Attentions. Otherwise, the sharing is very much 
against SCSI specs, so unsafe and can lead to data corruptions. More 
info you can find here: http://thread.gmane.org/gmane.linux.scsi/31288.

In my opinion, the simplest way would be:

1. Use in the host OS a SCSI target mid-layer, capable to support 
multi-initiators SCSI pass-through, like SCST (http://scst.sourceforge.net)

2. Write a simple 2 parts guest/host driver, one part of which would be 
a simple virtual HBA driver for guests, which would pass all requests 
from the guest OS to the host OS, and the second part would be a simple 
target driver for SCST, which would get requests from the virtual HBA in 
the guest and pass them to SCST on the host OS. Most likely, you already 
implemented most of the guest part of the driver in your 'megasas' HBA 
emulator. On the host side, for SCST most of the functionality already 
implemented by scst_local driver. All is necessary is to couple them 
together using some KVM transport for data between guests and host.

Thus, in this approach you need to connect the guest OS'es to already 
existing SCST (we hope someday SCST will be accepted in the mainline 
kernel, I'm currently preparing patches for the second review 
iteration), which you need to do anyway to implement correct devices 
sharing. Then SCST would do the rest.

This approach won't make any tight connection between physical and 
virtual HBAs, so can work with any SCSI-speaking HBAs (and not only-SCSI 
speaking, if a small layer to convert their requests in SCSI commands is 
implemented). The exact SCSI transport (SAS, SCSI, FC, etc.) the guest 
driver would report to the OS wouldn't matter much, because changing it 
is a matter of changing few constants. The hot plug would be handled 
automatically.

As a bonus, KVM guests would be able to use SCST emulated virtual 
devices (files as devices, virtual CDROMs, VTLs, etc.), including 
implemented in user space.

Obviously, this approach can be implemented with a minimal overhead 
(SCST internally is zero-copy).

Vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-08 13:44   ` Hannes Reinecke
@ 2010-04-10 23:50     ` Nicholas A. Bellinger
  0 siblings, 0 replies; 13+ messages in thread
From: Nicholas A. Bellinger @ 2010-04-10 23:50 UTC (permalink / raw
  To: linux-iscsi-target-dev; +Cc: lsf10-pc, SCSI Mailing List

On Thu, 2010-04-08 at 15:44 +0200, Hannes Reinecke wrote:
> Nicholas A. Bellinger wrote:
> > On Thu, 2010-04-01 at 10:15 +0200, Hannes Reinecke wrote:
> >> Hi all,
> >>
> > 
> > Greetings Hannes,
> > 
> > Just a few comments on your proposal..
> > 
> >> [Topic]
> >> Handling of invalid requests in virtual HBAs
> >>
> >> [Abstract]
> >> This discussion will focus on the problem of correct request handling with virtual HBAs.
> >> For KVM I have implemented a 'megasas' HBA emulation which serves as a backend for the
> >> megaraid_sas linux driver.
> >> It is now possible to connect several disks from different (physical) HBAs to that
> >> HBA emulation, each having different logical capabilities wrt transfersize,
> >> sgl size, sgl length etc.
> >>
> >> The goal of this discussion is how to determine the 'best' capability setting for the
> >> virtual HBA and how to handle hotplug scenarios, where a disk might be plugged in
> >> which has incompatible settings from the one the virtual HBA is using currently.
> >>

<SNIP>

> > What values should be enforced by TCM based on metadata presented by TCM
> > subsystem plugins (pSCSI, IBLOCK, FILEIO) for struct block_device, and
> > what is expected to be enforced by underlying Linux subsystems
> > presenting struct block_device..?
> > 
> > For the virtual TCM subsystem plugin cases (IBLOCK, FILEIO, RAMDISK) the
> > can_queue is a competely arbitary value and is enforced by the
> > underyling Linux subsystem.  There are a couple of special cases:
> > 
> > *) For TCM/pSCSI, can_queue is enforced from struct scsi_device->queue_depth
> >    and max_sectors from the smaller of the two values from struct Scsi_Host->max_sectors
> >    and struct scsi_device->request_queue->limits.max_sectors.
> > 
> > *) For TCM/IBLOCK, max_sectors is enforced based on struct request_queue->limits.max_sectors.
> > 
> > *) For TCM/FILEIO and TCM/RAMDISK, both can_queue and max_sectors are
> >    set to arbitrarly high values.
> > 
> > Also I should mention that TCM_Loop code currently uses a hardcoded
> > struct scsi_host_template->can_queue=1 and ->max_sectors=128, but will
> > work fine with larger values.   Being able to change the Linux/SCSI
> > queue depth on the fly for TCM_Loop virtual SAS ports being used in KVM
> > guest could be quite useful for managing KVM Guest megasas emulation I/O
> > traffic on a larger scale..
> > 
> And my question / topic here is how to handle a hotplug capability in these
> cases: What happens if a device / HBA is plugged in with different / lower
> capabilities than those announced?

I think this question depends a great deal upon the coupling of the
virtual HBA queue depth and per physical Linux/SCSI reported device
queue depth.  Using the TCM/pSCSI subsystem plugin as an example here to
reference plain /dev/sdX backstores, there are two possible modes of
operation using referenced struct scsi_device's and their parent struct
Scsi_Host's:

Virtual HBA Mode: Present a arbitrarily high virtual HBA queue depth and
allow individual struct scsi_device's from different underlying struct
Scsi_Host's to hang from a single TCM HBA.  TCM will enforce the per
device queue depth presented by struct scsi_device->queue_depth.   

Physical HBA Mode: Enforce an physical LLD queue_depth from each
underlying struct Scsi_Host and all struct scsi_device attached to it.
This is required for SCSI LLDs that report a higher struct
scsi_device->queue_depth than what the underlying hardware for struct
Scsi_Host is capable.  TCM will enforce the per HBA and per device queue
depths presented by the SCSI LLD.

The main requirement for SCSI LLDs with the first mode to function
properly is that the underlying Linux/SCSI LLD must present the proper
struct scsi_device->queue_depth, and the sum total of queue slots
exposed by struct scsi_device's cannot exceed what the parent struct
Scsi_Host is capable of (also can change based on the number of LUNs
presented by the SCSI LLD)

I had ran into some buggy SCSI LLDs in v2.4 kernel days that reported
their queue depths improperly, but do not recall coming across this
issue personally recently on modern v2.6 drivers/scsi/ (not sure if they
are completely gone now).  So with this in mind, I added support for
virtual HBA mode (called PHV_VIRUTAL_HOST_ID and default) while leaving
the legacy phyiscal HBA mode available (called PHV_LLD_SCSI_HOST_NO) for
broken SCSI LLDs.  The commit for doing this with TCM/pSCSI is here:

[Target_Core_Mod/pSCSI]: Decouple subsystem plugin from struct Scsi_Host

http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=commitdiff;h=da5ed2625e7690c33f776dd1a907a2739fe7f779

> Can we change the announced settings for the HBA on the fly?

In existing TCM v3.x code, the HBA queue depth is not exposed as a
configfs attribute, so unfortuately this cannot be changed just yet..
However the per TCM device virtual and physical queue_depth is available
at:

/sys/kernel/config/target/core/$HBA/$DEV/attrib/[hw_]queue_depth

The 'queue_depth' attribute here what is being actively enforced by TCM
for the backstore device, and the 'hw_queue_depth' attribute is what had
been reported by TCM/pSCSI via struct scsi_device->queue_depth.

Changing 'queue_depth' for the backstore currently requires that no
fabric module port symlinks exist, but this is something that will be
changing for TCM 4.0.

Also, changing 'hw_queue_depth' from underlying struct scsi_device for
the plain /dev/sdX currently requires that the device be re-registered
from TCM.  However, it would be easy enough to do this on the fly if
there was a target mode callback present in
drivers/scsi/scsi.c:scsi_adjust_queue_depth() to tell me when the change
is happening within the LLD.  :-)

> 
> > The other big advantage of using TCM_Loop with your megasas guest
> > emulation means that existing TCM logic for >= SPC-3 T10 NAA naming, PR,
> > and ALUA emulation is immediately available to KVM guest, and does not
> > have to be reproduced in QEMU code.
> > 
> I'm not doubting that using TCM_loop here would be advantageous.
> But I have to find a solution for folks just wanting to run on plain /dev/sdX.
> 

Well, I think that using a scsi-debug-esque model like TCM_Loop + SG_IO
on top of a target infrastructure enforcing underlying HBA and device
requirements would give KVM Guests alot of flexibility with existing
code, even for the plain /dev/sdX case.

> And I need to find a common ground here to argue with the KVM folks,
> whose main objection against the megasas emulation is this issue.
>
> Either way would be fine by me, I just think we should come to a common
> understanding.
> 

Completely understood.  I will give SG_IO + TCM_Loop a shot with megasas
emulation into KVM Guest and see how things look with using backstores
configured with the two HBA Modes for TCM/pSCSI (plain /dev/sdX)
discussed above.

Best,

--nab

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-10 15:31 ` Vladislav Bolkhovitin
@ 2010-04-13  8:56   ` Hannes Reinecke
  2010-04-13 17:09     ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2010-04-13  8:56 UTC (permalink / raw
  To: Vladislav Bolkhovitin; +Cc: lsf10-pc, SCSI Mailing List

Vladislav Bolkhovitin wrote:
> Hello Hannes,
> 
> Hannes Reinecke, on 04/01/2010 12:15 PM wrote:
>> Hi all,
>>
>> [Topic]
>> Handling of invalid requests in virtual HBAs
>>
>> [Abstract]
>> This discussion will focus on the problem of correct request handling
>> with virtual HBAs.
>> For KVM I have implemented a 'megasas' HBA emulation which serves as a
>> backend for the
>> megaraid_sas linux driver.
>> It is now possible to connect several disks from different (physical)
>> HBAs to that
>> HBA emulation, each having different logical capabilities wrt
>> transfersize,
>> sgl size, sgl length etc.
>>
>> The goal of this discussion is how to determine the 'best' capability
>> setting for the
>> virtual HBA and how to handle hotplug scenarios, where a disk might be
>> plugged in
>> which has incompatible settings from the one the virtual HBA is using
>> currently.
> 
> If I understand correctly, you need to allow several KVM guests to share
> the same physical disks?
> 
No, the other way round: A KVM guest is using several physical disks,
each of which coming via a different HBA (eg sda from libata, sdb from lpfc
and the like).
So each request queue for the physical disks could be having different
capabilities, while being routed through the same virtual HBA in the
KVM guest.

The general idea for the virtual HBA is that scatter-gather lists
could be passed directly from the guest to the host (as opposed to
abstract single I/O blocks only like virtio).
But the size and shape of the sg lists is different for devices
coming from different HBAs, so we have two options here (this is
all done on the host side; the guest will only see one HBA):

a) Adjust the sg list to match the underlying capabilities of
   the device. Has the drawback that we defeat the elevator
   mechanism in the guest side as the announced capabilities
   there do _not_ match the capabilities on the host :-(
b) Adjust the HBA capabilities to the lowest common denominator
   of all physical devices presented to the guest.
   While this would save us from adjusting the sg lists,
   it still has the drawback the disk hotplugging won't
   work, as we can't readjust the HBA parameters in the
   guest after it's been created.

Neither of which is really appealing.

My idea here would be to move all required capabilities
to the device/request queue.
That would neatly solve this issue once and for all.
And even TGT, LIO-target, and SCST would benefit from this
methinks.

But this is exactly the discussion I'd like to have at LSF,
to see which approach is best or favoured.

And yes, I am perfectly aware that for a 'production'
system one would be using a proper target emulator
like LIO-target or SCST for this kind of setup.
But first I have to convince the KVM/Qemu folks to
actually include the megasas emulation.
Which they won't until the above problem is solved.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-13  8:56   ` Hannes Reinecke
@ 2010-04-13 17:09     ` Vladislav Bolkhovitin
  2010-04-13 18:37       ` Nicholas A. Bellinger
  0 siblings, 1 reply; 13+ messages in thread
From: Vladislav Bolkhovitin @ 2010-04-13 17:09 UTC (permalink / raw
  To: Hannes Reinecke; +Cc: lsf10-pc, SCSI Mailing List

Hannes Reinecke, on 04/13/2010 12:56 PM wrote:
> Vladislav Bolkhovitin wrote:
>> Hello Hannes,
>>
>> Hannes Reinecke, on 04/01/2010 12:15 PM wrote:
>>> Hi all,
>>>
>>> [Topic]
>>> Handling of invalid requests in virtual HBAs
>>>
>>> [Abstract]
>>> This discussion will focus on the problem of correct request handling
>>> with virtual HBAs.
>>> For KVM I have implemented a 'megasas' HBA emulation which serves as a
>>> backend for the
>>> megaraid_sas linux driver.
>>> It is now possible to connect several disks from different (physical)
>>> HBAs to that
>>> HBA emulation, each having different logical capabilities wrt
>>> transfersize,
>>> sgl size, sgl length etc.
>>>
>>> The goal of this discussion is how to determine the 'best' capability
>>> setting for the
>>> virtual HBA and how to handle hotplug scenarios, where a disk might be
>>> plugged in
>>> which has incompatible settings from the one the virtual HBA is using
>>> currently.
>> If I understand correctly, you need to allow several KVM guests to share
>> the same physical disks?
>>
> No, the other way round: A KVM guest is using several physical disks,
> each of which coming via a different HBA (eg sda from libata, sdb from lpfc
> and the like).
> So each request queue for the physical disks could be having different
> capabilities, while being routed through the same virtual HBA in the
> KVM guest.
> 
> The general idea for the virtual HBA is that scatter-gather lists
> could be passed directly from the guest to the host (as opposed to
> abstract single I/O blocks only like virtio).
> But the size and shape of the sg lists is different for devices
> coming from different HBAs, so we have two options here (this is
> all done on the host side; the guest will only see one HBA):
> 
> a) Adjust the sg list to match the underlying capabilities of
>    the device. Has the drawback that we defeat the elevator
>    mechanism in the guest side as the announced capabilities
>    there do _not_ match the capabilities on the host :-(
> b) Adjust the HBA capabilities to the lowest common denominator
>    of all physical devices presented to the guest.
>    While this would save us from adjusting the sg lists,
>    it still has the drawback the disk hotplugging won't
>    work, as we can't readjust the HBA parameters in the
>    guest after it's been created.
> 
> Neither of which is really appealing.

Why only a single virtual HBA should be used? Why not to have a 
dedicated virtual HBA for each physical HBA? This way you wouldn't
have problems with capabilities and the need to have lowest common 
denominator. Basically, it's a matter of another struct 
scsi_host_template with possibly the same shared callback functions..

> My idea here would be to move all required capabilities
> to the device/request queue.
> That would neatly solve this issue once and for all.
> And even TGT, LIO-target, and SCST would benefit from this
> methinks.
> 
> But this is exactly the discussion I'd like to have at LSF,
> to see which approach is best or favoured.
> 
> And yes, I am perfectly aware that for a 'production'
> system one would be using a proper target emulator
> like LIO-target or SCST for this kind of setup.
> But first I have to convince the KVM/Qemu folks to
> actually include the megasas emulation.
> Which they won't until the above problem is solved.

LIO doesn't support 1 to many pass-through devices sharing, so SCST in 
the only option.

Vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-13 17:09     ` Vladislav Bolkhovitin
@ 2010-04-13 18:37       ` Nicholas A. Bellinger
  2010-04-13 19:23         ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas A. Bellinger @ 2010-04-13 18:37 UTC (permalink / raw
  To: Vladislav Bolkhovitin; +Cc: Hannes Reinecke, lsf10-pc, SCSI Mailing List

On Tue, 2010-04-13 at 21:09 +0400, Vladislav Bolkhovitin wrote:
> Hannes Reinecke, on 04/13/2010 12:56 PM wrote:
> > The general idea for the virtual HBA is that scatter-gather lists
> > could be passed directly from the guest to the host (as opposed to
> > abstract single I/O blocks only like virtio).
> > But the size and shape of the sg lists is different for devices
> > coming from different HBAs, so we have two options here (this is
> > all done on the host side; the guest will only see one HBA):
> > 
> > a) Adjust the sg list to match the underlying capabilities of
> >    the device. Has the drawback that we defeat the elevator
> >    mechanism in the guest side as the announced capabilities
> >    there do _not_ match the capabilities on the host :-(
> > b) Adjust the HBA capabilities to the lowest common denominator
> >    of all physical devices presented to the guest.
> >    While this would save us from adjusting the sg lists,
> >    it still has the drawback the disk hotplugging won't
> >    work, as we can't readjust the HBA parameters in the
> >    guest after it's been created.
> > 
> > Neither of which is really appealing.
> 
> Why only a single virtual HBA should be used? Why not to have a 
> dedicated virtual HBA for each physical HBA? This way you wouldn't
> have problems with capabilities and the need to have lowest common 
> denominator. Basically, it's a matter of another struct 
> scsi_host_template with possibly the same shared callback functions..
> 
> > My idea here would be to move all required capabilities
> > to the device/request queue.
> > That would neatly solve this issue once and for all.
> > And even TGT, LIO-target, and SCST would benefit from this
> > methinks.
> > 
> > But this is exactly the discussion I'd like to have at LSF,
> > to see which approach is best or favoured.
> > 
> > And yes, I am perfectly aware that for a 'production'
> > system one would be using a proper target emulator
> > like LIO-target or SCST for this kind of setup.
> > But first I have to convince the KVM/Qemu folks to
> > actually include the megasas emulation.
> > Which they won't until the above problem is solved.
> 
> LIO doesn't support 1 to many pass-through devices sharing, so SCST in 
> the only option.

Sorry, but this statement about your perceived limitiations wrt TCM/LIO
is completely incorrect.

Using a single passthrough backstore device (eg: plain /dev/sdX) with
TCM/pSCSI (or any TCM subsystem plugin) has been supported since the
dawn of time to allow for any number of TCM_Loop Virtual SAS Ports with
SG_IO going into KVM Guest.  The same is also true for LIO-Target
(iSCSI) and TCM_FC (FCoE) ports as well regardless of TCM subsystem
backstore.

Perhaps you would be so kind to provide a TCM/LIO source code reference
from where you came up with this make-believe notion..?

--nab


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-13 18:37       ` Nicholas A. Bellinger
@ 2010-04-13 19:23         ` Vladislav Bolkhovitin
  2010-04-13 20:45           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 13+ messages in thread
From: Vladislav Bolkhovitin @ 2010-04-13 19:23 UTC (permalink / raw
  To: Nicholas A. Bellinger; +Cc: Hannes Reinecke, lsf10-pc, SCSI Mailing List

Nicholas A. Bellinger, on 04/13/2010 10:37 PM wrote:
> On Tue, 2010-04-13 at 21:09 +0400, Vladislav Bolkhovitin wrote:
>> LIO doesn't support 1 to many pass-through devices sharing, so SCST in 
>> the only option.
> 
> Sorry, but this statement about your perceived limitiations wrt TCM/LIO
> is completely incorrect.
> 
> Using a single passthrough backstore device (eg: plain /dev/sdX) with
> TCM/pSCSI (or any TCM subsystem plugin) has been supported since the
> dawn of time to allow for any number of TCM_Loop Virtual SAS Ports with
> SG_IO going into KVM Guest.  The same is also true for LIO-Target
> (iSCSI) and TCM_FC (FCoE) ports as well regardless of TCM subsystem
> backstore.
> 
> Perhaps you would be so kind to provide a TCM/LIO source code reference
> from where you came up with this make-believe notion..?

I've just rechecked with the latest 
git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git and 
still wasn't able to find in LIO required pieces of functionality to 
support 1 to many pass-through. I see only code for non-enforced 1 to 1 
pass-through (single initiator only). You can use with it more 
initiators only as the SCSI violation, from pointing on which I started 
my participation in this discussion.

Particularly, I can't see the code, which in pass-through mode upon 
receive of RESERVE command sends it also to the backend device, if 
necessary (i.e. only the first time), and then sends RELEASE command to 
the device upon the reservation holder removal. Could you point me on 
the *exact* code which implements that?

Vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-13 19:23         ` Vladislav Bolkhovitin
@ 2010-04-13 20:45           ` Nicholas A. Bellinger
  2010-04-14 12:59             ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas A. Bellinger @ 2010-04-13 20:45 UTC (permalink / raw
  To: Vladislav Bolkhovitin; +Cc: Hannes Reinecke, lsf10-pc, SCSI Mailing List

On Tue, 2010-04-13 at 23:23 +0400, Vladislav Bolkhovitin wrote:
> Nicholas A. Bellinger, on 04/13/2010 10:37 PM wrote:
> > On Tue, 2010-04-13 at 21:09 +0400, Vladislav Bolkhovitin wrote:
> >> LIO doesn't support 1 to many pass-through devices sharing, so SCST in 
> >> the only option.
> > 
> > Sorry, but this statement about your perceived limitiations wrt TCM/LIO
> > is completely incorrect.
> > 
> > Using a single passthrough backstore device (eg: plain /dev/sdX) with
> > TCM/pSCSI (or any TCM subsystem plugin) has been supported since the
> > dawn of time to allow for any number of TCM_Loop Virtual SAS Ports with
> > SG_IO going into KVM Guest.  The same is also true for LIO-Target
> > (iSCSI) and TCM_FC (FCoE) ports as well regardless of TCM subsystem
> > backstore.
> > 
> > Perhaps you would be so kind to provide a TCM/LIO source code reference
> > from where you came up with this make-believe notion..?
> 
> I've just rechecked with the latest 
> git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git and 
> still wasn't able to find in LIO required pieces of functionality to 
> support 1 to many pass-through. I see only code for non-enforced 1 to 1 
> pass-through (single initiator only). You can use with it more 
> initiators only as the SCSI violation, from pointing on which I started 
> my participation in this discussion.
> 

Sorry, but listing the path to my git tree on k.o is not the same as
citing a actual TCM/LIO source+line reference.

> Particularly, I can't see the code, which in pass-through mode upon 
> receive of RESERVE command sends it also to the backend device, if 
> necessary (i.e. only the first time), and then sends RELEASE command to 
> the device upon the reservation holder removal. Could you point me on 
> the *exact* code which implements that?
> 

Again, since *you* are the one making a claim, *you* are the one
expected to back it up with a concrete code reference and example
scenario.   Please, do the leg-work yourself wrt TCM SPC-3 persisetent
reservations and legacy SPC-2 reservations instead of expecting me to do
the actual work for you to verify your own fanciful claims about target
mode, seriously..

So, short of you being able to produce a response that is concrete and
human-readable, your claim will once again be dismissed as generic
hand-waving.

--nab


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-13 20:45           ` Nicholas A. Bellinger
@ 2010-04-14 12:59             ` Vladislav Bolkhovitin
  2010-04-14 13:49               ` Nicholas A. Bellinger
  0 siblings, 1 reply; 13+ messages in thread
From: Vladislav Bolkhovitin @ 2010-04-14 12:59 UTC (permalink / raw
  To: Nicholas A. Bellinger; +Cc: Hannes Reinecke, lsf10-pc, SCSI Mailing List

Nicholas A. Bellinger, on 04/14/2010 12:45 AM wrote:
> On Tue, 2010-04-13 at 23:23 +0400, Vladislav Bolkhovitin wrote:
>> Nicholas A. Bellinger, on 04/13/2010 10:37 PM wrote:
>>> On Tue, 2010-04-13 at 21:09 +0400, Vladislav Bolkhovitin wrote:
>>>> LIO doesn't support 1 to many pass-through devices sharing, so SCST in 
>>>> the only option.
>>> Sorry, but this statement about your perceived limitiations wrt TCM/LIO
>>> is completely incorrect.
>>>
>>> Using a single passthrough backstore device (eg: plain /dev/sdX) with
>>> TCM/pSCSI (or any TCM subsystem plugin) has been supported since the
>>> dawn of time to allow for any number of TCM_Loop Virtual SAS Ports with
>>> SG_IO going into KVM Guest.  The same is also true for LIO-Target
>>> (iSCSI) and TCM_FC (FCoE) ports as well regardless of TCM subsystem
>>> backstore.
>>>
>>> Perhaps you would be so kind to provide a TCM/LIO source code reference
>>> from where you came up with this make-believe notion..?
>> I've just rechecked with the latest 
>> git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git and 
>> still wasn't able to find in LIO required pieces of functionality to 
>> support 1 to many pass-through. I see only code for non-enforced 1 to 1 
>> pass-through (single initiator only). You can use with it more 
>> initiators only as the SCSI violation, from pointing on which I started 
>> my participation in this discussion.
>>
> 
> Sorry, but listing the path to my git tree on k.o is not the same as
> citing a actual TCM/LIO source+line reference.

That's interesting. How can I point to the code which doesn't exist?

>> Particularly, I can't see the code, which in pass-through mode upon 
>> receive of RESERVE command sends it also to the backend device, if 
>> necessary (i.e. only the first time), and then sends RELEASE command to 
>> the device upon the reservation holder removal. Could you point me on 
>> the *exact* code which implements that?
>>
> 
> Again, since *you* are the one making a claim, *you* are the one
> expected to back it up with a concrete code reference and example
> scenario.   Please, do the leg-work yourself wrt TCM SPC-3 persisetent
> reservations and legacy SPC-2 reservations instead of expecting me to do
> the actual work for you to verify your own fanciful claims about target
> mode, seriously..
> 
> So, short of you being able to produce a response that is concrete and
> human-readable, your claim will once again be dismissed as generic
> hand-waving.

Well, this is an Open Source and in Open Source code auditing is the 
most important way to verify an implementation. I reviewed TCM/LIO and 
my conclusion is pretty clear: it misses too many important bits to be a 
1 to many pass-through implementation, so it implements only 
not-enforced 1 to 1 pass-through. You implemented reservations 
emulation, but that's _NOT_ enough. I gave an example of the one among 
many missed functionality.

I'm not going to go any deeper than I've already gone doing your home 
work for you and explaining you the SCSI basics (although for me it's 
really hard to believe that a person implemented something as big as 
TCM/LIO can't see such obvious things), because when I did it before, 
when you asserted that if a pass-trough device supports Persistent 
Reservations, LIO in pass-through mode with this device automatically 
supports them too, which is obviously wrong (see the end of 
http://lkml.org/lkml/2008/7/14/273), I didn't have even a bit of 
appreciation for my effort. Instead I received privately from you 
insulting and threatening e-mails.

If you are not agree with my conclusion about pass-through 
implementation in TCM/LIO, you can either point us on the code 
implementing the necessary functionality, which I missed, or argue why 
it isn't needed. But if you continue attacking me personally trying to 
discredit me, everybody will see that my conclusions are definitely 
correct as well as which is your preferred way of cooperation.

Vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-14 12:59             ` Vladislav Bolkhovitin
@ 2010-04-14 13:49               ` Nicholas A. Bellinger
  0 siblings, 0 replies; 13+ messages in thread
From: Nicholas A. Bellinger @ 2010-04-14 13:49 UTC (permalink / raw
  To: Vladislav Bolkhovitin; +Cc: Hannes Reinecke, lsf10-pc, SCSI Mailing List

On Wed, 2010-04-14 at 16:59 +0400, Vladislav Bolkhovitin wrote:
> Nicholas A. Bellinger, on 04/14/2010 12:45 AM wrote:
> > On Tue, 2010-04-13 at 23:23 +0400, Vladislav Bolkhovitin wrote:
> >> Nicholas A. Bellinger, on 04/13/2010 10:37 PM wrote:
> >>> On Tue, 2010-04-13 at 21:09 +0400, Vladislav Bolkhovitin wrote:
> >>>> LIO doesn't support 1 to many pass-through devices sharing, so SCST in 
> >>>> the only option.
> >>> Sorry, but this statement about your perceived limitiations wrt TCM/LIO
> >>> is completely incorrect.
> >>>
> >>> Using a single passthrough backstore device (eg: plain /dev/sdX) with
> >>> TCM/pSCSI (or any TCM subsystem plugin) has been supported since the
> >>> dawn of time to allow for any number of TCM_Loop Virtual SAS Ports with
> >>> SG_IO going into KVM Guest.  The same is also true for LIO-Target
> >>> (iSCSI) and TCM_FC (FCoE) ports as well regardless of TCM subsystem
> >>> backstore.
> >>>
> >>> Perhaps you would be so kind to provide a TCM/LIO source code reference
> >>> from where you came up with this make-believe notion..?
> >> I've just rechecked with the latest 
> >> git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git and 
> >> still wasn't able to find in LIO required pieces of functionality to 
> >> support 1 to many pass-through. I see only code for non-enforced 1 to 1 
> >> pass-through (single initiator only). You can use with it more 
> >> initiators only as the SCSI violation, from pointing on which I started 
> >> my participation in this discussion.
> >>
> > 
> > Sorry, but listing the path to my git tree on k.o is not the same as
> > citing a actual TCM/LIO source+line reference.
> 
> That's interesting. How can I point to the code which doesn't exist?
> 
> >> Particularly, I can't see the code, which in pass-through mode upon 
> >> receive of RESERVE command sends it also to the backend device, if 
> >> necessary (i.e. only the first time), and then sends RELEASE command to 
> >> the device upon the reservation holder removal. Could you point me on 
> >> the *exact* code which implements that?
> >>
> > 
> > Again, since *you* are the one making a claim, *you* are the one
> > expected to back it up with a concrete code reference and example
> > scenario.   Please, do the leg-work yourself wrt TCM SPC-3 persisetent
> > reservations and legacy SPC-2 reservations instead of expecting me to do
> > the actual work for you to verify your own fanciful claims about target
> > mode, seriously..
> > 
> > So, short of you being able to produce a response that is concrete and
> > human-readable, your claim will once again be dismissed as generic
> > hand-waving.
> 
> Well, this is an Open Source and in Open Source code auditing is the 
> most important way to verify an implementation. I reviewed TCM/LIO and 
> my conclusion is pretty clear:

Actually no, you have not reviewed or commented on a *single* one of the
'one patch per feature' commit series implementing >= SPC-3 Persistent
Reservations and implict/explict ALUA that I have posted to linux-scsi
in the last 18 months.  Instead you pick out this LSF thread about
Virtual HBAs + megasas + KVm Guest to start airing your persistent
reservations concerns with me, why on earth would that possibily be..?

See, the way that us here in the real world review patches is by posting
digestable individual feature bits so that they can be understood and
reviewed by actual humans instead of randomly posting 10,000s lines of
code with *zero* context for anyone to care about or understand.

>  it misses too many important bits to be a 
> 1 to many pass-through implementation, so it implements only 
> not-enforced 1 to 1 pass-through. You implemented reservations 
> emulation, but that's _NOT_ enough. I gave an example of the one among 
> many missed functionality.

Sorry, but again you are completely incorrect.  Why don't you have a
look at what TCM supports for iSCSI, SAS and FC target ports from the
link below and get specific about the 'many missed functionality'..?

http://linux-iscsi.org/index.php/Persistent_Reservations

> 
> I'm not going to go any deeper than I've already gone doing your home 
> work for you and explaining you the SCSI basics (although for me it's 
> really hard to believe that a person implemented something as big as 
> TCM/LIO can't see such obvious things), because when I did it before, 
> when you asserted that if a pass-trough device supports Persistent 
> Reservations, LIO in pass-through mode with this device automatically 
> supports them too, which is obviously wrong (see the end of 
> http://lkml.org/lkml/2008/7/14/273), I didn't have even a bit of 
> appreciation for my effort. Instead I received privately from you 
> insulting and threatening e-mails.
> 

Wow, this is where you are getting your ideas from..?  An email from
2008 talking about LIO 2.x code..?  Seriously, you attempting to ignore
all of the work that has gone into TCM/ConfigFS v3.x -> v4.0 means that
you are living in a world of fantasy.

> If you are not agree with my conclusion about pass-through 
> implementation in TCM/LIO, you can either point us on the code 
> implementing the necessary functionality, which I missed, or argue why 
> it isn't needed. But if you continue attacking me personally trying to 
> discredit me, everybody will see that my conclusions are definitely 
> correct as well as which is your preferred way of cooperation.
> 

Sorry, but I cannot make it any more clear for you.  Either you provide
a code reference and scenario for your claim against TCM/LIO code from
this decade for lio-core-2.6.git, or I will once again ignore your
hand-waving until you can learn how to have an on-topic discussion of
meaning and substance.

--nab



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs
  2010-04-01  8:15 [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs Hannes Reinecke
  2010-04-02  5:33 ` Nicholas A. Bellinger
  2010-04-10 15:31 ` Vladislav Bolkhovitin
@ 2010-05-10  3:16 ` FUJITA Tomonori
  2 siblings, 0 replies; 13+ messages in thread
From: FUJITA Tomonori @ 2010-05-10  3:16 UTC (permalink / raw
  To: hare; +Cc: lsf10-pc, linux-scsi

On Thu, 01 Apr 2010 10:15:46 +0200
Hannes Reinecke <hare@suse.de> wrote:

> [Abstract]
> This discussion will focus on the problem of correct request handling with virtual HBAs.
> For KVM I have implemented a 'megasas' HBA emulation which serves as a backend for the
> megaraid_sas linux driver.
> It is now possible to connect several disks from different (physical) HBAs to that
> HBA emulation, each having different logical capabilities wrt transfersize,
> sgl size, sgl length etc.

The 'megaraid_sas' emulation HBA tells the guests about such
capabilities via the register, right?

If so, the problem is that how to do it with hotplug on the host?


> The goal of this discussion is how to determine the 'best' capability setting for the
> virtual HBA and how to handle hotplug scenarios, where a disk might be plugged in
> which has incompatible settings from the one the virtual HBA is using currently.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-05-10  3:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-01  8:15 [LSF/VM TOPIC] Handling of invalid requests in virtual HBAs Hannes Reinecke
2010-04-02  5:33 ` Nicholas A. Bellinger
2010-04-08 13:44   ` Hannes Reinecke
2010-04-10 23:50     ` Nicholas A. Bellinger
2010-04-10 15:31 ` Vladislav Bolkhovitin
2010-04-13  8:56   ` Hannes Reinecke
2010-04-13 17:09     ` Vladislav Bolkhovitin
2010-04-13 18:37       ` Nicholas A. Bellinger
2010-04-13 19:23         ` Vladislav Bolkhovitin
2010-04-13 20:45           ` Nicholas A. Bellinger
2010-04-14 12:59             ` Vladislav Bolkhovitin
2010-04-14 13:49               ` Nicholas A. Bellinger
2010-05-10  3:16 ` FUJITA Tomonori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.