All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* status of block-integrity
@ 2013-12-22 19:21 Christoph Hellwig
  2013-12-22 20:45 ` Nicholas A. Bellinger
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Christoph Hellwig @ 2013-12-22 19:21 UTC (permalink / raw
  To: Martin K. Petersen, Jens Axboe; +Cc: linux-kernel, linux-scsi

We have the block integrity code to support DIF/DIX in the the tree for
about 5 and a half years, and we still don't have a single consumer of
it.  By normal kernel rules it should never have been merged, or at
least the bitrot long removed.

Given that we'll have a lot of work to do in this area with block
multiqueue I think it's time to either kill it off for good or make sure
we can actually use and test it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2013-12-22 19:21 status of block-integrity Christoph Hellwig
@ 2013-12-22 20:45 ` Nicholas A. Bellinger
  2013-12-23 13:35 ` Martin K. Petersen
  2014-01-03 15:01 ` Hannes Reinecke
  2 siblings, 0 replies; 24+ messages in thread
From: Nicholas A. Bellinger @ 2013-12-22 20:45 UTC (permalink / raw
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Jens Axboe, linux-kernel, linux-scsi

On Sun, 2013-12-22 at 11:21 -0800, Christoph Hellwig wrote:
> We have the block integrity code to support DIF/DIX in the the tree for
> about 5 and a half years, and we still don't have a single consumer of
> it.  By normal kernel rules it should never have been merged, or at
> least the bitrot long removed.

Is not sd_dif.c a consumer of blk_integrity.c logic..?

> Given that we'll have a lot of work to do in this area with block
> multiqueue I think it's time to either kill it off for good or make sure
> we can actually use and test it.

Speak of the devil..

I've been working on enabling DIF support in scsi-mq recently, and
AFAICT the only part that is required in blk-mq for DIF emulation to
function with scsi-debug is the following patch.

commit 1428a390cc16025f93905852777d4afd8aeba05d
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date:   Sun Dec 22 11:58:49 2013 +0000

    blk-mq: Add bio_integrity setup to blk_mq_make_request
    
    This patch adds the missing bio_integrity_enabled() +
    bio_integrity_prep() setup into blk_mq_make_request()
    in order to use DIF protection with scsi-mq.
    
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c79126e..a520c39 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -916,6 +916,11 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
        blk_queue_bounce(q, &bio);
 
+       if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
+               bio_endio(bio, -EIO);
+               return;
+       }
+
        if (use_plug && blk_attempt_plug_merge(q, bio, &request_count))
                return;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2013-12-22 19:21 status of block-integrity Christoph Hellwig
  2013-12-22 20:45 ` Nicholas A. Bellinger
@ 2013-12-23 13:35 ` Martin K. Petersen
  2013-12-23 13:48   ` Christoph Hellwig
                     ` (2 more replies)
  2014-01-03 15:01 ` Hannes Reinecke
  2 siblings, 3 replies; 24+ messages in thread
From: Martin K. Petersen @ 2013-12-23 13:35 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-kernel, linux-scsi

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> We have the block integrity code to support DIF/DIX in the
Christoph> the tree for about 5 and a half years, and we still don't
Christoph> have a single consumer of it.  

What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
then integrity protection is active from the block layer down. The only
code that's not currently being exercised are the tag interleaving
functions.  I was hoping the FS people would use them for back pointers
but nobody seemed to bite.


Christoph> Given that we'll have a lot of work to do in this area with
Christoph> block multiqueue I think it's time to either kill it off for
Christoph> good or make sure we can actually use and test it.

I don't understand why multiqueue would require a lot of work? It's just
an extra scatterlist per request.

And obviously, if there's anything that needs to be done in this area
I'll be happy to do so...

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2013-12-23 13:35 ` Martin K. Petersen
@ 2013-12-23 13:48   ` Christoph Hellwig
  2013-12-31 19:41   ` berthiaume, wayne
  2014-01-07  8:28   ` Ric Wheeler
  2 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2013-12-23 13:48 UTC (permalink / raw
  To: Martin K. Petersen; +Cc: Jens Axboe, linux-kernel, linux-scsi

On Mon, Dec 23, 2013 at 08:35:22AM -0500, Martin K. Petersen wrote:
> >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
> 
> Christoph> We have the block integrity code to support DIF/DIX in the
> Christoph> the tree for about 5 and a half years, and we still don't
> Christoph> have a single consumer of it.  
> 
> What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
> then integrity protection is active from the block layer down. The only
> code that's not currently being exercised are the tag interleaving
> functions.  I was hoping the FS people would use them for back pointers
> but nobody seemed to bite.

With single consumer of it I obviously meant the various symbols for the
consumer side as well as the application tag support.

Patch to remove the dead code below:

---
From: Christoph Hellwig <hch@lst.de>
Subject: [PATCH] block: remove dead on arrival integrity code

Signed-off-by: Christoph Hellwig <hch@lst.de>

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 03cf717..0b14db7 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -194,7 +194,6 @@ int blk_integrity_merge_rq(struct request_queue *q, struct request *req,
 
 	return 0;
 }
-EXPORT_SYMBOL(blk_integrity_merge_rq);
 
 int blk_integrity_merge_bio(struct request_queue *q, struct request *req,
 			    struct bio *bio)
@@ -214,7 +213,6 @@ int blk_integrity_merge_bio(struct request_queue *q, struct request *req,
 
 	return 0;
 }
-EXPORT_SYMBOL(blk_integrity_merge_bio);
 
 struct integrity_sysfs_entry {
 	struct attribute attr;
@@ -414,8 +412,6 @@ int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template)
 		bi->generate_fn = template->generate_fn;
 		bi->verify_fn = template->verify_fn;
 		bi->tuple_size = template->tuple_size;
-		bi->set_tag_fn = template->set_tag_fn;
-		bi->get_tag_fn = template->get_tag_fn;
 		bi->tag_size = template->tag_size;
 	} else
 		bi->name = bi_unsupported_name;
diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index 6174ca4..e32035a 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -128,39 +128,10 @@ static int sd_dif_type1_verify_ip(struct blk_integrity_exchg *bix)
 	return sd_dif_type1_verify(bix, sd_dif_ip_fn);
 }
 
-/*
- * Functions for interleaving and deinterleaving application tags
- */
-static void sd_dif_type1_set_tag(void *prot, void *tag_buf, unsigned int sectors)
-{
-	struct sd_dif_tuple *sdt = prot;
-	u8 *tag = tag_buf;
-	unsigned int i, j;
-
-	for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
-		sdt->app_tag = tag[j] << 8 | tag[j+1];
-		BUG_ON(sdt->app_tag == 0xffff);
-	}
-}
-
-static void sd_dif_type1_get_tag(void *prot, void *tag_buf, unsigned int sectors)
-{
-	struct sd_dif_tuple *sdt = prot;
-	u8 *tag = tag_buf;
-	unsigned int i, j;
-
-	for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
-		tag[j] = (sdt->app_tag & 0xff00) >> 8;
-		tag[j+1] = sdt->app_tag & 0xff;
-	}
-}
-
 static struct blk_integrity dif_type1_integrity_crc = {
 	.name			= "T10-DIF-TYPE1-CRC",
 	.generate_fn		= sd_dif_type1_generate_crc,
 	.verify_fn		= sd_dif_type1_verify_crc,
-	.get_tag_fn		= sd_dif_type1_get_tag,
-	.set_tag_fn		= sd_dif_type1_set_tag,
 	.tuple_size		= sizeof(struct sd_dif_tuple),
 	.tag_size		= 0,
 };
@@ -169,8 +140,6 @@ static struct blk_integrity dif_type1_integrity_ip = {
 	.name			= "T10-DIF-TYPE1-IP",
 	.generate_fn		= sd_dif_type1_generate_ip,
 	.verify_fn		= sd_dif_type1_verify_ip,
-	.get_tag_fn		= sd_dif_type1_get_tag,
-	.set_tag_fn		= sd_dif_type1_set_tag,
 	.tuple_size		= sizeof(struct sd_dif_tuple),
 	.tag_size		= 0,
 };
@@ -245,42 +214,10 @@ static int sd_dif_type3_verify_ip(struct blk_integrity_exchg *bix)
 	return sd_dif_type3_verify(bix, sd_dif_ip_fn);
 }
 
-static void sd_dif_type3_set_tag(void *prot, void *tag_buf, unsigned int sectors)
-{
-	struct sd_dif_tuple *sdt = prot;
-	u8 *tag = tag_buf;
-	unsigned int i, j;
-
-	for (i = 0, j = 0 ; i < sectors ; i++, j += 6, sdt++) {
-		sdt->app_tag = tag[j] << 8 | tag[j+1];
-		sdt->ref_tag = tag[j+2] << 24 | tag[j+3] << 16 |
-			tag[j+4] << 8 | tag[j+5];
-	}
-}
-
-static void sd_dif_type3_get_tag(void *prot, void *tag_buf, unsigned int sectors)
-{
-	struct sd_dif_tuple *sdt = prot;
-	u8 *tag = tag_buf;
-	unsigned int i, j;
-
-	for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
-		tag[j] = (sdt->app_tag & 0xff00) >> 8;
-		tag[j+1] = sdt->app_tag & 0xff;
-		tag[j+2] = (sdt->ref_tag & 0xff000000) >> 24;
-		tag[j+3] = (sdt->ref_tag & 0xff0000) >> 16;
-		tag[j+4] = (sdt->ref_tag & 0xff00) >> 8;
-		tag[j+5] = sdt->ref_tag & 0xff;
-		BUG_ON(sdt->app_tag == 0xffff || sdt->ref_tag == 0xffffffff);
-	}
-}
-
 static struct blk_integrity dif_type3_integrity_crc = {
 	.name			= "T10-DIF-TYPE3-CRC",
 	.generate_fn		= sd_dif_type3_generate_crc,
 	.verify_fn		= sd_dif_type3_verify_crc,
-	.get_tag_fn		= sd_dif_type3_get_tag,
-	.set_tag_fn		= sd_dif_type3_set_tag,
 	.tuple_size		= sizeof(struct sd_dif_tuple),
 	.tag_size		= 0,
 };
@@ -289,8 +226,6 @@ static struct blk_integrity dif_type3_integrity_ip = {
 	.name			= "T10-DIF-TYPE3-IP",
 	.generate_fn		= sd_dif_type3_generate_ip,
 	.verify_fn		= sd_dif_type3_verify_ip,
-	.get_tag_fn		= sd_dif_type3_get_tag,
-	.set_tag_fn		= sd_dif_type3_set_tag,
 	.tuple_size		= sizeof(struct sd_dif_tuple),
 	.tag_size		= 0,
 };
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index fc60b31..793eaa4 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -32,6 +32,8 @@
 static struct kmem_cache *bip_slab;
 static struct workqueue_struct *kintegrityd_wq;
 
+static void bio_integrity_endio(struct bio *bio, int error);
+
 /**
  * bio_integrity_alloc - Allocate integrity payload and attach it to bio
  * @bio:	bio to attach integrity metadata to
@@ -42,7 +44,7 @@ static struct workqueue_struct *kintegrityd_wq;
  * metadata.  nr_vecs specifies the maximum number of pages containing
  * integrity metadata that can be attached.
  */
-struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
+static struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
 						  gfp_t gfp_mask,
 						  unsigned int nr_vecs)
 {
@@ -83,7 +85,6 @@ err:
 	mempool_free(bip, bs->bio_integrity_pool);
 	return NULL;
 }
-EXPORT_SYMBOL(bio_integrity_alloc);
 
 /**
  * bio_integrity_free - Free bio integrity payload
@@ -112,7 +113,6 @@ void bio_integrity_free(struct bio *bio)
 
 	bio->bi_integrity = NULL;
 }
-EXPORT_SYMBOL(bio_integrity_free);
 
 /**
  * bio_integrity_add_page - Attach integrity metadata
@@ -123,7 +123,7 @@ EXPORT_SYMBOL(bio_integrity_free);
  *
  * Description: Attach a page containing integrity metadata to bio.
  */
-int bio_integrity_add_page(struct bio *bio, struct page *page,
+static int bio_integrity_add_page(struct bio *bio, struct page *page,
 			   unsigned int len, unsigned int offset)
 {
 	struct bio_integrity_payload *bip = bio->bi_integrity;
@@ -144,7 +144,6 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
 
 	return len;
 }
-EXPORT_SYMBOL(bio_integrity_add_page);
 
 static int bdev_integrity_enabled(struct block_device *bdev, int rw)
 {
@@ -181,7 +180,6 @@ int bio_integrity_enabled(struct bio *bio)
 
 	return bdev_integrity_enabled(bio->bi_bdev, bio_data_dir(bio));
 }
-EXPORT_SYMBOL(bio_integrity_enabled);
 
 /**
  * bio_integrity_hw_sectors - Convert 512b sectors to hardware ditto
@@ -204,89 +202,6 @@ static inline unsigned int bio_integrity_hw_sectors(struct blk_integrity *bi,
 }
 
 /**
- * bio_integrity_tag_size - Retrieve integrity tag space
- * @bio:	bio to inspect
- *
- * Description: Returns the maximum number of tag bytes that can be
- * attached to this bio. Filesystems can use this to determine how
- * much metadata to attach to an I/O.
- */
-unsigned int bio_integrity_tag_size(struct bio *bio)
-{
-	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
-
-	BUG_ON(bio->bi_size == 0);
-
-	return bi->tag_size * (bio->bi_size / bi->sector_size);
-}
-EXPORT_SYMBOL(bio_integrity_tag_size);
-
-int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len, int set)
-{
-	struct bio_integrity_payload *bip = bio->bi_integrity;
-	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
-	unsigned int nr_sectors;
-
-	BUG_ON(bip->bip_buf == NULL);
-
-	if (bi->tag_size == 0)
-		return -1;
-
-	nr_sectors = bio_integrity_hw_sectors(bi,
-					DIV_ROUND_UP(len, bi->tag_size));
-
-	if (nr_sectors * bi->tuple_size > bip->bip_size) {
-		printk(KERN_ERR "%s: tag too big for bio: %u > %u\n",
-		       __func__, nr_sectors * bi->tuple_size, bip->bip_size);
-		return -1;
-	}
-
-	if (set)
-		bi->set_tag_fn(bip->bip_buf, tag_buf, nr_sectors);
-	else
-		bi->get_tag_fn(bip->bip_buf, tag_buf, nr_sectors);
-
-	return 0;
-}
-
-/**
- * bio_integrity_set_tag - Attach a tag buffer to a bio
- * @bio:	bio to attach buffer to
- * @tag_buf:	Pointer to a buffer containing tag data
- * @len:	Length of the included buffer
- *
- * Description: Use this function to tag a bio by leveraging the extra
- * space provided by devices formatted with integrity protection.  The
- * size of the integrity buffer must be <= to the size reported by
- * bio_integrity_tag_size().
- */
-int bio_integrity_set_tag(struct bio *bio, void *tag_buf, unsigned int len)
-{
-	BUG_ON(bio_data_dir(bio) != WRITE);
-
-	return bio_integrity_tag(bio, tag_buf, len, 1);
-}
-EXPORT_SYMBOL(bio_integrity_set_tag);
-
-/**
- * bio_integrity_get_tag - Retrieve a tag buffer from a bio
- * @bio:	bio to retrieve buffer from
- * @tag_buf:	Pointer to a buffer for the tag data
- * @len:	Length of the target buffer
- *
- * Description: Use this function to retrieve the tag buffer from a
- * completed I/O. The size of the integrity buffer must be <= to the
- * size reported by bio_integrity_tag_size().
- */
-int bio_integrity_get_tag(struct bio *bio, void *tag_buf, unsigned int len)
-{
-	BUG_ON(bio_data_dir(bio) != READ);
-
-	return bio_integrity_tag(bio, tag_buf, len, 0);
-}
-EXPORT_SYMBOL(bio_integrity_get_tag);
-
-/**
  * bio_integrity_generate - Generate integrity metadata for a bio
  * @bio:	bio to generate integrity metadata for
  *
@@ -427,7 +342,6 @@ int bio_integrity_prep(struct bio *bio)
 
 	return 0;
 }
-EXPORT_SYMBOL(bio_integrity_prep);
 
 /**
  * bio_integrity_verify - Verify integrity metadata for a bio
@@ -510,7 +424,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
  * in process context.	This function postpones completion
  * accordingly.
  */
-void bio_integrity_endio(struct bio *bio, int error)
+static void bio_integrity_endio(struct bio *bio, int error)
 {
 	struct bio_integrity_payload *bip = bio->bi_integrity;
 
@@ -530,14 +444,13 @@ void bio_integrity_endio(struct bio *bio, int error)
 	INIT_WORK(&bip->bip_work, bio_integrity_verify_fn);
 	queue_work(kintegrityd_wq, &bip->bip_work);
 }
-EXPORT_SYMBOL(bio_integrity_endio);
 
 /**
  * bio_integrity_mark_head - Advance bip_vec skip bytes
  * @bip:	Integrity vector to advance
  * @skip:	Number of bytes to advance it
  */
-void bio_integrity_mark_head(struct bio_integrity_payload *bip,
+static void bio_integrity_mark_head(struct bio_integrity_payload *bip,
 			     unsigned int skip)
 {
 	struct bio_vec *iv;
@@ -563,7 +476,7 @@ void bio_integrity_mark_head(struct bio_integrity_payload *bip,
  * @bip:	Integrity vector to truncate
  * @len:	New length of integrity vector
  */
-void bio_integrity_mark_tail(struct bio_integrity_payload *bip,
+static void bio_integrity_mark_tail(struct bio_integrity_payload *bip,
 			     unsigned int len)
 {
 	struct bio_vec *iv;
@@ -603,7 +516,6 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
 	nr_sectors = bio_integrity_hw_sectors(bi, bytes_done >> 9);
 	bio_integrity_mark_head(bip, nr_sectors * bi->tuple_size);
 }
-EXPORT_SYMBOL(bio_integrity_advance);
 
 /**
  * bio_integrity_trim - Trim integrity vector
@@ -676,7 +588,6 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
 	bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
 	bp->bip1.bip_idx = bp->bip2.bip_idx = 0;
 }
-EXPORT_SYMBOL(bio_integrity_split);
 
 /**
  * bio_integrity_clone - Callback for cloning bios with integrity metadata
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 060ff69..4ff2d82 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -576,14 +576,9 @@ struct biovec_slab {
 
 #define bio_integrity(bio) (bio->bi_integrity != NULL)
 
-extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
 extern void bio_integrity_free(struct bio *);
-extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
 extern int bio_integrity_enabled(struct bio *bio);
-extern int bio_integrity_set_tag(struct bio *, void *, unsigned int);
-extern int bio_integrity_get_tag(struct bio *, void *, unsigned int);
 extern int bio_integrity_prep(struct bio *);
-extern void bio_integrity_endio(struct bio *, int);
 extern void bio_integrity_advance(struct bio *, unsigned int);
 extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
 extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1b135d4..d6db54e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1430,14 +1430,10 @@ struct blk_integrity_exchg {
 
 typedef void (integrity_gen_fn) (struct blk_integrity_exchg *);
 typedef int (integrity_vrfy_fn) (struct blk_integrity_exchg *);
-typedef void (integrity_set_tag_fn) (void *, void *, unsigned int);
-typedef void (integrity_get_tag_fn) (void *, void *, unsigned int);
 
 struct blk_integrity {
 	integrity_gen_fn	*generate_fn;
 	integrity_vrfy_fn	*verify_fn;
-	integrity_set_tag_fn	*set_tag_fn;
-	integrity_get_tag_fn	*get_tag_fn;
 
 	unsigned short		flags;
 	unsigned short		tuple_size;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: status of block-integrity
  2013-12-23 13:35 ` Martin K. Petersen
  2013-12-23 13:48   ` Christoph Hellwig
@ 2013-12-31 19:41   ` berthiaume, wayne
  2014-01-07  8:28   ` Ric Wheeler
  2 siblings, 0 replies; 24+ messages in thread
From: berthiaume, wayne @ 2013-12-31 19:41 UTC (permalink / raw
  To: Martin K. Petersen
  Cc: Jens Axboe, linux-kernel@vger.kernel.org, Christoph Hellwig,
	linux-scsi@vger.kernel.org

You can add there are SCSI targets out there employing this feature as well. =;^)

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Martin K. Petersen
Sent: Monday, December 23, 2013 8:35 AM
To: Christoph Hellwig
Cc: Jens Axboe; linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org
Subject: Re: status of block-integrity

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> We have the block integrity code to support DIF/DIX in the 
Christoph> the tree for about 5 and a half years, and we still don't 
Christoph> have a single consumer of it.

What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp) then integrity protection is active from the block layer down. The only code that's not currently being exercised are the tag interleaving functions.  I was hoping the FS people would use them for back pointers but nobody seemed to bite.


Christoph> Given that we'll have a lot of work to do in this area with 
Christoph> block multiqueue I think it's time to either kill it off for 
Christoph> good or make sure we can actually use and test it.

I don't understand why multiqueue would require a lot of work? It's just an extra scatterlist per request.

And obviously, if there's anything that needs to be done in this area I'll be happy to do so...

-- 
Martin K. Petersen	Oracle Linux Engineering

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2013-12-22 19:21 status of block-integrity Christoph Hellwig
  2013-12-22 20:45 ` Nicholas A. Bellinger
  2013-12-23 13:35 ` Martin K. Petersen
@ 2014-01-03 15:01 ` Hannes Reinecke
  2014-01-03 20:03   ` Martin K. Petersen
  2 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-03 15:01 UTC (permalink / raw
  To: Christoph Hellwig, Martin K. Petersen, Jens Axboe
  Cc: linux-kernel, linux-scsi

On 12/22/2013 08:21 PM, Christoph Hellwig wrote:
> We have the block integrity code to support DIF/DIX in the the tree for
> about 5 and a half years, and we still don't have a single consumer of
> it.  By normal kernel rules it should never have been merged, or at
> least the bitrot long removed.
>
> Given that we'll have a lot of work to do in this area with block
> multiqueue I think it's time to either kill it off for good or make sure
> we can actually use and test it.

Which would make an ideal topic for LSF, wouldn't it?

Personally, I doubt it's a good idea to kill it off, but
a proper (userland) API for it has been a long time missing.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-03 15:01 ` Hannes Reinecke
@ 2014-01-03 20:03   ` Martin K. Petersen
  2014-01-07  1:36     ` Darrick J. Wong
  0 siblings, 1 reply; 24+ messages in thread
From: Martin K. Petersen @ 2014-01-03 20:03 UTC (permalink / raw
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Martin K. Petersen, Jens Axboe, linux-kernel,
	linux-scsi, Darrick J. Wong

>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:

Hannes> Personally, I doubt it's a good idea to kill it off, but a
Hannes> proper (userland) API for it has been a long time missing.

Before we throw the baby out with the bath water, maybe Darrick can fill
us in on the progress of the aio passthrough interface?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-03 20:03   ` Martin K. Petersen
@ 2014-01-07  1:36     ` Darrick J. Wong
  2014-01-07  7:17         ` Hannes Reinecke
  2014-01-07 15:06       ` Chuck Lever
  0 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2014-01-07  1:36 UTC (permalink / raw
  To: Martin K. Petersen, chuck.lever
  Cc: Hannes Reinecke, Christoph Hellwig, Jens Axboe, linux-kernel,
	linux-scsi

On Fri, Jan 03, 2014 at 03:03:42PM -0500, Martin K. Petersen wrote:
> >>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
> 
> Hannes> Personally, I doubt it's a good idea to kill it off, but a
> Hannes> proper (userland) API for it has been a long time missing.
> 
> Before we throw the baby out with the bath water, maybe Darrick can fill
> us in on the progress of the aio passthrough interface?

I haven't made much progress on it -- I haven't seen any earnest demand for it.

Last year Chuck Lever said that some NFS working group was looking defining an
interface it... has there been any progress?  It doesn't sound like there has
been.

--D
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07  1:36     ` Darrick J. Wong
@ 2014-01-07  7:17         ` Hannes Reinecke
  2014-01-07 15:06       ` Chuck Lever
  1 sibling, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-07  7:17 UTC (permalink / raw
  To: Darrick J. Wong, Martin K. Petersen, chuck.lever
  Cc: Christoph Hellwig, Jens Axboe, linux-kernel, linux-scsi

On 01/07/2014 02:36 AM, Darrick J. Wong wrote:
> On Fri, Jan 03, 2014 at 03:03:42PM -0500, Martin K. Petersen wrote:
>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
>>
>> Hannes> Personally, I doubt it's a good idea to kill it off, but a
>> Hannes> proper (userland) API for it has been a long time missing.
>>
>> Before we throw the baby out with the bath water, maybe Darrick can fill
>> us in on the progress of the aio passthrough interface?
> 
> I haven't made much progress on it -- I haven't seen any earnest demand for it.
> 
Of course not. Who should be demanding it? Application developer
tend to code to existing interfaces.

> Last year Chuck Lever said that some NFS working group was looking defining an
> interface it... has there been any progress?  It doesn't sound like there has
> been.
> 
Well, the point is that without any defined (userland) interface
it's quite hard to pursuade any userland application developer to
use it.

Plus (as hch rightly pointed out) as there is no defined userland
interface the question is why we bother with all the DIX stuff
in the block layer.
DIF support would be perfectly sufficient to cover any connectivity
issues.

And one feels _really_ silly trying to convince customers from the
benefits of DIX they can't even use.
I've tried that several times, and it doesn't get better over time...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
@ 2014-01-07  7:17         ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-07  7:17 UTC (permalink / raw
  To: Darrick J. Wong, Martin K. Petersen, chuck.lever
  Cc: Christoph Hellwig, Jens Axboe, linux-kernel, linux-scsi

On 01/07/2014 02:36 AM, Darrick J. Wong wrote:
> On Fri, Jan 03, 2014 at 03:03:42PM -0500, Martin K. Petersen wrote:
>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
>>
>> Hannes> Personally, I doubt it's a good idea to kill it off, but a
>> Hannes> proper (userland) API for it has been a long time missing.
>>
>> Before we throw the baby out with the bath water, maybe Darrick can fill
>> us in on the progress of the aio passthrough interface?
> 
> I haven't made much progress on it -- I haven't seen any earnest demand for it.
> 
Of course not. Who should be demanding it? Application developer
tend to code to existing interfaces.

> Last year Chuck Lever said that some NFS working group was looking defining an
> interface it... has there been any progress?  It doesn't sound like there has
> been.
> 
Well, the point is that without any defined (userland) interface
it's quite hard to pursuade any userland application developer to
use it.

Plus (as hch rightly pointed out) as there is no defined userland
interface the question is why we bother with all the DIX stuff
in the block layer.
DIF support would be perfectly sufficient to cover any connectivity
issues.

And one feels _really_ silly trying to convince customers from the
benefits of DIX they can't even use.
I've tried that several times, and it doesn't get better over time...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2013-12-23 13:35 ` Martin K. Petersen
  2013-12-23 13:48   ` Christoph Hellwig
  2013-12-31 19:41   ` berthiaume, wayne
@ 2014-01-07  8:28   ` Ric Wheeler
  2014-01-07 13:33       ` Hannes Reinecke
  2 siblings, 1 reply; 24+ messages in thread
From: Ric Wheeler @ 2014-01-07  8:28 UTC (permalink / raw
  To: Martin K. Petersen, Christoph Hellwig
  Cc: Jens Axboe, linux-kernel, linux-scsi, Linux FS Devel

On 12/23/2013 09:35 PM, Martin K. Petersen wrote:
>>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
> Christoph> We have the block integrity code to support DIF/DIX in the
> Christoph> the tree for about 5 and a half years, and we still don't
> Christoph> have a single consumer of it.
>
> What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
> then integrity protection is active from the block layer down. The only
> code that's not currently being exercised are the tag interleaving
> functions.  I was hoping the FS people would use them for back pointers
> but nobody seemed to bite.
>
>
> Christoph> Given that we'll have a lot of work to do in this area with
> Christoph> block multiqueue I think it's time to either kill it off for
> Christoph> good or make sure we can actually use and test it.
>
> I don't understand why multiqueue would require a lot of work? It's just
> an extra scatterlist per request.
>
> And obviously, if there's anything that needs to be done in this area
> I'll be happy to do so...
>

One of the major knocks on linux file systems (except for btrfs) that I hear is 
the lack of full data path checksums. DIF/DIX + xfs or ext4 done right will give 
us another answer here.  I don't think it will be common, it is a request that 
comes in for very large storage customers most commonly.

We do have devices that support this and are working to get more vendor testing 
done, so I would hate to see us throw out the code instead of fixing it up for 
the end users that see value here.

I think that we can get this working & agree with the call to continue this 
discussion (here and at LSF :))

Ric


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07  8:28   ` Ric Wheeler
@ 2014-01-07 13:33       ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-07 13:33 UTC (permalink / raw
  To: Ric Wheeler, Martin K. Petersen, Christoph Hellwig
  Cc: Jens Axboe, linux-kernel, linux-scsi, Linux FS Devel

On 01/07/2014 09:28 AM, Ric Wheeler wrote:
> On 12/23/2013 09:35 PM, Martin K. Petersen wrote:
>>>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
>> Christoph> We have the block integrity code to support DIF/DIX in the
>> Christoph> the tree for about 5 and a half years, and we still don't
>> Christoph> have a single consumer of it.
>>
>> What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
>> then integrity protection is active from the block layer down. The
>> only
>> code that's not currently being exercised are the tag interleaving
>> functions.  I was hoping the FS people would use them for back
>> pointers
>> but nobody seemed to bite.
>>
>>
>> Christoph> Given that we'll have a lot of work to do in this area
>> with
>> Christoph> block multiqueue I think it's time to either kill it
>> off for
>> Christoph> good or make sure we can actually use and test it.
>>
>> I don't understand why multiqueue would require a lot of work?
>> It's just
>> an extra scatterlist per request.
>>
>> And obviously, if there's anything that needs to be done in this area
>> I'll be happy to do so...
>>
> 
> One of the major knocks on linux file systems (except for btrfs)
> that I hear is the lack of full data path checksums. DIF/DIX + xfs
> or ext4 done right will give us another answer here.  I don't think
> it will be common, it is a request that comes in for very large
> storage customers most commonly.
> 
> We do have devices that support this and are working to get more
> vendor testing done, so I would hate to see us throw out the code
> instead of fixing it up for the end users that see value here.
> 
> I think that we can get this working & agree with the call to
> continue this discussion (here and at LSF :))
> 
I would indeed like to have a discussion at LSF about the future of
DIX. DIF is not an issue, as most HBAs support it already and we
actually need it for proper connectivity.

DIX, OTOH, has been left dormant since time immemorial, with the
only known (supposed) user being Oracle.
(I actually talked to the DB/2 folks about it, and the response
was a polite feigned interest ...)

We need to come up with a concise story here (either integrate with
filesystems or have a userland interface), otherwise it's just dead
code and indeed should be removed.

Plus so far I've had exactly _one_ request for DIX, and even that
came from a company which has its own custom storage array firmware.
Making me wonder if DIF/DIX is really that important or more of an
tick-mark during procuring ...

Even so, it would warrant a discussion at LSF.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
@ 2014-01-07 13:33       ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-07 13:33 UTC (permalink / raw
  To: Ric Wheeler, Martin K. Petersen, Christoph Hellwig
  Cc: Jens Axboe, linux-kernel, linux-scsi, Linux FS Devel

On 01/07/2014 09:28 AM, Ric Wheeler wrote:
> On 12/23/2013 09:35 PM, Martin K. Petersen wrote:
>>>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
>> Christoph> We have the block integrity code to support DIF/DIX in the
>> Christoph> the tree for about 5 and a half years, and we still don't
>> Christoph> have a single consumer of it.
>>
>> What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
>> then integrity protection is active from the block layer down. The
>> only
>> code that's not currently being exercised are the tag interleaving
>> functions.  I was hoping the FS people would use them for back
>> pointers
>> but nobody seemed to bite.
>>
>>
>> Christoph> Given that we'll have a lot of work to do in this area
>> with
>> Christoph> block multiqueue I think it's time to either kill it
>> off for
>> Christoph> good or make sure we can actually use and test it.
>>
>> I don't understand why multiqueue would require a lot of work?
>> It's just
>> an extra scatterlist per request.
>>
>> And obviously, if there's anything that needs to be done in this area
>> I'll be happy to do so...
>>
> 
> One of the major knocks on linux file systems (except for btrfs)
> that I hear is the lack of full data path checksums. DIF/DIX + xfs
> or ext4 done right will give us another answer here.  I don't think
> it will be common, it is a request that comes in for very large
> storage customers most commonly.
> 
> We do have devices that support this and are working to get more
> vendor testing done, so I would hate to see us throw out the code
> instead of fixing it up for the end users that see value here.
> 
> I think that we can get this working & agree with the call to
> continue this discussion (here and at LSF :))
> 
I would indeed like to have a discussion at LSF about the future of
DIX. DIF is not an issue, as most HBAs support it already and we
actually need it for proper connectivity.

DIX, OTOH, has been left dormant since time immemorial, with the
only known (supposed) user being Oracle.
(I actually talked to the DB/2 folks about it, and the response
was a polite feigned interest ...)

We need to come up with a concise story here (either integrate with
filesystems or have a userland interface), otherwise it's just dead
code and indeed should be removed.

Plus so far I've had exactly _one_ request for DIX, and even that
came from a company which has its own custom storage array firmware.
Making me wonder if DIF/DIX is really that important or more of an
tick-mark during procuring ...

Even so, it would warrant a discussion at LSF.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07  1:36     ` Darrick J. Wong
  2014-01-07  7:17         ` Hannes Reinecke
@ 2014-01-07 15:06       ` Chuck Lever
  1 sibling, 0 replies; 24+ messages in thread
From: Chuck Lever @ 2014-01-07 15:06 UTC (permalink / raw
  To: Darrick J. Wong
  Cc: Martin K. Petersen, Hannes Reinecke, Christoph Hellwig,
	Jens Axboe, LKML Kernel, linux-scsi


On Jan 6, 2014, at 8:36 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:

> On Fri, Jan 03, 2014 at 03:03:42PM -0500, Martin K. Petersen wrote:
>>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
>> 
>> Hannes> Personally, I doubt it's a good idea to kill it off, but a
>> Hannes> proper (userland) API for it has been a long time missing.
>> 
>> Before we throw the baby out with the bath water, maybe Darrick can fill
>> us in on the progress of the aio passthrough interface?
> 
> I haven't made much progress on it -- I haven't seen any earnest demand for it.
> 
> Last year Chuck Lever said that some NFS working group was looking defining an
> interface it... has there been any progress?  It doesn't sound like there has
> been.

You must be thinking of some other Chuck Lever ;-)

What I promised to deliver was plumbing in the NFS protocol to support end-to-end data integrity.  That's here:

  https://datatracker.ietf.org/doc/draft-cel-nfsv4-end2end-data-protection/

The issue of system call API is as sticky for NFS as it is for other e2e implementations.  Without an API, other projects have been allowed to take up the time I would have spent on an NFS prototype.

What's more, some of the fields in the T10 tag are meaningless for byte-stream: an application, for example, will know nothing of block addresses, since those are chosen by the underlying filesystem.  NFS (or any byte stream e2e integrity API) will have to define a different protection envelope with perhaps different tag contents.

But we do have a range of potential use cases for NFS: hypervisors that emulate block devices using NFS files, strong cryptographic data checksums, and of course the Oracle database.  No real demand, as others have said.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07  7:17         ` Hannes Reinecke
  (?)
@ 2014-01-07 21:43         ` Martin K. Petersen
  2014-01-08  7:14             ` Hannes Reinecke
  -1 siblings, 1 reply; 24+ messages in thread
From: Martin K. Petersen @ 2014-01-07 21:43 UTC (permalink / raw
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Martin K. Petersen, chuck.lever,
	Christoph Hellwig, Jens Axboe, linux-kernel, linux-scsi

>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:

Hannes> Plus (as hch rightly pointed out) as there is no defined
Hannes> userland interface the question is why we bother with all the
Hannes> DIX stuff in the block layer.  

Because it catches problems in the path between block layer and HBA
ASIC? FWIW, we find more issues there than we do between initiator and
target.

API issues aside, another reason adoption has been slow is that very few
applications truly care about this stuff. The current approach in which
data is protected when the I/O is submitted by the filesystem is good
enough for most things. Saves the filesystem people the trouble of
dealing with it too.

In reality there are only a handful of applications that would actually
benefit from an explicit userland API. Mostly in the database
department. All the potential consumers of an interface I talked to
wanted to use aio so that's why we've focused our efforts there.

Both Darrick and I have been busy with other projects the last little
while. I'll start looking at this again when I'm done with copy
offload...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07 13:33       ` Hannes Reinecke
  (?)
@ 2014-01-07 23:34       ` Matthew Wilcox
  2014-01-08  0:05         ` James Bottomley
  -1 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2014-01-07 23:34 UTC (permalink / raw
  To: Hannes Reinecke
  Cc: Ric Wheeler, Martin K. Petersen, Christoph Hellwig, Jens Axboe,
	linux-kernel, linux-scsi, Linux FS Devel

On Tue, Jan 07, 2014 at 02:33:10PM +0100, Hannes Reinecke wrote:
> I would indeed like to have a discussion at LSF about the future of
> DIX. DIF is not an issue, as most HBAs support it already and we
> actually need it for proper connectivity.
> 
> DIX, OTOH, has been left dormant since time immemorial, with the
> only known (supposed) user being Oracle.
> (I actually talked to the DB/2 folks about it, and the response
> was a polite feigned interest ...)

I think there's a terminology confusion here; you seem to be using DIX
to mean the TCP CRC and DIF to mean T10DIF.  I've seen other people use
DIX to mean separate SGLs for metadata and DIF to mean interleaved data.
Can you confirm which thing you mean here?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07 23:34       ` Matthew Wilcox
@ 2014-01-08  0:05         ` James Bottomley
  2014-01-08 15:43           ` Martin K. Petersen
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2014-01-08  0:05 UTC (permalink / raw
  To: Matthew Wilcox
  Cc: Hannes Reinecke, Ric Wheeler, Martin K. Petersen,
	Christoph Hellwig, Jens Axboe, linux-kernel, linux-scsi,
	Linux FS Devel

On Tue, 2014-01-07 at 16:34 -0700, Matthew Wilcox wrote:
> On Tue, Jan 07, 2014 at 02:33:10PM +0100, Hannes Reinecke wrote:
> > I would indeed like to have a discussion at LSF about the future of
> > DIX. DIF is not an issue, as most HBAs support it already and we
> > actually need it for proper connectivity.
> > 
> > DIX, OTOH, has been left dormant since time immemorial, with the
> > only known (supposed) user being Oracle.
> > (I actually talked to the DB/2 folks about it, and the response
> > was a polite feigned interest ...)
> 
> I think there's a terminology confusion here; you seem to be using DIX
> to mean the TCP CRC and DIF to mean T10DIF.  I've seen other people use
> DIX to mean separate SGLs for metadata and DIF to mean interleaved data.
> Can you confirm which thing you mean here?

No, I think you're confusing algorithms with protocols.  DIF and DIX are
two names for protection envelopes.  DIF verifies integrity from the HBA
to the device surface.  DIX verifies integrity from an application to
the HBA.  Both DIF and DIX have pluggable checksum algorithms (and, in
theory, as long as the HBA does the conversion, they don't have to use
the same one, although the confusion over protection types and
algorithms is so dense already that the only way not to go insane is to
use the same end to end one).  Oracle has the best data sources to
explain this, including Martin's slides:

https://oss.oracle.com/projects/data-integrity/documentation/

The specific problem is that there's no defined interface for any
application to use DIX easily because it has to supply additional
protection information when it reads or writes data and there's no
agreed way to extend read/write to do this and, as Martin has said,
thinging about trying to do this with mmap leads to a "bonghit bonanza".

So, the question is do we need to bother with DIX at all?  No filesystem
uses it and there seems to be weak user demand at best.  We could just
strip DIX, losing the protection envelope from the application to the
HBA but keeping DIF, which is the protection envelope from the HBA to
the device.  

James



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-07 21:43         ` Martin K. Petersen
@ 2014-01-08  7:14             ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-08  7:14 UTC (permalink / raw
  To: Martin K. Petersen
  Cc: Darrick J. Wong, chuck.lever, Christoph Hellwig, Jens Axboe,
	linux-kernel, linux-scsi

On 01/07/2014 10:43 PM, Martin K. Petersen wrote:
>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
> 
> Hannes> Plus (as hch rightly pointed out) as there is no defined
> Hannes> userland interface the question is why we bother with all the
> Hannes> DIX stuff in the block layer.  
> 
> Because it catches problems in the path between block layer and HBA
> ASIC? FWIW, we find more issues there than we do between initiator and
> target.
> 
But how should it do that exactly?
As there is no user (apart from oracleasm) no-one can attach
protection information to any data, so even the most dedicated admin
cannot exercise this path, let alone find issues here.

> API issues aside, another reason adoption has been slow is that very few
> applications truly care about this stuff. The current approach in which
> data is protected when the I/O is submitted by the filesystem is good
> enough for most things. Saves the filesystem people the trouble of
> dealing with it too.
> 
> In reality there are only a handful of applications that would actually
> benefit from an explicit userland API. Mostly in the database
> department. All the potential consumers of an interface I talked to
> wanted to use aio so that's why we've focused our efforts there.
> 
aio is perfectly fine; all I care is to have _any_ way of feeding
protection information into the kernel.

> Both Darrick and I have been busy with other projects the last little
> while. I'll start looking at this again when I'm done with copy
> offload...
> 
Speaking of which, are there any patches?
Doug Gilbert and I are currently discussing LID4 / ROD Token copy
for sg3_utils and the block layer, so any patches would be very
helpful here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
@ 2014-01-08  7:14             ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-08  7:14 UTC (permalink / raw
  To: Martin K. Petersen
  Cc: Darrick J. Wong, chuck.lever, Christoph Hellwig, Jens Axboe,
	linux-kernel, linux-scsi

On 01/07/2014 10:43 PM, Martin K. Petersen wrote:
>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
> 
> Hannes> Plus (as hch rightly pointed out) as there is no defined
> Hannes> userland interface the question is why we bother with all the
> Hannes> DIX stuff in the block layer.  
> 
> Because it catches problems in the path between block layer and HBA
> ASIC? FWIW, we find more issues there than we do between initiator and
> target.
> 
But how should it do that exactly?
As there is no user (apart from oracleasm) no-one can attach
protection information to any data, so even the most dedicated admin
cannot exercise this path, let alone find issues here.

> API issues aside, another reason adoption has been slow is that very few
> applications truly care about this stuff. The current approach in which
> data is protected when the I/O is submitted by the filesystem is good
> enough for most things. Saves the filesystem people the trouble of
> dealing with it too.
> 
> In reality there are only a handful of applications that would actually
> benefit from an explicit userland API. Mostly in the database
> department. All the potential consumers of an interface I talked to
> wanted to use aio so that's why we've focused our efforts there.
> 
aio is perfectly fine; all I care is to have _any_ way of feeding
protection information into the kernel.

> Both Darrick and I have been busy with other projects the last little
> while. I'll start looking at this again when I'm done with copy
> offload...
> 
Speaking of which, are there any patches?
Doug Gilbert and I are currently discussing LID4 / ROD Token copy
for sg3_utils and the block layer, so any patches would be very
helpful here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-08  7:14             ` Hannes Reinecke
  (?)
@ 2014-01-08 15:23             ` Martin K. Petersen
  2014-01-09 11:19                 ` Hannes Reinecke
  -1 siblings, 1 reply; 24+ messages in thread
From: Martin K. Petersen @ 2014-01-08 15:23 UTC (permalink / raw
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Darrick J. Wong, chuck.lever,
	Christoph Hellwig, Jens Axboe, linux-kernel, linux-scsi

>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:

Hannes,

Hannes> As there is no user (apart from oracleasm) no-one can attach
Hannes> protection information to any data, so even the most dedicated
Hannes> admin cannot exercise this path, let alone find issues here.

That's not how it works!

If the filesystem has not attached protection information to a bio the
block layer will do it for you. The block layer generates protection
information for writes and verifies it for reads. That's how it's worked
since day one. The code is there, it is used by everyone with a
DIX-capable HBA. See Documentation/block/data-integrity.txt.

Normal applications do not want to have to deal with generating
protection information, using an async I/O model, keeping completion
state around for extended periods of time to figure out whether the I/O
actually completed or not and so on. So the kernel-to-platter protection
scheme we have in place now is good enough.

That doesn't mean that I'm not interested in augmenting libaio. I
am. Very. And I know of several applications that are keen to use
it. But getting page cache passthrough and filesystem interaction
working is non-trivial. That's what has inhibited progress, not
extending the libaio API.

Hannes> Doug Gilbert and I are currently discussing LID4 / ROD Token
Hannes> copy for sg3_utils and the block layer, so any patches would be
Hannes> very helpful here.

I'm only doing LID1 right now. Any particular reason you are exploring
LID4 and ROD?

I resumed my efforts before Christmas but I keep running into issues.
I'm guessing I'm a week or two from having something that is suitable
for consumption.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-08  0:05         ` James Bottomley
@ 2014-01-08 15:43           ` Martin K. Petersen
  0 siblings, 0 replies; 24+ messages in thread
From: Martin K. Petersen @ 2014-01-08 15:43 UTC (permalink / raw
  To: James Bottomley
  Cc: Matthew Wilcox, Hannes Reinecke, Ric Wheeler, Christoph Hellwig,
	Jens Axboe, linux-kernel, linux-scsi, Linux FS Devel

>>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:

James> No, I think you're confusing algorithms with protocols.  DIF and
James> DIX are two names for protection envelopes.  DIF verifies
James> integrity from the HBA to the device surface.  DIX verifies
James> integrity from an application to the HBA. 

Actually, DIX is a data integrity-aware HBA programming interface. We
have an implementation of that interface in the SCSI layer and in some
of the initiator drivers (lpfc, qla2xxx, mptNsas).

There is no single name for stuff above DIX. Other than "block layer
data integrity goo", "page cache black magic" and "let's add a few
fields to struct iocb".

James> So, the question is do we need to bother with DIX at all?  No
James> filesystem uses it 

...explicitly. Every filesystem uses it implicitly. There are only two
reasons for filesystems to want to be explicitly "block layer data
integrity goo"-aware:

     1. To be able to use the application tag space for back pointers or
        other metadata without requiring disk format changes.

     2. To facilitate passthrough of protection information submitted
        via the $TBD application programming interface.

I was hoping the extN folks would be interested in (1) but there were no
takers. (2) is hard but not forgotten. In any case the status quo is
that there is no point in filesystems manually generating protection
information when the block layer is going to do it for them when the bio
is submitted.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-08 15:23             ` Martin K. Petersen
@ 2014-01-09 11:19                 ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-09 11:19 UTC (permalink / raw
  To: Martin K. Petersen
  Cc: Darrick J. Wong, chuck.lever, Christoph Hellwig, Jens Axboe,
	linux-kernel, linux-scsi, Doug Gilbert

On 01/08/2014 04:23 PM, Martin K. Petersen wrote:
>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
>
> Hannes,
>
> Hannes> As there is no user (apart from oracleasm) no-one can attach
> Hannes> protection information to any data, so even the most dedicated
> Hannes> admin cannot exercise this path, let alone find issues here.
>
> That's not how it works!
>
> If the filesystem has not attached protection information to a bio the
> block layer will do it for you. The block layer generates protection
> information for writes and verifies it for reads. That's how it's worked
> since day one. The code is there, it is used by everyone with a
> DIX-capable HBA. See Documentation/block/data-integrity.txt.
>
> Normal applications do not want to have to deal with generating
> protection information, using an async I/O model, keeping completion
> state around for extended periods of time to figure out whether the I/O
> actually completed or not and so on. So the kernel-to-platter protection
> scheme we have in place now is good enough.
>
Ah. I stand corrected.
Sorry.

> That doesn't mean that I'm not interested in augmenting libaio. I
> am. Very. And I know of several applications that are keen to use
> it. But getting page cache passthrough and filesystem interaction
> working is non-trivial. That's what has inhibited progress, not
> extending the libaio API.
>
Same here. Actually I _like_ DIX, but the missing libaio / userland API 
makes it very hard to utilize it.

> Hannes> Doug Gilbert and I are currently discussing LID4 / ROD Token
> Hannes> copy for sg3_utils and the block layer, so any patches would be
> Hannes> very helpful here.
>
> I'm only doing LID1 right now. Any particular reason you are exploring
> LID4 and ROD?
>
Yes. LID4/ROD token is far easier to use (conceptually).
With LID1/XCopy you have the ambiguity on where to actually send the 
command to; the spec is silent in this area.
Also for LID1/XCopy you have three steps:
- Query the source disk
- Query the target disk
- Send the command to either source or target
Which is very awkward and one has to think really carefully on how to 
implement this without all sorts of layering violations.
And you have to mix-and-match between all the various xcopy descriptors;
if there's only one it's easy enough, but when several things are 
getting interesting.

With LID4/ROD Token copy you basically have _two_ steps:
- Get the ROD Token from the source device
- Send the ROD Token to the target device

Which is far easier (conceptually).
Or that's the hope, anyway.
Also, the ROD Token in principle has an independent lifetime, so
you could take an arbitrary amount of time between those steps.
It might expire, though, but then failure is always an option when
working with copy offload.

As said, Doug and me are working on putting this into sg3_utils, then
we'll have a better idea on the actual workings.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
@ 2014-01-09 11:19                 ` Hannes Reinecke
  0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2014-01-09 11:19 UTC (permalink / raw
  To: Martin K. Petersen
  Cc: Darrick J. Wong, chuck.lever, Christoph Hellwig, Jens Axboe,
	linux-kernel, linux-scsi, Doug Gilbert

On 01/08/2014 04:23 PM, Martin K. Petersen wrote:
>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
>
> Hannes,
>
> Hannes> As there is no user (apart from oracleasm) no-one can attach
> Hannes> protection information to any data, so even the most dedicated
> Hannes> admin cannot exercise this path, let alone find issues here.
>
> That's not how it works!
>
> If the filesystem has not attached protection information to a bio the
> block layer will do it for you. The block layer generates protection
> information for writes and verifies it for reads. That's how it's worked
> since day one. The code is there, it is used by everyone with a
> DIX-capable HBA. See Documentation/block/data-integrity.txt.
>
> Normal applications do not want to have to deal with generating
> protection information, using an async I/O model, keeping completion
> state around for extended periods of time to figure out whether the I/O
> actually completed or not and so on. So the kernel-to-platter protection
> scheme we have in place now is good enough.
>
Ah. I stand corrected.
Sorry.

> That doesn't mean that I'm not interested in augmenting libaio. I
> am. Very. And I know of several applications that are keen to use
> it. But getting page cache passthrough and filesystem interaction
> working is non-trivial. That's what has inhibited progress, not
> extending the libaio API.
>
Same here. Actually I _like_ DIX, but the missing libaio / userland API 
makes it very hard to utilize it.

> Hannes> Doug Gilbert and I are currently discussing LID4 / ROD Token
> Hannes> copy for sg3_utils and the block layer, so any patches would be
> Hannes> very helpful here.
>
> I'm only doing LID1 right now. Any particular reason you are exploring
> LID4 and ROD?
>
Yes. LID4/ROD token is far easier to use (conceptually).
With LID1/XCopy you have the ambiguity on where to actually send the 
command to; the spec is silent in this area.
Also for LID1/XCopy you have three steps:
- Query the source disk
- Query the target disk
- Send the command to either source or target
Which is very awkward and one has to think really carefully on how to 
implement this without all sorts of layering violations.
And you have to mix-and-match between all the various xcopy descriptors;
if there's only one it's easy enough, but when several things are 
getting interesting.

With LID4/ROD Token copy you basically have _two_ steps:
- Get the ROD Token from the source device
- Send the ROD Token to the target device

Which is far easier (conceptually).
Or that's the hope, anyway.
Also, the ROD Token in principle has an independent lifetime, so
you could take an arbitrary amount of time between those steps.
It might expire, though, but then failure is always an option when
working with copy offload.

As said, Doug and me are working on putting this into sg3_utils, then
we'll have a better idea on the actual workings.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: status of block-integrity
  2014-01-09 11:19                 ` Hannes Reinecke
  (?)
@ 2014-01-10  1:49                 ` Martin K. Petersen
  -1 siblings, 0 replies; 24+ messages in thread
From: Martin K. Petersen @ 2014-01-10  1:49 UTC (permalink / raw
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Darrick J. Wong, chuck.lever,
	Christoph Hellwig, Jens Axboe, linux-kernel, linux-scsi,
	Doug Gilbert

>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:

Hannes> With LID1/XCopy you have the ambiguity on where to actually send
Hannes> the command to; the spec is silent in this area.

Yeah, right now it's a coin toss.

However, thanks to VAAI most arrays support LID1. I'm trying to leverage
that. Doesn't in any way preclude LID4 being supported as well.

Hannes> As said, Doug and me are working on putting this into sg3_utils,
Hannes> then we'll have a better idea on the actual workings.

Cool! That'll save me some headaches when we start seeing LID4 devices.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-01-10  1:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-22 19:21 status of block-integrity Christoph Hellwig
2013-12-22 20:45 ` Nicholas A. Bellinger
2013-12-23 13:35 ` Martin K. Petersen
2013-12-23 13:48   ` Christoph Hellwig
2013-12-31 19:41   ` berthiaume, wayne
2014-01-07  8:28   ` Ric Wheeler
2014-01-07 13:33     ` Hannes Reinecke
2014-01-07 13:33       ` Hannes Reinecke
2014-01-07 23:34       ` Matthew Wilcox
2014-01-08  0:05         ` James Bottomley
2014-01-08 15:43           ` Martin K. Petersen
2014-01-03 15:01 ` Hannes Reinecke
2014-01-03 20:03   ` Martin K. Petersen
2014-01-07  1:36     ` Darrick J. Wong
2014-01-07  7:17       ` Hannes Reinecke
2014-01-07  7:17         ` Hannes Reinecke
2014-01-07 21:43         ` Martin K. Petersen
2014-01-08  7:14           ` Hannes Reinecke
2014-01-08  7:14             ` Hannes Reinecke
2014-01-08 15:23             ` Martin K. Petersen
2014-01-09 11:19               ` Hannes Reinecke
2014-01-09 11:19                 ` Hannes Reinecke
2014-01-10  1:49                 ` Martin K. Petersen
2014-01-07 15:06       ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.