All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] nvme: temporary fix for Apple controller reset
@ 2015-11-22  2:28 Stephan Günther
  2015-11-22  8:56 ` Christoph Hellwig
  2015-12-01 19:46 ` Stephan Günther
  0 siblings, 2 replies; 8+ messages in thread
From: Stephan Günther @ 2015-11-22  2:28 UTC (permalink / raw


Recent patches added basic support for the Apple NVMe controller but
still cause resets and data corruption on that particular controller
when a specific pattern of read/flush commands occurs. Limiting the
queue depth to 2 works around that issue.

This patch enforces that limit only for the Apple controller and is
considered a temporary fix until we find the root source of that
problem.

Signed-off-by: Stephan G?nther <guenther at tum.de>
Signed-off-by: Maurice Leclaire <leclaire at in.tum.de>
---
 drivers/nvme/host/pci.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8187df2..d3c2b03 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2701,6 +2701,18 @@ static int nvme_dev_map(struct nvme_dev *dev)
 	dev->q_depth = min_t(int, NVME_CAP_MQES(cap) + 1, NVME_Q_DEPTH);
 	dev->db_stride = 1 << NVME_CAP_STRIDE(cap);
 	dev->dbs = ((void __iomem *)dev->bar) + 4096;
+
+	/*
+ 	 * Temporary fix for the Apple controller found in the MacBook8,1 and
+	 * some MacBook7,1 to avoid controller resets and data loss.
+ 	 */
+	if (pdev->vendor == PCI_VENDOR_ID_APPLE && pdev->device == 0x2001) {
+ 		dev->q_depth = 2;
+		dev_warn(dev->dev, "detected Apple NVMe controller, set "
+			"queue depth=%u to work around controller resets\n",
+			dev->q_depth);
+	}
+
 	if (readl(&dev->bar->vs) >= NVME_VS(1, 2))
 		dev->cmb = nvme_map_cmb(dev);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-11-22  2:28 [PATCH v2] nvme: temporary fix for Apple controller reset Stephan Günther
@ 2015-11-22  8:56 ` Christoph Hellwig
  2015-11-25  0:21   ` Wes Cilldhaire
  2015-12-01 19:46 ` Stephan Günther
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2015-11-22  8:56 UTC (permalink / raw


Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-11-22  8:56 ` Christoph Hellwig
@ 2015-11-25  0:21   ` Wes Cilldhaire
  0 siblings, 0 replies; 8+ messages in thread
From: Wes Cilldhaire @ 2015-11-25  0:21 UTC (permalink / raw



I can confirm this works around the controller reset issue for me on macbook8,1 , and doesn't clobber my linux or other file systems (I have a triple boot setup with osx, win10 and linux).
Previously I was seeing the reset almost immediately due to the high amount of IO systemd seems to create during boot, or after a couple minutes when booting with openrc.  Now both boot without issue all the way to KDE init.  I've tried an fstrim, partprobes, and dd'ing to generate some writes for an extended time all without issue and all file systems appear healthy.

Cheers,
Wes Cilldhaire



----- Original Message -----
From: "Christoph Hellwig" <hch@infradead.org>
To: "Stephan G??nther" <guenther at tum.de>
Cc: "Matthew Wilcox" <willy at linux.intel.com>, linux-nvme at lists.infradead.org, "Jon Derrick" <jonathan.derrick at intel.com>, "Maurice Leclaire" <leclaire at in.tum.de>
Sent: Sunday, 22 November, 2015 7:56:56 PM
Subject: Re: [PATCH v2] nvme: temporary fix for Apple controller reset

Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

_______________________________________________
Linux-nvme mailing list
Linux-nvme at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-11-22  2:28 [PATCH v2] nvme: temporary fix for Apple controller reset Stephan Günther
  2015-11-22  8:56 ` Christoph Hellwig
@ 2015-12-01 19:46 ` Stephan Günther
  2015-12-01 19:56   ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Stephan Günther @ 2015-12-01 19:46 UTC (permalink / raw


The patch below has been reviewed by Christoph and reported to work.
However, there is still no sign that it will be applied to linux-4.4.

Please either undo commit c74dc7801d515d01847fd5cf2b472489fa5717b1,
which added the PCI ID of the Apple controller, or merge the patch below
asap.


Currently, the driver will make that controller destroy data!


Best,
Stephan


On 2015/November/22 03:28, Stephan G?nther wrote:
> Recent patches added basic support for the Apple NVMe controller but
> still cause resets and data corruption on that particular controller
> when a specific pattern of read/flush commands occurs. Limiting the
> queue depth to 2 works around that issue.
> 
> This patch enforces that limit only for the Apple controller and is
> considered a temporary fix until we find the root source of that
> problem.
> 
> Signed-off-by: Stephan G?nther <guenther at tum.de>
> Signed-off-by: Maurice Leclaire <leclaire at in.tum.de>
> ---
>  drivers/nvme/host/pci.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8187df2..d3c2b03 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2701,6 +2701,18 @@ static int nvme_dev_map(struct nvme_dev *dev)
>  	dev->q_depth = min_t(int, NVME_CAP_MQES(cap) + 1, NVME_Q_DEPTH);
>  	dev->db_stride = 1 << NVME_CAP_STRIDE(cap);
>  	dev->dbs = ((void __iomem *)dev->bar) + 4096;
> +
> +	/*
> + 	 * Temporary fix for the Apple controller found in the MacBook8,1 and
> +	 * some MacBook7,1 to avoid controller resets and data loss.
> + 	 */
> +	if (pdev->vendor == PCI_VENDOR_ID_APPLE && pdev->device == 0x2001) {
> + 		dev->q_depth = 2;
> +		dev_warn(dev->dev, "detected Apple NVMe controller, set "
> +			"queue depth=%u to work around controller resets\n",
> +			dev->q_depth);
> +	}
> +
>  	if (readl(&dev->bar->vs) >= NVME_VS(1, 2))
>  		dev->cmb = nvme_map_cmb(dev);
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-12-01 19:46 ` Stephan Günther
@ 2015-12-01 19:56   ` Jens Axboe
  2015-12-01 20:05     ` Stephan Günther
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2015-12-01 19:56 UTC (permalink / raw


On 12/01/2015 12:46 PM, Stephan G?nther wrote:
> The patch below has been reviewed by Christoph and reported to work.
> However, there is still no sign that it will be applied to linux-4.4.
>
> Please either undo commit c74dc7801d515d01847fd5cf2b472489fa5717b1,
> which added the PCI ID of the Apple controller, or merge the patch below
> asap.
>
>
> Currently, the driver will make that controller destroy data!

Honestly, I'd rather revert the pci id addition, unless there's 
conclusive evidence that limiting the (per queue) depth to 2 really does 
fix the issue. Is this what the OSX driver does? What testing was done 
to ascertain that 2 is the magic number? Does it just make it harder to 
hit, or does it really fix it?


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-12-01 19:56   ` Jens Axboe
@ 2015-12-01 20:05     ` Stephan Günther
  2015-12-01 20:10       ` Stephan Günther
  2015-12-01 20:21       ` Jens Axboe
  0 siblings, 2 replies; 8+ messages in thread
From: Stephan Günther @ 2015-12-01 20:05 UTC (permalink / raw


On 2015/December/01 12:56, Jens Axboe wrote:
> On 12/01/2015 12:46 PM, Stephan G?nther wrote:
> >The patch below has been reviewed by Christoph and reported to work.
> >However, there is still no sign that it will be applied to linux-4.4.
> >
> >Please either undo commit c74dc7801d515d01847fd5cf2b472489fa5717b1,
> >which added the PCI ID of the Apple controller, or merge the patch below
> >asap.
> >
> >
> >Currently, the driver will make that controller destroy data!
> 
> Honestly, I'd rather revert the pci id addition, unless there's conclusive
> evidence that limiting the (per queue) depth to 2 really does fix the issue.

Wes Cilldhaire tested the patch (his answer is in that thread) - as I
did the past 3 weeks ago as I'm running solely Linux on that broken 
MacBook.

> Is this what the OSX driver does? What testing was done to ascertain that 2
> is the magic number? Does it just make it harder to hit, or does it really
> fix it?

I do not know what the OSX driver does. Apple is of absolutely no help
whatsoever.

But without that patch `mkfs btrfs` immediately fails. Even `partprobe
/dev/nvme0n1` will reset the controller (the latter one without data
loss).

As I am running a btrfs on a luks container for 3 weeks with that patch,
it *does* work. Performance is, obviously, not the best one I have seen.
But given the circumstances I cannot complain at all.


So again: please either revert the previous patch and leave it to the
more interested individuals to try it (probably slowing down Linux
support for that MacBook as not detecting the hard-soldered disk is
quite a blocker), or merge that hotfix until we find a better solution.

With 4.5 that may even become a quirk.


Best,
Stephan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-12-01 20:05     ` Stephan Günther
@ 2015-12-01 20:10       ` Stephan Günther
  2015-12-01 20:21       ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Stephan Günther @ 2015-12-01 20:10 UTC (permalink / raw


Sorry for the addition regarding testing (see below).

On 2015/December/01 09:05, Stephan G?nther wrote:
> On 2015/December/01 12:56, Jens Axboe wrote:
> > On 12/01/2015 12:46 PM, Stephan G?nther wrote:
> > >The patch below has been reviewed by Christoph and reported to work.
> > >However, there is still no sign that it will be applied to linux-4.4.
> > >
> > >Please either undo commit c74dc7801d515d01847fd5cf2b472489fa5717b1,
> > >which added the PCI ID of the Apple controller, or merge the patch below
> > >asap.
> > >
> > >
> > >Currently, the driver will make that controller destroy data!
> > 
> > Honestly, I'd rather revert the pci id addition, unless there's conclusive
> > evidence that limiting the (per queue) depth to 2 really does fix the issue.
> 
> Wes Cilldhaire tested the patch (his answer is in that thread) - as I
> did the past 3 weeks ago as I'm running solely Linux on that broken 
> MacBook.
> 
> > Is this what the OSX driver does? What testing was done to ascertain that 2
> > is the magic number? Does it just make it harder to hit, or does it really
> > fix it?

Any number larger than 2 immediately triggered the problem for me. With 
per-queue depth = 2 it did not happen for 3 weeks.

Moreover, that Apple controller seems to have a single queue...

Best,
Stephan

> 
> I do not know what the OSX driver does. Apple is of absolutely no help
> whatsoever.
> 
> But without that patch `mkfs btrfs` immediately fails. Even `partprobe
> /dev/nvme0n1` will reset the controller (the latter one without data
> loss).
> 
> As I am running a btrfs on a luks container for 3 weeks with that patch,
> it *does* work. Performance is, obviously, not the best one I have seen.
> But given the circumstances I cannot complain at all.
> 
> 
> So again: please either revert the previous patch and leave it to the
> more interested individuals to try it (probably slowing down Linux
> support for that MacBook as not detecting the hard-soldered disk is
> quite a blocker), or merge that hotfix until we find a better solution.
> 
> With 4.5 that may even become a quirk.
> 
> 
> Best,
> Stephan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] nvme: temporary fix for Apple controller reset
  2015-12-01 20:05     ` Stephan Günther
  2015-12-01 20:10       ` Stephan Günther
@ 2015-12-01 20:21       ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2015-12-01 20:21 UTC (permalink / raw


On 12/01/2015 01:05 PM, Stephan G?nther wrote:
> On 2015/December/01 12:56, Jens Axboe wrote:
>> On 12/01/2015 12:46 PM, Stephan G?nther wrote:
>>> The patch below has been reviewed by Christoph and reported to work.
>>> However, there is still no sign that it will be applied to linux-4.4.
>>>
>>> Please either undo commit c74dc7801d515d01847fd5cf2b472489fa5717b1,
>>> which added the PCI ID of the Apple controller, or merge the patch below
>>> asap.
>>>
>>>
>>> Currently, the driver will make that controller destroy data!
>>
>> Honestly, I'd rather revert the pci id addition, unless there's conclusive
>> evidence that limiting the (per queue) depth to 2 really does fix the issue.
>
> Wes Cilldhaire tested the patch (his answer is in that thread) - as I
> did the past 3 weeks ago as I'm running solely Linux on that broken
> MacBook.
>
>> Is this what the OSX driver does? What testing was done to ascertain that 2
>> is the magic number? Does it just make it harder to hit, or does it really
>> fix it?
>
> I do not know what the OSX driver does. Apple is of absolutely no help
> whatsoever.

That's unfortunately not news... I used to run linux on a macbook, and 
some of the ahci based SSDs would randomly hang if msi interrupts were 
used. We had to quirk around that too, and as far as I know, nobody got 
any response from Apple on that issue either.

> But without that patch `mkfs btrfs` immediately fails. Even `partprobe
> /dev/nvme0n1` will reset the controller (the latter one without data
> loss).
>
> As I am running a btrfs on a luks container for 3 weeks with that patch,
> it *does* work. Performance is, obviously, not the best one I have seen.
> But given the circumstances I cannot complain at all.
>
>
> So again: please either revert the previous patch and leave it to the
> more interested individuals to try it (probably slowing down Linux
> support for that MacBook as not detecting the hard-soldered disk is
> quite a blocker), or merge that hotfix until we find a better solution.
>
> With 4.5 that may even become a quirk.

As per your other message on the device being single queue, then that 
does make the case more believable. I guess if you guys are fine with 
it, it's better to have it in.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-01 20:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-22  2:28 [PATCH v2] nvme: temporary fix for Apple controller reset Stephan Günther
2015-11-22  8:56 ` Christoph Hellwig
2015-11-25  0:21   ` Wes Cilldhaire
2015-12-01 19:46 ` Stephan Günther
2015-12-01 19:56   ` Jens Axboe
2015-12-01 20:05     ` Stephan Günther
2015-12-01 20:10       ` Stephan Günther
2015-12-01 20:21       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.