All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] MMC: error handling improvements
@ 2011-02-15 23:03 ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-02-15 23:03 UTC (permalink / raw
  To: linux-arm-kernel

This patch is for RFC only; it needs splitting up somewhat.  However, I
wanted to get it out there for some comment.

The patch basically improves all round the error handling with MMC block
transfers.  I've considered several scenarios, from lost or bad CRCs on
the initial commands (both causing no response from the card), lost stop
commands (which I've seen after a FIFO overrun), failed status commands
(also seen after a FIFO overrun), to data FIFO errors.

I've considered the case where the card becomes inaccessible, which
causes us to give up trying fairly quickly - this should avoid problems
with error recovery taking a very long time should the card disappear.

My motivation is primerily dealing with FIFO overruns on data transfers,
to allow the ARM MMCI primecell to run at a higher clock rate, while
allowing it to survive various interrupt loads (which invariably results
in not being able to service the FIFO quickly enough.)

The one obvious solution to this is the old SA_INTERRUPT stuff which used
to work.  As this has been ripped out of the kernel, we have no option but
to invent more weird and wacky solutions to work-around other interrupt
handlers taking their time.

With this solution below, I can increase the card clocking rate from
around 512Kbps to 4Mbps, which is about an 8 fold increase in throughput
which can not sanely be ignored.

The comments may not entirely reflect what the code is doing; they may
have aged a little bit since the code was written.  The adaptive clock
rate algorithm can probably do with a lot more work to avoid it up-
clocking to a rate which has proven to never work.  I'd actually go as
far as to say that the algorithm probably has a lot to be desired - but
it seems to work for my test scenarios.

Last couple of points:
1. it relies on hosts returning the number of bytes equivalent to the
number of blocks _successfully_ transferred and no more.  Returning zero
bytes on error is permitted, but untested, and may result in FIFO overruns
causing a faster reduction in clock rate.
2. it relies on MMC host drivers returning the right error codes, which
historically hasn't really mattered.  With this code diagnosing the reason
for the failure, it now matters that the correct error codes are used.

 drivers/mmc/card/block.c                       |  402 ++++++++++++++++++------
 drivers/mmc/core/core.c                        |   11 +
 drivers/mmc/core/core.h                        |    1 +
 drivers/mmc/core/mmc.c                         |    2 +-
 drivers/mmc/core/sd.c                          |    2 +-
 drivers/mmc/core/sdio.c                        |    4 +-
 drivers/mmc/host/mmci.c                        |    2 +-
 include/linux/mmc/host.h                       |    1 +
 include/linux/mmc/mmc.h                        |   10 +
 9 files changed, 336 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index bfc8a8a..8daf340 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -75,6 +75,9 @@ struct mmc_blk_data {
 
 	unsigned int	usage;
 	unsigned int	read_only;
+	unsigned int	clk_div;
+	unsigned int	fifo_ok[10];
+	unsigned int	fifo_fail[10];
 };
 
 static DEFINE_MUTEX(open_lock);
@@ -245,7 +248,21 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
 	return result;
 }
 
-static u32 get_card_status(struct mmc_card *card, struct request *req)
+static int send_stop(struct mmc_card *card, u32 *status)
+{
+	struct mmc_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(struct mmc_command));
+	cmd.opcode = MMC_STOP_TRANSMISSION;
+	cmd.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+	err = mmc_wait_for_cmd(card->host, &cmd, 5);
+	if (err == 0)
+		*status = cmd.resp[0];
+	return err;
+}
+
+static int get_card_status(struct mmc_card *card, u32 *status)
 {
 	struct mmc_command cmd;
 	int err;
@@ -256,10 +273,134 @@ static u32 get_card_status(struct mmc_card *card, struct request *req)
 		cmd.arg = card->rca << 16;
 	cmd.flags = MMC_RSP_SPI_R2 | MMC_RSP_R1 | MMC_CMD_AC;
 	err = mmc_wait_for_cmd(card->host, &cmd, 0);
+	if (err == 0)
+		*status = cmd.resp[0];
+	return err;
+}
+
+#define ERR_RETRY	2
+#define ERR_ABORT	1
+#define ERR_CONTINUE	0
+
+/*
+ * Initial r/w and stop cmd error recovery.
+ * We don't know whether the card received the r/w cmd or not, so try to
+ * restore things back to a sane state.  Essentially, we do this as follows:
+ * - Obtain card status.  If the first attempt to obtain card status fails,
+ *   the status word will reflect the failed status cmd, not the failed
+ *   r/w cmd.  If we fail to obtain card status, it suggests we can no
+ *   longer communicate with the card.
+ * - Check the card state.  If the card received the cmd but there was a
+ *   transient problem with the response, it might still be in a data transfer
+ *   mode.  Try to send it a stop command.  If this fails, we can't recover.
+ * - If the r/w cmd failed due to a response CRC error, it was probably
+ *   transient, so retry the cmd.
+ * - If the r/w cmd timed out, but we didn't get the r/w cmd status, retry.
+ * - If the r/w cmd timed out, and the r/w cmd failed due to CRC error or
+ *   illegal cmd, retry.
+ * Otherwise we don't understand what happened, so abort.
+ */
+static int mmc_blk_cmd_recovery(struct mmc_card *card, struct mmc_blk_request *brq,
+	struct request *req)
+{
+	bool rwcmd_status = true;
+	u32 status, stop_status = 0;
+	int err, retry;
+
+	/*
+	 * Try to get card status which indicates both the card state
+	 * and why there was no response.  If the first attempt fails,
+	 * we can't be sure the returned status is for the r/w command.
+	 */
+	for (retry = 2; retry >= 0; retry--) {
+		err = get_card_status(card, &status);
+		if (!err)
+			break;
+
+		rwcmd_status = false;
+		pr_err("%s: error %d sending status command, %sing\n",
+		       req->rq_disk->disk_name, err, retry ? "retry" : "abort");
+	}
+
+	/* We couldn't get a response from the card.  Give up. */
 	if (err)
-		printk(KERN_ERR "%s: error %d sending status command",
-		       req->rq_disk->disk_name, err);
-	return cmd.resp[0];
+		return ERR_ABORT;
+
+	/*
+	 * Check the current card state.  If it is in some data transfer
+	 * mode, tell it to stop (and hopefully transition back to TRAN.)
+	 */
+	if (R1_CURRENT_STATE(status) == R1_STATE_DATA ||
+	    R1_CURRENT_STATE(status) == R1_STATE_RCV) {
+		err = send_stop(card, &stop_status);
+		if (err)
+			pr_err("%s: error %d sending stop command\n",
+			       req->rq_disk->disk_name, err);
+
+		/*
+		 * If the stop cmd also timed out, the card is
+		 * probably not present, so abort.
+		 */
+		if (err == -ETIMEDOUT)
+			return ERR_ABORT;
+	}
+
+	/* Check for initial command errors */
+	switch (brq->cmd.error) {
+	case 0:
+		/* no error */
+		break;
+
+	case -EILSEQ:
+		/* response crc error, retry the r/w cmd */
+		pr_err("%s: %s sending %s command, card status %#x\n",
+			req->rq_disk->disk_name,
+			"response CRC error", "read/write",
+			status);
+		return ERR_RETRY;
+
+	case -ETIMEDOUT:
+		pr_err("%s: %s sending %s command, card status %#x\n",
+			req->rq_disk->disk_name,
+			"timed out", "read/write",
+			status);
+
+		if (!rwcmd_status) {
+			/* Status cmd initially failed, retry the r/w cmd */
+			return ERR_RETRY;
+		}
+		if (status & (R1_COM_CRC_ERROR | R1_ILLEGAL_COMMAND)) {
+			/*
+			 * if it was a r/w cmd crc error, or illegal command
+			 * (eg, issued in wrong state) then retry - we should
+			 * have corrected the state problem above.
+			 */
+			return ERR_RETRY;
+		}
+
+	default:
+		/* we don't understand the error code the driver gave us */
+		pr_err("%s: unknown error %d sending read/write command, card status %#x\n",
+		       req->rq_disk->disk_name, brq->cmd.error, status);
+		return ERR_ABORT;
+	}
+
+	/*
+	 * Now for stop errors.  These aren't fatal to the transfer.
+	 */
+	pr_err("%s: error %d sending stop command, original cmd response %#x, card status %#x\n",
+	       req->rq_disk->disk_name, brq->stop.error,
+	       brq->cmd.resp[0], status);
+
+	/*
+	 * Subsitute in our own stop status as this will give the error
+	 * state which happened during the execution of the r/w command.
+	 */
+	if (stop_status) {
+		brq->stop.resp[0] = stop_status;
+		brq->stop.error = 0;
+	}
+	return ERR_CONTINUE;
 }
 
 static int mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
@@ -331,18 +472,29 @@ out:
 	return err ? 0 : 1;
 }
 
+extern void mmc_set_clock(struct mmc_host *host, unsigned int hz);
+
+#define CMD_ERRORS	\
+	(R1_OUT_OF_RANGE	/* Command argument out of range */	|\
+	 R1_ADDRESS_ERROR	/* Misaligned address */		|\
+	 R1_BLOCK_LEN_ERROR	/* Transferred block length incorrect */ |\
+	 R1_WP_VIOLATION	/* Tried to write to protected block */	|\
+	 R1_CC_ERROR		/* Card controller error */		|\
+	 R1_ERROR		/* General/unknown error */		)
+
 static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 {
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
+	struct mmc_host *host = card->host;
 	struct mmc_blk_request brq;
-	int ret = 1, disable_multi = 0;
+	int ret = 1, disable_multi = 0, retry = 0;
 
-	mmc_claim_host(card->host);
+	mmc_claim_host(host);
 
 	do {
-		struct mmc_command cmd;
-		u32 readcmd, writecmd, status = 0;
+		u32 readcmd, writecmd;
+		unsigned int bytes_xfered;
 
 		memset(&brq, 0, sizeof(struct mmc_blk_request));
 		brq.mrq.cmd = &brq.cmd;
@@ -426,55 +578,46 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 
 		mmc_queue_bounce_post(mq);
 
-		/*
-		 * Check for errors here, but don't jump to cmd_err
-		 * until later as we need to wait for the card to leave
-		 * programming mode even when things go wrong.
-		 */
-		if (brq.cmd.error || brq.data.error || brq.stop.error) {
-			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
-				/* Redo read one sector at a time */
-				printk(KERN_WARNING "%s: retrying using single "
-				       "block read\n", req->rq_disk->disk_name);
-				disable_multi = 1;
-				continue;
+		if (brq.cmd.error || brq.stop.error) {
+			/*
+			 * cmd.error indicates a problem with the r/w
+			 * command.  No data will have been transferred.
+			 *
+			 * stop.error indicates a problem with the stop
+			 * command.  Data may have been transferred, or
+			 * may still be transferring.
+			 */
+			switch (mmc_blk_cmd_recovery(card, &brq, req)) {
+			case ERR_RETRY:
+				if (retry++ < 5)
+					goto cmd_err;
+			case ERR_ABORT:
+				goto cmd_err;
+			case ERR_CONTINUE:
+				break;
 			}
-			status = get_card_status(card, req);
 		}
 
-		if (brq.cmd.error) {
-			printk(KERN_ERR "%s: error %d sending read/write "
-			       "command, response %#x, card status %#x\n",
-			       req->rq_disk->disk_name, brq.cmd.error,
-			       brq.cmd.resp[0], status);
-		}
-
-		if (brq.data.error) {
-			if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
-				/* 'Stop' response contains card status */
-				status = brq.mrq.stop->resp[0];
-			printk(KERN_ERR "%s: error %d transferring data,"
-			       " sector %u, nr %u, card status %#x\n",
-			       req->rq_disk->disk_name, brq.data.error,
-			       (unsigned)blk_rq_pos(req),
-			       (unsigned)blk_rq_sectors(req), status);
-		}
-
-		if (brq.stop.error) {
-			printk(KERN_ERR "%s: error %d sending stop command, "
-			       "response %#x, card status %#x\n",
-			       req->rq_disk->disk_name, brq.stop.error,
-			       brq.stop.resp[0], status);
-		}
+		/*
+		 * Check for errors relating to the execution of
+		 * the initial command - such as address errors.
+		 */
+		if (brq.cmd.resp[0] & CMD_ERRORS) {
+			pr_err("%s: r/w command failed, status = %#x\n",
+				req->rq_disk->disk_name, brq.cmd.resp[0]);
+			goto cmd_err;
+		}			
 
+		/*
+		 * Everthing else is either success, or a data error of some
+		 * kind.  If it was a write, we may have transitioned to
+		 * program mode, which we have to wait for it to complete.
+		 */
 		if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
+			u32 status;
 			do {
-				int err;
-
-				cmd.opcode = MMC_SEND_STATUS;
-				cmd.arg = card->rca << 16;
-				cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
-				err = mmc_wait_for_cmd(card->host, &cmd, 5);
+				int err = get_card_status(card, &status);
+				/* If -EILSEQ, retry? */
 				if (err) {
 					printk(KERN_ERR "%s: error %d requesting status\n",
 					       req->rq_disk->disk_name, err);
@@ -485,20 +628,121 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				 * so make sure to check both the busy
 				 * indication and the card state.
 				 */
-			} while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
-				(R1_CURRENT_STATE(cmd.resp[0]) == 7));
-
-#if 0
-			if (cmd.resp[0] & ~0x00000900)
-				printk(KERN_ERR "%s: status = %08x\n",
-				       req->rq_disk->disk_name, cmd.resp[0]);
-			if (mmc_decode_status(cmd.resp))
-				goto cmd_err;
-#endif
+			} while (!(status & R1_READY_FOR_DATA) ||
+			    (R1_CURRENT_STATE(status) == R1_STATE_PRG));
+		}
+
+		/*
+		 * If this is an SD card and we're writing, we can first mark
+		 * the known good sectors as ok.
+		 *
+		 * If the card is not SD, we can still ok written sectors as
+		 * reported by the controller (which may be less than the real
+		 * number of written sectors, but never more).
+		 */
+		bytes_xfered = brq.data.bytes_xfered;
+		if (mmc_card_sd(card) && rq_data_dir(req) != READ) {
+			u32 blocks = mmc_sd_num_wr_blocks(card);
+			if (blocks != (u32)-1)
+				bytes_xfered = blocks << 9;
 		}
 
-		if (brq.cmd.error || brq.stop.error || brq.data.error) {
+		if (bytes_xfered) {
+			/*
+			 * Data was transferred.  Let the block layer know
+			 * how much was successfully transferred, so we
+			 * don't retry this.  Hosts which can't report the
+			 * number of successful bytes better return zero on
+			 * error.
+			 */
+			spin_lock_irq(&md->lock);
+			ret = __blk_end_request(req, 0, bytes_xfered);
+			spin_unlock_irq(&md->lock);
+		} else {
+			ret = 1;
+		}
+
+		if (brq.data.error)
+			pr_err("%s: error %d transferring data, sector %u, nr %u, cmd response %#x, card status %#x\n",
+			       req->rq_disk->disk_name, brq.data.error,
+			       (unsigned)blk_rq_pos(req), (unsigned)blk_rq_sectors(req),
+			       brq.cmd.resp[0], brq.stop.resp[0]);
+
+		if (brq.data.error == -EIO) {
+			/* fifo underrun/overrun */
+
+			/* increment error count for this rate */
+			md->fifo_fail[md->clk_div]++;
+
+			/* if we progressed, just retry. */
+			if (bytes_xfered && retry++ < 5)
+				continue;
+
+			/* retry once in case of transients */
+			if (retry++ < 1)
+				continue;
+
+			/* drop the error rate */
+			if (host->ios.clock > host->f_init && md->clk_div < 9) {
+				unsigned int clock;
+
+				md->clk_div++;
+				md->fifo_fail[md->clk_div] = 0;
+				md->fifo_ok[md->clk_div] = 0;
+
+				pr_err("%s: retrying with slower /%u clock rate\n",
+					req->rq_disk->disk_name,
+					1 << md->clk_div);
+
+				clock = host->clock_max >> md->clk_div;
+				mmc_set_clock(host, clock);
+				continue;
+			}
+		} else {
+			unsigned this_err_rate, next_err_rate = 100;
+			unsigned idx = md->clk_div;
+
+			md->fifo_ok[idx]++;
+
+			if (md->fifo_ok[idx] + md->fifo_fail[idx] > 100) {
+
+			if (idx > 0) {
+				unsigned total = md->fifo_fail[idx - 1] + md->fifo_ok[idx - 1];
+
+				next_err_rate = total ?
+					md->fifo_fail[idx - 1] / total : 0;
+			}
+
+			this_err_rate = md->fifo_fail[idx] /
+					(md->fifo_fail[idx] + md->fifo_ok[idx]);
+
+			if (this_err_rate < 5 && next_err_rate < 30) {
+				unsigned clock;
+				md->clk_div--;
+
+				md->fifo_ok[md->clk_div] = 0;
+				md->fifo_fail[md->clk_div] = 0;
+
+				pr_err("%s: increasing to /%u clock rate\n",
+					req->rq_disk->disk_name,
+					1 << md->clk_div);
+
+				clock = host->clock_max >> md->clk_div;
+				mmc_set_clock(host, clock);
+			}
+			}
+		}
+
+		if (brq.data.error) {
 			if (rq_data_dir(req) == READ) {
+				if (brq.data.blocks > 1) {
+					/* Redo read one sector at a time */
+					pr_warning("%s: retrying using single "
+						"block read\n", req->rq_disk->disk_name);
+					disable_multi = 1;
+					continue;
+				}
+
 				/*
 				 * After an error, we redo I/O one sector at a
 				 * time, so we only reach here after trying to
@@ -507,17 +751,10 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				spin_lock_irq(&md->lock);
 				ret = __blk_end_request(req, -EIO, brq.data.blksz);
 				spin_unlock_irq(&md->lock);
-				continue;
+			} else {
+				goto cmd_err;
 			}
-			goto cmd_err;
 		}
-
-		/*
-		 * A block was successfully transferred.
-		 */
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
 	} while (ret);
 
 	mmc_release_host(card->host);
@@ -525,29 +762,6 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	return 1;
 
  cmd_err:
- 	/*
- 	 * If this is an SD card and we're writing, we can first
- 	 * mark the known good sectors as ok.
- 	 *
-	 * If the card is not SD, we can still ok written sectors
-	 * as reported by the controller (which might be less than
-	 * the real number of written sectors, but never more).
-	 */
-	if (mmc_card_sd(card)) {
-		u32 blocks;
-
-		blocks = mmc_sd_num_wr_blocks(card);
-		if (blocks != (u32)-1) {
-			spin_lock_irq(&md->lock);
-			ret = __blk_end_request(req, 0, blocks << 9);
-			spin_unlock_irq(&md->lock);
-		}
-	} else {
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
-	}
-
 	mmc_release_host(card->host);
 
 	spin_lock_irq(&md->lock);
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 6625c05..b3ffd99 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -648,6 +648,17 @@ void mmc_set_clock(struct mmc_host *host, unsigned int hz)
 	mmc_set_ios(host);
 }
 
+void mmc_set_clock_max(struct mmc_host *host, unsigned int hz)
+{
+	WARN_ON(hz < host->f_min);
+
+	if (hz > host->f_max)
+		hz = host->f_max;
+
+	host->ios.clock = host->clock_max = hz;
+	mmc_set_ios(host);
+}
+
 #ifdef CONFIG_MMC_CLKGATE
 /*
  * This gates the clock by setting it to 0 Hz.
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index ca1fdde..412113c 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -33,6 +33,7 @@ void mmc_init_erase(struct mmc_card *card);
 
 void mmc_set_chip_select(struct mmc_host *host, int mode);
 void mmc_set_clock(struct mmc_host *host, unsigned int hz);
+void mmc_set_clock_max(struct mmc_host *host, unsigned int hz);
 void mmc_gate_clock(struct mmc_host *host);
 void mmc_ungate_clock(struct mmc_host *host);
 void mmc_set_ungated(struct mmc_host *host);
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 16006ef..4fe515a 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -515,7 +515,7 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
 		max_dtr = card->csd.max_dtr;
 	}
 
-	mmc_set_clock(host, max_dtr);
+	mmc_set_clock_max(host, max_dtr);
 
 	/*
 	 * Indicate DDR mode (if supported).
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index d18c32b..576b320 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -621,7 +621,7 @@ static int mmc_sd_init_card(struct mmc_host *host, u32 ocr,
 	/*
 	 * Set bus speed.
 	 */
-	mmc_set_clock(host, mmc_sd_get_max_clock(card));
+	mmc_set_clock_max(host, mmc_sd_get_max_clock(card));
 
 	/*
 	 * Switch to wider bus (if supported).
diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index 5c4a54d..fb41db6 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -425,7 +425,7 @@ static int mmc_sdio_init_card(struct mmc_host *host, u32 ocr,
 		 * It's host's responsibility to fill cccr and cis
 		 * structures in init_card().
 		 */
-		mmc_set_clock(host, card->cis.max_dtr);
+		mmc_set_clock_max(host, card->cis.max_dtr);
 
 		if (card->cccr.high_speed) {
 			mmc_card_set_highspeed(card);
@@ -491,7 +491,7 @@ static int mmc_sdio_init_card(struct mmc_host *host, u32 ocr,
 	/*
 	 * Change to the card's maximum speed.
 	 */
-	mmc_set_clock(host, mmc_sdio_get_max_clock(card));
+	mmc_set_clock_max(host, mmc_sdio_get_max_clock(card));
 
 	/*
 	 * Switch to wider bus (if supported).
diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 8a29c9f..dcbe9c4 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -38,7 +38,7 @@
 
 #define DRIVER_NAME "mmci-pl18x"
 
-static unsigned int fmax = 515633;
+static unsigned int fmax = 4000000;
 
 /**
  * struct variant_data - MMCI variant-specific quirks
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index bcb793e..76df335 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -199,6 +199,7 @@ struct mmc_host {
 
 	struct mmc_ios		ios;		/* current io bus settings */
 	u32			ocr;		/* the current OCR setting */
+	unsigned int		clock_max;	/* maximum clock rate */
 
 	/* group bitfields together to minimize padding */
 	unsigned int		use_spi_crc:1;
diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h
index 612301f..9d067ee 100644
--- a/include/linux/mmc/mmc.h
+++ b/include/linux/mmc/mmc.h
@@ -133,6 +133,16 @@
 #define R1_SWITCH_ERROR		(1 << 7)	/* sx, c */
 #define R1_APP_CMD		(1 << 5)	/* sr, c */
 
+#define R1_STATE_IDLE	0
+#define R1_STATE_READY	1
+#define R1_STATE_IDENT	2
+#define R1_STATE_STBY	3
+#define R1_STATE_TRAN	4
+#define R1_STATE_DATA	5
+#define R1_STATE_RCV	6
+#define R1_STATE_PRG	7
+#define R1_STATE_DIS	8
+
 /*
  * MMC/SD in SPI mode reports R1 status always, and R2 for SEND_STATUS
  * R1 is the low order byte; R2 is the next highest byte, when present.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
@ 2011-02-15 23:03 ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-02-15 23:03 UTC (permalink / raw
  To: linux-arm-kernel, linux-mmc, Chris Ball

This patch is for RFC only; it needs splitting up somewhat.  However, I
wanted to get it out there for some comment.

The patch basically improves all round the error handling with MMC block
transfers.  I've considered several scenarios, from lost or bad CRCs on
the initial commands (both causing no response from the card), lost stop
commands (which I've seen after a FIFO overrun), failed status commands
(also seen after a FIFO overrun), to data FIFO errors.

I've considered the case where the card becomes inaccessible, which
causes us to give up trying fairly quickly - this should avoid problems
with error recovery taking a very long time should the card disappear.

My motivation is primerily dealing with FIFO overruns on data transfers,
to allow the ARM MMCI primecell to run at a higher clock rate, while
allowing it to survive various interrupt loads (which invariably results
in not being able to service the FIFO quickly enough.)

The one obvious solution to this is the old SA_INTERRUPT stuff which used
to work.  As this has been ripped out of the kernel, we have no option but
to invent more weird and wacky solutions to work-around other interrupt
handlers taking their time.

With this solution below, I can increase the card clocking rate from
around 512Kbps to 4Mbps, which is about an 8 fold increase in throughput
which can not sanely be ignored.

The comments may not entirely reflect what the code is doing; they may
have aged a little bit since the code was written.  The adaptive clock
rate algorithm can probably do with a lot more work to avoid it up-
clocking to a rate which has proven to never work.  I'd actually go as
far as to say that the algorithm probably has a lot to be desired - but
it seems to work for my test scenarios.

Last couple of points:
1. it relies on hosts returning the number of bytes equivalent to the
number of blocks _successfully_ transferred and no more.  Returning zero
bytes on error is permitted, but untested, and may result in FIFO overruns
causing a faster reduction in clock rate.
2. it relies on MMC host drivers returning the right error codes, which
historically hasn't really mattered.  With this code diagnosing the reason
for the failure, it now matters that the correct error codes are used.

 drivers/mmc/card/block.c                       |  402 ++++++++++++++++++------
 drivers/mmc/core/core.c                        |   11 +
 drivers/mmc/core/core.h                        |    1 +
 drivers/mmc/core/mmc.c                         |    2 +-
 drivers/mmc/core/sd.c                          |    2 +-
 drivers/mmc/core/sdio.c                        |    4 +-
 drivers/mmc/host/mmci.c                        |    2 +-
 include/linux/mmc/host.h                       |    1 +
 include/linux/mmc/mmc.h                        |   10 +
 9 files changed, 336 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index bfc8a8a..8daf340 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -75,6 +75,9 @@ struct mmc_blk_data {
 
 	unsigned int	usage;
 	unsigned int	read_only;
+	unsigned int	clk_div;
+	unsigned int	fifo_ok[10];
+	unsigned int	fifo_fail[10];
 };
 
 static DEFINE_MUTEX(open_lock);
@@ -245,7 +248,21 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
 	return result;
 }
 
-static u32 get_card_status(struct mmc_card *card, struct request *req)
+static int send_stop(struct mmc_card *card, u32 *status)
+{
+	struct mmc_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(struct mmc_command));
+	cmd.opcode = MMC_STOP_TRANSMISSION;
+	cmd.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+	err = mmc_wait_for_cmd(card->host, &cmd, 5);
+	if (err == 0)
+		*status = cmd.resp[0];
+	return err;
+}
+
+static int get_card_status(struct mmc_card *card, u32 *status)
 {
 	struct mmc_command cmd;
 	int err;
@@ -256,10 +273,134 @@ static u32 get_card_status(struct mmc_card *card, struct request *req)
 		cmd.arg = card->rca << 16;
 	cmd.flags = MMC_RSP_SPI_R2 | MMC_RSP_R1 | MMC_CMD_AC;
 	err = mmc_wait_for_cmd(card->host, &cmd, 0);
+	if (err == 0)
+		*status = cmd.resp[0];
+	return err;
+}
+
+#define ERR_RETRY	2
+#define ERR_ABORT	1
+#define ERR_CONTINUE	0
+
+/*
+ * Initial r/w and stop cmd error recovery.
+ * We don't know whether the card received the r/w cmd or not, so try to
+ * restore things back to a sane state.  Essentially, we do this as follows:
+ * - Obtain card status.  If the first attempt to obtain card status fails,
+ *   the status word will reflect the failed status cmd, not the failed
+ *   r/w cmd.  If we fail to obtain card status, it suggests we can no
+ *   longer communicate with the card.
+ * - Check the card state.  If the card received the cmd but there was a
+ *   transient problem with the response, it might still be in a data transfer
+ *   mode.  Try to send it a stop command.  If this fails, we can't recover.
+ * - If the r/w cmd failed due to a response CRC error, it was probably
+ *   transient, so retry the cmd.
+ * - If the r/w cmd timed out, but we didn't get the r/w cmd status, retry.
+ * - If the r/w cmd timed out, and the r/w cmd failed due to CRC error or
+ *   illegal cmd, retry.
+ * Otherwise we don't understand what happened, so abort.
+ */
+static int mmc_blk_cmd_recovery(struct mmc_card *card, struct mmc_blk_request *brq,
+	struct request *req)
+{
+	bool rwcmd_status = true;
+	u32 status, stop_status = 0;
+	int err, retry;
+
+	/*
+	 * Try to get card status which indicates both the card state
+	 * and why there was no response.  If the first attempt fails,
+	 * we can't be sure the returned status is for the r/w command.
+	 */
+	for (retry = 2; retry >= 0; retry--) {
+		err = get_card_status(card, &status);
+		if (!err)
+			break;
+
+		rwcmd_status = false;
+		pr_err("%s: error %d sending status command, %sing\n",
+		       req->rq_disk->disk_name, err, retry ? "retry" : "abort");
+	}
+
+	/* We couldn't get a response from the card.  Give up. */
 	if (err)
-		printk(KERN_ERR "%s: error %d sending status command",
-		       req->rq_disk->disk_name, err);
-	return cmd.resp[0];
+		return ERR_ABORT;
+
+	/*
+	 * Check the current card state.  If it is in some data transfer
+	 * mode, tell it to stop (and hopefully transition back to TRAN.)
+	 */
+	if (R1_CURRENT_STATE(status) == R1_STATE_DATA ||
+	    R1_CURRENT_STATE(status) == R1_STATE_RCV) {
+		err = send_stop(card, &stop_status);
+		if (err)
+			pr_err("%s: error %d sending stop command\n",
+			       req->rq_disk->disk_name, err);
+
+		/*
+		 * If the stop cmd also timed out, the card is
+		 * probably not present, so abort.
+		 */
+		if (err == -ETIMEDOUT)
+			return ERR_ABORT;
+	}
+
+	/* Check for initial command errors */
+	switch (brq->cmd.error) {
+	case 0:
+		/* no error */
+		break;
+
+	case -EILSEQ:
+		/* response crc error, retry the r/w cmd */
+		pr_err("%s: %s sending %s command, card status %#x\n",
+			req->rq_disk->disk_name,
+			"response CRC error", "read/write",
+			status);
+		return ERR_RETRY;
+
+	case -ETIMEDOUT:
+		pr_err("%s: %s sending %s command, card status %#x\n",
+			req->rq_disk->disk_name,
+			"timed out", "read/write",
+			status);
+
+		if (!rwcmd_status) {
+			/* Status cmd initially failed, retry the r/w cmd */
+			return ERR_RETRY;
+		}
+		if (status & (R1_COM_CRC_ERROR | R1_ILLEGAL_COMMAND)) {
+			/*
+			 * if it was a r/w cmd crc error, or illegal command
+			 * (eg, issued in wrong state) then retry - we should
+			 * have corrected the state problem above.
+			 */
+			return ERR_RETRY;
+		}
+
+	default:
+		/* we don't understand the error code the driver gave us */
+		pr_err("%s: unknown error %d sending read/write command, card status %#x\n",
+		       req->rq_disk->disk_name, brq->cmd.error, status);
+		return ERR_ABORT;
+	}
+
+	/*
+	 * Now for stop errors.  These aren't fatal to the transfer.
+	 */
+	pr_err("%s: error %d sending stop command, original cmd response %#x, card status %#x\n",
+	       req->rq_disk->disk_name, brq->stop.error,
+	       brq->cmd.resp[0], status);
+
+	/*
+	 * Subsitute in our own stop status as this will give the error
+	 * state which happened during the execution of the r/w command.
+	 */
+	if (stop_status) {
+		brq->stop.resp[0] = stop_status;
+		brq->stop.error = 0;
+	}
+	return ERR_CONTINUE;
 }
 
 static int mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
@@ -331,18 +472,29 @@ out:
 	return err ? 0 : 1;
 }
 
+extern void mmc_set_clock(struct mmc_host *host, unsigned int hz);
+
+#define CMD_ERRORS	\
+	(R1_OUT_OF_RANGE	/* Command argument out of range */	|\
+	 R1_ADDRESS_ERROR	/* Misaligned address */		|\
+	 R1_BLOCK_LEN_ERROR	/* Transferred block length incorrect */ |\
+	 R1_WP_VIOLATION	/* Tried to write to protected block */	|\
+	 R1_CC_ERROR		/* Card controller error */		|\
+	 R1_ERROR		/* General/unknown error */		)
+
 static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 {
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
+	struct mmc_host *host = card->host;
 	struct mmc_blk_request brq;
-	int ret = 1, disable_multi = 0;
+	int ret = 1, disable_multi = 0, retry = 0;
 
-	mmc_claim_host(card->host);
+	mmc_claim_host(host);
 
 	do {
-		struct mmc_command cmd;
-		u32 readcmd, writecmd, status = 0;
+		u32 readcmd, writecmd;
+		unsigned int bytes_xfered;
 
 		memset(&brq, 0, sizeof(struct mmc_blk_request));
 		brq.mrq.cmd = &brq.cmd;
@@ -426,55 +578,46 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 
 		mmc_queue_bounce_post(mq);
 
-		/*
-		 * Check for errors here, but don't jump to cmd_err
-		 * until later as we need to wait for the card to leave
-		 * programming mode even when things go wrong.
-		 */
-		if (brq.cmd.error || brq.data.error || brq.stop.error) {
-			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
-				/* Redo read one sector at a time */
-				printk(KERN_WARNING "%s: retrying using single "
-				       "block read\n", req->rq_disk->disk_name);
-				disable_multi = 1;
-				continue;
+		if (brq.cmd.error || brq.stop.error) {
+			/*
+			 * cmd.error indicates a problem with the r/w
+			 * command.  No data will have been transferred.
+			 *
+			 * stop.error indicates a problem with the stop
+			 * command.  Data may have been transferred, or
+			 * may still be transferring.
+			 */
+			switch (mmc_blk_cmd_recovery(card, &brq, req)) {
+			case ERR_RETRY:
+				if (retry++ < 5)
+					goto cmd_err;
+			case ERR_ABORT:
+				goto cmd_err;
+			case ERR_CONTINUE:
+				break;
 			}
-			status = get_card_status(card, req);
 		}
 
-		if (brq.cmd.error) {
-			printk(KERN_ERR "%s: error %d sending read/write "
-			       "command, response %#x, card status %#x\n",
-			       req->rq_disk->disk_name, brq.cmd.error,
-			       brq.cmd.resp[0], status);
-		}
-
-		if (brq.data.error) {
-			if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
-				/* 'Stop' response contains card status */
-				status = brq.mrq.stop->resp[0];
-			printk(KERN_ERR "%s: error %d transferring data,"
-			       " sector %u, nr %u, card status %#x\n",
-			       req->rq_disk->disk_name, brq.data.error,
-			       (unsigned)blk_rq_pos(req),
-			       (unsigned)blk_rq_sectors(req), status);
-		}
-
-		if (brq.stop.error) {
-			printk(KERN_ERR "%s: error %d sending stop command, "
-			       "response %#x, card status %#x\n",
-			       req->rq_disk->disk_name, brq.stop.error,
-			       brq.stop.resp[0], status);
-		}
+		/*
+		 * Check for errors relating to the execution of
+		 * the initial command - such as address errors.
+		 */
+		if (brq.cmd.resp[0] & CMD_ERRORS) {
+			pr_err("%s: r/w command failed, status = %#x\n",
+				req->rq_disk->disk_name, brq.cmd.resp[0]);
+			goto cmd_err;
+		}			
 
+		/*
+		 * Everthing else is either success, or a data error of some
+		 * kind.  If it was a write, we may have transitioned to
+		 * program mode, which we have to wait for it to complete.
+		 */
 		if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
+			u32 status;
 			do {
-				int err;
-
-				cmd.opcode = MMC_SEND_STATUS;
-				cmd.arg = card->rca << 16;
-				cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
-				err = mmc_wait_for_cmd(card->host, &cmd, 5);
+				int err = get_card_status(card, &status);
+				/* If -EILSEQ, retry? */
 				if (err) {
 					printk(KERN_ERR "%s: error %d requesting status\n",
 					       req->rq_disk->disk_name, err);
@@ -485,20 +628,121 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				 * so make sure to check both the busy
 				 * indication and the card state.
 				 */
-			} while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
-				(R1_CURRENT_STATE(cmd.resp[0]) == 7));
-
-#if 0
-			if (cmd.resp[0] & ~0x00000900)
-				printk(KERN_ERR "%s: status = %08x\n",
-				       req->rq_disk->disk_name, cmd.resp[0]);
-			if (mmc_decode_status(cmd.resp))
-				goto cmd_err;
-#endif
+			} while (!(status & R1_READY_FOR_DATA) ||
+			    (R1_CURRENT_STATE(status) == R1_STATE_PRG));
+		}
+
+		/*
+		 * If this is an SD card and we're writing, we can first mark
+		 * the known good sectors as ok.
+		 *
+		 * If the card is not SD, we can still ok written sectors as
+		 * reported by the controller (which may be less than the real
+		 * number of written sectors, but never more).
+		 */
+		bytes_xfered = brq.data.bytes_xfered;
+		if (mmc_card_sd(card) && rq_data_dir(req) != READ) {
+			u32 blocks = mmc_sd_num_wr_blocks(card);
+			if (blocks != (u32)-1)
+				bytes_xfered = blocks << 9;
 		}
 
-		if (brq.cmd.error || brq.stop.error || brq.data.error) {
+		if (bytes_xfered) {
+			/*
+			 * Data was transferred.  Let the block layer know
+			 * how much was successfully transferred, so we
+			 * don't retry this.  Hosts which can't report the
+			 * number of successful bytes better return zero on
+			 * error.
+			 */
+			spin_lock_irq(&md->lock);
+			ret = __blk_end_request(req, 0, bytes_xfered);
+			spin_unlock_irq(&md->lock);
+		} else {
+			ret = 1;
+		}
+
+		if (brq.data.error)
+			pr_err("%s: error %d transferring data, sector %u, nr %u, cmd response %#x, card status %#x\n",
+			       req->rq_disk->disk_name, brq.data.error,
+			       (unsigned)blk_rq_pos(req), (unsigned)blk_rq_sectors(req),
+			       brq.cmd.resp[0], brq.stop.resp[0]);
+
+		if (brq.data.error == -EIO) {
+			/* fifo underrun/overrun */
+
+			/* increment error count for this rate */
+			md->fifo_fail[md->clk_div]++;
+
+			/* if we progressed, just retry. */
+			if (bytes_xfered && retry++ < 5)
+				continue;
+
+			/* retry once in case of transients */
+			if (retry++ < 1)
+				continue;
+
+			/* drop the error rate */
+			if (host->ios.clock > host->f_init && md->clk_div < 9) {
+				unsigned int clock;
+
+				md->clk_div++;
+				md->fifo_fail[md->clk_div] = 0;
+				md->fifo_ok[md->clk_div] = 0;
+
+				pr_err("%s: retrying with slower /%u clock rate\n",
+					req->rq_disk->disk_name,
+					1 << md->clk_div);
+
+				clock = host->clock_max >> md->clk_div;
+				mmc_set_clock(host, clock);
+				continue;
+			}
+		} else {
+			unsigned this_err_rate, next_err_rate = 100;
+			unsigned idx = md->clk_div;
+
+			md->fifo_ok[idx]++;
+
+			if (md->fifo_ok[idx] + md->fifo_fail[idx] > 100) {
+
+			if (idx > 0) {
+				unsigned total = md->fifo_fail[idx - 1] + md->fifo_ok[idx - 1];
+
+				next_err_rate = total ?
+					md->fifo_fail[idx - 1] / total : 0;
+			}
+
+			this_err_rate = md->fifo_fail[idx] /
+					(md->fifo_fail[idx] + md->fifo_ok[idx]);
+
+			if (this_err_rate < 5 && next_err_rate < 30) {
+				unsigned clock;
+				md->clk_div--;
+
+				md->fifo_ok[md->clk_div] = 0;
+				md->fifo_fail[md->clk_div] = 0;
+
+				pr_err("%s: increasing to /%u clock rate\n",
+					req->rq_disk->disk_name,
+					1 << md->clk_div);
+
+				clock = host->clock_max >> md->clk_div;
+				mmc_set_clock(host, clock);
+			}
+			}
+		}
+
+		if (brq.data.error) {
 			if (rq_data_dir(req) == READ) {
+				if (brq.data.blocks > 1) {
+					/* Redo read one sector at a time */
+					pr_warning("%s: retrying using single "
+						"block read\n", req->rq_disk->disk_name);
+					disable_multi = 1;
+					continue;
+				}
+
 				/*
 				 * After an error, we redo I/O one sector at a
 				 * time, so we only reach here after trying to
@@ -507,17 +751,10 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				spin_lock_irq(&md->lock);
 				ret = __blk_end_request(req, -EIO, brq.data.blksz);
 				spin_unlock_irq(&md->lock);
-				continue;
+			} else {
+				goto cmd_err;
 			}
-			goto cmd_err;
 		}
-
-		/*
-		 * A block was successfully transferred.
-		 */
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
 	} while (ret);
 
 	mmc_release_host(card->host);
@@ -525,29 +762,6 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	return 1;
 
  cmd_err:
- 	/*
- 	 * If this is an SD card and we're writing, we can first
- 	 * mark the known good sectors as ok.
- 	 *
-	 * If the card is not SD, we can still ok written sectors
-	 * as reported by the controller (which might be less than
-	 * the real number of written sectors, but never more).
-	 */
-	if (mmc_card_sd(card)) {
-		u32 blocks;
-
-		blocks = mmc_sd_num_wr_blocks(card);
-		if (blocks != (u32)-1) {
-			spin_lock_irq(&md->lock);
-			ret = __blk_end_request(req, 0, blocks << 9);
-			spin_unlock_irq(&md->lock);
-		}
-	} else {
-		spin_lock_irq(&md->lock);
-		ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
-		spin_unlock_irq(&md->lock);
-	}
-
 	mmc_release_host(card->host);
 
 	spin_lock_irq(&md->lock);
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 6625c05..b3ffd99 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -648,6 +648,17 @@ void mmc_set_clock(struct mmc_host *host, unsigned int hz)
 	mmc_set_ios(host);
 }
 
+void mmc_set_clock_max(struct mmc_host *host, unsigned int hz)
+{
+	WARN_ON(hz < host->f_min);
+
+	if (hz > host->f_max)
+		hz = host->f_max;
+
+	host->ios.clock = host->clock_max = hz;
+	mmc_set_ios(host);
+}
+
 #ifdef CONFIG_MMC_CLKGATE
 /*
  * This gates the clock by setting it to 0 Hz.
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index ca1fdde..412113c 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -33,6 +33,7 @@ void mmc_init_erase(struct mmc_card *card);
 
 void mmc_set_chip_select(struct mmc_host *host, int mode);
 void mmc_set_clock(struct mmc_host *host, unsigned int hz);
+void mmc_set_clock_max(struct mmc_host *host, unsigned int hz);
 void mmc_gate_clock(struct mmc_host *host);
 void mmc_ungate_clock(struct mmc_host *host);
 void mmc_set_ungated(struct mmc_host *host);
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 16006ef..4fe515a 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -515,7 +515,7 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
 		max_dtr = card->csd.max_dtr;
 	}
 
-	mmc_set_clock(host, max_dtr);
+	mmc_set_clock_max(host, max_dtr);
 
 	/*
 	 * Indicate DDR mode (if supported).
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index d18c32b..576b320 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -621,7 +621,7 @@ static int mmc_sd_init_card(struct mmc_host *host, u32 ocr,
 	/*
 	 * Set bus speed.
 	 */
-	mmc_set_clock(host, mmc_sd_get_max_clock(card));
+	mmc_set_clock_max(host, mmc_sd_get_max_clock(card));
 
 	/*
 	 * Switch to wider bus (if supported).
diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index 5c4a54d..fb41db6 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -425,7 +425,7 @@ static int mmc_sdio_init_card(struct mmc_host *host, u32 ocr,
 		 * It's host's responsibility to fill cccr and cis
 		 * structures in init_card().
 		 */
-		mmc_set_clock(host, card->cis.max_dtr);
+		mmc_set_clock_max(host, card->cis.max_dtr);
 
 		if (card->cccr.high_speed) {
 			mmc_card_set_highspeed(card);
@@ -491,7 +491,7 @@ static int mmc_sdio_init_card(struct mmc_host *host, u32 ocr,
 	/*
 	 * Change to the card's maximum speed.
 	 */
-	mmc_set_clock(host, mmc_sdio_get_max_clock(card));
+	mmc_set_clock_max(host, mmc_sdio_get_max_clock(card));
 
 	/*
 	 * Switch to wider bus (if supported).
diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 8a29c9f..dcbe9c4 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -38,7 +38,7 @@
 
 #define DRIVER_NAME "mmci-pl18x"
 
-static unsigned int fmax = 515633;
+static unsigned int fmax = 4000000;
 
 /**
  * struct variant_data - MMCI variant-specific quirks
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index bcb793e..76df335 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -199,6 +199,7 @@ struct mmc_host {
 
 	struct mmc_ios		ios;		/* current io bus settings */
 	u32			ocr;		/* the current OCR setting */
+	unsigned int		clock_max;	/* maximum clock rate */
 
 	/* group bitfields together to minimize padding */
 	unsigned int		use_spi_crc:1;
diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h
index 612301f..9d067ee 100644
--- a/include/linux/mmc/mmc.h
+++ b/include/linux/mmc/mmc.h
@@ -133,6 +133,16 @@
 #define R1_SWITCH_ERROR		(1 << 7)	/* sx, c */
 #define R1_APP_CMD		(1 << 5)	/* sr, c */
 
+#define R1_STATE_IDLE	0
+#define R1_STATE_READY	1
+#define R1_STATE_IDENT	2
+#define R1_STATE_STBY	3
+#define R1_STATE_TRAN	4
+#define R1_STATE_DATA	5
+#define R1_STATE_RCV	6
+#define R1_STATE_PRG	7
+#define R1_STATE_DIS	8
+
 /*
  * MMC/SD in SPI mode reports R1 status always, and R2 for SEND_STATUS
  * R1 is the low order byte; R2 is the next highest byte, when present.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-15 23:03 ` Russell King - ARM Linux
@ 2011-02-15 23:49   ` David Brown
  -1 siblings, 0 replies; 32+ messages in thread
From: David Brown @ 2011-02-15 23:49 UTC (permalink / raw
  To: linux-arm-kernel

On Tue, Feb 15 2011, Russell King - ARM Linux wrote:

> This patch is for RFC only; it needs splitting up somewhat.  However, I
> wanted to get it out there for some comment.

Just for kicks, I applied this and ran it on an MSM target (8x50).  It
seems to cause lots of:

  mmc0: Data timeout
  mmc0: Controller has been re-initialized
  ...
  mmc0: Data CRC error

and I can post more if you would find them interesting.  Eventually the
MSM mmc driver derefernces a null pointer in the interrupt code.

It's also possible this is finding problems in our SDCC driver.

David

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-15 23:49   ` David Brown
  0 siblings, 0 replies; 32+ messages in thread
From: David Brown @ 2011-02-15 23:49 UTC (permalink / raw
  To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mmc, Chris Ball

On Tue, Feb 15 2011, Russell King - ARM Linux wrote:

> This patch is for RFC only; it needs splitting up somewhat.  However, I
> wanted to get it out there for some comment.

Just for kicks, I applied this and ran it on an MSM target (8x50).  It
seems to cause lots of:

  mmc0: Data timeout
  mmc0: Controller has been re-initialized
  ...
  mmc0: Data CRC error

and I can post more if you would find them interesting.  Eventually the
MSM mmc driver derefernces a null pointer in the interrupt code.

It's also possible this is finding problems in our SDCC driver.

David

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-15 23:49   ` David Brown
@ 2011-02-16 18:41     ` Linus Walleij
  -1 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-16 18:41 UTC (permalink / raw
  To: linux-arm-kernel

2011/2/16 David Brown <davidb@codeaurora.org>:
> On Tue, Feb 15 2011, Russell King - ARM Linux wrote:
>
>> This patch is for RFC only; it needs splitting up somewhat. ?However, I
>> wanted to get it out there for some comment.
>
> Just for kicks, I applied this and ran it on an MSM target (8x50). ?It
> seems to cause lots of:
>
> ?mmc0: Data timeout
> ?mmc0: Controller has been re-initialized
> ?...
> ?mmc0: Data CRC error
>
> and I can post more if you would find them interesting. ?Eventually the
> MSM mmc driver derefernces a null pointer in the interrupt code.
>
> It's also possible this is finding problems in our SDCC driver.

The SDCC is obviously an MMCI derivate, VHDL hacking
on top of ARMs source code for PL180/PL181.

Why do you insist on maintaining a forked driver?

Please consider switching to using mmci.c like everyone else.
The quirks we have in place for U300, Nomadik and Ux500
should show you the way for how to do this (yes we did the
same thing, hacking the ARM VHDL).

If I remember correctly I could even see that some early
Android sources were using Russells mmci.c driver before this
fork was created.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-16 18:41     ` Linus Walleij
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-16 18:41 UTC (permalink / raw
  To: David Brown
  Cc: Russell King - ARM Linux, linux-arm-kernel, linux-mmc, Chris Ball

2011/2/16 David Brown <davidb@codeaurora.org>:
> On Tue, Feb 15 2011, Russell King - ARM Linux wrote:
>
>> This patch is for RFC only; it needs splitting up somewhat.  However, I
>> wanted to get it out there for some comment.
>
> Just for kicks, I applied this and ran it on an MSM target (8x50).  It
> seems to cause lots of:
>
>  mmc0: Data timeout
>  mmc0: Controller has been re-initialized
>  ...
>  mmc0: Data CRC error
>
> and I can post more if you would find them interesting.  Eventually the
> MSM mmc driver derefernces a null pointer in the interrupt code.
>
> It's also possible this is finding problems in our SDCC driver.

The SDCC is obviously an MMCI derivate, VHDL hacking
on top of ARMs source code for PL180/PL181.

Why do you insist on maintaining a forked driver?

Please consider switching to using mmci.c like everyone else.
The quirks we have in place for U300, Nomadik and Ux500
should show you the way for how to do this (yes we did the
same thing, hacking the ARM VHDL).

If I remember correctly I could even see that some early
Android sources were using Russells mmci.c driver before this
fork was created.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-15 23:03 ` Russell King - ARM Linux
@ 2011-02-16 19:01   ` Pawel Moll
  -1 siblings, 0 replies; 32+ messages in thread
From: Pawel Moll @ 2011-02-16 19:01 UTC (permalink / raw
  To: linux-arm-kernel

Hi,

> The adaptive clock
> rate algorithm can probably do with a lot more work to avoid it up-
> clocking to a rate which has proven to never work.  I'd actually go as
> far as to say that the algorithm probably has a lot to be desired - but
> it seems to work for my test scenarios.

I've just gave it a try (on top of a clean 2.6.38-rc5):


/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 2.922722 seconds, 437.9KB/s
/ # cat /dev/sda > /dev/null &
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /2 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /4 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /8 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /16 clock rate
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.763456 seconds, 27.4KB/s
/ # kill %1
/ # 
[1]+  Terminated                 cat /dev/sda 1>/dev/null
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.539866 seconds, 27.5KB/s
/ # sleep 30
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.540215 seconds, 27.5KB/s


So it does the right thing with decreasing the clock rate in face of
problems, I just can't see it clocking it back up...

Cheers!

Pawe?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-16 19:01   ` Pawel Moll
  0 siblings, 0 replies; 32+ messages in thread
From: Pawel Moll @ 2011-02-16 19:01 UTC (permalink / raw
  To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mmc, Chris Ball

Hi,

> The adaptive clock
> rate algorithm can probably do with a lot more work to avoid it up-
> clocking to a rate which has proven to never work.  I'd actually go as
> far as to say that the algorithm probably has a lot to be desired - but
> it seems to work for my test scenarios.

I've just gave it a try (on top of a clean 2.6.38-rc5):


/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 2.922722 seconds, 437.9KB/s
/ # cat /dev/sda > /dev/null &
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /2 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /4 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /8 clock rate
mmcblk0: error -5 transferring data, sector 0, nr 120, cmd response 0x900, card status 0xb00
mmcblk0: retrying with slower /16 clock rate
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.763456 seconds, 27.4KB/s
/ # kill %1
/ # 
[1]+  Terminated                 cat /dev/sda 1>/dev/null
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.539866 seconds, 27.5KB/s
/ # sleep 30
/ # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
10+0 records in
10+0 records out
1310720 bytes (1.3MB) copied, 46.540215 seconds, 27.5KB/s


So it does the right thing with decreasing the clock rate in face of
problems, I just can't see it clocking it back up...

Cheers!

Paweł




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-16 18:41     ` Linus Walleij
@ 2011-02-16 19:28       ` David Brown
  -1 siblings, 0 replies; 32+ messages in thread
From: David Brown @ 2011-02-16 19:28 UTC (permalink / raw
  To: linux-arm-kernel

On Wed, Feb 16 2011, Linus Walleij wrote:

> 2011/2/16 David Brown <davidb@codeaurora.org>:

>> It's also possible this is finding problems in our SDCC driver.
>
> The SDCC is obviously an MMCI derivate, VHDL hacking
> on top of ARMs source code for PL180/PL181.
>
> Why do you insist on maintaining a forked driver?

Well, it's not me insisting on it.  I'll let the maintainers of the
driver chime in.

The changes we made to the block are significant, but even beyond that
we changed how the block is even accessed.  The driver doesn't directly
access the registers of the controller, but all accesses go through a
custom DMA engine.

> Please consider switching to using mmci.c like everyone else.
> The quirks we have in place for U300, Nomadik and Ux500
> should show you the way for how to do this (yes we did the
> same thing, hacking the ARM VHDL).

I suspect the changes to mmci would be fairly drastic.

> If I remember correctly I could even see that some early
> Android sources were using Russells mmci.c driver before this
> fork was created.

These old drivers are also not usable.  The SDCC block is shared between
the modem processor and the processor running Linux.  If the driver
doesn't go through the DMA engine, which coordinates this, the registers
will be stomped on by the other CPU whenever it decides to access it's
parts of the flash device.

David

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-16 19:28       ` David Brown
  0 siblings, 0 replies; 32+ messages in thread
From: David Brown @ 2011-02-16 19:28 UTC (permalink / raw
  To: Linus Walleij
  Cc: Russell King - ARM Linux, linux-arm-kernel, linux-mmc, Chris Ball

On Wed, Feb 16 2011, Linus Walleij wrote:

> 2011/2/16 David Brown <davidb@codeaurora.org>:

>> It's also possible this is finding problems in our SDCC driver.
>
> The SDCC is obviously an MMCI derivate, VHDL hacking
> on top of ARMs source code for PL180/PL181.
>
> Why do you insist on maintaining a forked driver?

Well, it's not me insisting on it.  I'll let the maintainers of the
driver chime in.

The changes we made to the block are significant, but even beyond that
we changed how the block is even accessed.  The driver doesn't directly
access the registers of the controller, but all accesses go through a
custom DMA engine.

> Please consider switching to using mmci.c like everyone else.
> The quirks we have in place for U300, Nomadik and Ux500
> should show you the way for how to do this (yes we did the
> same thing, hacking the ARM VHDL).

I suspect the changes to mmci would be fairly drastic.

> If I remember correctly I could even see that some early
> Android sources were using Russells mmci.c driver before this
> fork was created.

These old drivers are also not usable.  The SDCC block is shared between
the modem processor and the processor running Linux.  If the driver
doesn't go through the DMA engine, which coordinates this, the registers
will be stomped on by the other CPU whenever it decides to access it's
parts of the flash device.

David

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-16 19:28       ` David Brown
@ 2011-02-16 21:01         ` Linus Walleij
  -1 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-16 21:01 UTC (permalink / raw
  To: linux-arm-kernel

2011/2/16 David Brown <davidb@codeaurora.org>:
> On Wed, Feb 16 2011, Linus Walleij wrote:
>> 2011/2/16 David Brown <davidb@codeaurora.org>:
>
>>> It's also possible this is finding problems in our SDCC driver.
>>
>> The SDCC is obviously an MMCI derivate, VHDL hacking
>> on top of ARMs source code for PL180/PL181.
>>
>> Why do you insist on maintaining a forked driver?
>
> Well, it's not me insisting on it. ?I'll let the maintainers of the
> driver chime in.

Yeah OK. I tried writing them last week on linux-arm-kernel
with more or less the same question.

>?The driver doesn't directly
> access the registers of the controller, but all accesses go through a
> custom DMA engine.
> (...)
> The SDCC block is shared between
> the modem processor and the processor running Linux.  If the driver
> doesn't go through the DMA engine, which coordinates this, the registers
> will be stomped on by the other CPU whenever it decides to access it's
> parts of the flash device.

That's significant, I agree. That the DMA engine is custom
instead of using the <linux/dmaengine.h> interface is not
making things easier, but it's another issue. If it did, I think it
could quite easily use mmci.c.

At the same time what you're saying sounds very weird:
the ios handler in mmc_sdcc does not request any DMA
channel before messing with the hardware, it simply just write
into registers very much in the style of mmci.c. Wouldn't that
disturb any simultaneous access to the MMC from another
CPU?

The DMA code path doesn't look one bit different from
what we currently do for the generic DMA engine in
mmci.c, it sets up a DMA job from the sglist in the datapath,
but maybe I'm not looking close enough?

> I suspect the changes to mmci would be fairly drastic.

I don't think so, but the changes to the DMA engine
(I guess mach-msm/dma.c) would potentially be pretty drastic,
apart from just moving the thing to drivers/dma.

Actually when I look at the code in msm_sdcc.c it looks
like some of the code we usually centralize into the
DMA engine (like the thing iterating over a sglist and
packing it into some custom struct called "box") is instead
spread out in the client drivers.

I just wanted to raise the issue because I see that the
msm_sdcc driver is trying to e.g. synchronize against
dataend signals and such stuff that we've worked with
recently in mmci.c, and I really think it would be in the
MSM platforms best interest to use this driver rather than
its own.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-16 21:01         ` Linus Walleij
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-16 21:01 UTC (permalink / raw
  To: David Brown
  Cc: Russell King - ARM Linux, linux-arm-kernel, linux-mmc, Chris Ball

2011/2/16 David Brown <davidb@codeaurora.org>:
> On Wed, Feb 16 2011, Linus Walleij wrote:
>> 2011/2/16 David Brown <davidb@codeaurora.org>:
>
>>> It's also possible this is finding problems in our SDCC driver.
>>
>> The SDCC is obviously an MMCI derivate, VHDL hacking
>> on top of ARMs source code for PL180/PL181.
>>
>> Why do you insist on maintaining a forked driver?
>
> Well, it's not me insisting on it.  I'll let the maintainers of the
> driver chime in.

Yeah OK. I tried writing them last week on linux-arm-kernel
with more or less the same question.

> The driver doesn't directly
> access the registers of the controller, but all accesses go through a
> custom DMA engine.
> (...)
> The SDCC block is shared between
> the modem processor and the processor running Linux.  If the driver
> doesn't go through the DMA engine, which coordinates this, the registers
> will be stomped on by the other CPU whenever it decides to access it's
> parts of the flash device.

That's significant, I agree. That the DMA engine is custom
instead of using the <linux/dmaengine.h> interface is not
making things easier, but it's another issue. If it did, I think it
could quite easily use mmci.c.

At the same time what you're saying sounds very weird:
the ios handler in mmc_sdcc does not request any DMA
channel before messing with the hardware, it simply just write
into registers very much in the style of mmci.c. Wouldn't that
disturb any simultaneous access to the MMC from another
CPU?

The DMA code path doesn't look one bit different from
what we currently do for the generic DMA engine in
mmci.c, it sets up a DMA job from the sglist in the datapath,
but maybe I'm not looking close enough?

> I suspect the changes to mmci would be fairly drastic.

I don't think so, but the changes to the DMA engine
(I guess mach-msm/dma.c) would potentially be pretty drastic,
apart from just moving the thing to drivers/dma.

Actually when I look at the code in msm_sdcc.c it looks
like some of the code we usually centralize into the
DMA engine (like the thing iterating over a sglist and
packing it into some custom struct called "box") is instead
spread out in the client drivers.

I just wanted to raise the issue because I see that the
msm_sdcc driver is trying to e.g. synchronize against
dataend signals and such stuff that we've worked with
recently in mmci.c, and I really think it would be in the
MSM platforms best interest to use this driver rather than
its own.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-16 21:01         ` Linus Walleij
@ 2011-02-16 23:06           ` Brian Swetland
  -1 siblings, 0 replies; 32+ messages in thread
From: Brian Swetland @ 2011-02-16 23:06 UTC (permalink / raw
  To: linux-arm-kernel

On Wed, Feb 16, 2011 at 1:01 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
> 2011/2/16 David Brown <davidb@codeaurora.org>:
>>?The driver doesn't directly
>> access the registers of the controller, but all accesses go through a
>> custom DMA engine.
>> (...)
>> The SDCC block is shared between
>> the modem processor and the processor running Linux. ?If the driver
>> doesn't go through the DMA engine, which coordinates this, the registers
>> will be stomped on by the other CPU whenever it decides to access it's
>> parts of the flash device.
>
> That's significant, I agree. That the DMA engine is custom
> instead of using the <linux/dmaengine.h> interface is not
> making things easier, but it's another issue. If it did, I think it
> could quite easily use mmci.c.
>
> At the same time what you're saying sounds very weird:
> the ios handler in mmc_sdcc does not request any DMA
> channel before messing with the hardware, it simply just write
> into registers very much in the style of mmci.c. Wouldn't that
> disturb any simultaneous access to the MMC from another
> CPU?

On MSM7x01,7x2x,8x50,8x55,7x30 the overall design involves both the
baseband cpu (running AMSS) and the apps cpu (running Linux) sharing a
single SDCC to read/write different partitions on NAND.

The way register access for the SDCC is synchronized between these two
cores is by using a locking primitive built into the MSM DMA
controller.  If you don't use this indirect access model (via DMA
transactions to the registers), you end up tripping over the baseband
or the other way 'round.  It's not fun.

Later Qualcomm chipsets move away from this model (yay) and that
should simplify things quite a bit.

Brian

>
> The DMA code path doesn't look one bit different from
> what we currently do for the generic DMA engine in
> mmci.c, it sets up a DMA job from the sglist in the datapath,
> but maybe I'm not looking close enough?
>
>> I suspect the changes to mmci would be fairly drastic.
>
> I don't think so, but the changes to the DMA engine
> (I guess mach-msm/dma.c) would potentially be pretty drastic,
> apart from just moving the thing to drivers/dma.
>
> Actually when I look at the code in msm_sdcc.c it looks
> like some of the code we usually centralize into the
> DMA engine (like the thing iterating over a sglist and
> packing it into some custom struct called "box") is instead
> spread out in the client drivers.
>
> I just wanted to raise the issue because I see that the
> msm_sdcc driver is trying to e.g. synchronize against
> dataend signals and such stuff that we've worked with
> recently in mmci.c, and I really think it would be in the
> MSM platforms best interest to use this driver rather than
> its own.
>
> Yours,
> Linus Walleij
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-16 23:06           ` Brian Swetland
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Swetland @ 2011-02-16 23:06 UTC (permalink / raw
  To: Linus Walleij
  Cc: David Brown, linux-mmc, Chris Ball, Russell King - ARM Linux,
	linux-arm-kernel

On Wed, Feb 16, 2011 at 1:01 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
> 2011/2/16 David Brown <davidb@codeaurora.org>:
>> The driver doesn't directly
>> access the registers of the controller, but all accesses go through a
>> custom DMA engine.
>> (...)
>> The SDCC block is shared between
>> the modem processor and the processor running Linux.  If the driver
>> doesn't go through the DMA engine, which coordinates this, the registers
>> will be stomped on by the other CPU whenever it decides to access it's
>> parts of the flash device.
>
> That's significant, I agree. That the DMA engine is custom
> instead of using the <linux/dmaengine.h> interface is not
> making things easier, but it's another issue. If it did, I think it
> could quite easily use mmci.c.
>
> At the same time what you're saying sounds very weird:
> the ios handler in mmc_sdcc does not request any DMA
> channel before messing with the hardware, it simply just write
> into registers very much in the style of mmci.c. Wouldn't that
> disturb any simultaneous access to the MMC from another
> CPU?

On MSM7x01,7x2x,8x50,8x55,7x30 the overall design involves both the
baseband cpu (running AMSS) and the apps cpu (running Linux) sharing a
single SDCC to read/write different partitions on NAND.

The way register access for the SDCC is synchronized between these two
cores is by using a locking primitive built into the MSM DMA
controller.  If you don't use this indirect access model (via DMA
transactions to the registers), you end up tripping over the baseband
or the other way 'round.  It's not fun.

Later Qualcomm chipsets move away from this model (yay) and that
should simplify things quite a bit.

Brian

>
> The DMA code path doesn't look one bit different from
> what we currently do for the generic DMA engine in
> mmci.c, it sets up a DMA job from the sglist in the datapath,
> but maybe I'm not looking close enough?
>
>> I suspect the changes to mmci would be fairly drastic.
>
> I don't think so, but the changes to the DMA engine
> (I guess mach-msm/dma.c) would potentially be pretty drastic,
> apart from just moving the thing to drivers/dma.
>
> Actually when I look at the code in msm_sdcc.c it looks
> like some of the code we usually centralize into the
> DMA engine (like the thing iterating over a sglist and
> packing it into some custom struct called "box") is instead
> spread out in the client drivers.
>
> I just wanted to raise the issue because I see that the
> msm_sdcc driver is trying to e.g. synchronize against
> dataend signals and such stuff that we've worked with
> recently in mmci.c, and I really think it would be in the
> MSM platforms best interest to use this driver rather than
> its own.
>
> Yours,
> Linus Walleij
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-16 19:01   ` Pawel Moll
@ 2011-02-17 10:40     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-02-17 10:40 UTC (permalink / raw
  To: linux-arm-kernel

On Wed, Feb 16, 2011 at 07:01:06PM +0000, Pawel Moll wrote:
> / # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
> 10+0 records in
> 10+0 records out
> 1310720 bytes (1.3MB) copied, 46.539866 seconds, 27.5KB/s
> / # sleep 30
> / # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
> 10+0 records in
> 10+0 records out
> 1310720 bytes (1.3MB) copied, 46.540215 seconds, 27.5KB/s
> 
> 
> So it does the right thing with decreasing the clock rate in face of
> problems, I just can't see it clocking it back up...

You need at least 100 requests before it'll consider clocking back up.
Once it has 100 requests, no more than 5% of them at the current clock
rate must have failed for it to consider clocking up.  I found with
fewer requests, it was forever clocking down, up, down very frequently.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-17 10:40     ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-02-17 10:40 UTC (permalink / raw
  To: Pawel Moll; +Cc: linux-arm-kernel, linux-mmc, Chris Ball

On Wed, Feb 16, 2011 at 07:01:06PM +0000, Pawel Moll wrote:
> / # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
> 10+0 records in
> 10+0 records out
> 1310720 bytes (1.3MB) copied, 46.539866 seconds, 27.5KB/s
> / # sleep 30
> / # dd if=/dev/mmcblk0 of=/dev/null bs=128k count=10
> 10+0 records in
> 10+0 records out
> 1310720 bytes (1.3MB) copied, 46.540215 seconds, 27.5KB/s
> 
> 
> So it does the right thing with decreasing the clock rate in face of
> problems, I just can't see it clocking it back up...

You need at least 100 requests before it'll consider clocking back up.
Once it has 100 requests, no more than 5% of them at the current clock
rate must have failed for it to consider clocking up.  I found with
fewer requests, it was forever clocking down, up, down very frequently.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-16 23:06           ` Brian Swetland
@ 2011-02-18 13:03             ` Linus Walleij
  -1 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-18 13:03 UTC (permalink / raw
  To: linux-arm-kernel

2011/2/17 Brian Swetland <swetland@google.com>:
> On Wed, Feb 16, 2011 at 1:01 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
>> 2011/2/16 David Brown <davidb@codeaurora.org>:

>>> The SDCC block is shared between
>>> the modem processor and the processor running Linux. ?If the driver
>>> doesn't go through the DMA engine, which coordinates this, the registers
>>> will be stomped on by the other CPU whenever it decides to access it's
>>> parts of the flash device.
>> (...)
>>
>> At the same time what you're saying sounds very weird:
>> the ios handler in mmc_sdcc does not request any DMA
>> channel before messing with the hardware, it simply just write
>> into registers very much in the style of mmci.c. Wouldn't that
>> disturb any simultaneous access to the MMC from another
>> CPU?
>
> (...)
> The way register access for the SDCC is synchronized between these two
> cores is by using a locking primitive built into the MSM DMA
> controller. ?If you don't use this indirect access model (via DMA
> transactions to the registers), you end up tripping over the baseband
> or the other way 'round. ?It's not fun.

Wherever is that synchronized in the DMA controller?
I don't get it because the code is so very similar.

When the ios does this:
msmsdcc_writel(host, clk, MMCICLOCK);

This magic macro does the trick?

static inline void
msmsdcc_writel(struct msmsdcc_host *host, u32 data, unsigned int reg)
{
        writel(data, host->base + reg);
        /* 3 clk delay required! */
        udelay(1 + ((3 * USEC_PER_SEC) /
               (host->clk_rate ? host->clk_rate : msmsdcc_fmin)));
}

What is so hard about renaming this mmci_writel() and putting
the udelay() weirdness inside #ifdef ARCH_MSM? Wrapping
all the writes in the mmci driver inside an inline function is hardly
a big problem.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-18 13:03             ` Linus Walleij
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-18 13:03 UTC (permalink / raw
  To: Brian Swetland
  Cc: David Brown, linux-mmc, Chris Ball, Russell King - ARM Linux,
	linux-arm-kernel

2011/2/17 Brian Swetland <swetland@google.com>:
> On Wed, Feb 16, 2011 at 1:01 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
>> 2011/2/16 David Brown <davidb@codeaurora.org>:

>>> The SDCC block is shared between
>>> the modem processor and the processor running Linux.  If the driver
>>> doesn't go through the DMA engine, which coordinates this, the registers
>>> will be stomped on by the other CPU whenever it decides to access it's
>>> parts of the flash device.
>> (...)
>>
>> At the same time what you're saying sounds very weird:
>> the ios handler in mmc_sdcc does not request any DMA
>> channel before messing with the hardware, it simply just write
>> into registers very much in the style of mmci.c. Wouldn't that
>> disturb any simultaneous access to the MMC from another
>> CPU?
>
> (...)
> The way register access for the SDCC is synchronized between these two
> cores is by using a locking primitive built into the MSM DMA
> controller.  If you don't use this indirect access model (via DMA
> transactions to the registers), you end up tripping over the baseband
> or the other way 'round.  It's not fun.

Wherever is that synchronized in the DMA controller?
I don't get it because the code is so very similar.

When the ios does this:
msmsdcc_writel(host, clk, MMCICLOCK);

This magic macro does the trick?

static inline void
msmsdcc_writel(struct msmsdcc_host *host, u32 data, unsigned int reg)
{
        writel(data, host->base + reg);
        /* 3 clk delay required! */
        udelay(1 + ((3 * USEC_PER_SEC) /
               (host->clk_rate ? host->clk_rate : msmsdcc_fmin)));
}

What is so hard about renaming this mmci_writel() and putting
the udelay() weirdness inside #ifdef ARCH_MSM? Wrapping
all the writes in the mmci driver inside an inline function is hardly
a big problem.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-18 13:03             ` Linus Walleij
@ 2011-02-18 16:04               ` Brian Swetland
  -1 siblings, 0 replies; 32+ messages in thread
From: Brian Swetland @ 2011-02-18 16:04 UTC (permalink / raw
  To: linux-arm-kernel

On Fri, Feb 18, 2011 at 5:03 AM, Linus Walleij <linus.walleij@linaro.org> wrote:
> 2011/2/17 Brian Swetland <swetland@google.com>:
>> On Wed, Feb 16, 2011 at 1:01 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
>>> 2011/2/16 David Brown <davidb@codeaurora.org>:
>
>>>> The SDCC block is shared between
>>>> the modem processor and the processor running Linux. ?If the driver
>>>> doesn't go through the DMA engine, which coordinates this, the registers
>>>> will be stomped on by the other CPU whenever it decides to access it's
>>>> parts of the flash device.
>>> (...)
>>>
>>> At the same time what you're saying sounds very weird:
>>> the ios handler in mmc_sdcc does not request any DMA
>>> channel before messing with the hardware, it simply just write
>>> into registers very much in the style of mmci.c. Wouldn't that
>>> disturb any simultaneous access to the MMC from another
>>> CPU?
>>
>> (...)
>> The way register access for the SDCC is synchronized between these two
>> cores is by using a locking primitive built into the MSM DMA
>> controller. ?If you don't use this indirect access model (via DMA
>> transactions to the registers), you end up tripping over the baseband
>> or the other way 'round. ?It's not fun.
>
> Wherever is that synchronized in the DMA controller?
> I don't get it because the code is so very similar.
>
> When the ios does this:
> msmsdcc_writel(host, clk, MMCICLOCK);

Sorry, I'm wrong.  I was thinking of the MSM NAND driver which does
have a funky DMA interlock access control scheme where the DMA engine
is use to arbitrate register access between the apps and modem cores.
You are correct -- there's no special magic in the SDCC driver.

If this can be handled by a common driver with a quirks flag for
msm-specific oddities, that sounds quite sane to me.

Brian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-18 16:04               ` Brian Swetland
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Swetland @ 2011-02-18 16:04 UTC (permalink / raw
  To: Linus Walleij
  Cc: David Brown, linux-mmc, Chris Ball, Russell King - ARM Linux,
	linux-arm-kernel

On Fri, Feb 18, 2011 at 5:03 AM, Linus Walleij <linus.walleij@linaro.org> wrote:
> 2011/2/17 Brian Swetland <swetland@google.com>:
>> On Wed, Feb 16, 2011 at 1:01 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
>>> 2011/2/16 David Brown <davidb@codeaurora.org>:
>
>>>> The SDCC block is shared between
>>>> the modem processor and the processor running Linux.  If the driver
>>>> doesn't go through the DMA engine, which coordinates this, the registers
>>>> will be stomped on by the other CPU whenever it decides to access it's
>>>> parts of the flash device.
>>> (...)
>>>
>>> At the same time what you're saying sounds very weird:
>>> the ios handler in mmc_sdcc does not request any DMA
>>> channel before messing with the hardware, it simply just write
>>> into registers very much in the style of mmci.c. Wouldn't that
>>> disturb any simultaneous access to the MMC from another
>>> CPU?
>>
>> (...)
>> The way register access for the SDCC is synchronized between these two
>> cores is by using a locking primitive built into the MSM DMA
>> controller.  If you don't use this indirect access model (via DMA
>> transactions to the registers), you end up tripping over the baseband
>> or the other way 'round.  It's not fun.
>
> Wherever is that synchronized in the DMA controller?
> I don't get it because the code is so very similar.
>
> When the ios does this:
> msmsdcc_writel(host, clk, MMCICLOCK);

Sorry, I'm wrong.  I was thinking of the MSM NAND driver which does
have a funky DMA interlock access control scheme where the DMA engine
is use to arbitrate register access between the apps and modem cores.
You are correct -- there's no special magic in the SDCC driver.

If this can be handled by a common driver with a quirks flag for
msm-specific oddities, that sounds quite sane to me.

Brian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-15 23:49   ` David Brown
@ 2011-02-19 15:00     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-02-19 15:00 UTC (permalink / raw
  To: linux-arm-kernel

On Tue, Feb 15, 2011 at 03:49:04PM -0800, David Brown wrote:
> On Tue, Feb 15 2011, Russell King - ARM Linux wrote:
> 
> > This patch is for RFC only; it needs splitting up somewhat.  However, I
> > wanted to get it out there for some comment.
> 
> Just for kicks, I applied this and ran it on an MSM target (8x50).  It
> seems to cause lots of:
> 
>   mmc0: Data timeout
>   mmc0: Controller has been re-initialized
>   ...
>   mmc0: Data CRC error
> 
> and I can post more if you would find them interesting.  Eventually the
> MSM mmc driver derefernces a null pointer in the interrupt code.

That doesn't look right.  If resetting the MMC controller results in all
state being lost, it presumably means the card gets powered down too.
If that happens, the card needs to be reinitialized from scratch again,
and I don't see how that happens in this driver.

Nothing in this patch should cause data timeouts to appear where they
weren't already appearing before the patch - the patch only changes
what happens after an error occurs, so you must already be encountering
errors on the interface before you applied this patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-19 15:00     ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-02-19 15:00 UTC (permalink / raw
  To: David Brown; +Cc: linux-arm-kernel, linux-mmc, Chris Ball

On Tue, Feb 15, 2011 at 03:49:04PM -0800, David Brown wrote:
> On Tue, Feb 15 2011, Russell King - ARM Linux wrote:
> 
> > This patch is for RFC only; it needs splitting up somewhat.  However, I
> > wanted to get it out there for some comment.
> 
> Just for kicks, I applied this and ran it on an MSM target (8x50).  It
> seems to cause lots of:
> 
>   mmc0: Data timeout
>   mmc0: Controller has been re-initialized
>   ...
>   mmc0: Data CRC error
> 
> and I can post more if you would find them interesting.  Eventually the
> MSM mmc driver derefernces a null pointer in the interrupt code.

That doesn't look right.  If resetting the MMC controller results in all
state being lost, it presumably means the card gets powered down too.
If that happens, the card needs to be reinitialized from scratch again,
and I don't see how that happens in this driver.

Nothing in this patch should cause data timeouts to appear where they
weren't already appearing before the patch - the patch only changes
what happens after an error occurs, so you must already be encountering
errors on the interface before you applied this patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-18 16:04               ` Brian Swetland
@ 2011-02-21 20:49                 ` Linus Walleij
  -1 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-21 20:49 UTC (permalink / raw
  To: linux-arm-kernel

Out of sheer curiosity, Qualcomm/CodeAurora guys:

if you dump out the registers at offset 0xfe0-0xfff on the
SDCC, does it contain a valid PrimeCell ID? Anyone who
can make a hexdump of it?

It is possible to hardcode an AMBA ID to use this driver
even if this area is just 0x0000... but I'm just curious
about how different silicon vendors treat these registers,
e.g. if they contain something sensible.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-02-21 20:49                 ` Linus Walleij
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-02-21 20:49 UTC (permalink / raw
  To: Brian Swetland, David Brown
  Cc: Russell King - ARM Linux, Chris Ball, linux-mmc, linux-arm-kernel

Out of sheer curiosity, Qualcomm/CodeAurora guys:

if you dump out the registers at offset 0xfe0-0xfff on the
SDCC, does it contain a valid PrimeCell ID? Anyone who
can make a hexdump of it?

It is possible to hardcode an AMBA ID to use this driver
even if this area is just 0x0000... but I'm just curious
about how different silicon vendors treat these registers,
e.g. if they contain something sensible.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC] MMC: error handling improvements
  2011-02-19 15:00     ` Russell King - ARM Linux
@ 2011-03-01 13:28       ` Sahitya Tummala
  -1 siblings, 0 replies; 32+ messages in thread
From: Sahitya Tummala @ 2011-03-01 13:28 UTC (permalink / raw
  To: linux-arm-kernel

Hi Russell King,

On Sat, 2011-02-19 at 15:00 +0000, Russell King - ARM Linux wrote:
> On Tue, Feb 15, 2011 at 03:49:04PM -0800, David Brown wrote:
> > On Tue, Feb 15 2011, Russell King - ARM Linux wrote:
> > 
> > > This patch is for RFC only; it needs splitting up somewhat.  However, I
> > > wanted to get it out there for some comment.
> > 
> > Just for kicks, I applied this and ran it on an MSM target (8x50).  It
> > seems to cause lots of:
> > 
> >   mmc0: Data timeout
> >   mmc0: Controller has been re-initialized
> >   ...
> >   mmc0: Data CRC error
> > 
> > and I can post more if you would find them interesting.  Eventually the
> > MSM mmc driver derefernces a null pointer in the interrupt code.
> 
> That doesn't look right.  If resetting the MMC controller results in all
> state being lost, it presumably means the card gets powered down too.
> If that happens, the card needs to be reinitialized from scratch again,
> and I don't see how that happens in this driver.
> 
> Nothing in this patch should cause data timeouts to appear where they
> weren't already appearing before the patch - the patch only changes
> what happens after an error occurs, so you must already be encountering
> errors on the interface before you applied this patch.

This patch sends CMD22 (mmc_sd_num_wr_blocks()) for SD cards after a
write command. This function relies on csd->tacc_ns and csd->tacc_clks
for setting timeout values. But for SDHC cards, these CSD values are not
read and will always be 0. The host driver will only use the timeout
values sent by upper layers and thus in this case we are writing 0 to
MMCIDATATIMER. This is the reason for getting data timeout error for
CMD22. Before sending CMD22 if I add mmc_set_data_timeout(), then it is
working fine for me because mmc_set_data_timeout() uses hardcoded
timeout values for SDHC cards.

All other SD read commands are also using mmc_set_data_timeout() to set
the timeout values - mmc_sd_switch(), mmc_app_sd_status()  and
mmc_app_send_scr(). I think it is a problem in mmc_sd_num_wr_blocks to
use CSD contents for timeout values for all types of SD cards. 

-- 
Thanks,
Sahitya.
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC] MMC: error handling improvements
@ 2011-03-01 13:28       ` Sahitya Tummala
  0 siblings, 0 replies; 32+ messages in thread
From: Sahitya Tummala @ 2011-03-01 13:28 UTC (permalink / raw
  To: Russell King - ARM Linux
  Cc: David Brown, linux-arm-kernel, linux-mmc, Chris Ball

Hi Russell King,

On Sat, 2011-02-19 at 15:00 +0000, Russell King - ARM Linux wrote:
> On Tue, Feb 15, 2011 at 03:49:04PM -0800, David Brown wrote:
> > On Tue, Feb 15 2011, Russell King - ARM Linux wrote:
> > 
> > > This patch is for RFC only; it needs splitting up somewhat.  However, I
> > > wanted to get it out there for some comment.
> > 
> > Just for kicks, I applied this and ran it on an MSM target (8x50).  It
> > seems to cause lots of:
> > 
> >   mmc0: Data timeout
> >   mmc0: Controller has been re-initialized
> >   ...
> >   mmc0: Data CRC error
> > 
> > and I can post more if you would find them interesting.  Eventually the
> > MSM mmc driver derefernces a null pointer in the interrupt code.
> 
> That doesn't look right.  If resetting the MMC controller results in all
> state being lost, it presumably means the card gets powered down too.
> If that happens, the card needs to be reinitialized from scratch again,
> and I don't see how that happens in this driver.
> 
> Nothing in this patch should cause data timeouts to appear where they
> weren't already appearing before the patch - the patch only changes
> what happens after an error occurs, so you must already be encountering
> errors on the interface before you applied this patch.

This patch sends CMD22 (mmc_sd_num_wr_blocks()) for SD cards after a
write command. This function relies on csd->tacc_ns and csd->tacc_clks
for setting timeout values. But for SDHC cards, these CSD values are not
read and will always be 0. The host driver will only use the timeout
values sent by upper layers and thus in this case we are writing 0 to
MMCIDATATIMER. This is the reason for getting data timeout error for
CMD22. Before sending CMD22 if I add mmc_set_data_timeout(), then it is
working fine for me because mmc_set_data_timeout() uses hardcoded
timeout values for SDHC cards.

All other SD read commands are also using mmc_set_data_timeout() to set
the timeout values - mmc_sd_switch(), mmc_app_sd_status()  and
mmc_app_send_scr(). I think it is a problem in mmc_sd_num_wr_blocks to
use CSD contents for timeout values for all types of SD cards. 

-- 
Thanks,
Sahitya.
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fwd: [RFC] MMC: error handling improvements
       [not found]                 ` <AANLkTimpJUoc64py_jCrvsrXscawsR2c7JBtr1e0U+e8@mail.gmail.com>
@ 2011-03-01 18:39                     ` Murali Krishna Palnati
  0 siblings, 0 replies; 32+ messages in thread
From: Murali Krishna Palnati @ 2011-03-01 18:39 UTC (permalink / raw
  To: Linus Walleij
  Cc: Brian Swetland, David Brown, Russell King - ARM Linux, Chris Ball,
	linux-mmc, linux-arm-kernel, linux-arm-msm

> ---------- Forwarded message ----------
> From: Linus Walleij <linus.walleij@linaro.org>
> Date: Tue, Feb 22, 2011 at 2:19 AM
> Subject: Re: [RFC] MMC: error handling improvements
> To: Brian Swetland <swetland@google.com>, David Brown
> <davidb@codeaurora.org
>>
> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>, Chris Ball <
> cjb@laptop.org>, linux-mmc@vger.kernel.org,
> linux-arm-kernel@lists.infradead.org
>
>
> Out of sheer curiosity, Qualcomm/CodeAurora guys:
>
> if you dump out the registers at offset 0xfe0-0xfff on the
> SDCC, does it contain a valid PrimeCell ID? Anyone who
> can make a hexdump of it?
>
> It is possible to hardcode an AMBA ID to use this driver
> even if this area is just 0x0000... but I'm just curious
> about how different silicon vendors treat these registers,
> e.g. if they contain something sensible.
>

Linus,

These Prime cell ID registers are removed in the newer versions of the MSM
SDCC controller and are replaced with a different register that indicates
the controller version number. So, using these prime cell ID registers may
not be an option for us. Nevertheless, as you pointed out we can hard code
the AMBA ID in the platform device data and use that in the mmci driver to
handle stuff specific to this controller similar to what is done for ST
variant in mmci driver.

Though the bulk of the MSM SDCC controller is designed around PL180, there
are still some modifications to include flow control, SDIO support, Data
Mover interface (DMA engine on Qualcomm MSMs) etc. Support/ability to talk
to the Data Mover is one of the main differences when compared to original
PL180 prime cell.

Our main concern in moving to mmci is support for DMA that needs to be
added to this driver. I believe PL180 controller does have the ability to
talk to DMA engines, but i dont see any relevant support for that in mmci
driver, unless i grossly missed something.

Regards

P Murali Krishna



Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Fwd: [RFC] MMC: error handling improvements
@ 2011-03-01 18:39                     ` Murali Krishna Palnati
  0 siblings, 0 replies; 32+ messages in thread
From: Murali Krishna Palnati @ 2011-03-01 18:39 UTC (permalink / raw
  To: linux-arm-kernel

> ---------- Forwarded message ----------
> From: Linus Walleij <linus.walleij@linaro.org>
> Date: Tue, Feb 22, 2011 at 2:19 AM
> Subject: Re: [RFC] MMC: error handling improvements
> To: Brian Swetland <swetland@google.com>, David Brown
> <davidb@codeaurora.org
>>
> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>, Chris Ball <
> cjb at laptop.org>, linux-mmc at vger.kernel.org,
> linux-arm-kernel at lists.infradead.org
>
>
> Out of sheer curiosity, Qualcomm/CodeAurora guys:
>
> if you dump out the registers at offset 0xfe0-0xfff on the
> SDCC, does it contain a valid PrimeCell ID? Anyone who
> can make a hexdump of it?
>
> It is possible to hardcode an AMBA ID to use this driver
> even if this area is just 0x0000... but I'm just curious
> about how different silicon vendors treat these registers,
> e.g. if they contain something sensible.
>

Linus,

These Prime cell ID registers are removed in the newer versions of the MSM
SDCC controller and are replaced with a different register that indicates
the controller version number. So, using these prime cell ID registers may
not be an option for us. Nevertheless, as you pointed out we can hard code
the AMBA ID in the platform device data and use that in the mmci driver to
handle stuff specific to this controller similar to what is done for ST
variant in mmci driver.

Though the bulk of the MSM SDCC controller is designed around PL180, there
are still some modifications to include flow control, SDIO support, Data
Mover interface (DMA engine on Qualcomm MSMs) etc. Support/ability to talk
to the Data Mover is one of the main differences when compared to original
PL180 prime cell.

Our main concern in moving to mmci is support for DMA that needs to be
added to this driver. I believe PL180 controller does have the ability to
talk to DMA engines, but i dont see any relevant support for that in mmci
driver, unless i grossly missed something.

Regards

P Murali Krishna



Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fwd: [RFC] MMC: error handling improvements
  2011-03-01 18:39                     ` Murali Krishna Palnati
@ 2011-03-01 20:22                       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-03-01 20:22 UTC (permalink / raw
  To: Murali Krishna Palnati
  Cc: Linus Walleij, Brian Swetland, David Brown, Chris Ball, linux-mmc,
	linux-arm-kernel, linux-arm-msm

On Tue, Mar 01, 2011 at 10:39:59AM -0800, Murali Krishna Palnati wrote:
> Our main concern in moving to mmci is support for DMA that needs to be
> added to this driver. I believe PL180 controller does have the ability to
> talk to DMA engines, but i dont see any relevant support for that in mmci
> driver, unless i grossly missed something.

DMA engine support will be added at the next merge window.  It's already
in linux-next.

Note that as ARM evaluation boards all have entirely broken DMA, it's
something I can't test here, so I'm entirely reliant on other people
using the driver with working DMA engines, testing it, and reporting
back.  This is why the ARM primecell drivers traditionally have had no
DMA support - I just don't have any working hardware to develop such
facilities.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Fwd: [RFC] MMC: error handling improvements
@ 2011-03-01 20:22                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux @ 2011-03-01 20:22 UTC (permalink / raw
  To: linux-arm-kernel

On Tue, Mar 01, 2011 at 10:39:59AM -0800, Murali Krishna Palnati wrote:
> Our main concern in moving to mmci is support for DMA that needs to be
> added to this driver. I believe PL180 controller does have the ability to
> talk to DMA engines, but i dont see any relevant support for that in mmci
> driver, unless i grossly missed something.

DMA engine support will be added at the next merge window.  It's already
in linux-next.

Note that as ARM evaluation boards all have entirely broken DMA, it's
something I can't test here, so I'm entirely reliant on other people
using the driver with working DMA engines, testing it, and reporting
back.  This is why the ARM primecell drivers traditionally have had no
DMA support - I just don't have any working hardware to develop such
facilities.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Fwd: [RFC] MMC: error handling improvements
  2011-03-01 18:39                     ` Murali Krishna Palnati
@ 2011-03-02 10:12                       ` Linus Walleij
  -1 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-03-02 10:12 UTC (permalink / raw
  To: Murali Krishna Palnati
  Cc: Brian Swetland, David Brown, Russell King - ARM Linux, Chris Ball,
	linux-mmc, linux-arm-kernel, linux-arm-msm

Thanks for writing back Murali!

2011/3/1 Murali Krishna Palnati <palnatim@codeaurora.org>:

> Though the bulk of the MSM SDCC controller is designed around PL180, there
> are still some modifications to include flow control, SDIO support,

We have these fixes in the U300/Ux500 version of MMCI as well, I suspect
it's nothing more than altering which bits are to be poked into which
registers.

> Data
> Mover interface (DMA engine on Qualcomm MSMs) etc. Support/ability to talk
> to the Data Mover is one of the main differences when compared to original
> PL180 prime cell.

The main work would be to switch the DMA engine in MSM to using
the DMA engine in drivers/dma. This is one thing we do to avoid the
proliferation of DMA engine code in arch/arm/*, and we very early
started to use that framework for U300 and Ux500 and I must say that
it has payed off well for us, now we have much nicer interfaces and more
review than any of the custom engines under arch/arm/*.

It's some upfront work but I think it would benefit you if you could
do this.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Fwd: [RFC] MMC: error handling improvements
@ 2011-03-02 10:12                       ` Linus Walleij
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Walleij @ 2011-03-02 10:12 UTC (permalink / raw
  To: linux-arm-kernel

Thanks for writing back Murali!

2011/3/1 Murali Krishna Palnati <palnatim@codeaurora.org>:

> Though the bulk of the MSM SDCC controller is designed around PL180, there
> are still some modifications to include flow control, SDIO support,

We have these fixes in the U300/Ux500 version of MMCI as well, I suspect
it's nothing more than altering which bits are to be poked into which
registers.

> Data
> Mover interface (DMA engine on Qualcomm MSMs) etc. Support/ability to talk
> to the Data Mover is one of the main differences when compared to original
> PL180 prime cell.

The main work would be to switch the DMA engine in MSM to using
the DMA engine in drivers/dma. This is one thing we do to avoid the
proliferation of DMA engine code in arch/arm/*, and we very early
started to use that framework for U300 and Ux500 and I must say that
it has payed off well for us, now we have much nicer interfaces and more
review than any of the custom engines under arch/arm/*.

It's some upfront work but I think it would benefit you if you could
do this.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2011-03-02 10:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-15 23:03 [RFC] MMC: error handling improvements Russell King - ARM Linux
2011-02-15 23:03 ` Russell King - ARM Linux
2011-02-15 23:49 ` David Brown
2011-02-15 23:49   ` David Brown
2011-02-16 18:41   ` Linus Walleij
2011-02-16 18:41     ` Linus Walleij
2011-02-16 19:28     ` David Brown
2011-02-16 19:28       ` David Brown
2011-02-16 21:01       ` Linus Walleij
2011-02-16 21:01         ` Linus Walleij
2011-02-16 23:06         ` Brian Swetland
2011-02-16 23:06           ` Brian Swetland
2011-02-18 13:03           ` Linus Walleij
2011-02-18 13:03             ` Linus Walleij
2011-02-18 16:04             ` Brian Swetland
2011-02-18 16:04               ` Brian Swetland
2011-02-21 20:49               ` Linus Walleij
2011-02-21 20:49                 ` Linus Walleij
     [not found]                 ` <AANLkTimpJUoc64py_jCrvsrXscawsR2c7JBtr1e0U+e8@mail.gmail.com>
2011-03-01 18:39                   ` Fwd: " Murali Krishna Palnati
2011-03-01 18:39                     ` Murali Krishna Palnati
2011-03-01 20:22                     ` Russell King - ARM Linux
2011-03-01 20:22                       ` Russell King - ARM Linux
2011-03-02 10:12                     ` Linus Walleij
2011-03-02 10:12                       ` Linus Walleij
2011-02-19 15:00   ` Russell King - ARM Linux
2011-02-19 15:00     ` Russell King - ARM Linux
2011-03-01 13:28     ` Sahitya Tummala
2011-03-01 13:28       ` Sahitya Tummala
2011-02-16 19:01 ` Pawel Moll
2011-02-16 19:01   ` Pawel Moll
2011-02-17 10:40   ` Russell King - ARM Linux
2011-02-17 10:40     ` Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.