LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/4] devcoredump: Add dev_coredump_put()
@ 2024-03-04 14:39 José Roberto de Souza
  2024-03-04 14:39 ` [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: José Roberto de Souza @ 2024-03-04 14:39 UTC (permalink / raw
  To: linux-kernel, intel-xe
  Cc: Rodrigo Vivi, Mukesh Ojha, Johannes Berg, Jonathan Cavitt,
	José Roberto de Souza

It is useful for modules that do not want to keep coredump available
after its unload.
Otherwise, the coredump would only be removed after DEVCD_TIMEOUT
seconds.

v2:
- dev_coredump_put() documentation updated (Mukesh)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/base/devcoredump.c  | 23 +++++++++++++++++++++++
 include/linux/devcoredump.h |  5 +++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index 7e2d1f0d903a6..82aeb09b3d1b5 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -304,6 +304,29 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
 				  offset);
 }
 
+/**
+ * dev_coredump_put - remove device coredump
+ * @dev: the struct device for the crashed device
+ *
+ * dev_coredump_put() removes coredump, if exists, for a given device from
+ * the file system and free its associated data otherwise, does nothing.
+ *
+ * It is useful for modules that do not want to keep coredump
+ * available after its unload.
+ */
+void dev_coredump_put(struct device *dev)
+{
+	struct device *existing;
+
+	existing = class_find_device(&devcd_class, NULL, dev,
+				     devcd_match_failing);
+	if (existing) {
+		devcd_free(existing, NULL);
+		put_device(existing);
+	}
+}
+EXPORT_SYMBOL_GPL(dev_coredump_put);
+
 /**
  * dev_coredumpm - create device coredump with read/free methods
  * @dev: the struct device for the crashed device
diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
index c008169ed2c6f..c8f7eb6cc1915 100644
--- a/include/linux/devcoredump.h
+++ b/include/linux/devcoredump.h
@@ -63,6 +63,8 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 
 void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 		    size_t datalen, gfp_t gfp);
+
+void dev_coredump_put(struct device *dev);
 #else
 static inline void dev_coredumpv(struct device *dev, void *data,
 				 size_t datalen, gfp_t gfp)
@@ -85,6 +87,9 @@ static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 {
 	_devcd_free_sgtable(table);
 }
+static inline void dev_coredump_put(struct device *dev)
+{
+}
 #endif /* CONFIG_DEV_COREDUMP */
 
 #endif /* __DEVCOREDUMP_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout()
  2024-03-04 14:39 [PATCH v3 1/4] devcoredump: Add dev_coredump_put() José Roberto de Souza
@ 2024-03-04 14:39 ` José Roberto de Souza
  2024-03-04 23:56   ` Lucas De Marchi
  2024-03-25 17:00   ` Johannes Berg
  2024-03-04 14:39 ` [PATCH v3 3/4] drm/xe: Remove devcoredump during driver release José Roberto de Souza
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: José Roberto de Souza @ 2024-03-04 14:39 UTC (permalink / raw
  To: linux-kernel, intel-xe
  Cc: Rodrigo Vivi, Mukesh Ojha, Johannes Berg, Jonathan Cavitt,
	José Roberto de Souza

Add function to set a custom coredump timeout.

Current 5-minute timeout may be too short for users to search and
understand what needs to be done to capture coredump to report bugs.

v2:
- replace dev_coredump_timeout_set() by dev_coredumpm_timeout() (Mukesh)

v3:
- make dev_coredumpm() static inline (Johannes)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/base/devcoredump.c  | 23 ++++++++--------
 include/linux/devcoredump.h | 54 ++++++++++++++++++++++++++++---------
 2 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index 82aeb09b3d1b5..c795edad1b969 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -18,9 +18,6 @@ static struct class devcd_class;
 /* global disable flag, for security purposes */
 static bool devcd_disabled;
 
-/* if data isn't read by userspace after 5 minutes then delete it */
-#define DEVCD_TIMEOUT	(HZ * 60 * 5)
-
 struct devcd_entry {
 	struct device devcd_dev;
 	void *data;
@@ -328,7 +325,8 @@ void dev_coredump_put(struct device *dev)
 EXPORT_SYMBOL_GPL(dev_coredump_put);
 
 /**
- * dev_coredumpm - create device coredump with read/free methods
+ * dev_coredumpm_timeout - create device coredump with read/free methods with a
+ * custom timeout.
  * @dev: the struct device for the crashed device
  * @owner: the module that contains the read/free functions, use %THIS_MODULE
  * @data: data cookie for the @read/@free functions
@@ -336,17 +334,20 @@ EXPORT_SYMBOL_GPL(dev_coredump_put);
  * @gfp: allocation flags
  * @read: function to read from the given buffer
  * @free: function to free the given buffer
+ * @timeout: time in jiffies to remove coredump
  *
  * Creates a new device coredump for the given device. If a previous one hasn't
  * been read yet, the new coredump is discarded. The data lifetime is determined
  * by the device coredump framework and when it is no longer needed the @free
  * function will be called to free the data.
  */
-void dev_coredumpm(struct device *dev, struct module *owner,
-		   void *data, size_t datalen, gfp_t gfp,
-		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
-				   void *data, size_t datalen),
-		   void (*free)(void *data))
+void dev_coredumpm_timeout(struct device *dev, struct module *owner,
+			   void *data, size_t datalen, gfp_t gfp,
+			   ssize_t (*read)(char *buffer, loff_t offset,
+					   size_t count, void *data,
+					   size_t datalen),
+			   void (*free)(void *data),
+			   unsigned long timeout)
 {
 	static atomic_t devcd_count = ATOMIC_INIT(0);
 	struct devcd_entry *devcd;
@@ -403,7 +404,7 @@ void dev_coredumpm(struct device *dev, struct module *owner,
 	dev_set_uevent_suppress(&devcd->devcd_dev, false);
 	kobject_uevent(&devcd->devcd_dev.kobj, KOBJ_ADD);
 	INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
-	schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
+	schedule_delayed_work(&devcd->del_wk, timeout);
 	mutex_unlock(&devcd->mutex);
 	return;
  put_device:
@@ -414,7 +415,7 @@ void dev_coredumpm(struct device *dev, struct module *owner,
  free:
 	free(data);
 }
-EXPORT_SYMBOL_GPL(dev_coredumpm);
+EXPORT_SYMBOL_GPL(dev_coredumpm_timeout);
 
 /**
  * dev_coredumpsg - create device coredump that uses scatterlist as data
diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
index c8f7eb6cc1915..56e606eb4640b 100644
--- a/include/linux/devcoredump.h
+++ b/include/linux/devcoredump.h
@@ -12,6 +12,9 @@
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
 
+/* if data isn't read by userspace after 5 minutes then delete it */
+#define DEVCOREDUMP_TIMEOUT	(HZ * 60 * 5)
+
 /*
  * _devcd_free_sgtable - free all the memory of the given scatterlist table
  * (i.e. both pages and scatterlist instances)
@@ -50,16 +53,17 @@ static inline void _devcd_free_sgtable(struct scatterlist *table)
 	kfree(delete_iter);
 }
 
-
 #ifdef CONFIG_DEV_COREDUMP
 void dev_coredumpv(struct device *dev, void *data, size_t datalen,
 		   gfp_t gfp);
 
-void dev_coredumpm(struct device *dev, struct module *owner,
-		   void *data, size_t datalen, gfp_t gfp,
-		   ssize_t (*read)(char *buffer, loff_t offset, size_t count,
-				   void *data, size_t datalen),
-		   void (*free)(void *data));
+void dev_coredumpm_timeout(struct device *dev, struct module *owner,
+			   void *data, size_t datalen, gfp_t gfp,
+			   ssize_t (*read)(char *buffer, loff_t offset,
+					   size_t count, void *data,
+					   size_t datalen),
+			   void (*free)(void *data),
+			   unsigned long timeout);
 
 void dev_coredumpsg(struct device *dev, struct scatterlist *table,
 		    size_t datalen, gfp_t gfp);
@@ -72,12 +76,13 @@ static inline void dev_coredumpv(struct device *dev, void *data,
 	vfree(data);
 }
 
-static inline void
-dev_coredumpm(struct device *dev, struct module *owner,
-	      void *data, size_t datalen, gfp_t gfp,
-	      ssize_t (*read)(char *buffer, loff_t offset, size_t count,
-			      void *data, size_t datalen),
-	      void (*free)(void *data))
+void dev_coredumpm_timeout(struct device *dev, struct module *owner,
+			   void *data, size_t datalen, gfp_t gfp,
+			   ssize_t (*read)(char *buffer, loff_t offset,
+					   size_t count, void *data,
+					   size_t datalen),
+			   void (*free)(void *data),
+			   unsigned long timeout)
 {
 	free(data);
 }
@@ -92,4 +97,29 @@ static inline void dev_coredump_put(struct device *dev)
 }
 #endif /* CONFIG_DEV_COREDUMP */
 
+/**
+ * dev_coredumpm - create device coredump with read/free methods
+ * @dev: the struct device for the crashed device
+ * @owner: the module that contains the read/free functions, use %THIS_MODULE
+ * @data: data cookie for the @read/@free functions
+ * @datalen: length of the data
+ * @gfp: allocation flags
+ * @read: function to read from the given buffer
+ * @free: function to free the given buffer
+ *
+ * Creates a new device coredump for the given device. If a previous one hasn't
+ * been read yet, the new coredump is discarded. The data lifetime is determined
+ * by the device coredump framework and when it is no longer needed the @free
+ * function will be called to free the data.
+ */
+static inline void dev_coredumpm(struct device *dev, struct module *owner,
+				 void *data, size_t datalen, gfp_t gfp,
+				 ssize_t (*read)(char *buffer, loff_t offset, size_t count,
+						 void *data, size_t datalen),
+				void (*free)(void *data))
+{
+	dev_coredumpm_timeout(dev, owner, data, datalen, gfp, read, free,
+			      DEVCOREDUMP_TIMEOUT);
+}
+
 #endif /* __DEVCOREDUMP_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] drm/xe: Remove devcoredump during driver release
  2024-03-04 14:39 [PATCH v3 1/4] devcoredump: Add dev_coredump_put() José Roberto de Souza
  2024-03-04 14:39 ` [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
@ 2024-03-04 14:39 ` José Roberto de Souza
  2024-03-04 14:39 ` [PATCH v3 4/4] drm/xe: Increase devcoredump timeout José Roberto de Souza
  2024-03-25 16:48 ` [PATCH v3 1/4] devcoredump: Add dev_coredump_put() Johannes Berg
  3 siblings, 0 replies; 11+ messages in thread
From: José Roberto de Souza @ 2024-03-04 14:39 UTC (permalink / raw
  To: linux-kernel, intel-xe
  Cc: Rodrigo Vivi, Jonathan Cavitt, José Roberto de Souza

This will remove devcoredump from file system and free its resources
during driver unload.

This fix the driver unload after gpu hang happened, otherwise this
it would report that Xe KMD is still in use and it would leave the
kernel in a state that Xe KMD can't be unload without a reboot.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_devcoredump.c | 13 ++++++++++++-
 drivers/gpu/drm/xe/xe_devcoredump.h |  5 +++++
 drivers/gpu/drm/xe/xe_device.c      |  4 ++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 0fcd306803236..109c027188bd9 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -9,6 +9,8 @@
 #include <linux/devcoredump.h>
 #include <generated/utsrelease.h>
 
+#include <drm/drm_managed.h>
+
 #include "xe_device.h"
 #include "xe_exec_queue.h"
 #include "xe_force_wake.h"
@@ -230,5 +232,14 @@ void xe_devcoredump(struct xe_sched_job *job)
 	dev_coredumpm(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
 		      xe_devcoredump_read, xe_devcoredump_free);
 }
-#endif
 
+static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg)
+{
+	dev_coredump_put(drm->dev);
+}
+
+int xe_devcoredump_init(struct xe_device *xe)
+{
+	return drmm_add_action_or_reset(&xe->drm, xe_driver_devcoredump_fini, xe);
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index df8671f0b5eb2..9eba67f37234f 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -11,10 +11,15 @@ struct xe_sched_job;
 
 #ifdef CONFIG_DEV_COREDUMP
 void xe_devcoredump(struct xe_sched_job *job);
+int xe_devcoredump_init(struct xe_device *xe);
 #else
 static inline void xe_devcoredump(struct xe_sched_job *job)
 {
 }
+static inline int xe_devcoredump_init(struct xe_device *xe)
+{
+	return 0;
+}
 #endif
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 919ad88f0495a..22be76537c7da 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -20,6 +20,7 @@
 #include "regs/xe_regs.h"
 #include "xe_bo.h"
 #include "xe_debugfs.h"
+#include "xe_devcoredump.h"
 #include "xe_dma_buf.h"
 #include "xe_drm_client.h"
 #include "xe_drv.h"
@@ -502,6 +503,9 @@ int xe_device_probe(struct xe_device *xe)
 			return err;
 	}
 
+	err = xe_devcoredump_init(xe);
+	if (err)
+		return err;
 	err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
 	if (err)
 		return err;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] drm/xe: Increase devcoredump timeout
  2024-03-04 14:39 [PATCH v3 1/4] devcoredump: Add dev_coredump_put() José Roberto de Souza
  2024-03-04 14:39 ` [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
  2024-03-04 14:39 ` [PATCH v3 3/4] drm/xe: Remove devcoredump during driver release José Roberto de Souza
@ 2024-03-04 14:39 ` José Roberto de Souza
  2024-03-25 16:48 ` [PATCH v3 1/4] devcoredump: Add dev_coredump_put() Johannes Berg
  3 siblings, 0 replies; 11+ messages in thread
From: José Roberto de Souza @ 2024-03-04 14:39 UTC (permalink / raw
  To: linux-kernel, intel-xe
  Cc: Rodrigo Vivi, Jonathan Cavitt, José Roberto de Souza

5 minutes is too short for a regular user to search and understand
what he needs to do to report capture devcoredump and report a bug to
us, so here increasing this timeout to 1 hour.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_devcoredump.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 109c027188bd9..08c60531e0db4 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -52,6 +52,9 @@
 
 #ifdef CONFIG_DEV_COREDUMP
 
+/* 1 hour timeout */
+#define XE_COREDUMP_TIMEOUT_JIFFIES (60 * 60 * HZ)
+
 static struct xe_device *coredump_to_xe(const struct xe_devcoredump *coredump)
 {
 	return container_of(coredump, struct xe_device, devcoredump);
@@ -229,8 +232,9 @@ void xe_devcoredump(struct xe_sched_job *job)
 	drm_info(&xe->drm, "Check your /sys/class/drm/card%d/device/devcoredump/data\n",
 		 xe->drm.primary->index);
 
-	dev_coredumpm(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
-		      xe_devcoredump_read, xe_devcoredump_free);
+	dev_coredumpm_timeout(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
+			      xe_devcoredump_read, xe_devcoredump_free,
+			      XE_COREDUMP_TIMEOUT_JIFFIES);
 }
 
 static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout()
  2024-03-04 14:39 ` [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
@ 2024-03-04 23:56   ` Lucas De Marchi
  2024-03-11 20:56     ` Souza, Jose
  2024-03-25 17:00   ` Johannes Berg
  1 sibling, 1 reply; 11+ messages in thread
From: Lucas De Marchi @ 2024-03-04 23:56 UTC (permalink / raw
  To: José Roberto de Souza
  Cc: linux-kernel, intel-xe, Rodrigo Vivi, Mukesh Ojha, Johannes Berg,
	Jonathan Cavitt

On Mon, Mar 04, 2024 at 06:39:03AM -0800, Jose Souza wrote:
>Add function to set a custom coredump timeout.
>
>Current 5-minute timeout may be too short for users to search and
>understand what needs to be done to capture coredump to report bugs.
>
>v2:
>- replace dev_coredump_timeout_set() by dev_coredumpm_timeout() (Mukesh)
>
>v3:
>- make dev_coredumpm() static inline (Johannes)


but let's wait the discussion settle on v2.

Lucas De Marchi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout()
  2024-03-04 23:56   ` Lucas De Marchi
@ 2024-03-11 20:56     ` Souza, Jose
  0 siblings, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2024-03-11 20:56 UTC (permalink / raw
  To: quic_mojha@quicinc.com, johannes@sipsolutions.net,
	De Marchi, Lucas
  Cc: intel-xe@lists.freedesktop.org, Vivi, Rodrigo,
	linux-kernel@vger.kernel.org, Cavitt, Jonathan

Hi Johannes and Mukesh

On Mon, 2024-03-04 at 17:56 -0600, Lucas De Marchi wrote:
> On Mon, Mar 04, 2024 at 06:39:03AM -0800, Jose Souza wrote:
> > Add function to set a custom coredump timeout.
> > 
> > Current 5-minute timeout may be too short for users to search and
> > understand what needs to be done to capture coredump to report bugs.
> > 
> > v2:
> > - replace dev_coredump_timeout_set() by dev_coredumpm_timeout() (Mukesh)
> > 
> > v3:
> > - make dev_coredumpm() static inline (Johannes)
> 
> 
> but let's wait the discussion settle on v2.

Lucas is fine with this(see v2 thread).
Can you please take another look at this series?

thank you


> 
> Lucas De Marchi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/4] devcoredump: Add dev_coredump_put()
  2024-03-04 14:39 [PATCH v3 1/4] devcoredump: Add dev_coredump_put() José Roberto de Souza
                   ` (2 preceding siblings ...)
  2024-03-04 14:39 ` [PATCH v3 4/4] drm/xe: Increase devcoredump timeout José Roberto de Souza
@ 2024-03-25 16:48 ` Johannes Berg
  2024-03-25 18:36   ` Souza, Jose
  3 siblings, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2024-03-25 16:48 UTC (permalink / raw
  To: José Roberto de Souza, linux-kernel, intel-xe
  Cc: Rodrigo Vivi, Mukesh Ojha, Jonathan Cavitt

On Mon, 2024-03-04 at 06:39 -0800, José Roberto de Souza wrote:
> It is useful for modules that do not want to keep coredump available
> after its unload.

Why not though? Maybe if this is a common case we should have devm_...
coredump functions? Dunno.

Anyway, I'm fine with this, even though I'd like to see a bit more
discussion than "do not want". Code looks good.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>

johannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout()
  2024-03-04 14:39 ` [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
  2024-03-04 23:56   ` Lucas De Marchi
@ 2024-03-25 17:00   ` Johannes Berg
  2024-03-25 18:28     ` Souza, Jose
  1 sibling, 1 reply; 11+ messages in thread
From: Johannes Berg @ 2024-03-25 17:00 UTC (permalink / raw
  To: José Roberto de Souza, linux-kernel, intel-xe
  Cc: Rodrigo Vivi, Mukesh Ojha, Jonathan Cavitt, Lucas De Marchi

On Mon, 2024-03-04 at 06:39 -0800, José Roberto de Souza wrote:
> Add function to set a custom coredump timeout.
> 
> Current 5-minute timeout may be too short for users to search and
> understand what needs to be done to capture coredump to report bugs.

> + */
> +static inline void dev_coredumpm(struct device *dev, struct module *owner,
> +				 void *data, size_t datalen, gfp_t gfp,
> +				 ssize_t (*read)(char *buffer, loff_t offset, size_t count,
> +						 void *data, size_t datalen),
> +				void (*free)(void *data))

nit: looks like you missed a space on the 'free' line

I don't think we'll actually _solve_ the discussion of whether or not
this makes sense.

I still think it's a bad idea to hang on to the dumps in core kernel
memory since they can be big (I would've said ath12k is big with 55MB,
but Rodrigo said graphics could be up to 2GB!), without being able to
page it out, etc. That's just a waste of memory, for what I don't think
is even a good reason.

So dunno.

However, I also don't like to exercise any power that I might randomly
hold just because I happened to write the code in the first place... And
if you want to shoot yourselves in the foot with any of this, should I
really disagree? I feel I've voiced my objections enough, and Lucas has
also tried to find ways of making a userspace implementation work for
you.

I'd perhaps argue that the documentation for the functions should be
more opinionated and actually recommend against using a large timeout
(like you want) for all these reasons, but other than that, the code
looks fine to me.

johannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout()
  2024-03-25 17:00   ` Johannes Berg
@ 2024-03-25 18:28     ` Souza, Jose
  0 siblings, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2024-03-25 18:28 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, johannes@sipsolutions.net,
	linux-kernel@vger.kernel.org
  Cc: Vivi, Rodrigo, quic_mojha@quicinc.com, Cavitt, Jonathan,
	De Marchi, Lucas

On Mon, 2024-03-25 at 18:00 +0100, Johannes Berg wrote:
> On Mon, 2024-03-04 at 06:39 -0800, José Roberto de Souza wrote:
> > Add function to set a custom coredump timeout.
> > 
> > Current 5-minute timeout may be too short for users to search and
> > understand what needs to be done to capture coredump to report bugs.
> 
> > + */
> > +static inline void dev_coredumpm(struct device *dev, struct module *owner,
> > +				 void *data, size_t datalen, gfp_t gfp,
> > +				 ssize_t (*read)(char *buffer, loff_t offset, size_t count,
> > +						 void *data, size_t datalen),
> > +				void (*free)(void *data))
> 
> nit: looks like you missed a space on the 'free' line
> 
> I don't think we'll actually _solve_ the discussion of whether or not
> this makes sense.
> 
> I still think it's a bad idea to hang on to the dumps in core kernel
> memory since they can be big (I would've said ath12k is big with 55MB,
> but Rodrigo said graphics could be up to 2GB!), without being able to
> page it out, etc. That's just a waste of memory, for what I don't think
> is even a good reason.
> 

There is a discussion to have a udev script to copy dump from memory to disk and then the script can write 0 to coredump/data and erase it from
memory. But if distro don't have this udev script it is still good to have larger timeout to allow user to capture it.

The 2GB usage are for cases when UMD developers enables the capture of every single buffer, regular capture size depends on the complexity of the
application but it is generally round 2MB.

> So dunno.
> 
> However, I also don't like to exercise any power that I might randomly
> hold just because I happened to write the code in the first place... And
> if you want to shoot yourselves in the foot with any of this, should I
> really disagree? I feel I've voiced my objections enough, and Lucas has
> also tried to find ways of making a userspace implementation work for
> you.
> 
> I'd perhaps argue that the documentation for the functions should be
> more opinionated and actually recommend against using a large timeout
> (like you want) for all these reasons, but other than that, the code
> looks fine to me.

@timeout: time in jiffies to remove coredump. DEVCOREDUMP_TIMEOUT is the value for dev_coredumpm() and it should be used unless it is absolutely
necessary a larger timeout.

Or do you have a better suggestion?

> 
> johannes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/4] devcoredump: Add dev_coredump_put()
  2024-03-25 16:48 ` [PATCH v3 1/4] devcoredump: Add dev_coredump_put() Johannes Berg
@ 2024-03-25 18:36   ` Souza, Jose
  2024-04-05 15:47     ` Souza, Jose
  0 siblings, 1 reply; 11+ messages in thread
From: Souza, Jose @ 2024-03-25 18:36 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, johannes@sipsolutions.net,
	linux-kernel@vger.kernel.org
  Cc: Vivi, Rodrigo, quic_mojha@quicinc.com, Cavitt, Jonathan

On Mon, 2024-03-25 at 17:48 +0100, Johannes Berg wrote:
> On Mon, 2024-03-04 at 06:39 -0800, José Roberto de Souza wrote:
> > It is useful for modules that do not want to keep coredump available
> > after its unload.
> 
> Why not though? Maybe if this is a common case we should have devm_...
> coredump functions? Dunno.
> 
> Anyway, I'm fine with this, even though I'd like to see a bit more
> discussion than "do not want". Code looks good.
> 
> Reviewed-by: Johannes Berg <johannes@sipsolutions.net>

Thank you, can you please add this patch to your next pull request? Or should us add it to the next drm/intel-drm pull request?

Again thank you, just this patch will already unblock some work for us.

> 
> johannes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/4] devcoredump: Add dev_coredump_put()
  2024-03-25 18:36   ` Souza, Jose
@ 2024-04-05 15:47     ` Souza, Jose
  0 siblings, 0 replies; 11+ messages in thread
From: Souza, Jose @ 2024-04-05 15:47 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, johannes@sipsolutions.net,
	linux-kernel@vger.kernel.org
  Cc: Vivi, Rodrigo, quic_mojha@quicinc.com, Cavitt, Jonathan

On Mon, 2024-03-25 at 11:36 -0700, José Roberto de Souza wrote:
> On Mon, 2024-03-25 at 17:48 +0100, Johannes Berg wrote:
> > On Mon, 2024-03-04 at 06:39 -0800, José Roberto de Souza wrote:
> > > It is useful for modules that do not want to keep coredump available
> > > after its unload.
> > 
> > Why not though? Maybe if this is a common case we should have devm_...
> > coredump functions? Dunno.
> > 
> > Anyway, I'm fine with this, even though I'd like to see a bit more
> > discussion than "do not want". Code looks good.
> > 
> > Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
> 
> Thank you, can you please add this patch to your next pull request? Or should us add it to the next drm/intel-drm pull request?

ping

> 
> Again thank you, just this patch will already unblock some work for us.
> 
> > 
> > johannes
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-05 15:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-04 14:39 [PATCH v3 1/4] devcoredump: Add dev_coredump_put() José Roberto de Souza
2024-03-04 14:39 ` [PATCH v3 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
2024-03-04 23:56   ` Lucas De Marchi
2024-03-11 20:56     ` Souza, Jose
2024-03-25 17:00   ` Johannes Berg
2024-03-25 18:28     ` Souza, Jose
2024-03-04 14:39 ` [PATCH v3 3/4] drm/xe: Remove devcoredump during driver release José Roberto de Souza
2024-03-04 14:39 ` [PATCH v3 4/4] drm/xe: Increase devcoredump timeout José Roberto de Souza
2024-03-25 16:48 ` [PATCH v3 1/4] devcoredump: Add dev_coredump_put() Johannes Berg
2024-03-25 18:36   ` Souza, Jose
2024-04-05 15:47     ` Souza, Jose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).