* [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-22 23:06 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-22 23:06 UTC (permalink / raw)
To: LKML
Cc: Linus Torvalds, Rajneesh Bhardwaj, Felix Kuehling,
Christian König, dri-devel
I just kicked off testing some patches on top of 6.8-rc1 and triggered this
immediately:
[ note this happened on both my 32 bit an 64 bit test machines, this is
just the 32 bit output ]
BUG: kernel NULL pointer dereference, address: 00000238
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
*pdpt = 0000000000000000 *pde = f000ff53f000ff53
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.8.0-rc1-test-00001-g2b44760609e9-dirty #1056
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events work_for_cpu_fn
EIP: ttm_device_init+0xb4/0x274
Code: 86 10 09 00 00 83 c4 0c 85 c0 0f 84 96 01 00 00 8b 45 ac 8d 9e 94 00 00 00 89 46 08 89 f0 e8 27 05 00 00 8b 55 a8 0f b6 45 98 <8b> 8a 38 02 00 00 50 0f b6 45 9c 50 89 d8 e8 95 ee ff ff 8b 45 a0
EAX: 00000000 EBX: c135a7e4 ECX: c135a7b0 EDX: 00000000
ESI: c135a750 EDI: 0007bc1d EBP: c11d7e4c ESP: c11d7de4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
CR0: 80050033 CR2: 00000238 CR3: 145c4000 CR4: 000006f0
Call Trace:
? show_regs+0x4f/0x58
? __die+0x1d/0x58
? page_fault_oops+0x171/0x330
? lock_acquire+0xa4/0x280
? kernelmode_fixup_or_oops.constprop.0+0x7c/0xcc
? __bad_area_nosemaphore.constprop.0+0x124/0x1b4
? __mutex_lock+0x17f/0xb00
? bad_area_nosemaphore+0xf/0x14
? do_user_addr_fault+0x140/0x3e4
? exc_page_fault+0x5b/0x1d8
? pvclock_clocksource_read_nowd+0x130/0x130
? handle_exception+0x133/0x133
? pvclock_clocksource_read_nowd+0x130/0x130
? ttm_device_init+0xb4/0x274
? pvclock_clocksource_read_nowd+0x130/0x130
? ttm_device_init+0xb4/0x274
qxl_ttm_init+0x34/0x130
qxl_bo_init+0xd/0x10
qxl_device_init+0x52a/0x92c
qxl_pci_probe+0x91/0x1ac
local_pci_probe+0x3d/0x84
work_for_cpu_fn+0x16/0x20
process_one_work+0x1bc/0x4a0
worker_thread+0x310/0x3a8
kthread+0xea/0x110
? rescuer_thread+0x2f0/0x2f0
? kthread_complete_and_exit+0x1c/0x1c
ret_from_fork+0x34/0x4c
? kthread_complete_and_exit+0x1c/0x1c
ret_from_fork_asm+0x12/0x18
entry_INT80_32+0xf0/0xf0
Modules linked in:
CR2: 0000000000000238
---[ end trace 0000000000000000 ]---
The crash happened here:
int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
struct device *dev, struct address_space *mapping,
struct drm_vma_offset_manager *vma_manager,
bool use_dma_alloc, bool use_dma32)
{
struct ttm_global *glob = &ttm_glob;
int ret;
if (WARN_ON(vma_manager == NULL))
return -EINVAL;
ret = ttm_global_init();
if (ret)
return ret;
bdev->wq = alloc_workqueue("ttm",
WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
if (!bdev->wq) {
ttm_global_release();
return -ENOMEM;
}
bdev->funcs = funcs;
ttm_sys_man_init(bdev);
ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
Specifically, it appears that dev is NULL and dev_to_node() doesn't like
having a NULL pointer passed to it.
I currently "fixed" this with a:
if (!dev)
return -EINVAL;
at the start of this function just so that I can continue running my tests,
but that is obviously incorrect.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-22 23:06 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-22 23:06 UTC (permalink / raw)
To: LKML
Cc: Felix Kuehling, Linus Torvalds, Rajneesh Bhardwaj, dri-devel,
Christian König
I just kicked off testing some patches on top of 6.8-rc1 and triggered this
immediately:
[ note this happened on both my 32 bit an 64 bit test machines, this is
just the 32 bit output ]
BUG: kernel NULL pointer dereference, address: 00000238
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
*pdpt = 0000000000000000 *pde = f000ff53f000ff53
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.8.0-rc1-test-00001-g2b44760609e9-dirty #1056
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: events work_for_cpu_fn
EIP: ttm_device_init+0xb4/0x274
Code: 86 10 09 00 00 83 c4 0c 85 c0 0f 84 96 01 00 00 8b 45 ac 8d 9e 94 00 00 00 89 46 08 89 f0 e8 27 05 00 00 8b 55 a8 0f b6 45 98 <8b> 8a 38 02 00 00 50 0f b6 45 9c 50 89 d8 e8 95 ee ff ff 8b 45 a0
EAX: 00000000 EBX: c135a7e4 ECX: c135a7b0 EDX: 00000000
ESI: c135a750 EDI: 0007bc1d EBP: c11d7e4c ESP: c11d7de4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
CR0: 80050033 CR2: 00000238 CR3: 145c4000 CR4: 000006f0
Call Trace:
? show_regs+0x4f/0x58
? __die+0x1d/0x58
? page_fault_oops+0x171/0x330
? lock_acquire+0xa4/0x280
? kernelmode_fixup_or_oops.constprop.0+0x7c/0xcc
? __bad_area_nosemaphore.constprop.0+0x124/0x1b4
? __mutex_lock+0x17f/0xb00
? bad_area_nosemaphore+0xf/0x14
? do_user_addr_fault+0x140/0x3e4
? exc_page_fault+0x5b/0x1d8
? pvclock_clocksource_read_nowd+0x130/0x130
? handle_exception+0x133/0x133
? pvclock_clocksource_read_nowd+0x130/0x130
? ttm_device_init+0xb4/0x274
? pvclock_clocksource_read_nowd+0x130/0x130
? ttm_device_init+0xb4/0x274
qxl_ttm_init+0x34/0x130
qxl_bo_init+0xd/0x10
qxl_device_init+0x52a/0x92c
qxl_pci_probe+0x91/0x1ac
local_pci_probe+0x3d/0x84
work_for_cpu_fn+0x16/0x20
process_one_work+0x1bc/0x4a0
worker_thread+0x310/0x3a8
kthread+0xea/0x110
? rescuer_thread+0x2f0/0x2f0
? kthread_complete_and_exit+0x1c/0x1c
ret_from_fork+0x34/0x4c
? kthread_complete_and_exit+0x1c/0x1c
ret_from_fork_asm+0x12/0x18
entry_INT80_32+0xf0/0xf0
Modules linked in:
CR2: 0000000000000238
---[ end trace 0000000000000000 ]---
The crash happened here:
int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
struct device *dev, struct address_space *mapping,
struct drm_vma_offset_manager *vma_manager,
bool use_dma_alloc, bool use_dma32)
{
struct ttm_global *glob = &ttm_glob;
int ret;
if (WARN_ON(vma_manager == NULL))
return -EINVAL;
ret = ttm_global_init();
if (ret)
return ret;
bdev->wq = alloc_workqueue("ttm",
WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
if (!bdev->wq) {
ttm_global_release();
return -ENOMEM;
}
bdev->funcs = funcs;
ttm_sys_man_init(bdev);
ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
Specifically, it appears that dev is NULL and dev_to_node() doesn't like
having a NULL pointer passed to it.
I currently "fixed" this with a:
if (!dev)
return -EINVAL;
at the start of this function just so that I can continue running my tests,
but that is obviously incorrect.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-22 23:06 ` Steven Rostedt
@ 2024-01-22 23:15 ` Steven Rostedt
-1 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-22 23:15 UTC (permalink / raw)
To: LKML
Cc: Linus Torvalds, Rajneesh Bhardwaj, Felix Kuehling,
Christian König, dri-devel
On Mon, 22 Jan 2024 18:06:05 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> qxl_ttm_init+0x34/0x130
>
> int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
> struct device *dev, struct address_space *mapping,
> struct drm_vma_offset_manager *vma_manager,
> bool use_dma_alloc, bool use_dma32)
> {
> struct ttm_global *glob = &ttm_glob;
> int ret;
>
> if (WARN_ON(vma_manager == NULL))
> return -EINVAL;
>
> ret = ttm_global_init();
> if (ret)
> return ret;
>
> bdev->wq = alloc_workqueue("ttm",
> WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
> if (!bdev->wq) {
> ttm_global_release();
> return -ENOMEM;
> }
>
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
>
> ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
>
> Specifically, it appears that dev is NULL and dev_to_node() doesn't like
> having a NULL pointer passed to it.
>
Yeah, that qxl_ttm_init() has:
/* No others user of address space so set it to 0 */
r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
qdev->ddev.anon_inode->i_mapping,
qdev->ddev.vma_offset_manager,
false, false);
Where that NULL is "dev"!
Thus that will never work here.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-22 23:15 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-22 23:15 UTC (permalink / raw)
To: LKML
Cc: Felix Kuehling, Linus Torvalds, Rajneesh Bhardwaj, dri-devel,
Christian König
On Mon, 22 Jan 2024 18:06:05 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> qxl_ttm_init+0x34/0x130
>
> int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
> struct device *dev, struct address_space *mapping,
> struct drm_vma_offset_manager *vma_manager,
> bool use_dma_alloc, bool use_dma32)
> {
> struct ttm_global *glob = &ttm_glob;
> int ret;
>
> if (WARN_ON(vma_manager == NULL))
> return -EINVAL;
>
> ret = ttm_global_init();
> if (ret)
> return ret;
>
> bdev->wq = alloc_workqueue("ttm",
> WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
> if (!bdev->wq) {
> ttm_global_release();
> return -ENOMEM;
> }
>
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
>
> ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
>
> Specifically, it appears that dev is NULL and dev_to_node() doesn't like
> having a NULL pointer passed to it.
>
Yeah, that qxl_ttm_init() has:
/* No others user of address space so set it to 0 */
r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
qdev->ddev.anon_inode->i_mapping,
qdev->ddev.vma_offset_manager,
false, false);
Where that NULL is "dev"!
Thus that will never work here.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-22 23:15 ` Steven Rostedt
@ 2024-01-22 23:19 ` Steven Rostedt
-1 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-22 23:19 UTC (permalink / raw)
To: LKML
Cc: Linus Torvalds, Rajneesh Bhardwaj, Felix Kuehling,
Christian König, dri-devel
On Mon, 22 Jan 2024 18:15:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
> >
> > Specifically, it appears that dev is NULL and dev_to_node() doesn't like
> > having a NULL pointer passed to it.
> >
>
> Yeah, that qxl_ttm_init() has:
>
> /* No others user of address space so set it to 0 */
> r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
> qdev->ddev.anon_inode->i_mapping,
> qdev->ddev.vma_offset_manager,
> false, false);
>
> Where that NULL is "dev"!
>
> Thus that will never work here.
Perhaps this is the real fix?
-- Steve
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index f5187b384ae9..bc217b4d6b04 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
ttm_sys_man_init(bdev);
- ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);
bdev->vma_manager = vma_manager;
spin_lock_init(&bdev->lru_lock);
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-22 23:19 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-22 23:19 UTC (permalink / raw)
To: LKML
Cc: Felix Kuehling, Linus Torvalds, Rajneesh Bhardwaj, dri-devel,
Christian König
On Mon, 22 Jan 2024 18:15:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
> >
> > Specifically, it appears that dev is NULL and dev_to_node() doesn't like
> > having a NULL pointer passed to it.
> >
>
> Yeah, that qxl_ttm_init() has:
>
> /* No others user of address space so set it to 0 */
> r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
> qdev->ddev.anon_inode->i_mapping,
> qdev->ddev.vma_offset_manager,
> false, false);
>
> Where that NULL is "dev"!
>
> Thus that will never work here.
Perhaps this is the real fix?
-- Steve
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index f5187b384ae9..bc217b4d6b04 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
ttm_sys_man_init(bdev);
- ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);
bdev->vma_manager = vma_manager;
spin_lock_init(&bdev->lru_lock);
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-22 23:06 ` Steven Rostedt
(?)
(?)
@ 2024-01-23 0:29 ` Bhardwaj, Rajneesh
2024-01-23 0:34 ` Steven Rostedt
-1 siblings, 1 reply; 30+ messages in thread
From: Bhardwaj, Rajneesh @ 2024-01-23 0:29 UTC (permalink / raw)
To: Steven Rostedt, LKML
Cc: Felix Kuehling, Linus Torvalds, Christian König, dri-devel
[-- Attachment #1: Type: text/plain, Size: 4627 bytes --]
On 1/22/2024 6:06 PM, Steven Rostedt wrote:
> I just kicked off testing some patches on top of 6.8-rc1 and triggered this
> immediately:
>
> [ note this happened on both my 32 bit an 64 bit test machines, this is
> just the 32 bit output ]
>
> BUG: kernel NULL pointer dereference, address: 00000238
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.8.0-rc1-test-00001-g2b44760609e9-dirty #1056
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Workqueue: events work_for_cpu_fn
> EIP: ttm_device_init+0xb4/0x274
> Code: 86 10 09 00 00 83 c4 0c 85 c0 0f 84 96 01 00 00 8b 45 ac 8d 9e 94 00 00 00 89 46 08 89 f0 e8 27 05 00 00 8b 55 a8 0f b6 45 98 <8b> 8a 38 02 00 00 50 0f b6 45 9c 50 89 d8 e8 95 ee ff ff 8b 45 a0
> EAX: 00000000 EBX: c135a7e4 ECX: c135a7b0 EDX: 00000000
> ESI: c135a750 EDI: 0007bc1d EBP: c11d7e4c ESP: c11d7de4
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> CR0: 80050033 CR2: 00000238 CR3: 145c4000 CR4: 000006f0
> Call Trace:
> ? show_regs+0x4f/0x58
> ? __die+0x1d/0x58
> ? page_fault_oops+0x171/0x330
> ? lock_acquire+0xa4/0x280
> ? kernelmode_fixup_or_oops.constprop.0+0x7c/0xcc
> ? __bad_area_nosemaphore.constprop.0+0x124/0x1b4
> ? __mutex_lock+0x17f/0xb00
> ? bad_area_nosemaphore+0xf/0x14
> ? do_user_addr_fault+0x140/0x3e4
> ? exc_page_fault+0x5b/0x1d8
> ? pvclock_clocksource_read_nowd+0x130/0x130
> ? handle_exception+0x133/0x133
> ? pvclock_clocksource_read_nowd+0x130/0x130
> ? ttm_device_init+0xb4/0x274
> ? pvclock_clocksource_read_nowd+0x130/0x130
> ? ttm_device_init+0xb4/0x274
> qxl_ttm_init+0x34/0x130
> qxl_bo_init+0xd/0x10
> qxl_device_init+0x52a/0x92c
> qxl_pci_probe+0x91/0x1ac
> local_pci_probe+0x3d/0x84
> work_for_cpu_fn+0x16/0x20
> process_one_work+0x1bc/0x4a0
> worker_thread+0x310/0x3a8
> kthread+0xea/0x110
> ? rescuer_thread+0x2f0/0x2f0
> ? kthread_complete_and_exit+0x1c/0x1c
> ret_from_fork+0x34/0x4c
> ? kthread_complete_and_exit+0x1c/0x1c
> ret_from_fork_asm+0x12/0x18
> entry_INT80_32+0xf0/0xf0
> Modules linked in:
> CR2: 0000000000000238
> ---[ end trace 0000000000000000 ]---
>
> The crash happened here:
>
> int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
> struct device *dev, struct address_space *mapping,
> struct drm_vma_offset_manager *vma_manager,
> bool use_dma_alloc, bool use_dma32)
> {
> struct ttm_global *glob = &ttm_glob;
> int ret;
>
> if (WARN_ON(vma_manager == NULL))
> return -EINVAL;
>
> ret = ttm_global_init();
> if (ret)
> return ret;
>
> bdev->wq = alloc_workqueue("ttm",
> WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
> if (!bdev->wq) {
> ttm_global_release();
> return -ENOMEM;
> }
>
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
>
> ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
>
> Specifically, it appears that dev is NULL and dev_to_node() doesn't like
> having a NULL pointer passed to it.
>
> I currently "fixed" this with a:
>
> if (!dev)
> return -EINVAL;
>
> at the start of this function just so that I can continue running my tests,
> but that is obviously incorrect.
In one of my previous revisions of this patch when I was experimenting,
I used something like below. Wonder if that could work in your case
and/or in general.
diff --git a/drivers/gpu/drm/ttm/ttm_device.c
b/drivers/gpu/drm/ttm/ttm_device.c
index 43e27ab77f95..4c3902b94be4 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
ttm_device_funcs *funcs,
bool use_dma_alloc, bool use_dma32){
struct ttm_global *glob = &ttm_glob;
+bool node_has_cpu = false;
int ret;
if (WARN_ON(vma_manager == NULL))
@@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
ttm_device_funcs *funcs,
bdev->funcs = funcs;
ttm_sys_man_init(bdev);
-ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
+
+node_has_cpu = node_state(dev->numa_node, N_CPU);
+if (node_has_cpu)
+ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
+else
+ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
+use_dma32);
bdev->vma_manager = vma_manager;
spin_lock_init(&bdev->lru_lock);
>
> -- Steve
[-- Attachment #2: Type: text/html, Size: 44220 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 0:29 ` Bhardwaj, Rajneesh
@ 2024-01-23 0:34 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 0:34 UTC (permalink / raw)
To: Bhardwaj, Rajneesh
Cc: Felix Kuehling, Linus Torvalds, LKML, dri-devel,
Christian König
On Mon, 22 Jan 2024 19:29:41 -0500
"Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
> In one of my previous revisions of this patch when I was experimenting,
> I used something like below. Wonder if that could work in your case
> and/or in general.
>
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> b/drivers/gpu/drm/ttm/ttm_device.c
>
> index 43e27ab77f95..4c3902b94be4 100644
>
> --- a/drivers/gpu/drm/ttm/ttm_device.c
>
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>
> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
> ttm_device_funcs *funcs,
>
> bool use_dma_alloc, bool use_dma32){
>
> struct ttm_global *glob = &ttm_glob;
>
> +bool node_has_cpu = false;
>
> int ret;
>
> if (WARN_ON(vma_manager == NULL))
>
> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
> ttm_device_funcs *funcs,
>
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
>
> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
>
> +
>
> +node_has_cpu = node_state(dev->numa_node, N_CPU);
Considering that qxl_ttm_init() passes in dev = NULL, the above would blow
up just the same.
-- Steve
>
> +if (node_has_cpu)
>
> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
>
> +else
>
> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
>
> +use_dma32);
>
> bdev->vma_manager = vma_manager;
>
> spin_lock_init(&bdev->lru_lock);
>
>
> >
> > -- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 0:34 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 0:34 UTC (permalink / raw)
To: Bhardwaj, Rajneesh
Cc: LKML, Linus Torvalds, Felix Kuehling, Christian König,
dri-devel
On Mon, 22 Jan 2024 19:29:41 -0500
"Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
> In one of my previous revisions of this patch when I was experimenting,
> I used something like below. Wonder if that could work in your case
> and/or in general.
>
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> b/drivers/gpu/drm/ttm/ttm_device.c
>
> index 43e27ab77f95..4c3902b94be4 100644
>
> --- a/drivers/gpu/drm/ttm/ttm_device.c
>
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>
> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
> ttm_device_funcs *funcs,
>
> bool use_dma_alloc, bool use_dma32){
>
> struct ttm_global *glob = &ttm_glob;
>
> +bool node_has_cpu = false;
>
> int ret;
>
> if (WARN_ON(vma_manager == NULL))
>
> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
> ttm_device_funcs *funcs,
>
> bdev->funcs = funcs;
>
> ttm_sys_man_init(bdev);
>
> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
>
> +
>
> +node_has_cpu = node_state(dev->numa_node, N_CPU);
Considering that qxl_ttm_init() passes in dev = NULL, the above would blow
up just the same.
-- Steve
>
> +if (node_has_cpu)
>
> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
>
> +else
>
> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
>
> +use_dma32);
>
> bdev->vma_manager = vma_manager;
>
> spin_lock_init(&bdev->lru_lock);
>
>
> >
> > -- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 0:34 ` Steven Rostedt
@ 2024-01-23 0:40 ` Bhardwaj, Rajneesh
-1 siblings, 0 replies; 30+ messages in thread
From: Bhardwaj, Rajneesh @ 2024-01-23 0:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linus Torvalds, Felix Kuehling, Christian König,
dri-devel
On 1/22/2024 7:34 PM, Steven Rostedt wrote:
> On Mon, 22 Jan 2024 19:29:41 -0500
> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
>> In one of my previous revisions of this patch when I was experimenting,
>> I used something like below. Wonder if that could work in your case
>> and/or in general.
>>
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>> b/drivers/gpu/drm/ttm/ttm_device.c
>>
>> index 43e27ab77f95..4c3902b94be4 100644
>>
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>
>> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
>> ttm_device_funcs *funcs,
>>
>> bool use_dma_alloc, bool use_dma32){
>>
>> struct ttm_global *glob = &ttm_glob;
>>
>> +bool node_has_cpu = false;
>>
>> int ret;
>>
>> if (WARN_ON(vma_manager == NULL))
>>
>> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
>> ttm_device_funcs *funcs,
>>
>> bdev->funcs = funcs;
>>
>> ttm_sys_man_init(bdev);
>>
>> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
>>
>> +
>>
>> +node_has_cpu = node_state(dev->numa_node, N_CPU);
> Considering that qxl_ttm_init() passes in dev = NULL, the above would blow
> up just the same.
I agree, I think we need something like you suggested i.e.
+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);
I am not quite sure if the above node_has_cpu change will be a better
solution in general, along with the NULL pointer check as you suggested.
If you prefer that, then I can send a fix otherwise, your fix looks good
to me.
>
> -- Steve
>
>
>> +if (node_has_cpu)
>>
>> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
>>
>> +else
>>
>> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
>>
>> +use_dma32);
>>
>> bdev->vma_manager = vma_manager;
>>
>> spin_lock_init(&bdev->lru_lock);
>>
>>
>>> -- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 0:40 ` Bhardwaj, Rajneesh
0 siblings, 0 replies; 30+ messages in thread
From: Bhardwaj, Rajneesh @ 2024-01-23 0:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: Felix Kuehling, Linus Torvalds, LKML, dri-devel,
Christian König
On 1/22/2024 7:34 PM, Steven Rostedt wrote:
> On Mon, 22 Jan 2024 19:29:41 -0500
> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
>> In one of my previous revisions of this patch when I was experimenting,
>> I used something like below. Wonder if that could work in your case
>> and/or in general.
>>
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>> b/drivers/gpu/drm/ttm/ttm_device.c
>>
>> index 43e27ab77f95..4c3902b94be4 100644
>>
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>
>> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct
>> ttm_device_funcs *funcs,
>>
>> bool use_dma_alloc, bool use_dma32){
>>
>> struct ttm_global *glob = &ttm_glob;
>>
>> +bool node_has_cpu = false;
>>
>> int ret;
>>
>> if (WARN_ON(vma_manager == NULL))
>>
>> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct
>> ttm_device_funcs *funcs,
>>
>> bdev->funcs = funcs;
>>
>> ttm_sys_man_init(bdev);
>>
>> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
>>
>> +
>>
>> +node_has_cpu = node_state(dev->numa_node, N_CPU);
> Considering that qxl_ttm_init() passes in dev = NULL, the above would blow
> up just the same.
I agree, I think we need something like you suggested i.e.
+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);
I am not quite sure if the above node_has_cpu change will be a better
solution in general, along with the NULL pointer check as you suggested.
If you prefer that, then I can send a fix otherwise, your fix looks good
to me.
>
> -- Steve
>
>
>> +if (node_has_cpu)
>>
>> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32);
>>
>> +else
>>
>> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc,
>>
>> +use_dma32);
>>
>> bdev->vma_manager = vma_manager;
>>
>> spin_lock_init(&bdev->lru_lock);
>>
>>
>>> -- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-22 23:19 ` Steven Rostedt
@ 2024-01-23 0:43 ` Linus Torvalds
-1 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2024-01-23 0:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Rajneesh Bhardwaj, Felix Kuehling, Christian König,
dri-devel
On Mon, 22 Jan 2024 at 15:17, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Perhaps this is the real fix?
If you send a signed-off version, I'll apply it asap.
Thanks,
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 0:43 ` Linus Torvalds
0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2024-01-23 0:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Felix Kuehling, Christian König, LKML, dri-devel,
Rajneesh Bhardwaj
On Mon, 22 Jan 2024 at 15:17, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Perhaps this is the real fix?
If you send a signed-off version, I'll apply it asap.
Thanks,
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 0:43 ` Linus Torvalds
(?)
@ 2024-01-23 0:56 ` Bhardwaj, Rajneesh
2024-01-23 1:25 ` Linus Torvalds
2024-01-23 1:35 ` Steven Rostedt
-1 siblings, 2 replies; 30+ messages in thread
From: Bhardwaj, Rajneesh @ 2024-01-23 0:56 UTC (permalink / raw)
To: Linus Torvalds, Steven Rostedt
Cc: Felix Kuehling, LKML, dri-devel, Christian König
[-- Attachment #1: Type: text/plain, Size: 476 bytes --]
On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
>> Perhaps this is the real fix?
> If you send a signed-off version, I'll apply it asap.
I think a fix might already be in flight. Please see Linux-Kernel
Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
no-dma-device drivers (iu.edu)
<https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
>
> Thanks,
> Linus
[-- Attachment #2: Type: text/html, Size: 1335 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-22 23:19 ` Steven Rostedt
@ 2024-01-23 1:06 ` Bhardwaj, Rajneesh
-1 siblings, 0 replies; 30+ messages in thread
From: Bhardwaj, Rajneesh @ 2024-01-23 1:06 UTC (permalink / raw)
To: Steven Rostedt, LKML
Cc: Kuehling, Felix, Linus Torvalds, Koenig, Christian,
dri-devel@lists.freedesktop.org
[AMD Official Use Only - General]
-----Original Message-----
From: Steven Rostedt <rostedt@goodmis.org>
Sent: Monday, January 22, 2024 6:19 PM
To: LKML <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; dri-devel@lists.freedesktop.org
Subject: Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
On Mon, 22 Jan 2024 18:15:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
> >
> > Specifically, it appears that dev is NULL and dev_to_node() doesn't
> > like having a NULL pointer passed to it.
> >
>
> Yeah, that qxl_ttm_init() has:
>
> /* No others user of address space so set it to 0 */
> r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
> qdev->ddev.anon_inode->i_mapping,
> qdev->ddev.vma_offset_manager,
> false, false);
>
> Where that NULL is "dev"!
>
> Thus that will never work here.
Perhaps this is the real fix?
I think the fix might be already applied to drm misc. Please see, https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html
-- Steve
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index f5187b384ae9..bc217b4d6b04 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
ttm_sys_man_init(bdev);
- ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);
bdev->vma_manager = vma_manager;
spin_lock_init(&bdev->lru_lock);
^ permalink raw reply related [flat|nested] 30+ messages in thread
* RE: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 1:06 ` Bhardwaj, Rajneesh
0 siblings, 0 replies; 30+ messages in thread
From: Bhardwaj, Rajneesh @ 2024-01-23 1:06 UTC (permalink / raw)
To: Steven Rostedt, LKML
Cc: Linus Torvalds, Kuehling, Felix, Koenig, Christian,
dri-devel@lists.freedesktop.org
[AMD Official Use Only - General]
-----Original Message-----
From: Steven Rostedt <rostedt@goodmis.org>
Sent: Monday, January 22, 2024 6:19 PM
To: LKML <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; dri-devel@lists.freedesktop.org
Subject: Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
On Mon, 22 Jan 2024 18:15:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG!
> >
> > Specifically, it appears that dev is NULL and dev_to_node() doesn't
> > like having a NULL pointer passed to it.
> >
>
> Yeah, that qxl_ttm_init() has:
>
> /* No others user of address space so set it to 0 */
> r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL,
> qdev->ddev.anon_inode->i_mapping,
> qdev->ddev.vma_offset_manager,
> false, false);
>
> Where that NULL is "dev"!
>
> Thus that will never work here.
Perhaps this is the real fix?
I think the fix might be already applied to drm misc. Please see, https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html
-- Steve
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index f5187b384ae9..bc217b4d6b04 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
ttm_sys_man_init(bdev);
- ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
+ ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE,
+ use_dma_alloc, use_dma32);
bdev->vma_manager = vma_manager;
spin_lock_init(&bdev->lru_lock);
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 0:56 ` Bhardwaj, Rajneesh
@ 2024-01-23 1:25 ` Linus Torvalds
2024-01-23 1:35 ` Steven Rostedt
1 sibling, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2024-01-23 1:25 UTC (permalink / raw)
To: Bhardwaj, Rajneesh
Cc: Steven Rostedt, LKML, Felix Kuehling, Christian König,
dri-devel
On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh
<rajneesh.bhardwaj@amd.com> wrote:
>
> I think a fix might already be in flight. Please see Linux-Kernel Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers (iu.edu)
Please use lore.kernel.org that doesn't corrupt whitespace in patches
or lose header information:
https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/
although that seems to be a strange definition of "in flight". It was
sent out 8 days ago, and apparently nobody thought to include it in
the drm fixes pile that came in last Friday.
So it made it into rc1, even though it was reported a week before.
It also looks like some mailing list there is mangling emails - if you
use 'all' instead of 'lkml', lore reports multiple emails with the
same message-id, and it all looks messier as a result.
I assume it's dri-devel@lists.freedesktop.org that messes up, mainly
because I don't tend to see this behaviour when only the usual
kernel.org mailing lists are involved.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 1:25 ` Linus Torvalds
0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2024-01-23 1:25 UTC (permalink / raw)
To: Bhardwaj, Rajneesh
Cc: dri-devel, Felix Kuehling, LKML, Steven Rostedt,
Christian König
On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh
<rajneesh.bhardwaj@amd.com> wrote:
>
> I think a fix might already be in flight. Please see Linux-Kernel Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers (iu.edu)
Please use lore.kernel.org that doesn't corrupt whitespace in patches
or lose header information:
https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/
although that seems to be a strange definition of "in flight". It was
sent out 8 days ago, and apparently nobody thought to include it in
the drm fixes pile that came in last Friday.
So it made it into rc1, even though it was reported a week before.
It also looks like some mailing list there is mangling emails - if you
use 'all' instead of 'lkml', lore reports multiple emails with the
same message-id, and it all looks messier as a result.
I assume it's dri-devel@lists.freedesktop.org that messes up, mainly
because I don't tend to see this behaviour when only the usual
kernel.org mailing lists are involved.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 0:56 ` Bhardwaj, Rajneesh
@ 2024-01-23 1:35 ` Steven Rostedt
2024-01-23 1:35 ` Steven Rostedt
1 sibling, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 1:35 UTC (permalink / raw)
To: Bhardwaj, Rajneesh
Cc: Linus Torvalds, LKML, Felix Kuehling, Christian König,
dri-devel, Fedor Pchelkin
On Mon, 22 Jan 2024 19:56:08 -0500
"Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
> On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> >> Perhaps this is the real fix?
> > If you send a signed-off version, I'll apply it asap.
>
>
> I think a fix might already be in flight. Please see Linux-Kernel
> Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> no-dma-device drivers (iu.edu)
> <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
Please use lore links. They are much easier to follow and use.
https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/
is the patch I believe you are referencing.
The fix doesn't need to be mine, but this should be in Linus's tree ASAP.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 1:35 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 1:35 UTC (permalink / raw)
To: Bhardwaj, Rajneesh
Cc: Felix Kuehling, Fedor Pchelkin, dri-devel, LKML, Linus Torvalds,
Christian König
On Mon, 22 Jan 2024 19:56:08 -0500
"Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
> On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> >> Perhaps this is the real fix?
> > If you send a signed-off version, I'll apply it asap.
>
>
> I think a fix might already be in flight. Please see Linux-Kernel
> Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> no-dma-device drivers (iu.edu)
> <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
Please use lore links. They are much easier to follow and use.
https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/
is the patch I believe you are referencing.
The fix doesn't need to be mine, but this should be in Linus's tree ASAP.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 1:35 ` Steven Rostedt
@ 2024-01-23 2:21 ` Dave Airlie
-1 siblings, 0 replies; 30+ messages in thread
From: Dave Airlie @ 2024-01-23 2:21 UTC (permalink / raw)
To: Steven Rostedt
Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling,
Christian König, dri-devel, Fedor Pchelkin
On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 22 Jan 2024 19:56:08 -0500
> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
> >
> > On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> > >> Perhaps this is the real fix?
> > > If you send a signed-off version, I'll apply it asap.
> >
> >
> > I think a fix might already be in flight. Please see Linux-Kernel
> > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> > no-dma-device drivers (iu.edu)
> > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
>
> Please use lore links. They are much easier to follow and use.
https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
should also fix it, Linus please apply it directly if Steven has a
chance to give it a run.
Thanks,
Dave.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 2:21 ` Dave Airlie
0 siblings, 0 replies; 30+ messages in thread
From: Dave Airlie @ 2024-01-23 2:21 UTC (permalink / raw)
To: Steven Rostedt
Cc: Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML,
Fedor Pchelkin, Linus Torvalds, Christian König
On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 22 Jan 2024 19:56:08 -0500
> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>
> >
> > On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> > >> Perhaps this is the real fix?
> > > If you send a signed-off version, I'll apply it asap.
> >
> >
> > I think a fix might already be in flight. Please see Linux-Kernel
> > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> > no-dma-device drivers (iu.edu)
> > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
>
> Please use lore links. They are much easier to follow and use.
https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
should also fix it, Linus please apply it directly if Steven has a
chance to give it a run.
Thanks,
Dave.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 2:21 ` Dave Airlie
@ 2024-01-23 2:32 ` Dave Airlie
-1 siblings, 0 replies; 30+ messages in thread
From: Dave Airlie @ 2024-01-23 2:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling,
Christian König, dri-devel, Fedor Pchelkin
On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote:
>
> On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Mon, 22 Jan 2024 19:56:08 -0500
> > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
> >
> > >
> > > On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> > > >> Perhaps this is the real fix?
> > > > If you send a signed-off version, I'll apply it asap.
> > >
> > >
> > > I think a fix might already be in flight. Please see Linux-Kernel
> > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> > > no-dma-device drivers (iu.edu)
> > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
> >
> > Please use lore links. They are much easier to follow and use.
>
> https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
>
> should also fix it, Linus please apply it directly if Steven has a
> chance to give it a run.
I see Linus applied the other one, that's fine too.
Dave.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 2:32 ` Dave Airlie
0 siblings, 0 replies; 30+ messages in thread
From: Dave Airlie @ 2024-01-23 2:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML,
Fedor Pchelkin, Linus Torvalds, Christian König
On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote:
>
> On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Mon, 22 Jan 2024 19:56:08 -0500
> > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
> >
> > >
> > > On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> > > >> Perhaps this is the real fix?
> > > > If you send a signed-off version, I'll apply it asap.
> > >
> > >
> > > I think a fix might already be in flight. Please see Linux-Kernel
> > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> > > no-dma-device drivers (iu.edu)
> > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
> >
> > Please use lore links. They are much easier to follow and use.
>
> https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
>
> should also fix it, Linus please apply it directly if Steven has a
> chance to give it a run.
I see Linus applied the other one, that's fine too.
Dave.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 2:32 ` Dave Airlie
@ 2024-01-23 2:52 ` Steven Rostedt
-1 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 2:52 UTC (permalink / raw)
To: Dave Airlie
Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling,
Christian König, dri-devel, Fedor Pchelkin
On Tue, 23 Jan 2024 12:32:39 +1000
Dave Airlie <airlied@gmail.com> wrote:
> On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote:
> >
> > On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Mon, 22 Jan 2024 19:56:08 -0500
> > > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
> > >
> > > >
> > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> > > > >> Perhaps this is the real fix?
> > > > > If you send a signed-off version, I'll apply it asap.
> > > >
> > > >
> > > > I think a fix might already be in flight. Please see Linux-Kernel
> > > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> > > > no-dma-device drivers (iu.edu)
> > > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
> > >
> > > Please use lore links. They are much easier to follow and use.
> >
> > https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
> >
> > should also fix it, Linus please apply it directly if Steven has a
> > chance to give it a run.
>
> I see Linus applied the other one, that's fine too.
>
They don't look mutually exclusive. I can test the other one as well.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 2:52 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 2:52 UTC (permalink / raw)
To: Dave Airlie
Cc: Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML,
Fedor Pchelkin, Linus Torvalds, Christian König
On Tue, 23 Jan 2024 12:32:39 +1000
Dave Airlie <airlied@gmail.com> wrote:
> On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote:
> >
> > On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Mon, 22 Jan 2024 19:56:08 -0500
> > > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
> > >
> > > >
> > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote:
> > > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
> > > > >> Perhaps this is the real fix?
> > > > > If you send a signed-off version, I'll apply it asap.
> > > >
> > > >
> > > > I think a fix might already be in flight. Please see Linux-Kernel
> > > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
> > > > no-dma-device drivers (iu.edu)
> > > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
> > >
> > > Please use lore links. They are much easier to follow and use.
> >
> > https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
> >
> > should also fix it, Linus please apply it directly if Steven has a
> > chance to give it a run.
>
> I see Linus applied the other one, that's fine too.
>
They don't look mutually exclusive. I can test the other one as well.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 2:52 ` Steven Rostedt
@ 2024-01-23 9:43 ` Christian König
-1 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2024-01-23 9:43 UTC (permalink / raw)
To: Steven Rostedt, Dave Airlie
Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling,
dri-devel, Fedor Pchelkin
Am 23.01.24 um 03:52 schrieb Steven Rostedt:
> On Tue, 23 Jan 2024 12:32:39 +1000
> Dave Airlie <airlied@gmail.com> wrote:
>
>> On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote:
>>> On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>>>> On Mon, 22 Jan 2024 19:56:08 -0500
>>>> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>>>>
>>>>> On 1/22/2024 7:43 PM, Linus Torvalds wrote:
>>>>>> On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
>>>>>>> Perhaps this is the real fix?
>>>>>> If you send a signed-off version, I'll apply it asap.
>>>>>
>>>>> I think a fix might already be in flight. Please see Linux-Kernel
>>>>> Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
>>>>> no-dma-device drivers (iu.edu)
>>>>> <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
>>>> Please use lore links. They are much easier to follow and use.
>>> https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
>>>
>>> should also fix it, Linus please apply it directly if Steven has a
>>> chance to give it a run.
>> I see Linus applied the other one, that's fine too.
>>
> They don't look mutually exclusive. I can test the other one as well.
While applying the fix a week ago I was under the impression that QXL
doesn't use a device structure because it doesn't have one and so can't
give anything meaningful for this parameter.
If QXL does have a device structure and can provide it I would rather
like to go down this route and make the device and with it the numa node
mandatory for drivers to specify.
Regards,
Christian.
>
> -- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 9:43 ` Christian König
0 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2024-01-23 9:43 UTC (permalink / raw)
To: Steven Rostedt, Dave Airlie
Cc: LKML, Felix Kuehling, Fedor Pchelkin, dri-devel,
Bhardwaj, Rajneesh, Linus Torvalds
Am 23.01.24 um 03:52 schrieb Steven Rostedt:
> On Tue, 23 Jan 2024 12:32:39 +1000
> Dave Airlie <airlied@gmail.com> wrote:
>
>> On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote:
>>> On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>>>> On Mon, 22 Jan 2024 19:56:08 -0500
>>>> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote:
>>>>
>>>>> On 1/22/2024 7:43 PM, Linus Torvalds wrote:
>>>>>> On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote:
>>>>>>> Perhaps this is the real fix?
>>>>>> If you send a signed-off version, I'll apply it asap.
>>>>>
>>>>> I think a fix might already be in flight. Please see Linux-Kernel
>>>>> Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for
>>>>> no-dma-device drivers (iu.edu)
>>>>> <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html>
>>>> Please use lore links. They are much easier to follow and use.
>>> https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u
>>>
>>> should also fix it, Linus please apply it directly if Steven has a
>>> chance to give it a run.
>> I see Linus applied the other one, that's fine too.
>>
> They don't look mutually exclusive. I can test the other one as well.
While applying the fix a week ago I was under the impression that QXL
doesn't use a device structure because it doesn't have one and so can't
give anything meaningful for this parameter.
If QXL does have a device structure and can provide it I would rather
like to go down this route and make the device and with it the numa node
mandatory for drivers to specify.
Regards,
Christian.
>
> -- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
2024-01-23 9:43 ` Christian König
@ 2024-01-23 14:35 ` Steven Rostedt
-1 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 14:35 UTC (permalink / raw)
To: Christian König
Cc: Dave Airlie, Bhardwaj, Rajneesh, Linus Torvalds, LKML,
Felix Kuehling, dri-devel, Fedor Pchelkin
On Tue, 23 Jan 2024 10:43:04 +0100
Christian König <christian.koenig@amd.com> wrote:
> While applying the fix a week ago I was under the impression that QXL
> doesn't use a device structure because it doesn't have one and so can't
> give anything meaningful for this parameter.
>
> If QXL does have a device structure and can provide it I would rather
> like to go down this route and make the device and with it the numa node
> mandatory for drivers to specify.
Then at a minimum my original fix should be applied. Perhaps with a warning
too. That is, I added at the beginning of that function:
if (!dev)
return -EINVAL;
Could have that be:
if (WARN_ON_ONCE(!dev))
return -EINVAL;
In any case, it should not cause the system to crash.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
@ 2024-01-23 14:35 ` Steven Rostedt
0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2024-01-23 14:35 UTC (permalink / raw)
To: Christian König
Cc: Dave Airlie, Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML,
Fedor Pchelkin, Linus Torvalds
On Tue, 23 Jan 2024 10:43:04 +0100
Christian König <christian.koenig@amd.com> wrote:
> While applying the fix a week ago I was under the impression that QXL
> doesn't use a device structure because it doesn't have one and so can't
> give anything meaningful for this parameter.
>
> If QXL does have a device structure and can provide it I would rather
> like to go down this route and make the device and with it the numa node
> mandatory for drivers to specify.
Then at a minimum my original fix should be applied. Perhaps with a warning
too. That is, I added at the beginning of that function:
if (!dev)
return -EINVAL;
Could have that be:
if (WARN_ON_ONCE(!dev))
return -EINVAL;
In any case, it should not cause the system to crash.
-- Steve
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2024-01-23 14:34 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 23:06 [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 Steven Rostedt
2024-01-22 23:06 ` Steven Rostedt
2024-01-22 23:15 ` Steven Rostedt
2024-01-22 23:15 ` Steven Rostedt
2024-01-22 23:19 ` Steven Rostedt
2024-01-22 23:19 ` Steven Rostedt
2024-01-23 0:43 ` Linus Torvalds
2024-01-23 0:43 ` Linus Torvalds
2024-01-23 0:56 ` Bhardwaj, Rajneesh
2024-01-23 1:25 ` Linus Torvalds
2024-01-23 1:25 ` Linus Torvalds
2024-01-23 1:35 ` Steven Rostedt
2024-01-23 1:35 ` Steven Rostedt
2024-01-23 2:21 ` Dave Airlie
2024-01-23 2:21 ` Dave Airlie
2024-01-23 2:32 ` Dave Airlie
2024-01-23 2:32 ` Dave Airlie
2024-01-23 2:52 ` Steven Rostedt
2024-01-23 2:52 ` Steven Rostedt
2024-01-23 9:43 ` Christian König
2024-01-23 9:43 ` Christian König
2024-01-23 14:35 ` Steven Rostedt
2024-01-23 14:35 ` Steven Rostedt
2024-01-23 1:06 ` Bhardwaj, Rajneesh
2024-01-23 1:06 ` Bhardwaj, Rajneesh
2024-01-23 0:29 ` Bhardwaj, Rajneesh
2024-01-23 0:34 ` Steven Rostedt
2024-01-23 0:34 ` Steven Rostedt
2024-01-23 0:40 ` Bhardwaj, Rajneesh
2024-01-23 0:40 ` Bhardwaj, Rajneesh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.