* [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-22 23:06 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-22 23:06 UTC (permalink / raw) To: LKML Cc: Linus Torvalds, Rajneesh Bhardwaj, Felix Kuehling, Christian König, dri-devel I just kicked off testing some patches on top of 6.8-rc1 and triggered this immediately: [ note this happened on both my 32 bit an 64 bit test machines, this is just the 32 bit output ] BUG: kernel NULL pointer dereference, address: 00000238 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page *pdpt = 0000000000000000 *pde = f000ff53f000ff53 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.8.0-rc1-test-00001-g2b44760609e9-dirty #1056 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: events work_for_cpu_fn EIP: ttm_device_init+0xb4/0x274 Code: 86 10 09 00 00 83 c4 0c 85 c0 0f 84 96 01 00 00 8b 45 ac 8d 9e 94 00 00 00 89 46 08 89 f0 e8 27 05 00 00 8b 55 a8 0f b6 45 98 <8b> 8a 38 02 00 00 50 0f b6 45 9c 50 89 d8 e8 95 ee ff ff 8b 45 a0 EAX: 00000000 EBX: c135a7e4 ECX: c135a7b0 EDX: 00000000 ESI: c135a750 EDI: 0007bc1d EBP: c11d7e4c ESP: c11d7de4 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246 CR0: 80050033 CR2: 00000238 CR3: 145c4000 CR4: 000006f0 Call Trace: ? show_regs+0x4f/0x58 ? __die+0x1d/0x58 ? page_fault_oops+0x171/0x330 ? lock_acquire+0xa4/0x280 ? kernelmode_fixup_or_oops.constprop.0+0x7c/0xcc ? __bad_area_nosemaphore.constprop.0+0x124/0x1b4 ? __mutex_lock+0x17f/0xb00 ? bad_area_nosemaphore+0xf/0x14 ? do_user_addr_fault+0x140/0x3e4 ? exc_page_fault+0x5b/0x1d8 ? pvclock_clocksource_read_nowd+0x130/0x130 ? handle_exception+0x133/0x133 ? pvclock_clocksource_read_nowd+0x130/0x130 ? ttm_device_init+0xb4/0x274 ? pvclock_clocksource_read_nowd+0x130/0x130 ? ttm_device_init+0xb4/0x274 qxl_ttm_init+0x34/0x130 qxl_bo_init+0xd/0x10 qxl_device_init+0x52a/0x92c qxl_pci_probe+0x91/0x1ac local_pci_probe+0x3d/0x84 work_for_cpu_fn+0x16/0x20 process_one_work+0x1bc/0x4a0 worker_thread+0x310/0x3a8 kthread+0xea/0x110 ? rescuer_thread+0x2f0/0x2f0 ? kthread_complete_and_exit+0x1c/0x1c ret_from_fork+0x34/0x4c ? kthread_complete_and_exit+0x1c/0x1c ret_from_fork_asm+0x12/0x18 entry_INT80_32+0xf0/0xf0 Modules linked in: CR2: 0000000000000238 ---[ end trace 0000000000000000 ]--- The crash happened here: int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, struct device *dev, struct address_space *mapping, struct drm_vma_offset_manager *vma_manager, bool use_dma_alloc, bool use_dma32) { struct ttm_global *glob = &ttm_glob; int ret; if (WARN_ON(vma_manager == NULL)) return -EINVAL; ret = ttm_global_init(); if (ret) return ret; bdev->wq = alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16); if (!bdev->wq) { ttm_global_release(); return -ENOMEM; } bdev->funcs = funcs; ttm_sys_man_init(bdev); ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! Specifically, it appears that dev is NULL and dev_to_node() doesn't like having a NULL pointer passed to it. I currently "fixed" this with a: if (!dev) return -EINVAL; at the start of this function just so that I can continue running my tests, but that is obviously incorrect. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-22 23:06 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-22 23:06 UTC (permalink / raw) To: LKML Cc: Felix Kuehling, Linus Torvalds, Rajneesh Bhardwaj, dri-devel, Christian König I just kicked off testing some patches on top of 6.8-rc1 and triggered this immediately: [ note this happened on both my 32 bit an 64 bit test machines, this is just the 32 bit output ] BUG: kernel NULL pointer dereference, address: 00000238 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page *pdpt = 0000000000000000 *pde = f000ff53f000ff53 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.8.0-rc1-test-00001-g2b44760609e9-dirty #1056 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: events work_for_cpu_fn EIP: ttm_device_init+0xb4/0x274 Code: 86 10 09 00 00 83 c4 0c 85 c0 0f 84 96 01 00 00 8b 45 ac 8d 9e 94 00 00 00 89 46 08 89 f0 e8 27 05 00 00 8b 55 a8 0f b6 45 98 <8b> 8a 38 02 00 00 50 0f b6 45 9c 50 89 d8 e8 95 ee ff ff 8b 45 a0 EAX: 00000000 EBX: c135a7e4 ECX: c135a7b0 EDX: 00000000 ESI: c135a750 EDI: 0007bc1d EBP: c11d7e4c ESP: c11d7de4 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246 CR0: 80050033 CR2: 00000238 CR3: 145c4000 CR4: 000006f0 Call Trace: ? show_regs+0x4f/0x58 ? __die+0x1d/0x58 ? page_fault_oops+0x171/0x330 ? lock_acquire+0xa4/0x280 ? kernelmode_fixup_or_oops.constprop.0+0x7c/0xcc ? __bad_area_nosemaphore.constprop.0+0x124/0x1b4 ? __mutex_lock+0x17f/0xb00 ? bad_area_nosemaphore+0xf/0x14 ? do_user_addr_fault+0x140/0x3e4 ? exc_page_fault+0x5b/0x1d8 ? pvclock_clocksource_read_nowd+0x130/0x130 ? handle_exception+0x133/0x133 ? pvclock_clocksource_read_nowd+0x130/0x130 ? ttm_device_init+0xb4/0x274 ? pvclock_clocksource_read_nowd+0x130/0x130 ? ttm_device_init+0xb4/0x274 qxl_ttm_init+0x34/0x130 qxl_bo_init+0xd/0x10 qxl_device_init+0x52a/0x92c qxl_pci_probe+0x91/0x1ac local_pci_probe+0x3d/0x84 work_for_cpu_fn+0x16/0x20 process_one_work+0x1bc/0x4a0 worker_thread+0x310/0x3a8 kthread+0xea/0x110 ? rescuer_thread+0x2f0/0x2f0 ? kthread_complete_and_exit+0x1c/0x1c ret_from_fork+0x34/0x4c ? kthread_complete_and_exit+0x1c/0x1c ret_from_fork_asm+0x12/0x18 entry_INT80_32+0xf0/0xf0 Modules linked in: CR2: 0000000000000238 ---[ end trace 0000000000000000 ]--- The crash happened here: int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, struct device *dev, struct address_space *mapping, struct drm_vma_offset_manager *vma_manager, bool use_dma_alloc, bool use_dma32) { struct ttm_global *glob = &ttm_glob; int ret; if (WARN_ON(vma_manager == NULL)) return -EINVAL; ret = ttm_global_init(); if (ret) return ret; bdev->wq = alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16); if (!bdev->wq) { ttm_global_release(); return -ENOMEM; } bdev->funcs = funcs; ttm_sys_man_init(bdev); ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! Specifically, it appears that dev is NULL and dev_to_node() doesn't like having a NULL pointer passed to it. I currently "fixed" this with a: if (!dev) return -EINVAL; at the start of this function just so that I can continue running my tests, but that is obviously incorrect. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-22 23:06 ` Steven Rostedt @ 2024-01-22 23:15 ` Steven Rostedt -1 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-22 23:15 UTC (permalink / raw) To: LKML Cc: Linus Torvalds, Rajneesh Bhardwaj, Felix Kuehling, Christian König, dri-devel On Mon, 22 Jan 2024 18:06:05 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > qxl_ttm_init+0x34/0x130 > > int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, > struct device *dev, struct address_space *mapping, > struct drm_vma_offset_manager *vma_manager, > bool use_dma_alloc, bool use_dma32) > { > struct ttm_global *glob = &ttm_glob; > int ret; > > if (WARN_ON(vma_manager == NULL)) > return -EINVAL; > > ret = ttm_global_init(); > if (ret) > return ret; > > bdev->wq = alloc_workqueue("ttm", > WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16); > if (!bdev->wq) { > ttm_global_release(); > return -ENOMEM; > } > > bdev->funcs = funcs; > > ttm_sys_man_init(bdev); > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > Specifically, it appears that dev is NULL and dev_to_node() doesn't like > having a NULL pointer passed to it. > Yeah, that qxl_ttm_init() has: /* No others user of address space so set it to 0 */ r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL, qdev->ddev.anon_inode->i_mapping, qdev->ddev.vma_offset_manager, false, false); Where that NULL is "dev"! Thus that will never work here. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-22 23:15 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-22 23:15 UTC (permalink / raw) To: LKML Cc: Felix Kuehling, Linus Torvalds, Rajneesh Bhardwaj, dri-devel, Christian König On Mon, 22 Jan 2024 18:06:05 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > qxl_ttm_init+0x34/0x130 > > int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, > struct device *dev, struct address_space *mapping, > struct drm_vma_offset_manager *vma_manager, > bool use_dma_alloc, bool use_dma32) > { > struct ttm_global *glob = &ttm_glob; > int ret; > > if (WARN_ON(vma_manager == NULL)) > return -EINVAL; > > ret = ttm_global_init(); > if (ret) > return ret; > > bdev->wq = alloc_workqueue("ttm", > WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16); > if (!bdev->wq) { > ttm_global_release(); > return -ENOMEM; > } > > bdev->funcs = funcs; > > ttm_sys_man_init(bdev); > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > Specifically, it appears that dev is NULL and dev_to_node() doesn't like > having a NULL pointer passed to it. > Yeah, that qxl_ttm_init() has: /* No others user of address space so set it to 0 */ r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL, qdev->ddev.anon_inode->i_mapping, qdev->ddev.vma_offset_manager, false, false); Where that NULL is "dev"! Thus that will never work here. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-22 23:15 ` Steven Rostedt @ 2024-01-22 23:19 ` Steven Rostedt -1 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-22 23:19 UTC (permalink / raw) To: LKML Cc: Linus Torvalds, Rajneesh Bhardwaj, Felix Kuehling, Christian König, dri-devel On Mon, 22 Jan 2024 18:15:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > > > Specifically, it appears that dev is NULL and dev_to_node() doesn't like > > having a NULL pointer passed to it. > > > > Yeah, that qxl_ttm_init() has: > > /* No others user of address space so set it to 0 */ > r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL, > qdev->ddev.anon_inode->i_mapping, > qdev->ddev.vma_offset_manager, > false, false); > > Where that NULL is "dev"! > > Thus that will never work here. Perhaps this is the real fix? -- Steve diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index f5187b384ae9..bc217b4d6b04 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func ttm_sys_man_init(bdev); - ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); + ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE, + use_dma_alloc, use_dma32); bdev->vma_manager = vma_manager; spin_lock_init(&bdev->lru_lock); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-22 23:19 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-22 23:19 UTC (permalink / raw) To: LKML Cc: Felix Kuehling, Linus Torvalds, Rajneesh Bhardwaj, dri-devel, Christian König On Mon, 22 Jan 2024 18:15:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > > > Specifically, it appears that dev is NULL and dev_to_node() doesn't like > > having a NULL pointer passed to it. > > > > Yeah, that qxl_ttm_init() has: > > /* No others user of address space so set it to 0 */ > r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL, > qdev->ddev.anon_inode->i_mapping, > qdev->ddev.vma_offset_manager, > false, false); > > Where that NULL is "dev"! > > Thus that will never work here. Perhaps this is the real fix? -- Steve diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index f5187b384ae9..bc217b4d6b04 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func ttm_sys_man_init(bdev); - ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); + ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE, + use_dma_alloc, use_dma32); bdev->vma_manager = vma_manager; spin_lock_init(&bdev->lru_lock); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-22 23:19 ` Steven Rostedt @ 2024-01-23 0:43 ` Linus Torvalds -1 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2024-01-23 0:43 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Rajneesh Bhardwaj, Felix Kuehling, Christian König, dri-devel On Mon, 22 Jan 2024 at 15:17, Steven Rostedt <rostedt@goodmis.org> wrote: > > Perhaps this is the real fix? If you send a signed-off version, I'll apply it asap. Thanks, Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 0:43 ` Linus Torvalds 0 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2024-01-23 0:43 UTC (permalink / raw) To: Steven Rostedt Cc: Felix Kuehling, Christian König, LKML, dri-devel, Rajneesh Bhardwaj On Mon, 22 Jan 2024 at 15:17, Steven Rostedt <rostedt@goodmis.org> wrote: > > Perhaps this is the real fix? If you send a signed-off version, I'll apply it asap. Thanks, Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 0:43 ` Linus Torvalds (?) @ 2024-01-23 0:56 ` Bhardwaj, Rajneesh 2024-01-23 1:25 ` Linus Torvalds 2024-01-23 1:35 ` Steven Rostedt -1 siblings, 2 replies; 30+ messages in thread From: Bhardwaj, Rajneesh @ 2024-01-23 0:56 UTC (permalink / raw) To: Linus Torvalds, Steven Rostedt Cc: Felix Kuehling, LKML, dri-devel, Christian König [-- Attachment #1: Type: text/plain, Size: 476 bytes --] On 1/22/2024 7:43 PM, Linus Torvalds wrote: > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: >> Perhaps this is the real fix? > If you send a signed-off version, I'll apply it asap. I think a fix might already be in flight. Please see Linux-Kernel Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers (iu.edu) <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > Thanks, > Linus [-- Attachment #2: Type: text/html, Size: 1335 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 0:56 ` Bhardwaj, Rajneesh @ 2024-01-23 1:25 ` Linus Torvalds 2024-01-23 1:35 ` Steven Rostedt 1 sibling, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2024-01-23 1:25 UTC (permalink / raw) To: Bhardwaj, Rajneesh Cc: Steven Rostedt, LKML, Felix Kuehling, Christian König, dri-devel On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh <rajneesh.bhardwaj@amd.com> wrote: > > I think a fix might already be in flight. Please see Linux-Kernel Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers (iu.edu) Please use lore.kernel.org that doesn't corrupt whitespace in patches or lose header information: https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/ although that seems to be a strange definition of "in flight". It was sent out 8 days ago, and apparently nobody thought to include it in the drm fixes pile that came in last Friday. So it made it into rc1, even though it was reported a week before. It also looks like some mailing list there is mangling emails - if you use 'all' instead of 'lkml', lore reports multiple emails with the same message-id, and it all looks messier as a result. I assume it's dri-devel@lists.freedesktop.org that messes up, mainly because I don't tend to see this behaviour when only the usual kernel.org mailing lists are involved. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 1:25 ` Linus Torvalds 0 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2024-01-23 1:25 UTC (permalink / raw) To: Bhardwaj, Rajneesh Cc: dri-devel, Felix Kuehling, LKML, Steven Rostedt, Christian König On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh <rajneesh.bhardwaj@amd.com> wrote: > > I think a fix might already be in flight. Please see Linux-Kernel Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers (iu.edu) Please use lore.kernel.org that doesn't corrupt whitespace in patches or lose header information: https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/ although that seems to be a strange definition of "in flight". It was sent out 8 days ago, and apparently nobody thought to include it in the drm fixes pile that came in last Friday. So it made it into rc1, even though it was reported a week before. It also looks like some mailing list there is mangling emails - if you use 'all' instead of 'lkml', lore reports multiple emails with the same message-id, and it all looks messier as a result. I assume it's dri-devel@lists.freedesktop.org that messes up, mainly because I don't tend to see this behaviour when only the usual kernel.org mailing lists are involved. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 0:56 ` Bhardwaj, Rajneesh @ 2024-01-23 1:35 ` Steven Rostedt 2024-01-23 1:35 ` Steven Rostedt 1 sibling, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 1:35 UTC (permalink / raw) To: Bhardwaj, Rajneesh Cc: Linus Torvalds, LKML, Felix Kuehling, Christian König, dri-devel, Fedor Pchelkin On Mon, 22 Jan 2024 19:56:08 -0500 "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > >> Perhaps this is the real fix? > > If you send a signed-off version, I'll apply it asap. > > > I think a fix might already be in flight. Please see Linux-Kernel > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > no-dma-device drivers (iu.edu) > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> Please use lore links. They are much easier to follow and use. https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/ is the patch I believe you are referencing. The fix doesn't need to be mine, but this should be in Linus's tree ASAP. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 1:35 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 1:35 UTC (permalink / raw) To: Bhardwaj, Rajneesh Cc: Felix Kuehling, Fedor Pchelkin, dri-devel, LKML, Linus Torvalds, Christian König On Mon, 22 Jan 2024 19:56:08 -0500 "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > >> Perhaps this is the real fix? > > If you send a signed-off version, I'll apply it asap. > > > I think a fix might already be in flight. Please see Linux-Kernel > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > no-dma-device drivers (iu.edu) > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> Please use lore links. They are much easier to follow and use. https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/ is the patch I believe you are referencing. The fix doesn't need to be mine, but this should be in Linus's tree ASAP. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 1:35 ` Steven Rostedt @ 2024-01-23 2:21 ` Dave Airlie -1 siblings, 0 replies; 30+ messages in thread From: Dave Airlie @ 2024-01-23 2:21 UTC (permalink / raw) To: Steven Rostedt Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling, Christian König, dri-devel, Fedor Pchelkin On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 22 Jan 2024 19:56:08 -0500 > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > > >> Perhaps this is the real fix? > > > If you send a signed-off version, I'll apply it asap. > > > > > > I think a fix might already be in flight. Please see Linux-Kernel > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > > no-dma-device drivers (iu.edu) > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > Please use lore links. They are much easier to follow and use. https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u should also fix it, Linus please apply it directly if Steven has a chance to give it a run. Thanks, Dave. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 2:21 ` Dave Airlie 0 siblings, 0 replies; 30+ messages in thread From: Dave Airlie @ 2024-01-23 2:21 UTC (permalink / raw) To: Steven Rostedt Cc: Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML, Fedor Pchelkin, Linus Torvalds, Christian König On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 22 Jan 2024 19:56:08 -0500 > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > > >> Perhaps this is the real fix? > > > If you send a signed-off version, I'll apply it asap. > > > > > > I think a fix might already be in flight. Please see Linux-Kernel > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > > no-dma-device drivers (iu.edu) > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > Please use lore links. They are much easier to follow and use. https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u should also fix it, Linus please apply it directly if Steven has a chance to give it a run. Thanks, Dave. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 2:21 ` Dave Airlie @ 2024-01-23 2:32 ` Dave Airlie -1 siblings, 0 replies; 30+ messages in thread From: Dave Airlie @ 2024-01-23 2:32 UTC (permalink / raw) To: Steven Rostedt Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling, Christian König, dri-devel, Fedor Pchelkin On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote: > > On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Mon, 22 Jan 2024 19:56:08 -0500 > > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > > > > > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > > > >> Perhaps this is the real fix? > > > > If you send a signed-off version, I'll apply it asap. > > > > > > > > > I think a fix might already be in flight. Please see Linux-Kernel > > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > > > no-dma-device drivers (iu.edu) > > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > > > Please use lore links. They are much easier to follow and use. > > https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u > > should also fix it, Linus please apply it directly if Steven has a > chance to give it a run. I see Linus applied the other one, that's fine too. Dave. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 2:32 ` Dave Airlie 0 siblings, 0 replies; 30+ messages in thread From: Dave Airlie @ 2024-01-23 2:32 UTC (permalink / raw) To: Steven Rostedt Cc: Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML, Fedor Pchelkin, Linus Torvalds, Christian König On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote: > > On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Mon, 22 Jan 2024 19:56:08 -0500 > > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > > > > > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > > > >> Perhaps this is the real fix? > > > > If you send a signed-off version, I'll apply it asap. > > > > > > > > > I think a fix might already be in flight. Please see Linux-Kernel > > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > > > no-dma-device drivers (iu.edu) > > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > > > Please use lore links. They are much easier to follow and use. > > https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u > > should also fix it, Linus please apply it directly if Steven has a > chance to give it a run. I see Linus applied the other one, that's fine too. Dave. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 2:32 ` Dave Airlie @ 2024-01-23 2:52 ` Steven Rostedt -1 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 2:52 UTC (permalink / raw) To: Dave Airlie Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling, Christian König, dri-devel, Fedor Pchelkin On Tue, 23 Jan 2024 12:32:39 +1000 Dave Airlie <airlied@gmail.com> wrote: > On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote: > > > > On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Mon, 22 Jan 2024 19:56:08 -0500 > > > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > > > > > > > > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > > > > >> Perhaps this is the real fix? > > > > > If you send a signed-off version, I'll apply it asap. > > > > > > > > > > > > I think a fix might already be in flight. Please see Linux-Kernel > > > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > > > > no-dma-device drivers (iu.edu) > > > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > > > > > Please use lore links. They are much easier to follow and use. > > > > https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u > > > > should also fix it, Linus please apply it directly if Steven has a > > chance to give it a run. > > I see Linus applied the other one, that's fine too. > They don't look mutually exclusive. I can test the other one as well. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 2:52 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 2:52 UTC (permalink / raw) To: Dave Airlie Cc: Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML, Fedor Pchelkin, Linus Torvalds, Christian König On Tue, 23 Jan 2024 12:32:39 +1000 Dave Airlie <airlied@gmail.com> wrote: > On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote: > > > > On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Mon, 22 Jan 2024 19:56:08 -0500 > > > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > > > > > > > > > > On 1/22/2024 7:43 PM, Linus Torvalds wrote: > > > > > On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: > > > > >> Perhaps this is the real fix? > > > > > If you send a signed-off version, I'll apply it asap. > > > > > > > > > > > > I think a fix might already be in flight. Please see Linux-Kernel > > > > Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for > > > > no-dma-device drivers (iu.edu) > > > > <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> > > > > > > Please use lore links. They are much easier to follow and use. > > > > https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u > > > > should also fix it, Linus please apply it directly if Steven has a > > chance to give it a run. > > I see Linus applied the other one, that's fine too. > They don't look mutually exclusive. I can test the other one as well. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 2:52 ` Steven Rostedt @ 2024-01-23 9:43 ` Christian König -1 siblings, 0 replies; 30+ messages in thread From: Christian König @ 2024-01-23 9:43 UTC (permalink / raw) To: Steven Rostedt, Dave Airlie Cc: Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling, dri-devel, Fedor Pchelkin Am 23.01.24 um 03:52 schrieb Steven Rostedt: > On Tue, 23 Jan 2024 12:32:39 +1000 > Dave Airlie <airlied@gmail.com> wrote: > >> On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote: >>> On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: >>>> On Mon, 22 Jan 2024 19:56:08 -0500 >>>> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: >>>> >>>>> On 1/22/2024 7:43 PM, Linus Torvalds wrote: >>>>>> On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: >>>>>>> Perhaps this is the real fix? >>>>>> If you send a signed-off version, I'll apply it asap. >>>>> >>>>> I think a fix might already be in flight. Please see Linux-Kernel >>>>> Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for >>>>> no-dma-device drivers (iu.edu) >>>>> <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> >>>> Please use lore links. They are much easier to follow and use. >>> https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u >>> >>> should also fix it, Linus please apply it directly if Steven has a >>> chance to give it a run. >> I see Linus applied the other one, that's fine too. >> > They don't look mutually exclusive. I can test the other one as well. While applying the fix a week ago I was under the impression that QXL doesn't use a device structure because it doesn't have one and so can't give anything meaningful for this parameter. If QXL does have a device structure and can provide it I would rather like to go down this route and make the device and with it the numa node mandatory for drivers to specify. Regards, Christian. > > -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 9:43 ` Christian König 0 siblings, 0 replies; 30+ messages in thread From: Christian König @ 2024-01-23 9:43 UTC (permalink / raw) To: Steven Rostedt, Dave Airlie Cc: LKML, Felix Kuehling, Fedor Pchelkin, dri-devel, Bhardwaj, Rajneesh, Linus Torvalds Am 23.01.24 um 03:52 schrieb Steven Rostedt: > On Tue, 23 Jan 2024 12:32:39 +1000 > Dave Airlie <airlied@gmail.com> wrote: > >> On Tue, 23 Jan 2024 at 12:21, Dave Airlie <airlied@gmail.com> wrote: >>> On Tue, 23 Jan 2024 at 12:15, Steven Rostedt <rostedt@goodmis.org> wrote: >>>> On Mon, 22 Jan 2024 19:56:08 -0500 >>>> "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: >>>> >>>>> On 1/22/2024 7:43 PM, Linus Torvalds wrote: >>>>>> On Mon, 22 Jan 2024 at 15:17, Steven Rostedt<rostedt@goodmis.org> wrote: >>>>>>> Perhaps this is the real fix? >>>>>> If you send a signed-off version, I'll apply it asap. >>>>> >>>>> I think a fix might already be in flight. Please see Linux-Kernel >>>>> Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for >>>>> no-dma-device drivers (iu.edu) >>>>> <https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html> >>>> Please use lore links. They are much easier to follow and use. >>> https://lore.kernel.org/dri-devel/20240123022015.1288588-1-airlied@gmail.com/T/#u >>> >>> should also fix it, Linus please apply it directly if Steven has a >>> chance to give it a run. >> I see Linus applied the other one, that's fine too. >> > They don't look mutually exclusive. I can test the other one as well. While applying the fix a week ago I was under the impression that QXL doesn't use a device structure because it doesn't have one and so can't give anything meaningful for this parameter. If QXL does have a device structure and can provide it I would rather like to go down this route and make the device and with it the numa node mandatory for drivers to specify. Regards, Christian. > > -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 9:43 ` Christian König @ 2024-01-23 14:35 ` Steven Rostedt -1 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 14:35 UTC (permalink / raw) To: Christian König Cc: Dave Airlie, Bhardwaj, Rajneesh, Linus Torvalds, LKML, Felix Kuehling, dri-devel, Fedor Pchelkin On Tue, 23 Jan 2024 10:43:04 +0100 Christian König <christian.koenig@amd.com> wrote: > While applying the fix a week ago I was under the impression that QXL > doesn't use a device structure because it doesn't have one and so can't > give anything meaningful for this parameter. > > If QXL does have a device structure and can provide it I would rather > like to go down this route and make the device and with it the numa node > mandatory for drivers to specify. Then at a minimum my original fix should be applied. Perhaps with a warning too. That is, I added at the beginning of that function: if (!dev) return -EINVAL; Could have that be: if (WARN_ON_ONCE(!dev)) return -EINVAL; In any case, it should not cause the system to crash. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 14:35 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 14:35 UTC (permalink / raw) To: Christian König Cc: Dave Airlie, Felix Kuehling, Bhardwaj, Rajneesh, dri-devel, LKML, Fedor Pchelkin, Linus Torvalds On Tue, 23 Jan 2024 10:43:04 +0100 Christian König <christian.koenig@amd.com> wrote: > While applying the fix a week ago I was under the impression that QXL > doesn't use a device structure because it doesn't have one and so can't > give anything meaningful for this parameter. > > If QXL does have a device structure and can provide it I would rather > like to go down this route and make the device and with it the numa node > mandatory for drivers to specify. Then at a minimum my original fix should be applied. Perhaps with a warning too. That is, I added at the beginning of that function: if (!dev) return -EINVAL; Could have that be: if (WARN_ON_ONCE(!dev)) return -EINVAL; In any case, it should not cause the system to crash. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-22 23:19 ` Steven Rostedt @ 2024-01-23 1:06 ` Bhardwaj, Rajneesh -1 siblings, 0 replies; 30+ messages in thread From: Bhardwaj, Rajneesh @ 2024-01-23 1:06 UTC (permalink / raw) To: Steven Rostedt, LKML Cc: Kuehling, Felix, Linus Torvalds, Koenig, Christian, dri-devel@lists.freedesktop.org [AMD Official Use Only - General] -----Original Message----- From: Steven Rostedt <rostedt@goodmis.org> Sent: Monday, January 22, 2024 6:19 PM To: LKML <linux-kernel@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; dri-devel@lists.freedesktop.org Subject: Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 On Mon, 22 Jan 2024 18:15:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > > > Specifically, it appears that dev is NULL and dev_to_node() doesn't > > like having a NULL pointer passed to it. > > > > Yeah, that qxl_ttm_init() has: > > /* No others user of address space so set it to 0 */ > r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL, > qdev->ddev.anon_inode->i_mapping, > qdev->ddev.vma_offset_manager, > false, false); > > Where that NULL is "dev"! > > Thus that will never work here. Perhaps this is the real fix? I think the fix might be already applied to drm misc. Please see, https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html -- Steve diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index f5187b384ae9..bc217b4d6b04 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func ttm_sys_man_init(bdev); - ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); + ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE, + use_dma_alloc, use_dma32); bdev->vma_manager = vma_manager; spin_lock_init(&bdev->lru_lock); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* RE: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 1:06 ` Bhardwaj, Rajneesh 0 siblings, 0 replies; 30+ messages in thread From: Bhardwaj, Rajneesh @ 2024-01-23 1:06 UTC (permalink / raw) To: Steven Rostedt, LKML Cc: Linus Torvalds, Kuehling, Felix, Koenig, Christian, dri-devel@lists.freedesktop.org [AMD Official Use Only - General] -----Original Message----- From: Steven Rostedt <rostedt@goodmis.org> Sent: Monday, January 22, 2024 6:19 PM To: LKML <linux-kernel@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; dri-devel@lists.freedesktop.org Subject: Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 On Mon, 22 Jan 2024 18:15:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > > > Specifically, it appears that dev is NULL and dev_to_node() doesn't > > like having a NULL pointer passed to it. > > > > Yeah, that qxl_ttm_init() has: > > /* No others user of address space so set it to 0 */ > r = ttm_device_init(&qdev->mman.bdev, &qxl_bo_driver, NULL, > qdev->ddev.anon_inode->i_mapping, > qdev->ddev.vma_offset_manager, > false, false); > > Where that NULL is "dev"! > > Thus that will never work here. Perhaps this is the real fix? I think the fix might be already applied to drm misc. Please see, https://lkml.iu.edu/hypermail/linux/kernel/2401.1/06778.html -- Steve diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index f5187b384ae9..bc217b4d6b04 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -215,7 +215,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func ttm_sys_man_init(bdev); - ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); + ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE, + use_dma_alloc, use_dma32); bdev->vma_manager = vma_manager; spin_lock_init(&bdev->lru_lock); ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-22 23:06 ` Steven Rostedt (?) (?) @ 2024-01-23 0:29 ` Bhardwaj, Rajneesh 2024-01-23 0:34 ` Steven Rostedt -1 siblings, 1 reply; 30+ messages in thread From: Bhardwaj, Rajneesh @ 2024-01-23 0:29 UTC (permalink / raw) To: Steven Rostedt, LKML Cc: Felix Kuehling, Linus Torvalds, Christian König, dri-devel [-- Attachment #1: Type: text/plain, Size: 4627 bytes --] On 1/22/2024 6:06 PM, Steven Rostedt wrote: > I just kicked off testing some patches on top of 6.8-rc1 and triggered this > immediately: > > [ note this happened on both my 32 bit an 64 bit test machines, this is > just the 32 bit output ] > > BUG: kernel NULL pointer dereference, address: 00000238 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > *pdpt = 0000000000000000 *pde = f000ff53f000ff53 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.8.0-rc1-test-00001-g2b44760609e9-dirty #1056 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > Workqueue: events work_for_cpu_fn > EIP: ttm_device_init+0xb4/0x274 > Code: 86 10 09 00 00 83 c4 0c 85 c0 0f 84 96 01 00 00 8b 45 ac 8d 9e 94 00 00 00 89 46 08 89 f0 e8 27 05 00 00 8b 55 a8 0f b6 45 98 <8b> 8a 38 02 00 00 50 0f b6 45 9c 50 89 d8 e8 95 ee ff ff 8b 45 a0 > EAX: 00000000 EBX: c135a7e4 ECX: c135a7b0 EDX: 00000000 > ESI: c135a750 EDI: 0007bc1d EBP: c11d7e4c ESP: c11d7de4 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246 > CR0: 80050033 CR2: 00000238 CR3: 145c4000 CR4: 000006f0 > Call Trace: > ? show_regs+0x4f/0x58 > ? __die+0x1d/0x58 > ? page_fault_oops+0x171/0x330 > ? lock_acquire+0xa4/0x280 > ? kernelmode_fixup_or_oops.constprop.0+0x7c/0xcc > ? __bad_area_nosemaphore.constprop.0+0x124/0x1b4 > ? __mutex_lock+0x17f/0xb00 > ? bad_area_nosemaphore+0xf/0x14 > ? do_user_addr_fault+0x140/0x3e4 > ? exc_page_fault+0x5b/0x1d8 > ? pvclock_clocksource_read_nowd+0x130/0x130 > ? handle_exception+0x133/0x133 > ? pvclock_clocksource_read_nowd+0x130/0x130 > ? ttm_device_init+0xb4/0x274 > ? pvclock_clocksource_read_nowd+0x130/0x130 > ? ttm_device_init+0xb4/0x274 > qxl_ttm_init+0x34/0x130 > qxl_bo_init+0xd/0x10 > qxl_device_init+0x52a/0x92c > qxl_pci_probe+0x91/0x1ac > local_pci_probe+0x3d/0x84 > work_for_cpu_fn+0x16/0x20 > process_one_work+0x1bc/0x4a0 > worker_thread+0x310/0x3a8 > kthread+0xea/0x110 > ? rescuer_thread+0x2f0/0x2f0 > ? kthread_complete_and_exit+0x1c/0x1c > ret_from_fork+0x34/0x4c > ? kthread_complete_and_exit+0x1c/0x1c > ret_from_fork_asm+0x12/0x18 > entry_INT80_32+0xf0/0xf0 > Modules linked in: > CR2: 0000000000000238 > ---[ end trace 0000000000000000 ]--- > > The crash happened here: > > int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, > struct device *dev, struct address_space *mapping, > struct drm_vma_offset_manager *vma_manager, > bool use_dma_alloc, bool use_dma32) > { > struct ttm_global *glob = &ttm_glob; > int ret; > > if (WARN_ON(vma_manager == NULL)) > return -EINVAL; > > ret = ttm_global_init(); > if (ret) > return ret; > > bdev->wq = alloc_workqueue("ttm", > WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16); > if (!bdev->wq) { > ttm_global_release(); > return -ENOMEM; > } > > bdev->funcs = funcs; > > ttm_sys_man_init(bdev); > > ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32); <<<------- BUG! > > Specifically, it appears that dev is NULL and dev_to_node() doesn't like > having a NULL pointer passed to it. > > I currently "fixed" this with a: > > if (!dev) > return -EINVAL; > > at the start of this function just so that I can continue running my tests, > but that is obviously incorrect. In one of my previous revisions of this patch when I was experimenting, I used something like below. Wonder if that could work in your case and/or in general. diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 43e27ab77f95..4c3902b94be4 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs, bool use_dma_alloc, bool use_dma32){ struct ttm_global *glob = &ttm_glob; +bool node_has_cpu = false; int ret; if (WARN_ON(vma_manager == NULL)) @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs, bdev->funcs = funcs; ttm_sys_man_init(bdev); -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32); + +node_has_cpu = node_state(dev->numa_node, N_CPU); +if (node_has_cpu) +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32); +else +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, +use_dma32); bdev->vma_manager = vma_manager; spin_lock_init(&bdev->lru_lock); > > -- Steve [-- Attachment #2: Type: text/html, Size: 44220 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 0:29 ` Bhardwaj, Rajneesh @ 2024-01-23 0:34 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 0:34 UTC (permalink / raw) To: Bhardwaj, Rajneesh Cc: Felix Kuehling, Linus Torvalds, LKML, dri-devel, Christian König On Mon, 22 Jan 2024 19:29:41 -0500 "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > In one of my previous revisions of this patch when I was experimenting, > I used something like below. Wonder if that could work in your case > and/or in general. > > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c > b/drivers/gpu/drm/ttm/ttm_device.c > > index 43e27ab77f95..4c3902b94be4 100644 > > --- a/drivers/gpu/drm/ttm/ttm_device.c > > +++ b/drivers/gpu/drm/ttm/ttm_device.c > > @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct > ttm_device_funcs *funcs, > > bool use_dma_alloc, bool use_dma32){ > > struct ttm_global *glob = &ttm_glob; > > +bool node_has_cpu = false; > > int ret; > > if (WARN_ON(vma_manager == NULL)) > > @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct > ttm_device_funcs *funcs, > > bdev->funcs = funcs; > > ttm_sys_man_init(bdev); > > -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32); > > + > > +node_has_cpu = node_state(dev->numa_node, N_CPU); Considering that qxl_ttm_init() passes in dev = NULL, the above would blow up just the same. -- Steve > > +if (node_has_cpu) > > +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32); > > +else > > +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, > > +use_dma32); > > bdev->vma_manager = vma_manager; > > spin_lock_init(&bdev->lru_lock); > > > > > > -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 0:34 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2024-01-23 0:34 UTC (permalink / raw) To: Bhardwaj, Rajneesh Cc: LKML, Linus Torvalds, Felix Kuehling, Christian König, dri-devel On Mon, 22 Jan 2024 19:29:41 -0500 "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > > In one of my previous revisions of this patch when I was experimenting, > I used something like below. Wonder if that could work in your case > and/or in general. > > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c > b/drivers/gpu/drm/ttm/ttm_device.c > > index 43e27ab77f95..4c3902b94be4 100644 > > --- a/drivers/gpu/drm/ttm/ttm_device.c > > +++ b/drivers/gpu/drm/ttm/ttm_device.c > > @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct > ttm_device_funcs *funcs, > > bool use_dma_alloc, bool use_dma32){ > > struct ttm_global *glob = &ttm_glob; > > +bool node_has_cpu = false; > > int ret; > > if (WARN_ON(vma_manager == NULL)) > > @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct > ttm_device_funcs *funcs, > > bdev->funcs = funcs; > > ttm_sys_man_init(bdev); > > -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32); > > + > > +node_has_cpu = node_state(dev->numa_node, N_CPU); Considering that qxl_ttm_init() passes in dev = NULL, the above would blow up just the same. -- Steve > > +if (node_has_cpu) > > +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32); > > +else > > +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, > > +use_dma32); > > bdev->vma_manager = vma_manager; > > spin_lock_init(&bdev->lru_lock); > > > > > > -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 2024-01-23 0:34 ` Steven Rostedt @ 2024-01-23 0:40 ` Bhardwaj, Rajneesh -1 siblings, 0 replies; 30+ messages in thread From: Bhardwaj, Rajneesh @ 2024-01-23 0:40 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, Linus Torvalds, Felix Kuehling, Christian König, dri-devel On 1/22/2024 7:34 PM, Steven Rostedt wrote: > On Mon, 22 Jan 2024 19:29:41 -0500 > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > >> In one of my previous revisions of this patch when I was experimenting, >> I used something like below. Wonder if that could work in your case >> and/or in general. >> >> >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >> b/drivers/gpu/drm/ttm/ttm_device.c >> >> index 43e27ab77f95..4c3902b94be4 100644 >> >> --- a/drivers/gpu/drm/ttm/ttm_device.c >> >> +++ b/drivers/gpu/drm/ttm/ttm_device.c >> >> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct >> ttm_device_funcs *funcs, >> >> bool use_dma_alloc, bool use_dma32){ >> >> struct ttm_global *glob = &ttm_glob; >> >> +bool node_has_cpu = false; >> >> int ret; >> >> if (WARN_ON(vma_manager == NULL)) >> >> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct >> ttm_device_funcs *funcs, >> >> bdev->funcs = funcs; >> >> ttm_sys_man_init(bdev); >> >> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32); >> >> + >> >> +node_has_cpu = node_state(dev->numa_node, N_CPU); > Considering that qxl_ttm_init() passes in dev = NULL, the above would blow > up just the same. I agree, I think we need something like you suggested i.e. + ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE, + use_dma_alloc, use_dma32); I am not quite sure if the above node_has_cpu change will be a better solution in general, along with the NULL pointer check as you suggested. If you prefer that, then I can send a fix otherwise, your fix looks good to me. > > -- Steve > > >> +if (node_has_cpu) >> >> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32); >> >> +else >> >> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, >> >> +use_dma32); >> >> bdev->vma_manager = vma_manager; >> >> spin_lock_init(&bdev->lru_lock); >> >> >>> -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 @ 2024-01-23 0:40 ` Bhardwaj, Rajneesh 0 siblings, 0 replies; 30+ messages in thread From: Bhardwaj, Rajneesh @ 2024-01-23 0:40 UTC (permalink / raw) To: Steven Rostedt Cc: Felix Kuehling, Linus Torvalds, LKML, dri-devel, Christian König On 1/22/2024 7:34 PM, Steven Rostedt wrote: > On Mon, 22 Jan 2024 19:29:41 -0500 > "Bhardwaj, Rajneesh" <rajneesh.bhardwaj@amd.com> wrote: > >> In one of my previous revisions of this patch when I was experimenting, >> I used something like below. Wonder if that could work in your case >> and/or in general. >> >> >> diff --git a/drivers/gpu/drm/ttm/ttm_device.c >> b/drivers/gpu/drm/ttm/ttm_device.c >> >> index 43e27ab77f95..4c3902b94be4 100644 >> >> --- a/drivers/gpu/drm/ttm/ttm_device.c >> >> +++ b/drivers/gpu/drm/ttm/ttm_device.c >> >> @@ -195,6 +195,7 @@ int ttm_device_init(struct ttm_device *bdev, struct >> ttm_device_funcs *funcs, >> >> bool use_dma_alloc, bool use_dma32){ >> >> struct ttm_global *glob = &ttm_glob; >> >> +bool node_has_cpu = false; >> >> int ret; >> >> if (WARN_ON(vma_manager == NULL)) >> >> @@ -213,7 +214,12 @@ int ttm_device_init(struct ttm_device *bdev, struct >> ttm_device_funcs *funcs, >> >> bdev->funcs = funcs; >> >> ttm_sys_man_init(bdev); >> >> -ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32); >> >> + >> >> +node_has_cpu = node_state(dev->numa_node, N_CPU); > Considering that qxl_ttm_init() passes in dev = NULL, the above would blow > up just the same. I agree, I think we need something like you suggested i.e. + ttm_pool_init(&bdev->pool, dev, dev ? dev_to_node(dev) : NUMA_NO_NODE, + use_dma_alloc, use_dma32); I am not quite sure if the above node_has_cpu change will be a better solution in general, along with the NULL pointer check as you suggested. If you prefer that, then I can send a fix otherwise, your fix looks good to me. > > -- Steve > > >> +if (node_has_cpu) >> >> +ttm_pool_init(&bdev->pool, dev, dev->numa_node, use_dma_alloc, use_dma32); >> >> +else >> >> +ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, >> >> +use_dma32); >> >> bdev->vma_manager = vma_manager; >> >> spin_lock_init(&bdev->lru_lock); >> >> >>> -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2024-01-23 14:34 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-22 23:06 [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 Steven Rostedt 2024-01-22 23:06 ` Steven Rostedt 2024-01-22 23:15 ` Steven Rostedt 2024-01-22 23:15 ` Steven Rostedt 2024-01-22 23:19 ` Steven Rostedt 2024-01-22 23:19 ` Steven Rostedt 2024-01-23 0:43 ` Linus Torvalds 2024-01-23 0:43 ` Linus Torvalds 2024-01-23 0:56 ` Bhardwaj, Rajneesh 2024-01-23 1:25 ` Linus Torvalds 2024-01-23 1:25 ` Linus Torvalds 2024-01-23 1:35 ` Steven Rostedt 2024-01-23 1:35 ` Steven Rostedt 2024-01-23 2:21 ` Dave Airlie 2024-01-23 2:21 ` Dave Airlie 2024-01-23 2:32 ` Dave Airlie 2024-01-23 2:32 ` Dave Airlie 2024-01-23 2:52 ` Steven Rostedt 2024-01-23 2:52 ` Steven Rostedt 2024-01-23 9:43 ` Christian König 2024-01-23 9:43 ` Christian König 2024-01-23 14:35 ` Steven Rostedt 2024-01-23 14:35 ` Steven Rostedt 2024-01-23 1:06 ` Bhardwaj, Rajneesh 2024-01-23 1:06 ` Bhardwaj, Rajneesh 2024-01-23 0:29 ` Bhardwaj, Rajneesh 2024-01-23 0:34 ` Steven Rostedt 2024-01-23 0:34 ` Steven Rostedt 2024-01-23 0:40 ` Bhardwaj, Rajneesh 2024-01-23 0:40 ` Bhardwaj, Rajneesh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.