"no cuda-capable device" when running as non-root

Hi all,

We installed two Tesla C1060 cards on an Ubuntu 10.04.1 machine (server version), and are not able to run any programs.
When running CUDA example programs we are getting “no cuda-capable device” as non-root, and the deviceQuery program gets stuck (not outputting anything).

Using sudo everything works just fine. The problem seems to be easily solved, but we are clueless.

Permissions of /dev/nv* are crw-rw-rw- - which is what we seems to be right.

Anyone has an idea?
Thanks!

Hi all,

We installed two Tesla C1060 cards on an Ubuntu 10.04.1 machine (server version), and are not able to run any programs.
When running CUDA example programs we are getting “no cuda-capable device” as non-root, and the deviceQuery program gets stuck (not outputting anything).

Using sudo everything works just fine. The problem seems to be easily solved, but we are clueless.

Permissions of /dev/nv* are crw-rw-rw- - which is what we seems to be right.

Anyone has an idea?
Thanks!

Those permissions are probably incorrect. If your users are not members of the group of the /dev/nv* files, they will have no read or write access, which seems to be the symptom you are seeing,

Those permissions are probably incorrect. If your users are not members of the group of the /dev/nv* files, they will have no read or write access, which seems to be the symptom you are seeing,

Just remember that if you will just change these files’ permissions, when you reboot the machine probably it will execute /etc/init.d/nvidia-kernel and change permissions back.
Sou you will need to edit /etc/init.d/nvidia-kernel too.

Just remember that if you will just change these files’ permissions, when you reboot the machine probably it will execute /etc/init.d/nvidia-kernel and change permissions back.
Sou you will need to edit /etc/init.d/nvidia-kernel too.

Thanks for replying! We checked the permissions on machines that work properly. All /dev/nv* show identical ownership and permissions to the ones on the problem machine.

On our problematic machine we still tried what you suggested and made the user owner of the /dev/nv* files. This didn’t help. It means that permission are not the issue.

Are there any other suggestions?

Thanks again!

Thanks for replying! We checked the permissions on machines that work properly. All /dev/nv* show identical ownership and permissions to the ones on the problem machine.

On our problematic machine we still tried what you suggested and made the user owner of the /dev/nv* files. This didn’t help. It means that permission are not the issue.

Are there any other suggestions?

Thanks again!

Is SELinux enabled? Try to disable it

Is SELinux enabled? Try to disable it

SELinux is not installed.

SELinux is not installed.

Im having the same issue. Im on a dell XPS 15 9510 which comes with a laptop 3050 Ti.
I am not running SELinux and the permissions are the same as here. I have tried explicitly resetting the permissions to no effect.

Has anyone been able to resolve this since?

The following dump appears in the journal logs on boot:

Jan 12 13:33:03 dev kernel: nvidia: loading out-of-tree module taints kernel.
Jan 12 13:33:03 dev kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 12 13:33:03 dev kernel: Disabling lock debugging due to kernel taint
Jan 12 13:33:03 dev kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 12 13:33:03 dev kernel: checking generic (4000000000 8ca000) vs hw (6162000000 1000000)
Jan 12 13:33:03 dev kernel: checking generic (4000000000 8ca000) vs hw (4000000000 10000000)
Jan 12 13:33:03 dev kernel: fb0: switching to inteldrmfb from EFI VGA
Jan 12 13:33:03 dev kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
Jan 12 13:33:03 dev kernel: ------------[ cut here ]------------
Jan 12 13:33:03 dev kernel: Missing case (val == 65535)
Jan 12 13:33:03 dev kernel: WARNING: CPU: 13 PID: 260 at drivers/gpu/drm/i915/intel_dram.c:96 skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Modules linked in: fjes(-) i915(+) rtsx_pci_sdmmc i2c_algo_bit crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel aesni_intel syscopyarea crypto_simd sysfillrect cryptd sysimgblt glue_helper fb_sys_fops psmouse cec rc_core nvme intel_lpss_pci rtsx_pci drm i2c_i801 intel_lpss thunderbolt(+) intel_ish_ipc(+) idma64 i2c_smbus xhci_pci intel_ishtp nvme_core virt_dma xhci_pci_renesas intel_pmt i2c_hid wmi hid video pinctrl_tigerlake
Jan 12 13:33:03 dev kernel: CPU: 13 PID: 260 Comm: systemd-udevd Tainted: P           OE     5.11.0-46-generic #51~20.04.1-Ubuntu
Jan 12 13:33:03 dev kernel: Hardware name: Dell Inc. XPS 15 9510/01V4T3, BIOS 1.6.2 11/13/2021
Jan 12 13:33:03 dev kernel: RIP: 0010:skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Code: 66 3d 00 01 0f 84 13 01 00 00 41 81 e0 00 01 00 00 0f 84 06 01 00 00 48 c7 c6 11 09 a4 c0 48 c7 c7 15 09 a4 c0 e8 dd 2e 10 d9 <0f> 0b 45 0f b7 0c 24 41 b8 01 00 00 00 31 c0 31 ff 41 88 44 24 02
Jan 12 13:33:03 dev kernel: RSP: 0018:ffffa99040dbf8e0 EFLAGS: 00010282
Jan 12 13:33:03 dev kernel: RAX: 0000000000000000 RBX: 000000000000ffff RCX: c0000000ffffdfff
Jan 12 13:33:03 dev kernel: RDX: ffffa99040dbf6a8 RSI: 00000000ffffdfff RDI: 0000000000000247
Jan 12 13:33:03 dev kernel: RBP: ffffa99040dbf908 R08: 0000000000000000 R09: ffffa99040dbf6a0
Jan 12 13:33:03 dev kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffffa99040dbf94c
Jan 12 13:33:03 dev kernel: R13: 0000000000000000 R14: 000000000000004c R15: ffff8de79ef20000
Jan 12 13:33:03 dev kernel: FS:  00007f65f20b3880(0000) GS:ffff8deeef740000(0000) knlGS:0000000000000000
Jan 12 13:33:03 dev kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 13:33:03 dev kernel: CR2: 0000558bfc8d6e48 CR3: 0000000119d74005 CR4: 0000000000770ee0
Jan 12 13:33:03 dev kernel: PKRU: 55555554
Jan 12 13:33:03 dev kernel: Call Trace:
Jan 12 13:33:03 dev kernel:  skl_dram_get_channel_info+0x2e/0x150 [i915]
Jan 12 13:33:03 dev kernel:  intel_dram_detect+0x102/0x320 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_hw_probe+0x293/0x2e0 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_probe+0x3af/0x6f0 [i915]
Jan 12 13:33:03 dev kernel:  ? mutex_lock+0x13/0x40
Jan 12 13:33:03 dev kernel:  i915_pci_probe+0x5a/0x140 [i915]
Jan 12 13:33:03 dev kernel:  local_pci_probe+0x48/0x80
Jan 12 13:33:03 dev kernel:  pci_device_probe+0x10f/0x1c0
Jan 12 13:33:03 dev kernel:  really_probe+0x1fa/0x460
Jan 12 13:33:03 dev kernel:  driver_probe_device+0xe9/0x160
Jan 12 13:33:03 dev kernel:  device_driver_attach+0x5d/0x70
Jan 12 13:33:03 dev kernel:  __driver_attach+0x8f/0x150
Jan 12 13:33:03 dev kernel:  ? device_driver_attach+0x70/0x70
Jan 12 13:33:03 dev kernel:  bus_for_each_dev+0x7e/0xc0
Jan 12 13:33:03 dev kernel:  driver_attach+0x1e/0x20
Jan 12 13:33:03 dev kernel:  bus_add_driver+0x152/0x1f0
Jan 12 13:33:03 dev kernel:  driver_register+0x74/0xd0
Jan 12 13:33:03 dev kernel:  __pci_register_driver+0x54/0x60
Jan 12 13:33:03 dev kernel:  i915_init+0x66/0x86 [i915]
Jan 12 13:33:03 dev kernel:  ? 0xffffffffc0ab9000
Jan 12 13:33:03 dev kernel:  do_one_initcall+0x48/0x1d0
Jan 12 13:33:03 dev kernel:  ? _cond_resched+0x19/0x30
Jan 12 13:33:03 dev kernel:  ? kmem_cache_alloc_trace+0x37a/0x430
Jan 12 13:33:03 dev kernel:  ? do_init_module+0x28/0x260
Jan 12 13:33:03 dev kernel:  do_init_module+0x62/0x260
Jan 12 13:33:03 dev kernel:  load_module+0x11aa/0x1370
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  ? __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  __x64_sys_finit_module+0x1a/0x20
Jan 12 13:33:03 dev kernel:  do_syscall_64+0x38/0x90
Jan 12 13:33:03 dev kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 12 13:33:03 dev kernel: RIP: 0033:0x7f65f263589d
Jan 12 13:33:03 dev kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
Jan 12 13:33:03 dev kernel: RSP: 002b:00007fff4fe96de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jan 12 13:33:03 dev kernel: RAX: ffffffffffffffda RBX: 0000558bfc91fef0 RCX: 00007f65f263589d
Jan 12 13:33:03 dev kernel: RDX: 0000000000000000 RSI: 00007f65f2512ded RDI: 0000000000000011
Jan 12 13:33:03 dev kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
Jan 12 13:33:03 dev kernel: R10: 0000000000000011 R11: 0000000000000246 R12: 00007f65f2512ded
Jan 12 13:33:03 dev kernel: R13: 0000000000000000 R14: 0000558bfc923680 R15: 0000558bfc91fef0
Jan 12 13:33:03 dev kernel: ---[ end trace 24e096d976c9e04e ]---
Jan 12 13:33:03 dev kernel: ------------[ cut here ]------------
Jan 12 13:33:03 dev kernel: Missing case (val == 65535)
Jan 12 13:33:03 dev kernel: WARNING: CPU: 13 PID: 260 at drivers/gpu/drm/i915/intel_dram.c:96 skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Modules linked in: fjes(-) i915(+) rtsx_pci_sdmmc i2c_algo_bit crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel aesni_intel syscopyarea crypto_simd sysfillrect cryptd sysimgblt glue_helper fb_sys_fops psmouse cec rc_core nvme intel_lpss_pci rtsx_pci drm i2c_i801 intel_lpss thunderbolt(+) intel_ish_ipc(+) idma64 i2c_smbus xhci_pci intel_ishtp nvme_core virt_dma xhci_pci_renesas intel_pmt i2c_hid wmi hid video pinctrl_tigerlake
Jan 12 13:33:03 dev kernel: CPU: 13 PID: 260 Comm: systemd-udevd Tainted: P        W  OE     5.11.0-46-generic #51~20.04.1-Ubuntu
Jan 12 13:33:03 dev kernel: Hardware name: Dell Inc. XPS 15 9510/01V4T3, BIOS 1.6.2 11/13/2021
Jan 12 13:33:03 dev kernel: RIP: 0010:skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Code: 66 3d 00 01 0f 84 13 01 00 00 41 81 e0 00 01 00 00 0f 84 06 01 00 00 48 c7 c6 11 09 a4 c0 48 c7 c7 15 09 a4 c0 e8 dd 2e 10 d9 <0f> 0b 45 0f b7 0c 24 41 b8 01 00 00 00 31 c0 31 ff 41 88 44 24 02
Jan 12 13:33:03 dev kernel: RSP: 0018:ffffa99040dbf8e0 EFLAGS: 00010282
Jan 12 13:33:03 dev kernel: RAX: 0000000000000000 RBX: 000000000000ffff RCX: c0000000ffffdfff
Jan 12 13:33:03 dev kernel: RDX: ffffa99040dbf6a8 RSI: 00000000ffffdfff RDI: 0000000000000247
Jan 12 13:33:03 dev kernel: RBP: ffffa99040dbf908 R08: 0000000000000000 R09: ffffa99040dbf6a0
Jan 12 13:33:03 dev kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffffa99040dbf950
Jan 12 13:33:03 dev kernel: R13: 0000000000000000 R14: 0000000000000053 R15: ffff8de79ef20000
Jan 12 13:33:03 dev kernel: FS:  00007f65f20b3880(0000) GS:ffff8deeef740000(0000) knlGS:0000000000000000
Jan 12 13:33:03 dev kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 13:33:03 dev kernel: CR2: 0000558bfc8d6e48 CR3: 0000000119d74005 CR4: 0000000000770ee0
Jan 12 13:33:03 dev kernel: PKRU: 55555554
Jan 12 13:33:03 dev kernel: Call Trace:
Jan 12 13:33:03 dev kernel:  skl_dram_get_channel_info+0x45/0x150 [i915]
Jan 12 13:33:03 dev kernel:  intel_dram_detect+0x102/0x320 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_hw_probe+0x293/0x2e0 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_probe+0x3af/0x6f0 [i915]
Jan 12 13:33:03 dev kernel:  ? mutex_lock+0x13/0x40
Jan 12 13:33:03 dev kernel:  i915_pci_probe+0x5a/0x140 [i915]
Jan 12 13:33:03 dev kernel:  local_pci_probe+0x48/0x80
Jan 12 13:33:03 dev kernel:  pci_device_probe+0x10f/0x1c0
Jan 12 13:33:03 dev kernel:  really_probe+0x1fa/0x460
Jan 12 13:33:03 dev kernel:  driver_probe_device+0xe9/0x160
Jan 12 13:33:03 dev kernel:  device_driver_attach+0x5d/0x70
Jan 12 13:33:03 dev kernel:  __driver_attach+0x8f/0x150
Jan 12 13:33:03 dev kernel:  ? device_driver_attach+0x70/0x70
Jan 12 13:33:03 dev kernel:  bus_for_each_dev+0x7e/0xc0
Jan 12 13:33:03 dev kernel:  driver_attach+0x1e/0x20
Jan 12 13:33:03 dev kernel:  bus_add_driver+0x152/0x1f0
Jan 12 13:33:03 dev kernel:  driver_register+0x74/0xd0
Jan 12 13:33:03 dev kernel:  __pci_register_driver+0x54/0x60
Jan 12 13:33:03 dev kernel:  i915_init+0x66/0x86 [i915]
Jan 12 13:33:03 dev kernel:  ? 0xffffffffc0ab9000
Jan 12 13:33:03 dev kernel:  do_one_initcall+0x48/0x1d0
Jan 12 13:33:03 dev kernel:  ? _cond_resched+0x19/0x30
Jan 12 13:33:03 dev kernel:  ? kmem_cache_alloc_trace+0x37a/0x430
Jan 12 13:33:03 dev kernel:  ? do_init_module+0x28/0x260
Jan 12 13:33:03 dev kernel:  do_init_module+0x62/0x260
Jan 12 13:33:03 dev kernel:  load_module+0x11aa/0x1370
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  ? __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  __x64_sys_finit_module+0x1a/0x20
Jan 12 13:33:03 dev kernel:  do_syscall_64+0x38/0x90
Jan 12 13:33:03 dev kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 12 13:33:03 dev kernel: RIP: 0033:0x7f65f263589d
Jan 12 13:33:03 dev kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
Jan 12 13:33:03 dev kernel: RSP: 002b:00007fff4fe96de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jan 12 13:33:03 dev kernel: RAX: ffffffffffffffda RBX: 0000558bfc91fef0 RCX: 00007f65f263589d
Jan 12 13:33:03 dev kernel: RDX: 0000000000000000 RSI: 00007f65f2512ded RDI: 0000000000000011
Jan 12 13:33:03 dev kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
Jan 12 13:33:03 dev kernel: R10: 0000000000000011 R11: 0000000000000246 R12: 00007f65f2512ded
Jan 12 13:33:03 dev kernel: R13: 0000000000000000 R14: 0000558bfc923680 R15: 0000558bfc91fef0
Jan 12 13:33:03 dev kernel: ---[ end trace 24e096d976c9e04f ]---
Jan 12 13:33:03 dev kernel: ------------[ cut here ]------------
Jan 12 13:33:03 dev kernel: Missing case (val == 65535)
Jan 12 13:33:03 dev kernel: WARNING: CPU: 13 PID: 260 at drivers/gpu/drm/i915/intel_dram.c:96 skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Modules linked in: fjes(-) i915(+) rtsx_pci_sdmmc i2c_algo_bit crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel aesni_intel syscopyarea crypto_simd sysfillrect cryptd sysimgblt glue_helper fb_sys_fops psmouse cec rc_core nvme intel_lpss_pci rtsx_pci drm i2c_i801 intel_lpss thunderbolt(+) intel_ish_ipc(+) idma64 i2c_smbus xhci_pci intel_ishtp nvme_core virt_dma xhci_pci_renesas intel_pmt i2c_hid wmi hid video pinctrl_tigerlake
Jan 12 13:33:03 dev kernel: CPU: 13 PID: 260 Comm: systemd-udevd Tainted: P        W  OE     5.11.0-46-generic #51~20.04.1-Ubuntu
Jan 12 13:33:03 dev kernel: Hardware name: Dell Inc. XPS 15 9510/01V4T3, BIOS 1.6.2 11/13/2021
Jan 12 13:33:03 dev kernel: RIP: 0010:skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Code: 66 3d 00 01 0f 84 13 01 00 00 41 81 e0 00 01 00 00 0f 84 06 01 00 00 48 c7 c6 11 09 a4 c0 48 c7 c7 15 09 a4 c0 e8 dd 2e 10 d9 <0f> 0b 45 0f b7 0c 24 41 b8 01 00 00 00 31 c0 31 ff 41 88 44 24 02
Jan 12 13:33:03 dev kernel: RSP: 0018:ffffa99040dbf8e0 EFLAGS: 00010282
Jan 12 13:33:03 dev kernel: RAX: 0000000000000000 RBX: 000000000000ffff RCX: c0000000ffffdfff
Jan 12 13:33:03 dev kernel: RDX: ffffa99040dbf6a8 RSI: 00000000ffffdfff RDI: 0000000000000247
Jan 12 13:33:03 dev kernel: RBP: ffffa99040dbf908 R08: 0000000000000000 R09: ffffa99040dbf6a0
Jan 12 13:33:03 dev kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffffa99040dbf956
Jan 12 13:33:03 dev kernel: R13: 0000000000000001 R14: 000000000000004c R15: ffff8de79ef20000
Jan 12 13:33:03 dev kernel: FS:kernel  00007f65f20b3880(0000) GS:ffff8deeef740000(0000) knlGS:0000000000000000
Jan 12 13:33:03 dev kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 13:33:03 dev kernel: CR2: 0000558bfc8d6e48 CR3: 0000000119d74005 CR4: 0000000000770ee0
Jan 12 13:33:03 dev kernel: PKRU: 55555554
Jan 12 13:33:03 dev kernel: Call Trace:
Jan 12 13:33:03 dev kernel:  skl_dram_get_channel_info+0x2e/0x150 [i915]
Jan 12 13:33:03 dev kernel:  intel_dram_detect+0x139/0x320 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_hw_probe+0x293/0x2e0 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_probe+0x3af/0x6f0 [i915]
Jan 12 13:33:03 dev kernel:  ? mutex_lock+0x13/0x40
Jan 12 13:33:03 dev kernel:  i915_pci_probe+0x5a/0x140 [i915]
Jan 12 13:33:03 dev kernel:  local_pci_probe+0x48/0x80
Jan 12 13:33:03 dev kernel:  pci_device_probe+0x10f/0x1c0
Jan 12 13:33:03 dev kernel:  really_probe+0x1fa/0x460
Jan 12 13:33:03 dev kernel:  driver_probe_device+0xe9/0x160
Jan 12 13:33:03 dev kernel:  device_driver_attach+0x5d/0x70
Jan 12 13:33:03 dev kernel:  __driver_attach+0x8f/0x150
Jan 12 13:33:03 dev kernel:  ? device_driver_attach+0x70/0x70
Jan 12 13:33:03 dev kernel:  bus_for_each_dev+0x7e/0xc0
Jan 12 13:33:03 dev kernel:  driver_attach+0x1e/0x20
Jan 12 13:33:03 dev kernel:  bus_add_driver+0x152/0x1f0
Jan 12 13:33:03 dev kernel:  driver_register+0x74/0xd0
Jan 12 13:33:03 dev kernel:  __pci_register_driver+0x54/0x60
Jan 12 13:33:03 dev kernel:  i915_init+0x66/0x86 [i915]
Jan 12 13:33:03 dev kernel:  ? 0xffffffffc0ab9000
Jan 12 13:33:03 dev kernel:  do_one_initcall+0x48/0x1d0
Jan 12 13:33:03 dev kernel:  ? _cond_resched+0x19/0x30
Jan 12 13:33:03 dev kernel:  ? kmem_cache_alloc_trace+0x37a/0x430
Jan 12 13:33:03 dev kernel:  ? do_init_module+0x28/0x260
Jan 12 13:33:03 dev kernel:  do_init_module+0x62/0x260
Jan 12 13:33:03 dev kernel:  load_module+0x11aa/0x1370
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  ? __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  __x64_sys_finit_module+0x1a/0x20
Jan 12 13:33:03 dev kernel:  do_syscall_64+0x38/0x90
Jan 12 13:33:03 dev kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 12 13:33:03 dev kernel: RIP: 0033:0x7f65f263589d
Jan 12 13:33:03 dev kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
Jan 12 13:33:03 dev kernel: RSP: 002b:00007fff4fe96de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jan 12 13:33:03 dev kernel: RAX: ffffffffffffffda RBX: 0000558bfc91fef0 RCX: 00007f65f263589d
Jan 12 13:33:03 dev kernel: RDX: 0000000000000000 RSI: 00007f65f2512ded RDI: 0000000000000011
Jan 12 13:33:03 dev kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
Jan 12 13:33:03 dev kernel: R10: 0000000000000011 R11: 0000000000000246 R12: 00007f65f2512ded
Jan 12 13:33:03 dev kernel: R13: 0000000000000000 R14: 0000558bfc923680 R15: 0000558bfc91fef0
Jan 12 13:33:03 dev kernel: ---[ end trace 24e096d976c9e050 ]---
Jan 12 13:33:03 dev kernel: ------------[ cut here ]------------
Jan 12 13:33:03 dev kernel: Missing case (val == 65535)
Jan 12 13:33:03 dev kernel: WARNING: CPU: 13 PID: 260 at drivers/gpu/drm/i915/intel_dram.c:96 skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Modules linked in: fjes(-) i915(+) rtsx_pci_sdmmc i2c_algo_bit crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel aesni_intel syscopyarea crypto_simd sysfillrect cryptd sysimgblt glue_helper fb_sys_fops psmouse cec rc_core nvme intel_lpss_pci rtsx_pci drm i2c_i801 intel_lpss thunderbolt(+) intel_ish_ipc(+) idma64 i2c_smbus xhci_pci intel_ishtp nvme_core virt_dma xhci_pci_renesas intel_pmt i2c_hid wmi hid video pinctrl_tigerlake
Jan 12 13:33:03 dev kernel: CPU: 13 PID: 260 Comm: systemd-udevd Tainted: P        W  OE     5.11.0-46-generic #51~20.04.1-Ubuntu
Jan 12 13:33:03 dev kernel: Hardware name: Dell Inc. XPS 15 9510/01V4T3, BIOS 1.6.2 11/13/2021
Jan 12 13:33:03 dev kernel: RIP: 0010:skl_dram_get_dimm_info+0x79/0x1b0 [i915]
Jan 12 13:33:03 dev kernel: Code: 66 3d 00 01 0f 84 13 01 00 00 41 81 e0 00 01 00 00 0f 84 06 01 00 00 48 c7 c6 11 09 a4 c0 48 c7 c7 15 09 a4 c0 e8 dd 2e 10 d9 <0f> 0b 45 0f b7 0c 24 41 b8 01 00 00 00 31 c0 31 ff 41 88 44 24 02
Jan 12 13:33:03 dev kernel: RSP: 0018:ffffa99040dbf8e0 EFLAGS: 00010282
Jan 12 13:33:03 dev kernel: RAX: 0000000000000000 RBX: 000000000000ffff RCX: c0000000ffffdfff
Jan 12 13:33:03 dev kernel: RDX: ffffa99040dbf6a8 RSI: 00000000ffffdfff RDI: 0000000000000247
Jan 12 13:33:03 dev kernel: RBP: ffffa99040dbf908 R08: 0000000000000000 R09: ffffa99040dbf6a0
Jan 12 13:33:03 dev kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffffa99040dbf95a
Jan 12 13:33:03 dev kernel: R13: 0000000000000001 R14: 0000000000000053 R15: ffff8de79ef20000
Jan 12 13:33:03 dev kernel: FS:  00007f65f20b3880(0000) GS:ffff8deeef740000(0000) knlGS:0000000000000000
Jan 12 13:33:03 dev kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 13:33:03 dev kernel: CR2: 0000558bfc8d6e48 CR3: 0000000119d74005 CR4: 0000000000770ee0
Jan 12 13:33:03 dev kernel: PKRU: 55555554
Jan 12 13:33:03 dev kernel: Call Trace:
Jan 12 13:33:03 dev kernel:  skl_dram_get_channel_info+0x45/0x150 [i915]
Jan 12 13:33:03 dev kernel:  intel_dram_detect+0x139/0x320 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_hw_probe+0x293/0x2e0 [i915]
Jan 12 13:33:03 dev kernel:  i915_driver_probe+0x3af/0x6f0 [i915]
Jan 12 13:33:03 dev kernel:  ? mutex_lock+0x13/0x40
Jan 12 13:33:03 dev kernel:  i915_pci_probe+0x5a/0x140 [i915]
Jan 12 13:33:03 dev kernel:  local_pci_probe+0x48/0x80
Jan 12 13:33:03 dev kernel:  pci_device_probe+0x10f/0x1c0
Jan 12 13:33:03 dev kernel:  really_probe+0x1fa/0x460
Jan 12 13:33:03 dev kernel:  driver_probe_device+0xe9/0x160
Jan 12 13:33:03 dev kernel:  device_driver_attach+0x5d/0x70
Jan 12 13:33:03 dev kernel:  __driver_attach+0x8f/0x150
Jan 12 13:33:03 dev kernel:  ? device_driver_attach+0x70/0x70
Jan 12 13:33:03 dev kernel:  bus_for_each_dev+0x7e/0xc0
Jan 12 13:33:03 dev kernel:  driver_attach+0x1e/0x20
Jan 12 13:33:03 dev kernel:  bus_add_driver+0x152/0x1f0
Jan 12 13:33:03 dev kernel:  driver_register+0x74/0xd0
Jan 12 13:33:03 dev kernel:  __pci_register_driver+0x54/0x60
Jan 12 13:33:03 dev kernel:  i915_init+0x66/0x86 [i915]
Jan 12 13:33:03 dev kernel:  ? 0xffffffffc0ab9000
Jan 12 13:33:03 dev kernel:  do_one_initcall+0x48/0x1d0
Jan 12 13:33:03 dev kernel:  ? _cond_resched+0x19/0x30
Jan 12 13:33:03 dev kernel:  ? kmem_cache_alloc_trace+0x37a/0x430
Jan 12 13:33:03 dev kernel:  ? do_init_module+0x28/0x260
Jan 12 13:33:03 dev kernel:  do_init_module+0x62/0x260
Jan 12 13:33:03 dev kernel:  load_module+0x11aa/0x1370
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  ? security_kernel_post_read_file+0x5c/0x70
Jan 12 13:33:03 dev kernel:  __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  ? __do_sys_finit_module+0xc2/0x120
Jan 12 13:33:03 dev kernel:  __x64_sys_finit_module+0x1a/0x20
Jan 12 13:33:03 dev kernel:  do_syscall_64+0x38/0x90
Jan 12 13:33:03 dev kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 12 13:33:03 dev kernel: RIP: 0033:0x7f65f263589d
Jan 12 13:33:03 dev kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
Jan 12 13:33:03 dev kernel: RSP: 002b:00007fff4fe96de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jan 12 13:33:03 dev kernel: RAX: ffffffffffffffda RBX: 0000558bfc91fef0 RCX: 00007f65f263589d
Jan 12 13:33:03 dev kernel: RDX: 0000000000000000 RSI: 00007f65f2512ded RDI: 0000000000000011
Jan 12 13:33:03 dev kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
Jan 12 13:33:03 dev kernel: R10: 0000000000000011 R11: 0000000000000246 R12: 00007f65f2512ded
Jan 12 13:33:03 dev kernel: R13: 0000000000000000 R14: 0000558bfc923680 R15: 0000558bfc91fef0
Jan 12 13:33:03 dev kernel: ---[ end trace 24e096d976c9e051 ]---
Jan 12 13:33:03 dev kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 511
Jan 12 13:33:03 dev kernel: 
Jan 12 13:33:03 dev kernel: nvidia 0000:01:00.0: enabling device (0006 -> 0007)
Jan 12 13:33:03 dev kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem

Ok so I figured it out, in my bashrc from a previous project I had set CUDA_VISIBLE_DEVICES so the device was not visible to the application, when running as root the variable was not set so it worked.

Unsetting this for the user account means everything works correctly.