Graphics driver 470 works fine, but cuda 10.1 fails with 'cudaGetDeviceCount returns 802'

2022-05-01T22:00:00Z

After quite some investigation and reading, and testing different combinations of HW&SW, I am still blocked and would welcome your advise and help to resolve the issue.

The intent is to use a Quadro K420 (or later card if they work), dual Opteron 2435 system, openSUSE 15.3 x86_64 and the respective NVIDIA repositories / packages for cuda samples/demo apps, and Machine Learning (with python) self-education with cuda acceleration.

Short summary: changing the very same (SSD) disk and quadro K420 card to a Xeon/Intel based system work fines, both for graphics and cuda support.

  • different OS: Lubuntu 20.04 + drv 470 + cuda-10-1 fails for AMD boards, works for Intel board.
  • Other AMD / MCP55 chipset type boards exhibit the same issue (fail)
  • Later cuda versions show same issue (fail on AMD board).

My assumption is cuInit() (CUDA_ERROR_SYSTEM_NOT_READY = 802) from /include/cuda.h runs into hardware config or driver / firmware related issue which fails to handle initialization of the Nvidia cards on Opteron board (Nvidia MCP55 chipset).

The graphics driver version 470 (x11-video-nvidiaG05 470.103.01*) installs and work fine for X11 display (from nvidia repo). cuda-10-1 support Quadro K420 and installs and compile fine (from cuda repo), kernel-header file match the installed kernel version (with latest updates). (nvidia repo: /download.nvidia.com/opensuse/leap/15.3 , cuda repo: Index of /compute/cuda/repos/opensuse15/x86_64)

The Nvidia devices load and initialize correctly in all cases:

# l /dev/nvi*
crw-rw----+ 1 root video 195,   0  2. Mai 14:16 /dev/nvidia0
crw-rw----+ 1 root video 195, 255  2. Mai 14:16 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254  2. Mai 14:16 /dev/nvidia-modeset
crw-rw-rw-+ 1 root root  238,   0  2. Mai 14:16 /dev/nvidia-uvm
crw-rw-rw-+ 1 root root  238,   1  2. Mai 14:16 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
insgesamt 0
drwxr-xr-x  2 root root     80  2. Mai 14:53 ./
drwxr-xr-x 21 root root   4280  2. Mai 14:53 ../
cr--------  1 root root 241, 1  2. Mai 14:53 nvidia-cap1
cr--r--r--  1 root root 241, 2  2. Mai 14:53 nvidia-cap2

# l /proc/driver/
nvidia/  nvidia-caps/ nvidia-nvlink/ nvidia-nvswitch/ nvidia-uvm/

nvidia-smi and nvidia-settings both work fine in all configurations.

deviceQuery (same for precompiled version) returns:
---
/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 802
-> system not yet initialized
Result = FAIL
---

strace’ing the command deviceQuery shows a difference on AMD vs Intel board, executing deviceQuery fails in ioctcl() of /dev/nvidiactl, which then causes cudaGetDeviceCount to fail:

[... lines removed ...]
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffc9ad6b540) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "80000000\n", 99)               = 9
close(4)                                = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffc9ad6b540) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0x900), 0x7fa257eec220) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffc9ad6b540) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffc9ad6b470) = 0
openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Name:\tdeviceQuery\nUmask:\t0022\nSt"..., 1024) = 1024
read(4, ",00000000,00000000,00000000,0000"..., 1024) = 297
close(4)                                = 0
openat(AT_FDCWD, "/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(4, /* 11 entries */, 32768)  = 360
openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(5, "fff\n", 4096)                  = 4
close(5)                                = 0
getdents64(4, /* 0 entries */, 32768)   = 0
close(4)                                = 0
futex(0x7fa257edb890, FUTEX_WAKE_PRIVATE, 2147483647) = 0
get_mempolicy([MPOL_DEFAULT], [000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000], 1024, NULL, 0) = 0
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "af_packet 53248 2 - Live 0xfffff"..., 1024) = 1024
read(4, "e 0xffffffffc0c59000\nnf_conntrac"..., 1024) = 1024
read(4, "ables 53248 11 ip6table_mangle,i"..., 1024) = 1024
close(4)                                = 0
openat(AT_FDCWD, "/proc/devices", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Character devices:\n  1 mem\n  4 /"..., 1024) = 583
close(4)                                = 0
stat("/dev/nvidia-uvm", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xef, 0), ...}) = 0
stat("/dev/nvidia-uvm-tools", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xef, 0x1), ...}) = 0

          openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4
          fcntl(4, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
          ioctl(4, _IOC(_IOC_NONE, 0, 0x1, 0x3000), 0x7ffc9ad6b590) = 0
          ioctl(4, _IOC(_IOC_NONE, 0, 0x27, 0), 0x7ffc9ad6b5a0) = 0
          ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffc9ad6a310) = 0
          ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7ffc9ad6b580) = 0
          close(3)                                = 0
          ioctl(4, _IOC(_IOC_NONE, 0, 0x2, 0x3000), 0) = 0
          close(4)                                = 0

munmap(0x7fa2565c3000, 26448520)        = 0
futex(0x673c90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(1, "./deviceQuery Starting...\n\n CUDA"..., 169./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 802
-> system not yet initialized
Result = FAIL
) = 169
exit_group(1)                           = ?
+++ exited with 1 +++

strace on Intel board works fine:

[... lines removed ...]
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(3)                                = 0
stat("/dev/nvidiactl", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0xff), ...}) = 0
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffe62133b90) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "80000000\n", 99)               = 9
close(4)                                = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffe62133b90) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0x900), 0x7f6b4c9ab220) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffe62133b90) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ac0) = 0
openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Name:\tdeviceQuery\nUmask:\t0022\nSt"..., 1024) = 1024
read(4, "0000,00000000,00000000,00000000,"..., 1024) = 312
close(4)                                = 0
openat(AT_FDCWD, "/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(4, /* 12 entries */, 32768)  = 392
openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(5, "003ff\n", 4096)                = 6
close(5)                                = 0
openat(AT_FDCWD, "/sys/devices/system/node/node1/cpumap", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(5, "ffc00\n", 4096)                = 6
close(5)                                = 0
getdents64(4, /* 0 entries */, 32768)   = 0
close(4)                                = 0
futex(0x7f6b4c99a890, FUTEX_WAKE_PRIVATE, 2147483647) = 0
get_mempolicy([MPOL_DEFAULT], [000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000], 1024, NULL, 0) = 0
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "af_packet 53248 2 - Live 0xfffff"..., 1024) = 1024
read(4, "ve 0xffffffffc0d3e000\nnf_conntra"..., 1024) = 1024
read(4, "bles 53248 11 ip6table_mangle,ip"..., 1024) = 1024
read(4, "ve 0xffffffffc0bde000\nkvm_intel "..., 1024) = 1024
close(4)                                = 0
openat(AT_FDCWD, "/proc/devices", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Character devices:\n  1 mem\n  4 /"..., 1024) = 611
close(4)                                = 0
stat("/dev/nvidia-uvm", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xed, 0), ...}) = 0
stat("/dev/nvidia-uvm-tools", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xed, 0x1), ...}) = 0

    openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4
    fcntl(4, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
    ioctl(4, _IOC(_IOC_NONE, 0, 0x1, 0x3000), 0x7ffe62133be0) = 0
    ioctl(4, _IOC(_IOC_NONE, 0, 0x27, 0), 0x7ffe62133bf0) = 0
    ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
    ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132860) = 0
    ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621320d0) = 0
    ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0

openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(5, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(5)                                = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 5
fcntl(5, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132950) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132770) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132910) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(6, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(6)                                = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 6
fcntl(6, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc9, 0x4), 0x7ffe62132960) = 0
ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd7, 0x228), 0x7ffe621326f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132a50) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132820) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132870) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 7
fstat(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(7, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(7)                                = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 7
fcntl(7, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
ioctl(7, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc9, 0x4), 0x7ffe621328c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132980) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132940) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132860) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621324e0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132890) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132470) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132770) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132620) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62130f80) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132860) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132840) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132850) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132970) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132970) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132880) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327e0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621328b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133590) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133590) = 0
mmap(NULL, 172032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b4df32000
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62131c90) = 0
mmap(NULL, 659456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b4de91000
sysinfo({uptime=12427, loads=[0, 0, 0], totalram=135146749952, freeram=133223190528, sharedram=20606976, bufferram=56446976, totalswap=4294959104, freeswap=4294959104, procs=368, totalhigh=0, freehigh=0, mem_unit=1}) = 0
uname({sysname="Linux", nodename="dx", ...}) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x25, 0), 0x7ffe62133bb0) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x17, 0), 0x7ffe62133c50) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x4a, 0xb8), 0x7ffe621339c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x4a, 0xb8), 0x7ffe62133b00) = 0
sysinfo({uptime=12427, loads=[0, 0, 0], totalram=135146749952, freeram=133196439552, sharedram=20606976, bufferram=56446976, totalswap=4294959104, freeswap=4294959104, procs=372, totalhigh=0, freehigh=0, mem_unit=1}) = 0
prlimit64(0, RLIMIT_AS, NULL, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
mmap(0x200000000, 141733920768, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x200000000
mmap(0x2300000000, 8589934592, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2300000000
mmap(NULL, 536866816, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b2b083000
munmap(0x7f6b2b083000, 83349504)        = 0
munmap(0x7f6b40000000, 185081856)       = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62133b90) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133aa0) = 0
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 8
fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6b4a881000
mprotect(0x7f6b4a882000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f6b4b080cf0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[16136], tls=0x7f6b4b081700, child_tidptr=0x7f6b4b0819d0) = 16136
openat(AT_FDCWD, "/proc/self/task/16136/comm", O_RDWR) = 9
write(9, "cuda-EvtHandlr", 14)          = 14
close(9)                                = 0
futex(0xa51798, FUTEX_WAKE_PRIVATE, 1)  = 1
getpid()                                = 16131
stat("/proc/16131/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
stat("/proc/16131/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
socket(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = 9
unlink("")                              = -1 ENOENT (Datei oder Verzeichnis nicht gefunden)
bind(9, {sa_family=AF_UNIX, sun_path=@"cuda-uvmfd-4026531836-16131\0"}, 31) = 0
listen(9, 128)                          = 0
write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
getpid()                                = 16131
futex(0x673c90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b80) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x4a, 0xb8), 0x7ffe62133d20) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ba0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133c10) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133c10) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
write(1, "./deviceQuery Starting...\n\n CUDA"..., 2376./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro K420"
  CUDA Driver Version / Runtime Version          11.4 / 10.1
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 980 MBytes (1027604480 bytes)
  ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
  GPU Max Clock rate:                            876 MHz (0.88 GHz)
  Memory Clock rate:                             891 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 132 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.1, NumDevs = 1, Device0 = Quadro K420 Result = PASS) = 2376
exit_group(0)                           = ?
+++ exited with 0 +++

So, initialization seems to struggle in /lib/libcuda.* or /lib/libcudart.* part (cuInit() or cudaGetDeviceInfo()), which is beyond further tracing by me.

I changed the slot of the card, no effect, same issue. I have tested different cards (Quardo P2000 / GTX 1660 (all supported by nvi drv 470 and cuda-10-1), same issue (returns 802 on AMD board).

Any further advice or hint you can provide to solve or circumvent the issue ?

Regards M

Update: problem persists with latest updates (18.05.2022) for openSUSE 15.3 to:

  • kernel-default-5.3.18-150300.59.63.1
  • kernel-firmware-nvidia 20210208-150300.4.7.1
  • nvidia-computeG05 470.129.06-lp153.50.1
  • nvidia-gfxG05-kmp-default 470.129.06_k5.3.18_57-lp153.50.1
  • nvidia-glG05 470.129.06-lp153.50.1
  • x11-video-nvidiaG05 470.129.06-lp153.50.1

and upgrade to cuda-10.2. Same behavior: switching same disk from Opteron/AMD to Xeon/Intel system works fine.