2022-05-01T22:00:00Z
After quite some investigation and reading, and testing different combinations of HW&SW, I am still blocked and would welcome your advise and help to resolve the issue.
The intent is to use a Quadro K420 (or later card if they work), dual Opteron 2435 system, openSUSE 15.3 x86_64 and the respective NVIDIA repositories / packages for cuda samples/demo apps, and Machine Learning (with python) self-education with cuda acceleration.
Short summary: changing the very same (SSD) disk and quadro K420 card to a Xeon/Intel based system work fines, both for graphics and cuda support.
- different OS: Lubuntu 20.04 + drv 470 + cuda-10-1 fails for AMD boards, works for Intel board.
- Other AMD / MCP55 chipset type boards exhibit the same issue (fail)
- Later cuda versions show same issue (fail on AMD board).
My assumption is cuInit() (CUDA_ERROR_SYSTEM_NOT_READY = 802) from /include/cuda.h runs into hardware config or driver / firmware related issue which fails to handle initialization of the Nvidia cards on Opteron board (Nvidia MCP55 chipset).
The graphics driver version 470 (x11-video-nvidiaG05 470.103.01*) installs and work fine for X11 display (from nvidia repo). cuda-10-1 support Quadro K420 and installs and compile fine (from cuda repo), kernel-header file match the installed kernel version (with latest updates). (nvidia repo: /download.nvidia.com/opensuse/leap/15.3 , cuda repo: https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/)
The Nvidia devices load and initialize correctly in all cases:
# l /dev/nvi*
crw-rw----+ 1 root video 195, 0 2. Mai 14:16 /dev/nvidia0
crw-rw----+ 1 root video 195, 255 2. Mai 14:16 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254 2. Mai 14:16 /dev/nvidia-modeset
crw-rw-rw-+ 1 root root 238, 0 2. Mai 14:16 /dev/nvidia-uvm
crw-rw-rw-+ 1 root root 238, 1 2. Mai 14:16 /dev/nvidia-uvm-tools
/dev/nvidia-caps:
insgesamt 0
drwxr-xr-x 2 root root 80 2. Mai 14:53 ./
drwxr-xr-x 21 root root 4280 2. Mai 14:53 ../
cr-------- 1 root root 241, 1 2. Mai 14:53 nvidia-cap1
cr--r--r-- 1 root root 241, 2 2. Mai 14:53 nvidia-cap2
# l /proc/driver/
nvidia/ nvidia-caps/ nvidia-nvlink/ nvidia-nvswitch/ nvidia-uvm/
nvidia-smi and nvidia-settings both work fine in all configurations.
deviceQuery (same for precompiled version) returns:
---
/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 802
-> system not yet initialized
Result = FAIL
---
strace’ing the command deviceQuery shows a difference on AMD vs Intel board, executing deviceQuery fails in ioctcl() of /dev/nvidiactl, which then causes cudaGetDeviceCount to fail:
[... lines removed ...]
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffc9ad6b540) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "80000000\n", 99) = 9
close(4) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffc9ad6b540) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0x900), 0x7fa257eec220) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffc9ad6b540) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffc9ad6b470) = 0
openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Name:\tdeviceQuery\nUmask:\t0022\nSt"..., 1024) = 1024
read(4, ",00000000,00000000,00000000,0000"..., 1024) = 297
close(4) = 0
openat(AT_FDCWD, "/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(4, /* 11 entries */, 32768) = 360
openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(5, "fff\n", 4096) = 4
close(5) = 0
getdents64(4, /* 0 entries */, 32768) = 0
close(4) = 0
futex(0x7fa257edb890, FUTEX_WAKE_PRIVATE, 2147483647) = 0
get_mempolicy([MPOL_DEFAULT], [000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000], 1024, NULL, 0) = 0
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "af_packet 53248 2 - Live 0xfffff"..., 1024) = 1024
read(4, "e 0xffffffffc0c59000\nnf_conntrac"..., 1024) = 1024
read(4, "ables 53248 11 ip6table_mangle,i"..., 1024) = 1024
close(4) = 0
openat(AT_FDCWD, "/proc/devices", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Character devices:\n 1 mem\n 4 /"..., 1024) = 583
close(4) = 0
stat("/dev/nvidia-uvm", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xef, 0), ...}) = 0
stat("/dev/nvidia-uvm-tools", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xef, 0x1), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4
fcntl(4, F_GETFD) = 0x1 (flags FD_CLOEXEC)
ioctl(4, _IOC(_IOC_NONE, 0, 0x1, 0x3000), 0x7ffc9ad6b590) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x27, 0), 0x7ffc9ad6b5a0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffc9ad6a310) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7ffc9ad6b580) = 0
close(3) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x2, 0x3000), 0) = 0
close(4) = 0
munmap(0x7fa2565c3000, 26448520) = 0
futex(0x673c90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(1, "./deviceQuery Starting...\n\n CUDA"..., 169./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 802
-> system not yet initialized
Result = FAIL
) = 169
exit_group(1) = ?
+++ exited with 1 +++
strace on Intel board works fine:
[... lines removed ...]
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(3) = 0
stat("/dev/nvidiactl", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0xff), ...}) = 0
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffe62133b90) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "80000000\n", 99) = 9
close(4) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffe62133b90) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0x900), 0x7f6b4c9ab220) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffe62133b90) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ac0) = 0
openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Name:\tdeviceQuery\nUmask:\t0022\nSt"..., 1024) = 1024
read(4, "0000,00000000,00000000,00000000,"..., 1024) = 312
close(4) = 0
openat(AT_FDCWD, "/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(4, /* 12 entries */, 32768) = 392
openat(AT_FDCWD, "/sys/devices/system/node/node0/cpumap", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(5, "003ff\n", 4096) = 6
close(5) = 0
openat(AT_FDCWD, "/sys/devices/system/node/node1/cpumap", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(5, "ffc00\n", 4096) = 6
close(5) = 0
getdents64(4, /* 0 entries */, 32768) = 0
close(4) = 0
futex(0x7f6b4c99a890, FUTEX_WAKE_PRIVATE, 2147483647) = 0
get_mempolicy([MPOL_DEFAULT], [000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000], 1024, NULL, 0) = 0
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "af_packet 53248 2 - Live 0xfffff"..., 1024) = 1024
read(4, "ve 0xffffffffc0d3e000\nnf_conntra"..., 1024) = 1024
read(4, "bles 53248 11 ip6table_mangle,ip"..., 1024) = 1024
read(4, "ve 0xffffffffc0bde000\nkvm_intel "..., 1024) = 1024
close(4) = 0
openat(AT_FDCWD, "/proc/devices", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Character devices:\n 1 mem\n 4 /"..., 1024) = 611
close(4) = 0
stat("/dev/nvidia-uvm", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xed, 0), ...}) = 0
stat("/dev/nvidia-uvm-tools", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xed, 0x1), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia-uvm", O_RDWR|O_CLOEXEC) = 4
fcntl(4, F_GETFD) = 0x1 (flags FD_CLOEXEC)
ioctl(4, _IOC(_IOC_NONE, 0, 0x1, 0x3000), 0x7ffe62133be0) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x27, 0), 0x7ffe62133bf0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132860) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621320d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(5, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(5) = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 5
fcntl(5, F_GETFD) = 0x1 (flags FD_CLOEXEC)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132950) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132770) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132910) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(6, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(6) = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 6
fcntl(6, F_GETFD) = 0x1 (flags FD_CLOEXEC)
ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc9, 0x4), 0x7ffe62132960) = 0
ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd7, 0x228), 0x7ffe621326f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132a50) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132960) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132820) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132870) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 7
fstat(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(7, "ResmanDebugLevel: 4294967295\nRmL"..., 1024) = 805
close(7) = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 7
fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC)
ioctl(7, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc9, 0x4), 0x7ffe621328c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132980) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132940) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132860) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621324e0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132890) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621326b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132470) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132770) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327f0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132620) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327d0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62130f80) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132860) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132840) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132850) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132970) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62132970) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62132880) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327e0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621328b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe621327b0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133590) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133590) = 0
mmap(NULL, 172032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b4df32000
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62131c90) = 0
mmap(NULL, 659456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b4de91000
sysinfo({uptime=12427, loads=[0, 0, 0], totalram=135146749952, freeram=133223190528, sharedram=20606976, bufferram=56446976, totalswap=4294959104, freeswap=4294959104, procs=368, totalhigh=0, freehigh=0, mem_unit=1}) = 0
uname({sysname="Linux", nodename="dx", ...}) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x25, 0), 0x7ffe62133bb0) = 0
ioctl(4, _IOC(_IOC_NONE, 0, 0x17, 0), 0x7ffe62133c50) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x4a, 0xb8), 0x7ffe621339c0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x4a, 0xb8), 0x7ffe62133b00) = 0
sysinfo({uptime=12427, loads=[0, 0, 0], totalram=135146749952, freeram=133196439552, sharedram=20606976, bufferram=56446976, totalswap=4294959104, freeswap=4294959104, procs=372, totalhigh=0, freehigh=0, mem_unit=1}) = 0
prlimit64(0, RLIMIT_AS, NULL, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
mmap(0x200000000, 141733920768, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x200000000
mmap(0x2300000000, 8589934592, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2300000000
mmap(NULL, 536866816, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b2b083000
munmap(0x7f6b2b083000, 83349504) = 0
munmap(0x7f6b40000000, 185081856) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x28), 0x7ffe62133b90) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133aa0) = 0
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 8
fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6b4a881000
mprotect(0x7f6b4a882000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f6b4b080cf0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[16136], tls=0x7f6b4b081700, child_tidptr=0x7f6b4b0819d0) = 16136
openat(AT_FDCWD, "/proc/self/task/16136/comm", O_RDWR) = 9
write(9, "cuda-EvtHandlr", 14) = 14
close(9) = 0
futex(0xa51798, FUTEX_WAKE_PRIVATE, 1) = 1
getpid() = 16131
stat("/proc/16131/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
stat("/proc/16131/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
socket(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = 9
unlink("") = -1 ENOENT (Datei oder Verzeichnis nicht gefunden)
bind(9, {sa_family=AF_UNIX, sun_path=@"cuda-uvmfd-4026531836-16131\0"}, 31) = 0
listen(9, 128) = 0
write(8, "\1\0\0\0\0\0\0\0", 8) = 8
getpid() = 16131
futex(0x673c90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b80) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x4a, 0xb8), 0x7ffe62133d20) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ba0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133a60) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133c10) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133c10) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133ad0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffe62133b40) = 0
write(1, "./deviceQuery Starting...\n\n CUDA"..., 2376./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro K420"
CUDA Driver Version / Runtime Version 11.4 / 10.1
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 980 MBytes (1027604480 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
GPU Max Clock rate: 876 MHz (0.88 GHz)
Memory Clock rate: 891 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 132 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.1, NumDevs = 1, Device0 = Quadro K420 Result = PASS) = 2376
exit_group(0) = ?
+++ exited with 0 +++
So, initialization seems to struggle in /lib/libcuda.* or /lib/libcudart.* part (cuInit() or cudaGetDeviceInfo()), which is beyond further tracing by me.
I changed the slot of the card, no effect, same issue. I have tested different cards (Quardo P2000 / GTX 1660 (all supported by nvi drv 470 and cuda-10-1), same issue (returns 802 on AMD board).
Any further advice or hint you can provide to solve or circumvent the issue ?
Regards M