I have a freshly installed Ubuntu 22.04 system with two RTX Titans (with bridge installed), one of which is also used a display GPU. Any call to software using cuda (deviceQuery, bandwidthTest, a CuDNN example, or loading tensorflow in python) causes the system to freeze for 1-2 minutes and often the display driver to seemingly reset. Is this expected behavior? Can i avoid it? I’d like to able to just load tensorflow of course. It happens both with CUDA 12 and 11.8.
Thanks a lot!
nvidia kernel module 520.61.05
Nvlink Core is being initialized, major device number 234
Initialized nvidia-drm 0.0.0 20160202 for 0000:1a:00.0 on minor 0
Initialized nvidia-drm 0.0.0 20160202 for 0000:68:00.0 on minor 1
Loaded the UVM driver, major device number 510
some messages in dmesg around times of freeze:
[ 1466.575694] NVRM: GPU at PCI:0000:68:00: GPU-05840d2e-94e5-7d79-fa44-67c47673d015
[ 1466.575703] NVRM: GPU Board Serial Number: <removed>
[ 1466.575706] NVRM: Xid (PCI:0000:68:00): 109, pid=1391, name=Xorg, Ch 00000008, errorString CTX SWITCH TIMEOUT, Info 0x4c002
[ 1559.557808] NVRM: Xid (PCI:0000:68:00): 109, pid=1391, name=Xorg, Ch 0000000a, errorString CTX SWITCH TIMEOUT, Info 0x4c002
[ 1754.619915] NVRM: Xid (PCI:0000:68:00): 109, pid=1626, name=gnome-shell, Ch 00000010, errorString CTX SWITCH TIMEOUT, Info 0x2c004
[ 3600.075508] NVRM: Xid (PCI:0000:68:00): 109, pid=1626, name=gnome-shell, Ch 00000010, errorString CTX SWITCH TIMEOUT, Info 0x7c004
kernel:
5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN RTX On | 00000000:1A:00.0 Off | N/A |
| 40% 40C P8 36W / 280W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA TITAN RTX On | 00000000:68:00.0 On | N/A |
| 41% 42C P8 27W / 280W | 324MiB / 24576MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+