CUDA Program hang for a long time in Linux System

My very simple program is taking a long time to launch. I’ve upgraded to the latest driver, and also enabled persistence mode. This issue can also be found when I run use some vscode extension that may utilize GPU. The hang seems only happen at launch time, so I use strace to check which op takes that much time, and found that 1 ioctl tooks 24s!!

My GPU settings is as below:
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 On | N/A |
| 31% 44C P5 39W / 350W| 1021MiB / 24576MiB | 29% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce RTX 3090 On | 00000000:07:00.0 Off | N/A |
| 30% 31C P8 7W / 350W| 6MiB / 24576MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1062 G /usr/lib/Xorg 275MiB |
| 0 N/A N/A 1156 G /usr/bin/gnome-shell 62MiB |
| 1 N/A N/A 1062 G /usr/lib/Xorg 4MiB |
±--------------------------------------------------------------------------------------+

And here is the strace logs.
testrun.log (105.3 KB)

Could you please help to check what’s the issue?

I found that when I remove the NVLink, the hang disappeared. This is very strange to me. Can anyone help me here?

This is the log from

$ nvidia-smi nvlink -c
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-af2d97b2-c063-6609-d833-a19a4492048a)
	 Link 0, P2P is supported: true
	 Link 0, Access to system memory supported: true
	 Link 0, P2P atomics supported: true
	 Link 0, System memory atomics supported: true
	 Link 0, SLI is supported: true
	 Link 0, Link is supported: false
	 Link 1, P2P is supported: true
	 Link 1, Access to system memory supported: true
	 Link 1, P2P atomics supported: true
	 Link 1, System memory atomics supported: true
	 Link 1, SLI is supported: true
	 Link 1, Link is supported: false
	 Link 3, P2P is supported: true
	 Link 3, Access to system memory supported: true
	 Link 3, P2P atomics supported: true
	 Link 3, System memory atomics supported: true
	 Link 3, SLI is supported: true
	 Link 3, Link is supported: false
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-8cd77855-a0ad-17e4-6f37-94c3eed8fcaf)
	 Link 0, P2P is supported: true
	 Link 0, Access to system memory supported: true
	 Link 0, P2P atomics supported: true
	 Link 0, System memory atomics supported: true
	 Link 0, SLI is supported: true
	 Link 0, Link is supported: false
	 Link 1, P2P is supported: true
	 Link 1, Access to system memory supported: true
	 Link 1, P2P atomics supported: true
	 Link 1, System memory atomics supported: true
	 Link 1, SLI is supported: true
	 Link 1, Link is supported: false
	 Link 3, P2P is supported: true
	 Link 3, Access to system memory supported: true
	 Link 3, P2P atomics supported: true
	 Link 3, System memory atomics supported: true
	 Link 3, SLI is supported: true
	 Link 3, Link is supported: false