I am profiling cuda program on a linux workstation. The nsys command hangs when profiling any cuda program. For example, the following command hangs
nsys profile python3 -c "import torch; a = torch.empty((1, ), dtype=torch.int32, device='cuda')"
Nsight system version:
➜ ~/nsight-systems-2024.1.1/bin/nsys --version
NVIDIA Nsight Systems version 2024.1.1.59-241133802077v0
CUDA toolkit version: 12.2
CUDA driver version: 535.86.10
torch version: 2.1.2
How can I debug this issue? After I run the above command, I can find three nsys process in my system.
2846 1.0 0.0 5116820 41752 pts/0 Sl+ 23:02 0:00 ~/nsight-systems-2024.1.1/bin/nsys profile python3 -c import torch; a = torch.empty((1,),dtype=torch.float16, device='cuda')
2853 5.5 0.0 5438964 114580 ? Ssl 23:02 0:00 ~/nsight-systems-2024.1.1/target-linux-x64/nsys --start-agent --session-name=profile-2846 --shm-name=NSys-create-agent-1630e3b5-efe2-4ccd-9c65-ec2b17ecb88d-4026531836-2024.1.1
2867 0.0 0.0 11004 3840 pts/0 S+ 23:02 0:00 ~/nsight-systems-2024.1.1/target-linux-x64/nsys-launcher /tmp/2d3c-2d7e-a209-2394 28
2869 0.0 0.0 11140 3072 pts/0 S+ 23:02 0:00 nsys-tee 3 4 5 6
It seems that they both hang and never finish.