Nsys hangs when profiling any cuda process

I am profiling cuda program on a linux workstation. The nsys command hangs when profiling any cuda program. For example, the following command hangs

nsys profile python3 -c "import torch; a = torch.empty((1, ), dtype=torch.int32, device='cuda')"

Nsight system version:

➜   ~/nsight-systems-2024.1.1/bin/nsys --version
NVIDIA Nsight Systems version 2024.1.1.59-241133802077v0

CUDA toolkit version: 12.2
CUDA driver version: 535.86.10
torch version: 2.1.2
How can I debug this issue? After I run the above command, I can find three nsys process in my system.

        2846  1.0  0.0 5116820 41752 pts/0   Sl+  23:02   0:00 ~/nsight-systems-2024.1.1/bin/nsys profile python3 -c import torch; a = torch.empty((1,),dtype=torch.float16, device='cuda')
        2853  5.5  0.0 5438964 114580 ?      Ssl  23:02   0:00 ~/nsight-systems-2024.1.1/target-linux-x64/nsys --start-agent --session-name=profile-2846 --shm-name=NSys-create-agent-1630e3b5-efe2-4ccd-9c65-ec2b17ecb88d-4026531836-2024.1.1
        2867  0.0  0.0  11004  3840 pts/0    S+   23:02   0:00 ~/nsight-systems-2024.1.1/target-linux-x64/nsys-launcher /tmp/2d3c-2d7e-a209-2394 28
        2869  0.0  0.0  11140  3072 pts/0    S+   23:02   0:00 nsys-tee 3 4 5 6

It seems that they both hang and never finish.