Nsys launch julia hangs on Linux

Hello, I am trying to profile my julia code with Nsight Systems but no report or error is generated, the execution simply freezes. My guess is there is an issue close to this topic, but since it was closed and it was for windows I opened a new topic.

This problem happens whenever I try to profile a julia code. For example, the following code freezes:

using CUDA

a = CUDA.rand(1024,1024,1024)
sin.(a)
CUDA.@profile sin.(a) 

I’m using the following command to profile the code:

nsys launch --trace=cuda,cublas julia --threads 32 test.jl

The output is the following:

application launched
[ Info: Running under Nsight Systems, CUDA.@profile will automatically start the profiler

WARNING: CUDA tracing is required for cudaProfilerStart/Stop API support. Turning it on by default.
waiting for capture range to start the collection
Capture range started in the application

Any idea on How to solve this issue? I don’t have sudo permissions on this system. I can ask the maintainers to implement your suggestions but it is not ideal.

System Information

Some important information about the system:

  • I’m running my code in an HPC with the CentOS Linux release 7.7.1908
  • CUDA version: 11.1.1-GCC-10.2.0
  • GPU: Tesla V100 16GB
  • GPU Drivers: 510.47.03
  • Julia version: 1.7.2
  • CUDA.jl version: 3.11.0
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   29C    P0    25W / 250W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@skottapalli

@mjpc13 - what version of nsys are you using?

Are you running on a headless system? Can you please run the nvidia-persistenced program before retrying to profile Julia? nvidia-persistenced is a daemon process that prevents the kernel driver from tearing down device state when no other process is using the device node.

$ nsys --version
NVIDIA Nsight Systems version 2020.3.4.32-52657a0

I’m running in a headless system and the nvidia-persistenced is failing with the following message:

$ nvidia-persistenced --verbose
nvidia-persistenced failed to initialize. Check syslog for more details.

I don’t have sudo permissions in the system. Can I run nvidia-persistenced without permissions?

Thanks for trying. Could you try the latest version 2022.2.1 from NVIDIA Nsight Systems | NVIDIA Developer? We made a few fixes for profiling Julia code since the version you are using. The latest version should not run into any issue even if nvidia-persistenced isn’t running before you profile.

Updating the version fixed the issue, thank you.