Nsys doesn't show cuda kernel and memory data

I’m following openACC guide to optimize the jacobi function while using nsys to profile.
However, after adding kernels directives and running after nsys profile from CLI, seeing the rep from NVIDIA Nsight Systems GUI, but in Timeline View, I can’t see the detail of kernels in Cuda block.


the diagnostics summary is here:

Messages
Source	Process ID	Time	Description
Information	Daemon		-00:00.078	
Dwarf backtraces collected.
Information	Daemon		-00:00.078	
Event 'CPU Clock (sw)', with sampling period 2000000, used to trigger process-tree CPU IP sample collection.
Information	Daemon		-00:00.000	
4 CPU IP samples collected for every CPU IP backtrace collected.
Information	Analysis		00:00.000	
Profiling has started.
Information	Daemon	11714	00:00.000	
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_1011695/streams/pid_11714_stdout.log and stderr.log for program output
Information	Injection	11714	00:00.016	
Common injection library initialized successfully.
Information	Injection	11714	00:00.021	
OS runtime libraries injection initialized successfully.
Information	Injection	11714	00:00.025	
OpenGL injection initialized successfully.
Information	Injection	11714	00:00.124	
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call. See --flush-on-cudaprofilerstop to control this behavior.
Information	Injection	11714	00:00.148	
Loaded CUPTI library: /opt/nvidia/hpc_sdk/Linux_x86_64/24.11/profilers/Nsight_Systems/target-linux-x64/libcupti.so.12.6
Information	Injection	11714	00:00.247	
OpenACC injection initialized successfully.
Information	Injection	11714	00:00.249	
Enabling trace for device graph launch
Information	Injection	11714	00:00.249	
CUDA injection initialized successfully.
Error	Injection	11714	00:00.279	
CUDA device 0: Unified Memory cannot be traced on devices that don't support peer-to-peer transfers.Please verify that SLI/NVLink is functioning properly.
Warning	Analysis	11714	00:00.431	
No NVTX events collected. Does the process use NVTX?
Warning	Analysis	11714	00:00.431	
No OpenGL events collected. Does the process use OpenGL?
Information	Analysis	11714	00:00.431	
Number of CUDA events collected: 11,010.
Information	Analysis	11714	00:00.431	
Number of OS runtime libraries events collected: 578,127.
Information	Injection	11714	00:56.822	
Number of CUPTI events produced: 55,053, CUPTI buffers: 50.
Information	Analysis		00:56.920	
Profiling has stopped.
Information	Daemon		00:57.262	
Number of IP samples collected: 28,464.

my cuda driver version is below:

nvidia-smi
Tue Dec  3 20:24:29 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.134                Driver Version: 553.35         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4500               On  |   00000000:AC:00.0 Off |                  Off |
| 30%   48C    P2             58W /  200W |    2264MiB /  20470MiB |     34%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     18653      C   /jacobi_acc_kernels                         N/A      |
+-----------------------------------------------------------------------------------------+
the Nsys GUI version is 2024.7.1
the Nsys CLI version is 2024.6.1 installed with HPC tools.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

from other same topics, I think it is the combination issue but it seems the driver is compatible with CUDA version.
could anyone offer some helps, thanks a lot.

It looks to me like you just need to zoom in on the timeline (it is a little hard to tell from a screen shot). There appears to be a large quantity of information in the CUDA API row, but it is indistinguishable at this level of detail.

What I don’t see is the GPU rows, but that might be becase that section is off the screen.

@hwilper Thanks, you are correct, When I zoom in it gives me the memory copies and CUDA driver API calls section like the picture below;


However, is there any way to make them(memory and kernels) different rows so that without zoom in I can see the partition of each section
Also, as you said, I still can’t figure out where is the GPU rows in Timeline View, could you please give any advice?

When you open your timeline, you should see a row for every GPU, here is an example from a result I have in hand:

If you expand that “CUDA HW” row you would see:

What concerns me is that we are not seeing that row at all.

Can you tell me what command you ran?

Sure, I ran

nvfortran -acc -Minfo=accel -o executableA A.f90
nsys profile execuatableA

Then open the report from Nsys GUI;
by the way, what I want to show up is also the line you showed, which has memory and kenerls data so that I can compare their cost in a program.
notice that I’m using wsl2, could that be the reason why I can’t see the GPU row?

Are you using WSL within a docker?

No, without docker.
I install it from powershell.

@liuyis can you add some insight here?

In WSL2, Nsys still has some issue with GPU->CPU timestamp conversion, so the GPU side CUDA activities are not processed correctly and not show up on timeline by default.

There is a WAR to let CUPTI, the underlying library that Nsys uses for CUDA trace, to handle the timestamp conversion. It will not be as accurate as Nsys’ normal mechanism, but the GPU side activities can show up on timeline.

Firstly, you might need to download the latest 2024.7 release from Nsight Systems - Get Started | NVIDIA Developer. There’s a bug in the CUPTI we used in older versions that prevented the WAR from working.

After that, here’s the steps for the WAR:

  1. Find the Nsys config.ini file path from nsys -z. For example on my system:
$ nsys -z
/home/liuyis/.config/NVIDIA Corporation/nsys-config.ini
  1. Create the config.ini file if it does not already exist. Note the path might have a space in it so it needs to be wrapped by quotes
mkdir -p "/home/liuyis/.config/NVIDIA Corporation"
touch "/home/liuyis/.config/NVIDIA Corporation/nsys-config.ini"
  1. Add a line in the config file: CuptiUseRawGpuTimestamps=false
echo "CuptiUseRawGpuTimestamps=false" > "/home/liuyis/.config/NVIDIA Corporation/nsys-config.ini"

Let me know if this works, thanks

@liuyis @hwilper
Amazing!
Thank you for all your helps, after the steps as per your instructions, the GPU timeline appears as follows:

1 Like