Nsys doesn't show cuda kernel and memory data

baihdong · December 3, 2024, 11:35am

I’m following openACC guide to optimize the jacobi function while using nsys to profile.
However, after adding kernels directives and running after nsys profile from CLI, seeing the rep from NVIDIA Nsight Systems GUI, but in Timeline View, I can’t see the detail of kernels in Cuda block.

the diagnostics summary is here:

Messages
Source	Process ID	Time	Description
Information	Daemon		-00:00.078	
Dwarf backtraces collected.
Information	Daemon		-00:00.078	
Event 'CPU Clock (sw)', with sampling period 2000000, used to trigger process-tree CPU IP sample collection.
Information	Daemon		-00:00.000	
4 CPU IP samples collected for every CPU IP backtrace collected.
Information	Analysis		00:00.000	
Profiling has started.
Information	Daemon	11714	00:00.000	
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_1011695/streams/pid_11714_stdout.log and stderr.log for program output
Information	Injection	11714	00:00.016	
Common injection library initialized successfully.
Information	Injection	11714	00:00.021	
OS runtime libraries injection initialized successfully.
Information	Injection	11714	00:00.025	
OpenGL injection initialized successfully.
Information	Injection	11714	00:00.124	
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call. See --flush-on-cudaprofilerstop to control this behavior.
Information	Injection	11714	00:00.148	
Loaded CUPTI library: /opt/nvidia/hpc_sdk/Linux_x86_64/24.11/profilers/Nsight_Systems/target-linux-x64/libcupti.so.12.6
Information	Injection	11714	00:00.247	
OpenACC injection initialized successfully.
Information	Injection	11714	00:00.249	
Enabling trace for device graph launch
Information	Injection	11714	00:00.249	
CUDA injection initialized successfully.
Error	Injection	11714	00:00.279	
CUDA device 0: Unified Memory cannot be traced on devices that don't support peer-to-peer transfers.Please verify that SLI/NVLink is functioning properly.
Warning	Analysis	11714	00:00.431	
No NVTX events collected. Does the process use NVTX?
Warning	Analysis	11714	00:00.431	
No OpenGL events collected. Does the process use OpenGL?
Information	Analysis	11714	00:00.431	
Number of CUDA events collected: 11,010.
Information	Analysis	11714	00:00.431	
Number of OS runtime libraries events collected: 578,127.
Information	Injection	11714	00:56.822	
Number of CUPTI events produced: 55,053, CUPTI buffers: 50.
Information	Analysis		00:56.920	
Profiling has stopped.
Information	Daemon		00:57.262	
Number of IP samples collected: 28,464.

my cuda driver version is below:

nvidia-smi
Tue Dec  3 20:24:29 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.134                Driver Version: 553.35         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4500               On  |   00000000:AC:00.0 Off |                  Off |
| 30%   48C    P2             58W /  200W |    2264MiB /  20470MiB |     34%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     18653      C   /jacobi_acc_kernels                         N/A      |
+-----------------------------------------------------------------------------------------+

the Nsys GUI version is 2024.7.1
the Nsys CLI version is 2024.6.1 installed with HPC tools.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

from other same topics, I think it is the combination issue but it seems the driver is compatible with CUDA version.
could anyone offer some helps, thanks a lot.

hwilper · December 3, 2024, 5:15pm

It looks to me like you just need to zoom in on the timeline (it is a little hard to tell from a screen shot). There appears to be a large quantity of information in the CUDA API row, but it is indistinguishable at this level of detail.

What I don’t see is the GPU rows, but that might be becase that section is off the screen.

baihdong · December 4, 2024, 1:17am

@hwilper Thanks, you are correct, When I zoom in it gives me the memory copies and CUDA driver API calls section like the picture below;

However, is there any way to make them(memory and kernels) different rows so that without zoom in I can see the partition of each section
Also, as you said, I still can’t figure out where is the GPU rows in Timeline View, could you please give any advice?

hwilper · December 4, 2024, 4:39pm

When you open your timeline, you should see a row for every GPU, here is an example from a result I have in hand:

If you expand that “CUDA HW” row you would see:

What concerns me is that we are not seeing that row at all.

Can you tell me what command you ran?

baihdong · December 5, 2024, 4:27am

Sure, I ran

nvfortran -acc -Minfo=accel -o executableA A.f90
nsys profile execuatableA

Then open the report from Nsys GUI;
by the way, what I want to show up is also the line you showed, which has memory and kenerls data so that I can compare their cost in a program.
notice that I’m using wsl2, could that be the reason why I can’t see the GPU row?

hwilper · December 5, 2024, 8:58pm

Are you using WSL within a docker?

baihdong · December 6, 2024, 1:12am

No, without docker.
I install it from powershell.

hwilper · December 6, 2024, 2:41pm

@liuyis can you add some insight here?

liuyis · December 6, 2024, 3:56pm

In WSL2, Nsys still has some issue with GPU->CPU timestamp conversion, so the GPU side CUDA activities are not processed correctly and not show up on timeline by default.

There is a WAR to let CUPTI, the underlying library that Nsys uses for CUDA trace, to handle the timestamp conversion. It will not be as accurate as Nsys’ normal mechanism, but the GPU side activities can show up on timeline.

Firstly, you might need to download the latest 2024.7 release from Nsight Systems - Get Started | NVIDIA Developer. There’s a bug in the CUPTI we used in older versions that prevented the WAR from working.

After that, here’s the steps for the WAR:

Find the Nsys config.ini file path from nsys -z. For example on my system:

$ nsys -z
/home/liuyis/.config/NVIDIA Corporation/nsys-config.ini

Create the config.ini file if it does not already exist. Note the path might have a space in it so it needs to be wrapped by quotes

mkdir -p "/home/liuyis/.config/NVIDIA Corporation"
touch "/home/liuyis/.config/NVIDIA Corporation/nsys-config.ini"

Add a line in the config file: CuptiUseRawGpuTimestamps=false

echo "CuptiUseRawGpuTimestamps=false" > "/home/liuyis/.config/NVIDIA Corporation/nsys-config.ini"

Let me know if this works, thanks

baihdong · December 7, 2024, 5:13am

@liuyis @hwilper
Amazing!
Thank you for all your helps, after the steps as per your instructions, the GPU timeline appears as follows:

system · December 21, 2024, 5:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2496	September 14, 2023
How to profile an application with Cuda 12.1 driver? Profiling Linux Targets	19	2535	July 18, 2023
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1657	January 12, 2023
Nsys Does not Show the kernels output Profiling Embedded Targets	21	3140	October 20, 2022
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	28	7150	November 14, 2024
Error with CUPTI when profiling CUDA kernel written using Numba Profiling Linux Targets cuda , python , numba	7	663	March 7, 2024
Sqlite does not contain CUDA kernel data CUDA on Windows Subsystem for Linux	12	3465	April 28, 2023
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	8829	January 11, 2023
[QuadDCommon::tag_message*] = No GPU associated to the given UUID Profiling Linux Targets	24	888	November 5, 2024
Nsys does not show CUDA kernels Profiling Linux Targets	6	1243	December 12, 2022

Nsys doesn't show cuda kernel and memory data

Related topics