Updated Nsight Systems and lost CUDA API trace

nchang · January 24, 2022, 8:18pm

I am profiling my python CUDA application with Nsight Systems that I installed inside the nvidia l4t-ml docker container (nvcr.io/nvidia/l4t-ml:l4t-ml:r32.5.0-py3).
Both the python application and NSight are executed inside the same docker container, running on a Jetson AGX Xavier.

After updating my Nsight Systems-cli from version 2021.1.3 to 2021.5.2 I am obtaining errors and have lost CUDA api calls and GPU sampling.

Diagnostics Summary reports:
Warning Injection CUDA injection initialization failed.
Warning Analysis CUDA profiling stopped unexpectedly: Cannot initialize CUDA event collection.
Warning Analysis No CUDA events collected. Does the process use CUDA?

Versions
Old: NVIDIA Nsight Systems version 2021.1.3.14-b695ea9
New: NVIDIA Nsight Systems version 2021.5.2.53-28d0e6e

Cuda version (output of nvcc --version):
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0

What could be causing this new issue? Any help would be appreciated

hwilper · January 24, 2022, 8:56pm

@liuyis can you add this to your list?

liuyis · January 24, 2022, 11:41pm

Hi @nchang, could you share the reports you collected from 2021.1 and 2021.5 respectively? Thanks

nchang · January 25, 2022, 2:23pm

Hi @liuyis

Attached are some sample reports for both versions. Thank you
Nvidia Reports.zip (7.0 MB)

liuyis · January 25, 2022, 4:10pm

Hi @nchang, where did you get the 2021.5 version of Nsight Systems?

Note that for L4T platform, Nsight Systems is bundled with JetPack SDK (https://developer.nvidia.com/embedded/develop/tools). It seems the latest version of JetPack SDK is 4.6, which carries Nsys 2021.2. (I.e. the latest officially released version of Nsys for L4T platform is 2021.2)

If you downloaded Nsight Systems from our website or developers zone, it’s meant for Desktop/Server platforms only. The Linux SBSA version may be able to execute on L4T platform since they are both Arm-based, but there is no guarantee that all the features will work as expected.

nchang · January 25, 2022, 6:56pm

I see, thank you for that clarification @liuyis . I was hoping to make use of the latest version of Nsight Systems which has the “analyze” command switch that was added.
Is there any significant difference between the Nsight Systems 2021.2 vs 2021.1 that I was previously using?

On another note, I am noticing that when I include the cuda api tracing (-t nvtx, osrt, cuda), nvtx blocks appearing my report are significantly slower. Is this expected when profiling with the added -t cuda option? I was expecting that the Nsight Systems profiler added very little overhead. What is the best way to profile a cuda application with realistic timings?

liuyis · January 25, 2022, 10:03pm

I was hoping to make use of the latest version of Nsight Systems which has the “analyze” command switch that was added.

You can copy the reports collected on L4T with 2021.1/2 to a desktop or server, and use 2021.5 version there for the nsys analyze comamnd.

Is there any significant difference between the Nsight Systems 2021.2 vs 2021.1 that I was previously using?

This links provides some information: Latest Nsight Developer Tools Releases: Nsight Systems 2021.2, Nsight Compute 2021.1, Nsight Visual Studio Code Edition | NVIDIA Technical Blog

On another note, I am noticing that when I include the cuda api tracing (-t nvtx, osrt, cuda), nvtx blocks appearing my report are significantly slower. Is this expected when profiling with the added -t cuda option? I was expecting that the Nsight Systems profiler added very little overhead. What is the best way to profile a cuda application with realistic timings?

The first CUDA API call will have significant overhead due to profiler initialization. For the rest of calls, there will also be some overhead but should not be very significant. What’s the amount of slow-down you were observing? Could you share reports with and without CUDA trace using the same Nsys version?

To minimize overhead, you can disable unnecessary features. For example if you are interested in CUDA and NVTX only, use something like nsys profile -t cuda, nvtx -s none <app>.

nchang · January 27, 2022, 4:12pm

Hi @liuyis thanks for the clarifications and suggestions concerning the NSight Systems versions. I will do as you suggest and use the previous Nsight Systems to generate the reports and the later version to run the “analyze”.

As for the CUDA tracing overhead, I have included reports (in 2 posts) and the generated stats when cuda tracing is included and not. You will see that the timings are significantly increased over the duration of profiling capture and not simply during initialization.

Thanks for investigating the issue as these increased timings make it difficult to evaluate optimizations accurately.

NoCuda.zip (91.0 MB)

nchang · January 27, 2022, 4:14pm

@liuyis , here is the second set of logs for cuda tracing (unfortunately the sqlite file is too large, please let me know if you wish to see it):
Cuda.zip (8.2 MB)

liuyis · January 27, 2022, 4:46pm

@nchang Thanks for uploading the report, I do see some of the ranges are 2X slower when CUDA trace is enabled. Is it possible to share the application, or a simple reproducer, so that we can investigate on our end?

nchang · January 28, 2022, 2:15pm

Hi @liuyis , yes exactly CUDA tracing seems to cause 2-3X slowdowns. As for sharing the application, is there a process available for sharing sensitive code with an NDA?

liuyis · February 1, 2022, 10:45pm

Hi @nchang, which company/organization do you (or does the code) belongs to? Does the company/organization have an existing SA (solution archtech) or Devtech contact with NVIDIA?

Topic		Replies	Views
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	9576	January 11, 2023
Nsys Does not Track CUDA Api events Profiling Linux Targets	5	1125	December 22, 2022
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	3841	March 4, 2021
Profiling DCGan Tutorial Spins forever Nsight Compute	13	1212	June 7, 2020
issues with using Nsight Systems CUDA Programming and Performance	0	989	September 20, 2018
Transitioning to Nsight Systems from NVIDIA Visual Profiler / nvprof Technical Blog	3	720	November 15, 2019
Only CUDA Context 0 is shown on Nsight timeline with RTX 2080 Ti Nsight Visual Studio Edition	6	1235	October 12, 2021
"No Events Captured" - When using Nsight 2.2 analysis tool with vs2010 Nsight Visual Studio Edition	3	1923	March 11, 2013
Unsupported CUDA driver version: 10010 Profiling Linux Targets	10	7087	August 2, 2019
Parallel Nsight CUDA Programming and Performance	0	655	May 18, 2011

Updated Nsight Systems and lost CUDA API trace

Related topics