Nsight Systems doesn't profile kernels

Piane_Ramso · August 22, 2019, 10:34am

I have an application where multiple CUDA streams are used to achieve more concurrency.
Nsight Systems doesn’t provide any information about kernels running in streams (in default stream either), saying that memory operations (memset in my case) take 100% of time what is obviously wrong.

Nsight Systems shows the following (mind there are no warnings or errors):
External Media
The output is the same either I profile via Nsight Systems GUI or with “nsys profile -t cuda ./myapp” command and then import a report file in GUI.

Versions, hardware:
Ubuntu 18.04, GeForce RTX 2070 (the same situation is on Tesla V100), Driver Version: 418.67, CUDA Version: 10.1.

What’s wrong?

UPDATE: the same situation is with the app that uses default stream for all the calculations (one of older versions of the app). So, multiple streams are not the case, kernels are just not traced.

Piane_Ramso · August 23, 2019, 7:07am

Found the reason of such behavior.

There was dynamic parallelism in the app, after I avoided it everything worked fine.
Question 1: Is it a bug?

Still, there were no warning, errors or any other messages indicating any issues or restrictions on profiling my app.
Question 2: How can I suggest an improvement, or file a bug in case it is a bug, to Nsight Systems?

Andrey_Trachenko · August 29, 2019, 5:16pm

Thank you for drawing our attention to this! It looks like Nsight Systems currently doesn’t trace CDP kernels correctly. We’ll get a bug filed internally, and will update this thread once we have more information.

Piane_Ramso · August 30, 2019, 8:31am

Thank you! Wish you all the best with fixing it!

AKravets · October 24, 2019, 12:17pm

Hi!

This issue should be fixed in the latest Nsight Systems release 2019.5: NVIDIA Nsight Systems | NVIDIA Developer

Unfortunately due to another issue, you would need to create a file, named config.ini with the following line: “HandleInvalidEvents=true” in the directory, where you launch the nsys command.

For example

% cat config.ini
HandleInvalidEvents=true
% nsys profile -t cuda ./yourApp

Piane_Ramso · November 6, 2019, 8:31am

Hi!

Downloaded the new version, created config.ini with the content you posted. Several other issues…

When I run nsys with the app with dynamic parallelism present in the code, but it is not even run in it, I get “an illegal memory access was encountered” error. Without nsys it finished without any errors.
After that Nsight Compute GUI doesn’t show anything that has some sense. Check out the screenshot. External Media
When I comment the code with dynamic parallelism everything works as expected.
I can’t say that the solution with creation of some additional file is convenient and user-friendly. How people could know that a file should be created without reading this post?

Andrey_Trachenko · November 7, 2019, 2:40pm

Hi Timofei, unfortunately the screenshot doesn’t seem to be available for us (410 Gone). Can you please attach it directly in the reply, so that we could see it. Thanks!

Piane_Ramso · November 8, 2019, 3:05pm

Sorry for that.
Don’t see any possibility to attach it directly to the message, so, here is the public link to it https://yadi.sk/i/UTd6T-nJViaeJQ

jaideep777 · December 20, 2019, 5:56pm

I am facing the same issue with kernels with dynamic parallelism.

I am using Nsight Systems 2019.6.1

Running the following command does not list my kernels at all.
nsys profile --stats=true ./app

I followed the advice regarding config.ini, but still cannot see the kernels.

Any help would be appreciated.

hwilper · January 6, 2020, 3:18pm

I’m going to ask Andrey to have a look at this jaideep…however he is OOO until 1/9

jaideep777 · January 7, 2020, 11:12am

Thank you very much! Looking forward to Andrey’s comments.

Andrey_Trachenko · January 9, 2020, 12:46pm

jaideep777:

We looked into the issue with stats not printed correctly, and the upcoming release will contain a fix for this.

Piane_Ramso:

Unfortunately support for CDP has been lower on our priority list and we couldn’t get back to it yet.

hsa · January 26, 2022, 7:40pm

Is the CDP support in yet or close to ready?

Andrey_Trachenko · January 27, 2022, 3:54pm

Unfortunately the answer is still no, trace for CDP is not supported. Thank you for asking.

Topic		Replies	Views
No CUDA kernels shown in nsys profiler timeline when using dynamic parallelism Nsight Systems cuda , kernel , nsight	4	1442	January 7, 2021
How to profile dynamic parallelism CUPTI – CUDA Profiler Tools Interface	9	2375	November 29, 2023
Missing kernels in NSight Profiling Nsight Visual Studio Edition	4	2015	October 2, 2015
Dependency analysis in Nsight Profiling Linux Targets	4	881	March 10, 2023
Kernel profiling missing Nsight Visual Studio Edition	10	4579	April 14, 2017
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	3654	March 4, 2021
Nsight Systems Issue: Unable to configure the collection of CPU IP samples Profiling Linux Targets	12	8981	December 27, 2021
Cannot get any stream parallelism. CUDA Programming and Performance	13	1294	December 31, 2019
Kernel time of Nsight system is larger than nsight compute Profiling Linux Targets	11	924	April 3, 2024
Nsys does not show CUDA kernels Profiling Linux Targets	6	1284	December 12, 2022

Nsight Systems doesn't profile kernels

Related topics