Nsight Systems doesn't profile kernels

I have an application where multiple CUDA streams are used to achieve more concurrency.
Nsight Systems doesn’t provide any information about kernels running in streams (in default stream either), saying that memory operations (memset in my case) take 100% of time what is obviously wrong.

Nsight Systems shows the following (mind there are no warnings or errors):

The output is the same either I profile via Nsight Systems GUI or with “nsys profile -t cuda ./myapp” command and then import a report file in GUI.

Versions, hardware:
Ubuntu 18.04, GeForce RTX 2070 (the same situation is on Tesla V100), Driver Version: 418.67, CUDA Version: 10.1.

What’s wrong?

UPDATE: the same situation is with the app that uses default stream for all the calculations (one of older versions of the app). So, multiple streams are not the case, kernels are just not traced.

Found the reason of such behavior.

There was dynamic parallelism in the app, after I avoided it everything worked fine.
Question 1: Is it a bug?

Still, there were no warning, errors or any other messages indicating any issues or restrictions on profiling my app.
Question 2: How can I suggest an improvement, or file a bug in case it is a bug, to Nsight Systems?

Thank you for drawing our attention to this! It looks like Nsight Systems currently doesn’t trace CDP kernels correctly. We’ll get a bug filed internally, and will update this thread once we have more information.

Thank you! Wish you all the best with fixing it!


This issue should be fixed in the latest Nsight Systems release 2019.5: https://developer.nvidia.com/nsight-systems

Unfortunately due to another issue, you would need to create a file, named config.ini with the following line: “HandleInvalidEvents=true” in the directory, where you launch the nsys command.

For example

% cat config.ini
% nsys profile -t cuda ./yourApp


Downloaded the new version, created config.ini with the content you posted. Several other issues…

  1. When I run nsys with the app with dynamic parallelism present in the code, but it is not even run in it, I get “an illegal memory access was encountered” error. Without nsys it finished without any errors.
  2. After that Nsight Compute GUI doesn’t show anything that has some sense. Check out the screenshot.
  3. When I comment the code with dynamic parallelism everything works as expected.
  4. I can’t say that the solution with creation of some additional file is convenient and user-friendly. How people could know that a file should be created without reading this post?

Hi Timofei, unfortunately the screenshot doesn’t seem to be available for us (410 Gone). Can you please attach it directly in the reply, so that we could see it. Thanks!

Sorry for that.
Don’t see any possibility to attach it directly to the message, so, here is the public link to it https://yadi.sk/i/UTd6T-nJViaeJQ

I am facing the same issue with kernels with dynamic parallelism.

I am using Nsight Systems 2019.6.1

Running the following command does not list my kernels at all.
nsys profile --stats=true ./app

I followed the advice regarding config.ini, but still cannot see the kernels.

Any help would be appreciated.

I’m going to ask Andrey to have a look at this jaideep…however he is OOO until 1/9

Thank you very much! Looking forward to Andrey’s comments.


We looked into the issue with stats not printed correctly, and the upcoming release will contain a fix for this.


Unfortunately support for CDP has been lower on our priority list and we couldn’t get back to it yet.