NSIGHT Visual Studio Edition not working with Quadro RTX 5000

The NSIGHT visual studio pluggin is detecting the two RTX 5000 we have installed, but when executing the application with Trace Application and CUDA selected, it does not record any event from any of the two GPU’s.

Windows 10 Pro 1809
CUDA 10
Driver 412.16
Visual Studio 2017 15.9.4 using x64 compiler

With this information, is the compiler ok? The windows version? Or simply, the RTX 5000 compatibility is not still mature??

We plan to sell thousand’s of this GPU’s, but right now, we can’t work with them.

Thanks in advance.

Ok, found out that Turing support for NSIGHT Visual Studio edition is never going to happen.

First check to NSIGHT Systems, makes it so much difficult to understand what’s happening.

The user interface is so much less clear.

PLEASE, correct me if I’m wrong, but:

When looking into the CUDA kernels execution line in the timeline, in order to know the name of the kernel, instead of seeing it directly on the line that represents the time used by the kernel, now I have to click on the runtime call that launched it. It makes it so much unclear, and I have to make such a nightmare of zoom in and zoom out in order to find small kernels.

It looks robust, I understand that it now is the same API for all OS’s. Unified features among OS it’s nice for maintainability. BUT, I only wish you had used the NSIGHT Visual Studio interface.

I may be one of the few intensive users of NSIGHT Visual Studio. Our Windows application uses multiple CPU threads, many streams, exposes a lot of potential parallelism for kernel and kernel/transfers overlapping, and so on. We execute many kernels at the same time with different execution times and different occupancy levels, which allows for different amounts of possible overlapping.

All this was so clear with NSIGHT Visual Studio edition. Now with NSIGHT Systems it is so unclear.

Additionally, I don’t see the possibility to analyze CUDA kernel in NSIGHT Systems. I understand I have to use NSIGHT Compute for that. Still didn’t manage to make it work.

Is there a place besides this one, to ask for improvements?

Any way, I’m going for the tutorials. Thanks for making them! I will definitely need them.

I’ve taken the liberty of moving this thread over to the Nsight Systems forum for Windows X86 targets.

May I ask what version of Nsight Systems you have been trying?

Hi hwilper,

First of all, thanks for taking care of my post. Appreciate it sincerely.

Let me update you with my latest findings:

1 Version 2018.3.4.3 (any newer version? this is the version I find on the official dowloads page.)
2 I found how to see actual kernel executions individually and also memory transfers, and how to distinguish DtoH, HtoD and DtoD.
3 Still, the view is “per stream”, which makes it quite slow to see interactions between kernels and transfers on different streams:

  • I wish I could see all the kernel calls collapsed in a single line in the timeline, each with slightly different colors, so it is much easier to see if and how much they overlap.
  • The same goes for the memory transfers, and close to the previous "kernel line", plus using a different color for each type of memory transfer. For instance, red for DtoH, green for HtoD, blue for DtoD. All this, will make much easier to see how transfers overlap between them and with computation. And specifically, will allow to see which calls do not overlap, like for instance cudaMemsetAsync on WDDM for small arrays.

A clear example of this, is NSIGHT Visual Studio edition. Which is slow and buggy, but has this nice visualization mode, which I think makes it much faster to see everything.

Thanks a lot for reading.

I’ve also pinged Doron Ofek on this forum, he knows the Windows target better than I (I usually focus on the Linux target).

Have you found the correlations yet? If you click on a CUDA call on the CPU or on a CUDA kernel on the GPU, the related calls become highlighted. If the correlation is off screen, you will even get teal arrows to help you find the related calls.

You can also pin individual rows and scroll the screen to bring things you are interested in closer together.

Ok, pinning everything would remove some unnecessary rows that separate the different streams, but still, we have around 27 streams, in 1, 2 or 3 GPU’s (depending on configuration).

The code is very large, there are many developers, and I need to make sure no thread is switching contexts among many other things, which is easier if I have a compact view of kernel execution and memory transfers, for all GPU’s.

The “CUDA Kernel running” and “Memory operation in progress” would be the lines I would like to see, if only I could differentiate individual kernels and transfers, without having to look on the individual streams.

Let me include an screenshot of a tiny and ugly example (with a few bad practices like DeviceSyncrhonize):


https://ibb.co/LnFtzNC

In here, you can see that I didn’t uncollapse the streams section. For a first review, I don’t need to. Mainly because I can see all kernels executions in the row “Compute”, and I can even see how they overlap their endings and begginings, because they where enqueued in different streams. So we don’t have between kernels waiting time.

I also can see, that the first three memory transfers, in the row “Memory” are HtoD because they are green, and they overlap with a large kernel execution, that is using data different from the one being transferred now, and enqueued in a different stream.

Finally, I can see that the execution ends with three transfers DtoD (in clear blue), that do overlap slightly among themselves and with the last kenel.

If there where DtoH transfers, they would appear on the Memory row in purple, and they can overlap with HtoD transfers, DtoD transfers, and kernels.

With this, in a very fast overview, I can see bad behaviors, and start looking for the guilty code.

Just wanted to let you know that you have spawned some interesting discussions. I’ll let you know more (like what we decided and when we might be able to fit in on the roadmap) when I know more

Thanks for the update.

This topic is very interesting for me, I’m available for a call if you want more feedback from an intensive user.

If we use a tool that shares screens, I can show more timeline examples and explain more details on things that I found.

Just send me an email at my forum account email.

Thanks!