Next-Gen debug on V100 - how to view data values inside kernel?

Hi all,

I have a server equipped with 2 NVIDIA Volta V100, Windows Server 2016, Visual Studio 2015 with GPU driver 441.22, CUDA toolkit 10.2 and hence Nsight Visual Studio Integration 2019.4.0.19274.

I’d like to debug my application, inspecting the values of the variables for each thread. I’ve tried first of all to debug the sample program “addKernel” which comes invoking a .cu file for the current project.

I read that on V100 the Legacy Debugging is not available, just the Next-Gen one (as described here: https://developer.nvidia.com/nsight-visual-studio-edition-supported-gpus-full-list), hence there is no CUDA Warp Watch to call for inspecting the values of the variables inside the kernel.

Inside the kernel, in the Local Variables pane, only one thread is shown (usually, it happens to be thread0). The only way to inspect all the c[i] array values is to select Debug -> Windows -> Watch -> Watch1 and calling separately the different elements of the array, known that it is a float data type (so, c for the first element, c+4 for the second one, till c+16 for the last one).

This is tedious. It is not like the CUDA Warp Watch tool (available just in the Legacy Debug if I understood correctly) where all the values are shown together.

Here (https://docs.nvidia.com/nsight-visual-studio-edition/Content/CUDA_Info_View_NextGen.htm), for the Nsight Visual Studio Edition 2019.4, the Next-Gen State Inspection Views are described, no mention about how inspecting values inside the kernel. How can it be done? Is it possible just using the CUDA Next-Gen debug tool? Are there other tools on NVIDIA V100 and using Visual Studio to check the values of the variables after computation inside the kernel?

Thank you for the attention

Marco