Nsight VSE 5.3 cannot debug in VS2015

Hello,I have
Win7 Home SP1 x64
Visual Studio Community 2015 (14.0.25431.01 Update 3)
CUDA 8.0
Nsight VSE 5.3 (build 5.3.0.17162)
CPU: i5-3570K @3.4GHz
Prim. GPU: Intel HD 4000 (on CPU)
Sec. GPU : GeForce GTX 660 (2GB)
Driver: NVIDIA 384.76 = (22.21.13.8476)

I’m trying CUDA Debugger tutorial:
http://docs.nvidia.com/nsight-visual-studio-edition/5.3/Content/Using_CUDA_Debugger.htm
=> matrixMul_vc100.vcpxroj

I did all steps incl.

  • Rebuild matrixMul (Debug, win32)
  • set breakpoints
  • Ex3: Start Nsight Monitor
  • Start CUDA Debugging

Output from: Nsight

CUDA context created : 0053e8d8
CUDA module loaded: 058d0e08 matrixMul.cu

Output in CMD window:

[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “GeForce GTX 660” with compute capability 3.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…

and nothing happend for 10 minutes.

If I run it w/o Debugger, it immediatelly shows:

done

and after few seconds it shows the rest:

Performance= 42.83 GFlop/s, Time= 3.060 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: OK

Note: For peak performance, please refer to the matrixMulCUBLAS example.
Press any key to continue . . .

Any hint what to do to be able debug the cuda code? To see Locals etc?

Hi,

Sorry for the delay, does your issue still exist? matrixMul is fully tested in our system.

Best Regards
Harry

Hi,
The issue is w/ Nsight. When I did “Start CUDA Debugging”, it started app but it looked like something wrong w/ debugging - after 10 minutes no breakpoint reached , no debug info like Locals shown in VS.
After 14-16 minutes it reset GPU (like TDR) but TDR delay is 3600s.
How the screen should look like?
Is there another example for Nsight Debugging? Some screen shots would be appreciated.
Thanks.
Martin

Looks like you are using the cuda samples in nsight folder, could you have a try on cuda samples in cuda sdk, which is located at “c:\programdata\nvidia corporation\cuda samples\v8.0”.

I will try to repo it on the samples in nsight folder tomorrow

Here is what it should look like, I can debug matrixMul in nsight samples on my GTX 660.

I’m not sure, but could you try to disable the intel gpu in bios, interl gpu may interrupt the debugging

I tried matrixMul_vs2015.sln from C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\0_Simple\matrixMul but it’s the same like before. :(

Regarding to disable intel gpu: I set intel gpu as primary b/c if Nvidia was primary (w/ LCD connected), the picture on LCD flickered and Win7 was unresponsive during CUDA code running (until TDR reset GPU).
Curretly: Nvidia GPU has no LCD connected. Both LCDs are on intel GPU.

Also ref. “Setup Local Headless GPU Debugging” :
On Windows 7, it’s recommended that users run their CUDA applications on a headless GPU.
That’s my case. Nvidia is headless.

What about Nsight VSE User setting?
Launch - Launch Project or External program?

sorry, duplicated update. It can be removed.

Right click your project and start CUDA debugging, it should work.

You can use the 1_Utilities\deviceQuery in cuda sample to find out how many cuda devices you have.

The “right click” and “menu Nsight” do the same => open cmd window, show Nsight connected, but nothing more.

Output from device query:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\1_Utilities\deviceQuery…/.
./bin/win64/Debug/deviceQuery.exe Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GTX 660”
CUDA Driver Version / Runtime Version 9.0 / 8.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 5) Multiprocessors, (192) CUDA Cores/MP: 960 CUDA Cores
GPU Max Clock rate: 1098 MHz (1.10 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 393216 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536),
3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Mo
del)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Versi
on = 8.0, NumDevs = 1, Device0 = GeForce GTX 660
Result = PASS
Press any key to continue . . .

Here are pics from CUDA Info 1:




Yeah, according to your picture, the debugger should work, did you build your app with in debug mode? Also only the bp in global and device functions can be hit, you cannot debug the cpu code in nsight.

Yes I build it as Debug, x64 - see pic.

Should I try to reinstall Nsight or CUDA Toolkit?
I installed it when Nvidia was primary GPU, then I found out that it must be headless, so changed BIOS setting to make intel Primary and Nvidia secondary.

No need to reinstall anything, actually I really have no idea what’s going on, could you try to run you nsight monitor as administrator, if it still doesn’t work, could you try the latest nsight 5.4? Thank you.

I have found some time and uninstalled Nsight 5.3 and installed 5.4.
The examples in Nsight 5.4 (Debugging/Matrix Multiply) are somehow wrong, I 've got an error when loading to VS2015.
I tried examples from Nsight 5.3.

EDITED:

First I did mistake and had SimpleStreams as main project in Solution , so it run debug for Simple Streams.
I changed it and I’m back to my previous output:

– cut –
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “GeForce GTX 660” with compute capability 3.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
– /cut –

I changed setting of both project and matrixMul.cu to have “Generate GPU Debug Information” set as “Yes (-G)” [it’s -G, not -G0] And here is my output from rebuild:

– cut –
1>------ Rebuild All started: Project: matrixMul, Configuration: Debug Win32 ------
1>
1> D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe” -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin” -I…\Common -I…\Common\C99 -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -G --keep-dir “D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\obj\Win32_Debug_vc100” -maxrregcount=0 --machine 32 --compile -g -D_DEBUG -DWIN32 -D_CONSOLE -Xcompiler “/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o “D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\obj\Win32_Debug_vc100\matrixMul.cu.obj” “D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\matrixMul.cu” -clean
1>CUDACOMPILE : nvcc warning : The ‘compute_20’, ‘sm_20’, and ‘sm_21’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> matrixMul.cu
1> Compiling CUDA source file matrixMul.cu…
1>
1> D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe” -gencode=arch=compute_20,code=“sm_20,compute_20” -gencode=arch=compute_30,code=“sm_30,compute_30” -gencode=arch=compute_35,code=“sm_35,compute_35” --use-local-env --cl-version 2015 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin” -I…\Common -I…\Common\C99 -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -G --keep-dir “D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\obj\Win32_Debug_vc100” -maxrregcount=0 --machine 32 --compile -cudart static -g -D_DEBUG -DWIN32 -D_CONSOLE -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o “D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\obj\Win32_Debug_vc100\matrixMul.cu.obj” “D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\matrixMul.cu”
1>CUDACOMPILE : nvcc warning : The ‘compute_20’, ‘sm_20’, and ‘sm_21’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> matrixMul.cu
1> matrixMul_vc100.vcxproj -> D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\bin\Win32_Debug_vc100\matrixMul.exe
1> matrixMul_vc100.vcxproj -> D:\Data\visual-c-wrk\NsightSamples\CUDA\Debugging\Matrix Multiply\bin\Win32_Debug_vc100\matrixMul.pdb (Full PDB)
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
– /cut –

Note it’s compiled as “Debug” “Win32” . I’m not usre if “x64” would have some benefit.

I have breakpoints on lines 78, 84, 115 - all are “full red points” so active but Debugger doesn’t reach them.

Note: my driver is old now
Driver: NVIDIA 384.76 = (22.21.13.8476)
But recommened for Nsight 5.4 is NVIDIA Display Driver version 384.94 or newer.
Installing a new one: 387.92 (newest)

EDIT:
some issue w/ 387.92, so installed 385.69.
But no progress. After rebuild, still the same issue: CUDA debugger does nothing.

Hi,

I really have no idea what’s going on, did you set the bp at cpu code? nsight can only debug the gpu code.

Best Regards
Harry