Hello,
My setup:
I’m a cuda beginner and I started to use nsight to exploit the debugging capabilities of
nsight (eclipse edition). Currently I’ve installed cuda 7.5 toolkit on linux Ubuntu 14.04. I’ve also
tried some programs involving the thrust library and some other example to be shure that
my device is cuda capable and works correctly. Furthermore, I’ve updated the GDB version to 7.8.
My project:
currently I’m working on a pretty large project written in cuda-C++, involving a lot of classes.
I’ve just imported the project on nsight and built it without problems, using th project’s makefile.
The project run correctly and display the correct output. The only problem that arises is
the pretty common "method … cannot be resolved " but I read about it and soon I will fix it.
So, I can say that my project run correctly.
My goal:
I want to learn how to use the nsight debugger, and use it while developing my project.
To do so
I read the help on the nsight integraterd guide and I learned some basic concempt.
My problem:
I inserted a simple breackpoint on a line that executes on CPU and then
I launch the debugger on my project. In the breakpoint view on the debugger I
see the breakpoint I’ve just added, and it is selected.
When I launch the debugger, it stops at the first main funcion, at the beginning (I noticed
this feature in the settings and I’m ok with that), but when i click on step-into, or resume
it executes the program without stopping at the breackpoint. I’m also shure
that the line on wich the break point is, is executed. The debugger simply doesn’t stop at
the break point.
My question:
What shoud I do to make the debugger stop on breakpoints?
Further Informations:
I’m running the nsight debugger on a SINGLE GPU and I’ve activated the option
“Enable CUDA software preemption debugging” and my GPU is enabled.
The instruction stepping mode works and I can see the changing registers.
If you need more info on my GPU…
By running the deviceQuery sample I obtain this output…
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: “GeForce GTX 750 Ti”
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2047 MBytes (2146762752 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1110 MHz (1.11 GHz)
Memory Clock rate: 2700 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 750 Ti
Result = PASS
Best Regards.
Max