Hi,
I’m having a debugging problem and I cannot understand what is happening. Take as example the trivial code
attached in the file testkernel.cu.
I compile this source file with “nvcc -arch=sm_13 -g -G testkernel.cu -o testkernel” and if I execute it I get the
expected result but when i try to debug it with cuda-gdb the debugging process remains blocked when calling
cudaMalloc. Here you can see what i get from my terminal:
[codebox]
user@server:~$ ./testkernel
0 0
1 3
2 6
3 9
4 12
5 15
6 18
7 21
8 24
9 27
user@server:~$ cuda-gdb testkernel
NVIDIA ® CUDA Debugger
BETA release
Portions Copyright © 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”…
Using host libthread_db library “/lib/libthread_db.so.1”.
(cuda-gdb) break main
Breakpoint 1 at 0x417c97: file testkernel.cu, line 13.
(cuda-gdb) run
Starting program: /home/vincenzi/testkernel
Breakpoint 1 at 0x417c8b: file testkernel.cu, line 11.
Breakpoint 1 at 0x417c97: file testkernel.cu, line 13.
[Thread debugging using libthread_db enabled]
[New process 13431]
[New Thread 140190674859776 (LWP 13431)]
[Switching to Thread 140190674859776 (LWP 13431)]
Breakpoint 1, main () at testkernel.cu:13
13 cudaError error = cudaSetDevice(0);
Current language: auto; currently c++
(cuda-gdb) next
Warning: a GPU was made unavailable to the application due to debugging
constraints. This may change the application behaviour!
15 if (error != cudaSuccess)
(cuda-gdb) next
22 if ( cudaMalloc ((void **) &device_x, N * sizeof(double)) != cudaSuccess)
(cuda-gdb) next
^C
Program received signal SIGINT, Interrupt.
0x00007f80adc27e48 in ?? () from /usr/lib/libcuda.so
(cuda-gdb)
[/codebox]
It remains blocked and nothing happens until I force termination with the CNTRL-C signal. What is happening?
Why the normal execution works fine but the debugging doesn’t? I’m actually running the code on a Linux server
with two GEOFORCE GTX 295:
[codebox]
user@server:~$ ./NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA
Device 0: “GeForce GTX 295”
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 938803200 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.24 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Device 1: “GeForce GTX 295”
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939261952 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.24 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Test PASSED
Press ENTER to exit…
user@server:~$
[/codebox]
Thank u in advance ,
Alessandro Vincenzi