Cannot debug cuda application

Hi,

I’m having a debugging problem and I cannot understand what is happening. Take as example the trivial code

attached in the file testkernel.cu.

I compile this source file with “nvcc -arch=sm_13 -g -G testkernel.cu -o testkernel” and if I execute it I get the

expected result but when i try to debug it with cuda-gdb the debugging process remains blocked when calling

cudaMalloc. Here you can see what i get from my terminal:

[codebox]

user@server:~$ ./testkernel

0 0

1 3

2 6

3 9

4 12

5 15

6 18

7 21

8 24

9 27

user@server:~$ cuda-gdb testkernel

NVIDIA ® CUDA Debugger

BETA release

Portions Copyright © 2008,2009 NVIDIA Corporation

GNU gdb 6.6

Copyright © 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib/libthread_db.so.1”.

(cuda-gdb) break main

Breakpoint 1 at 0x417c97: file testkernel.cu, line 13.

(cuda-gdb) run

Starting program: /home/vincenzi/testkernel

Breakpoint 1 at 0x417c8b: file testkernel.cu, line 11.

Breakpoint 1 at 0x417c97: file testkernel.cu, line 13.

[Thread debugging using libthread_db enabled]

[New process 13431]

[New Thread 140190674859776 (LWP 13431)]

[Switching to Thread 140190674859776 (LWP 13431)]

Breakpoint 1, main () at testkernel.cu:13

13 cudaError error = cudaSetDevice(0);

Current language: auto; currently c++

(cuda-gdb) next

Warning: a GPU was made unavailable to the application due to debugging

constraints. This may change the application behaviour!

15 if (error != cudaSuccess)

(cuda-gdb) next

22 if ( cudaMalloc ((void **) &device_x, N * sizeof(double)) != cudaSuccess)

(cuda-gdb) next

^C

Program received signal SIGINT, Interrupt.

0x00007f80adc27e48 in ?? () from /usr/lib/libcuda.so

(cuda-gdb)

[/codebox]

It remains blocked and nothing happens until I force termination with the CNTRL-C signal. What is happening?

Why the normal execution works fine but the debugging doesn’t? I’m actually running the code on a Linux server

with two GEOFORCE GTX 295:

[codebox]

user@server:~$ ./NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: “GeForce GTX 295”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 938803200 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.24 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: “GeForce GTX 295”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 939261952 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.24 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit…

user@server:~$

[/codebox]

Thank u in advance ,

Alessandro Vincenzi