newbie struggling to get cuda-gdb to run example is CUDA-GDB user manual Problem getting cuda-gdb to

jmswanner · October 30, 2011, 11:01pm

I am trying to troubleshoot why I cannot get the CUDA-GDB 3.2 to perform properly. I’m getting bizarre errors whenever I try to run the cuda-gdb. I read in the user manual that X11 cannot be running on the GPU that is used for debugging because the debugger effectively make the GPU look hung to the X server, resulting in a deadlock or crash.

I also did read the note that "the CUDA driver automatically excludes the device used by X11 from being picked by the application being debugged. This can change the behavior of the application.

I’m wondering if its a problem with the fact that the CUDA Driver Version is 4.0 and the CUDA Runtime Version is 3.2.
I’m also wondering if its a problem possibly with the driver not correctly selecting the proper device while debugging. Maybe the debugger is choosing the device running the X Server?

I did find the Nvidia X Server Settings. I confirmed that the Quadro FX 380 is the GPU for the X Screen: Screen 0 and has the monitors listed in the Display Devices field.
The GeForce GTX 260 does not have any X Ccreens listed nor any display devices.

I’ve also noticed that when the debugger outputs “__cuda_0=0x0”. I was wondering if this has anything to do with the known issue in Appendix B that mentions that “debugging applications using textures is not supported on GPUs with sm_type less than sm_20”.

I’ve been able to compile and run the sample programs. I’ve ran the deviceQuery and bandwidthTest.

The output is below. I also will listed some of the output from trying to use the debugger as provided in the CUDA-GDB user manual.
I will continue to research this problem, but any advice or a pointer in the right direction of understanding the problem is much appreciated.
Thank you!

Device 0: “GeForce GTX 260”
CUDA Driver Version: 4.0
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 939327488 bytes
Multiprocessors x Cores/MP = Cores: 27 (MP) x 8 (Cores/MP) = 216 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.35 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No

Device 1: “Quadro FX 380”
CUDA Driver Version: 4.0
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 267714560 bytes
Multiprocessors x Cores/MP = Cores: 2 (MP) x 8 (Cores/MP) = 16 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.10 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No

I’ve ran the bandwidth test:
Running on…

Device 0: GeForce GTX 260
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4337.3

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3781.1

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 96744.1

[bandwidthTest] - Test results:
PASSED

[jjackson@l-lnx101 CUDA-GDB]$ nvcc -g -G bitreverse.cu -o bitreverse

[jjackson@l-lnx101 CUDA-GDB]$ cuda-gdb bitreverse

NVIDIA (R) CUDA Debugger

3.2 release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib64/libthread_db.so.1”.

(cuda-gdb)

(cuda-gdb) b main

Breakpoint 1 at 0x400c30: file bitreverse.cu, line 28.

(cuda-gdb) b bitreverse

Breakpoint 2 at 0x401ec7: file bitreverse.cu, line 28.

(cuda-gdb) b 21

Breakpoint 3 at 0x400c24: file bitreverse.cu, line 21.

(cuda-gdb) r

Starting program: /mnt/nfs/netapp2/grad/jjackson/cuda-examples-3.2/CUDA-GDB/bitreverse

BFD: /lib64/libc.so.6: invalid relocation type 37

BFD: BFD 2.17.50 assertion fail /home/buildmeister/build/rel/gpgpu/toolkit/r3.2/debugger/cuda-gdb/bfd/elf64-x86-64.c:259

BFD: /lib64/libc.so.6: invalid relocation type 37

BFD: BFD 2.17.50 assertion fail /home/buildmeister/build/rel/gpgpu/toolkit/r3.2/debugger/cuda-gdb/bfd/elf64-x86-64.c:259

[Thread debugging using libthread_db enabled]

[New process 4922]

[New Thread 139669044221760 (LWP 4922)]

[Switching to Thread 139669044221760 (LWP 4922)]

Breakpoint 3, main () at bitreverse.cu:24

24 int main(void) {

(cuda-gdb) c

Continuing.

Breakpoint 3, main () at bitreverse.cu:24

24 int main(void) {

(cuda-gdb) c

Continuing.

Breakpoint 3, main () at bitreverse.cu:24

24 int main(void) {

(cuda-gdb) c

Continuing.

Breakpoint 1, main () at bitreverse.cu:28

28 for (i = 0; i < N; i++)

(cuda-gdb) c

Continuing.

Breakpoint 1, main () at bitreverse.cu:28

28 for (i = 0; i < N; i++)

(cuda-gdb) c

Continuing.

Breakpoint 2, 0x0000000000401ec7 in bitreverse (__cuda_0=0x0)

at bitreverse.cu:28

28 for (i = 0; i < N; i++)

Breakpoint 2, 0x0000000000401ec7 in bitreverse (__cuda_0=0x0)

at bitreverse.cu:28

28 for (i = 0; i < N; i++)

(cuda-gdb) c

Continuing.

0 → 0

1 → 128

2 → 64

3 → 192

4 → 32

5 → 160

6 → 96

… (editted to shorten)
253 → 191

254 → 127

255 → 255

Program exited normally.

mkaushik · November 1, 2011, 6:16pm

Since the assertion is in elfx86-64.c, relocation type 37 is most likely for: R_X86_64_IRELATIVE. It looks like that was introduced pretty recently in /usr/include/elf.h on most linux distros. So, this looks like a case of running cuda-gdb 3.2 on a newer (and unsupported at the time of 3.2) distro – so question 1: What distribution are you running on?

The second issue could be caused by the first issue, though.

Upgrading to the latest cuda-gdb will probably fix these items.

Topic		Replies	Views
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53584	August 1, 2011
cuda-gdb hang and compiled program spewing nonsense CUDA Programming and Performance	7	2244	February 15, 2011
Should we expect cuda-gdb to repeatedly allocate and deallocate memory on the fly? CUDA-GDB	7	689	May 17, 2021
Cuda-gdb internal-error of copy_type on basic fortran example CUDA-GDB cuda-gdb	5	955	January 2, 2024
Getting into a CUDA Subprogram why is the debuuger stepping over cuds subprograms? CUDA Programming and Performance	4	7067	November 20, 2009
Possible debugger bug? Debugger doesn't recognize functions instantiated from templates CUDA Programming and Performance	7	2970	August 3, 2009
Anomalies with __device__ functions. Or is cuda-gdb playing stupid? CUDA Programming and Performance	0	3499	November 9, 2011
cuda-gdb hangs in the CUDA 2.3 beta CUDA Programming and Performance	0	1114	June 30, 2009
When using cuda-gdb for debugging in CLion, setting a breakpoint within a kernel function causes CLion to freeze CUDA-GDB	2	220	July 30, 2024
cuda-gdb error CUDA Setup and Installation	15	3104	September 12, 2019

newbie struggling to get cuda-gdb to run example is CUDA-GDB user manual Problem getting cuda-gdb to

Related topics