Getting into a CUDA Subprogram why is the debuuger stepping over cuds subprograms?

#include <stdio.h>
#include <stdlib.h>

// Simple 8-bit bit reversal Compute test

#define N 256

global void bitreverse(unsigned int *data)s
unsigned int *idata = data;

unsigned int x = idata[threadIdx.x];

x = ((0xf0f0f0f0 & x) >> 4) | ((0x0f0f0f0f & x) << 4);
x = ((0xcccccccc & x) >> 2) | ((0x33333333 & x) << 2);
x = ((0xaaaaaaaa & x) >> 1) | ((0x55555555 & x) << 1);

idata[threadIdx.x] = x;


int main(void)
unsigned int *d = NULL; int i;
unsigned int idata[N], odata[N];

for (i = 0; i < N; i++) 
    idata[i] = (unsigned int)i;

cudaMalloc((void**)&d, sizeof(int)*N);
cudaMemcpy(d, idata, sizeof(int)*N,

bitreverse<<<1, N>>>(d);

cudaMemcpy(odata, d, sizeof(int)*N,

for (i = 0; i < N; i++)
    printf("%u -> %u\n", idata[i], odata[i]);

return 0;


The above code is from the CUDA-GDB manual V2.3, June 2009. It is designed so a student can step through the execution lines and understand what goes on in a CUDA program. I cannot get into the bitreverse subprogram. The command

break bitreverse

if it is the only break command in the program, does not stop the program’s execution in the bitreverse subprogram. What am I doing wrong? The CUDA-GDB manual states clearly that you cannot step over subroutines, yet that is apparently what I am doing. It seems even when I step one through the program one executable line at a time that I am still not going into the bitreverse subprogram. I want to learn abut CUDA code and that I why I want this. I already know about c code. It goes into c subprograms, but not CUDA subprograms.

I compile with

nvcc -g -G -o bitreverse

and then run

cuda -gdb bitreverse

then once in the debugger I type

break bitreverse

it responds as it says in the manual, but it still it does not go into CUDA subprograms.

why is this?



You have to start the program inside cuda-gdb. Try typing


I believe that is what i am doing. I compile

nvcc -g -G -o cuda


cuda-gdb cuda


breakpoint bitreverse



Note bitreverse is the CUDA subprogram. Now if I put a breakpoint over in the CUDA subprogram, it is ignored. it goes to all of the c subprograms and the main - not the CUDA subprograms.

I just do not know why.



It works for me.

avid@cuda:~$ /opt/cuda/bin/nvcc -g -G -o yunker

avid@cuda:~$ LD_LIBRARY_PATH=/opt/cuda/lib64 /opt/cuda/bin/cuda-gdb ./yunker

NVIDIA (R) CUDA Debugger

BETA release

Portions Copyright (C) 2008,2009 NVIDIA Corporation

GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "x86_64-unknown-linux-gnu"...

Using host libthread_db library "/lib/".

(cuda-gdb) list 

14	x = ((0xf0f0f0f0 & x) >> 4) | ((0x0f0f0f0f & x) << 4);

15	x = ((0xcccccccc & x) >> 2) | ((0x33333333 & x) << 2);

16	x = ((0xaaaaaaaa & x) >> 1) | ((0x55555555 & x) << 1);


18	idata[threadIdx.x] = x;

19	}


21	int main(void)

22	{

23	unsigned int *d = NULL; int i;


24	unsigned int idata[N], odata[N];


26	for (i = 0; i < N; i++)

27	idata[i] = (unsigned int)i;


29	cudaMalloc((void**)&d, sizeof(int)*N);

30	cudaMemcpy(d, idata, sizeof(int)*N,

31	cudaMemcpyHostToDevice);


33	bitreverse<<<1, N>>>(d);

(cuda-gdb) break bitreverse

Breakpoint 1 at 0x40ee29: file, line 10.

(cuda-gdb) run

Starting program: /home/avid/yunker 

[Thread debugging using libthread_db enabled]

[New process 32208]

[New Thread 140040012605184 (LWP 32208)]

Warning: a GPU was made unavailable to the application due to debugging

constraints.  This may change the application behaviour!

[Switching to Thread 140040012605184 (LWP 32208)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, bitreverse () at

10	unsigned int *idata = data;

Current language:  auto; currently c++

(cuda-gdb) list


6	#define N 256


8	__global__ void bitreverse(unsigned int *data)

9	{

10	unsigned int *idata = data;


12	unsigned int x = idata[threadIdx.x];


14	x = ((0xf0f0f0f0 & x) >> 4) | ((0x0f0f0f0f & x) << 4);


The GPU you run cuda-gdb on must not have a display manager running on it. Are you trying to run the debugger on a card shared with X11?

It happens to me too, a quick workaround is to break on line numbers, that is type

b filename:linenumber

into gdb.