cuda-gdb misses breakpoints depending on "compute capability" cuda-gdb, breakpoint, misses

Basically, depending on a single compilation flag (–gpu-architecture compute_13), cuda-gdb will miss breakpoints in kernels.

I have a trivial cuda program,, with a single kernel.

#include <stdlib.h>

#include <stdio.h>

__global__ void kernel()


	int a = 1;

	a = a + 2;


int main(void)


	kernel<<<1, 1>>>();

	return 0;


I compile it in two different ways (one with compute capability 1.3 turned on, and one without):

nvcc -g -G -o test
nvcc -g -G --gpu-architecture compute_13 -o test_err

If I debug ‘test’ with cuda-gdb and place a breakpoint in the kernel, all is well.

If I do the same in ‘test_err’, the debugger will miss the breakpoint.

That is:

> cuda-gdb ./test

cuda-gdb> break

cuda-gdb> run

--- the debugger stops at the breakpoint as intended ---

cuda-gdb> q

Same thing, but with ‘test_err’:

> cuda-gdb ./test_err

cuda-gdb> break

cuda-gdb> run

--- the debugger goes past the breakpoint and consequently finishes execution ---

cuda-gdb> q


My system is Linux IA64 with a Tesla C1060 card in it.

> nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2009 NVIDIA Corporation

Built on Thu_Jul_30_09:24:36_PDT_2009

Cuda compilation tools, release 2.3, V0.2.1221
> cuda-gdb --version

NVIDIA (R) CUDA Debugger

BETA release

Portions Copyright (C) 2008,2009 NVIDIA Corporation

GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "x86_64-unknown-linux-gnu".
> cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  190.18  Wed Jul 22 15:36:09 PDT 2009

GCC version:  gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)

(Please tell me if you need more information.)


Thankful for any ideas.

Your example isn’t a very useful one - Open64 has a very aggressive dead code removal optimizer and it is likely that your kernel is getting optimized to a null kernel. I wouldn’t expect gdb to work in that case.

Also, I am a bit surprised you can run CUDA on Itanium, or do you mean you are using EMT64/x86_64?

> uname -a

Linux tesla 2.6.27-7-server #1 SMP Tue Nov 4 20:16:57 UTC 2008 x86_64 GNU/Linux

So I guess x86_64. I thought IA64 meant “(generic) Intel Architecture (64-bit)”.

Regarding your other point: the example is of course distilled from a non-trivial example which cannot have been compiled to a null-kernel.

In any case, the breakpoint get’s hit in ‘test’, which I cannot imagine get’s less attention from the code removal optimizer. (But I don’t know for sure, of course.)

Do you think the code removal optimizer works differently just because I add “–gpu-architecture compute_13” when I compile?

(That’s the only difference between ‘test’ and ‘test_err’.)

If so, I would of course be happy to supply a less trivial example.

It looks like a compiler/debugger bug. We are investigating the problem.

compute_13 only generates the PTX with no debugging information.
The fix is to replace compute_13 with sm_13.