Possible debugger bug? Debugger doesn't recognize functions instantiated from templates

PTThompson · August 3, 2009, 1:50am

Ok, so my code uses function templates, and when I try to run it on the debugger, if I try to set a break point, e.g. break dlsoda_<myFex, myJex> or even just break dlsoda_, cuda-gdb complains that the function doesn’t exist. I thought at first it might be something to do with having my code split into multiple files, but even after combining all my code into one file, the problem persists. cuda-gdb works properly with the bitreverse.cu sample code included in the instruction manual, but fails anytime I am using a template. Also, and I’m not sure how this is supposed to work, but when I step into a kernel call, I can step up to the point where it makes the first call to one of my template functions, then it just skips over that template function. I’ve found a partial workaround by manually setting a breakpoint by line number (in my all-in-one version of my code), but I’m still confused by this behaviour. I have CUDA installed on a tesla cluster running ( I think) RHEL, and cuda 2.3. I can provide more details, including the full code I’m trying to run, if necessary.

So anyone know if I just have my HUA, or is this a legit bug/absence of feature? If the former, then how would I set a breakpoint for such a function?

Thanks,

Paul

PTThompson · August 3, 2009, 2:12am

Ok, so my code uses function templates, and when I try to run it on the debugger, if I try to set a break point, e.g. break dlsoda_<myFex, myJex> or even just break dlsoda_, cuda-gdb complains that the function doesn’t exist. I thought at first it might be something to do with having my code split into multiple files, but even after combining all my code into one file, the problem persists. cuda-gdb works properly with the bitreverse.cu sample code included in the instruction manual, but fails anytime I am using a template. Also, and I’m not sure how this is supposed to work, but when I step into a kernel call, I can step up to the point where it makes the first call to one of my template functions, then it just skips over that template function. I’ve found a partial workaround by manually setting a breakpoint by line number (in my all-in-one version of my code), but I’m still confused by this behaviour. I have CUDA installed on a tesla cluster running ( I think) RHEL, and cuda 2.3. I can provide more details, including the full code I’m trying to run, if necessary.

So anyone know if I just have my HUA, or is this a legit bug/absence of feature? If the former, then how would I set a breakpoint for such a function?

Thanks,

Paul

Um, a bit more trouble with the debugger: When I’m inside one of my template functions (called from the kernel), and I try to print either the address or the value of one of the pointers that has been passed as a parameter, cuda-gdb crashes to heck.

Breakpoint 1, cuLsoda<myFex, myJex> () at test.cu:1316

1316			goto L601;

Current language:  auto; currently c++

(cuda-gdb) p threadIdx 

$1 = {x = 0, y = 0, z = 0}

(cuda-gdb) p *istate

Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2278: cuda-gdb internal error

Aborted

[ptthomps@adroit-001 gdbtest]$

istate is one of the arguments that I pass to the kernel (copied from host to device using cudaMemcpy), and which then gets passed to my template functions. Furthermore, if I try to print any of the locals that get declared at the start of the function, cuda-gdb complains that ‘No symbol “lf0” in current context.’ (replace lf0 with whatever variable I may try).

Any ideas?

PTThompson · August 3, 2009, 2:38pm

Ok, so I tried stripping out all of the template stuff and just hardcoding for one particular type, recompiled, ran it in the debugger, and it still gives me these errors when it’s running on the device. The code being run is at http://culsoda.googlecode.com/files/testnotemp.cu

Compiling with ‘nvcc testnotemp.cu -arch=sm_13 -o testntdev -g -G’ for the device and ‘nvcc testnotemp.cu -arch=sm_13 -o testnt -g -G -deviceemu’ for emulation.

Running on the device we have:

[codebox][ptthomps@adroit-001 gdbtest]$ cuda-gdb testntdev

NVIDIA ® CUDA Debugger

BETA release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib64/libthread_db.so.1”.

(cuda-gdb) break 275

Breakpoint 1 at 0x4021da: file testnotemp.cu, line 275.

(cuda-gdb) break 1312

Breakpoint 2 at 0x4021f4: file testnotemp.cu, line 1312.

(cuda-gdb) r

Starting program: /home/ptthomps/gdbtest/testntdev

[Thread debugging using libthread_db enabled]

[New process 12478]

[New Thread 47515754291120 (LWP 12478)]

[Switching to Thread 47515754291120 (LWP 12478)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, cuLsoda () at testnotemp.cu:275

275 int kgo = 0;

Current language: auto; currently c++

(cuda-gdb) l

270 double rh = 0.;

271 int mu = 0;

272 double tp = 0.;

273 int lf0 = 0;

274 double big = 0.;

275 int kgo = 0;

276 double ayi = 0.;

277 double hmx = 0.;

278 double tol = 0.;

279 double sum = 0.;

(cuda-gdb) p tp

Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2278: cuda-gdb internal error

Aborted

[ptthomps@adroit-001 gdbtest]$ cuda-gdb testntdev

NVIDIA ® CUDA Debugger

BETA release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib64/libthread_db.so.1”.

(cuda-gdb) break 1312

Breakpoint 1 at 0x4021f4: file testnotemp.cu, line 1312.

(cuda-gdb) r

Starting program: /home/ptthomps/gdbtest/testntdev

[Thread debugging using libthread_db enabled]

[New process 12483]

[New Thread 47320958222256 (LWP 12483)]

[Switching to Thread 47320958222256 (LWP 12483)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, cuLsoda () at testnotemp.cu:1314

1314 goto L601;

Current language: auto; currently c++

(cuda-gdb) l

1309 /* If ISTATE .gt. 1 but the flag INIT shows that initialization has */

1310 /* not yet been done, an error return occurs. */

1311 /* If ISTATE = 1 and TOUT = T, return immediately. */

1312 /* ----------------------------------------------------------------------- */

1313 if (*istate < 1 || *istate > 3) {

1314 goto L601;

1315 }

1316 if (*itask < 1 || *itask > 5) {

1317 goto L602;

1318 }

(cuda-gdb) p *istate

Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2278: cuda-gdb internal error

Aborted

[ptthomps@adroit-001 gdbtest]$

in the second run, *istate should have a value of 1.[/codebox]

running on emulation we have:

[codebox][ptthomps@adroit-001 gdbtest]$ cuda-gdb testnt

NVIDIA ® CUDA Debugger

BETA release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib64/libthread_db.so.1”.

(cuda-gdb) break 275

Breakpoint 1 at 0x40cf72: file testnotemp.cu, line 275.

(cuda-gdb) break 1312

Breakpoint 2 at 0x40d05d: file testnotemp.cu, line 1312.

(cuda-gdb) r

Starting program: /home/ptthomps/gdbtest/testnt

[Thread debugging using libthread_db enabled]

[New process 12489]

[New Thread 47549481721776 (LWP 12489)]

[New Thread 1101580608 (LWP 12492)]

[Switching to Thread 1101580608 (LWP 12492)]

Breakpoint 1, dlsoda_ (f={__dummy = 0 ‘\0’}, neq=0x11d94c00, y=0x11d94a00, t=0x11d94900, tout=0x11d92f00,

itol=0x11d92b00, rtol=0x11d92d00, atol=0x11d92a00, itask=0x11d93000, istate=0x11d93500, iopt=0x11d92c00, 

rwork=0x11d93200, lrw=0x11d94e00, iwork=0x11d93100, liw=0x11d94d00, jac={__dummy = 0 '\0'}, jt=0x11d94b00, 

common=0x11d96400) at testnotemp.cu:275

275 int kgo = 0;

Current language: auto; currently c++

(cuda-gdb) l

270 double rh = 0.;

271 int mu = 0;

272 double tp = 0.;

273 int lf0 = 0;

274 double big = 0.;

275 int kgo = 0;

276 double ayi = 0.;

277 double hmx = 0.;

278 double tol = 0.;

279 double sum = 0.;

(cuda-gdb) p tp

$1 = 0

(cuda-gdb) c

Continuing.

Breakpoint 2, dlsoda_ (f={__dummy = 0 ‘\0’}, neq=0x11d94c00, y=0x11d94a00, t=0x11d94900, tout=0x11d92f00,

itol=0x11d92b00, rtol=0x11d92d00, atol=0x11d92a00, itask=0x11d93000, istate=0x11d93500, iopt=0x11d92c00, 

rwork=0x11d93200, lrw=0x11d94e00, iwork=0x11d93100, liw=0x11d94d00, jac={__dummy = 0 '\0'}, jt=0x11d94b00, 

common=0x11d96400) at testnotemp.cu:1313

1313 if (*istate < 1 || *istate > 3) {

(cuda-gdb) p *istate

$2 = 1

(cuda-gdb) p istate

$3 = (int *) 0x11d93500

(cuda-gdb) q

The program is running. Exit anyway? (y or n) y

[ptthomps@adroit-001 gdbtest]$[/codebox]

It’s running on

[codebox]CUDA Device Query (Runtime API) version (CUDART static linking)

There are 4 devices supporting CUDA

Device 0: “Tesla C1060”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: “Tesla C1060”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Device 2: “Tesla C1060”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Device 3: “Tesla C1060”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

[/codebox]

Any clue what’s going on?

Thanks,

Paul

tmurray · August 3, 2009, 4:47pm

oh yeah, I think I recognize that assertion. debugger guys know about it and are working on a fix.

PTThompson · August 3, 2009, 5:03pm

New developements: if I delete everything from line 1313 to the end of the function dlsoda_, and insert at line 1313 the following:

if (*istate < 1 || *istate > 3) {

		return 0;

	}

	return 1;

} /* dlsoda_ */

then it runs no problem, and in the debugger I can do ‘p *istate’ with the correct result of 1. Is it possible that nvcc and the debugger are choking on all the goto statements that I have in my code?

PTThompson · August 3, 2009, 7:28pm

ok, I don’t know if this is causing the trouble that I’ve been having with the istate variable, but I managed to get the debugger to crash reproducibly using this code:

[codebox]#include <stdio.h>

#include <math.h>

device int dlsoda_(int *istate)

{

int mxhnl0 = 10;

int i__1 = 0;

return 0;

}

global void cuLsoda( int *istate)

{

dlsoda_( istate);

}

int main(void) /* Main program */

{

int *istate;

cudaMallocHost((void**)&istate,sizeof(int));

int	*_Distate;

*istate = 1;

cudaMalloc((void**)&_Distate,sizeof(int));

cudaMemcpy(_Distate,istate,sizeof(int),cudaMemcpyHostToDevic

e);

for (int iout = 1; iout <= 12; ++iout) 

{

	cuLsoda<<<1,1>>>( _Distate);

	cudaMemcpy(istate,_Distate,sizeof(int),cudaMemcpyDeviceToHos

t);

	printf("Exit:\n");

}

}[/codebox]

I’ve put it into a file called gorp2.cu. here is the output from the compile and the crash:

[codebox][ptthomps@adroit-001 gdbtest]$ nvcc gorp2.cu -arch=sm_13 -o gorp2.o -g -G

gorp2.cu(7): warning: variable “mxhnl0” was declared but never referenced

gorp2.cu(8): warning: variable “i__1” was declared but never referenced

gorp2.cu(7): warning: variable “mxhnl0” was declared but never referenced

gorp2.cu(8): warning: variable “i__1” was declared but never referenced

[ptthomps@adroit-001 gdbtest]$ cuda-gdb gorp2.o

NVIDIA ® CUDA Debugger

BETA release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib64/libthread_db.so.1”.

(cuda-gdb) b 7

Breakpoint 1 at 0x400e0a: file gorp2.cu, line 7.

(cuda-gdb) r

Starting program: /home/ptthomps/gdbtest/gorp2.o

[Thread debugging using libthread_db enabled]

[New process 16809]

[New Thread 47052810411952 (LWP 16809)]

[Switching to Thread 47052810411952 (LWP 16809)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, cuLsoda () at gorp2.cu:7

7 int mxhnl0 = 10;

Current language: auto; currently c++

(cuda-gdb) p *istate

$1 = 1

(cuda-gdb) r

The program being debugged has been started already.

Start it from the beginning? (y or n) y

`/tmp/tmp_cudagdb_11358_16808_1383576172.o’ has disappeared; keeping its symbols.

Starting program: /home/ptthomps/gdbtest/gorp2.o

[Thread debugging using libthread_db enabled]

[New process 16813]

[New Thread 47441196830640 (LWP 16813)]

[Switching to Thread 47441196830640 (LWP 16813)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, cuLsoda () at gorp2.cu:7

7 int mxhnl0 = 10;

(cuda-gdb) r

The program being debugged has been started already.

Start it from the beginning? (y or n) y

`/tmp/tmp_cudagdb_11358_16808_1383576172.o’ has disappeared; keeping its symbols.

`/tmp/tmp_cudagdb_11358_16808_2044862037.o’ has disappeared; keeping its symbols.

Starting program: /home/ptthomps/gdbtest/gorp2.o

[Thread debugging using libthread_db enabled]

[New process 16815]

[New Thread 46979883777968 (LWP 16815)]

Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2342: cuda-gdb internal error

Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2413: cuda-gdb internal error[/codebox]

Note that that last error, at line 2413, pumps out hundreds if not thousands of times, and ends with a Segmentation Fault.

If I exclude the ‘p *istate’ from my inputs when I’m running the debugger, I only get the first assertion failure, 2342, and no segfault.

[codebox][ptthomps@adroit-001 gdbtest]$ nvcc gorp2.cu -arch=sm_13 -o gorp2.o -g -G

gorp2.cu(6): warning: variable “mxhnl0” was declared but never referenced

gorp2.cu(7): warning: variable “i__1” was declared but never referenced

gorp2.cu(6): warning: variable “mxhnl0” was declared but never referenced

gorp2.cu(7): warning: variable “i__1” was declared but never referenced

[ptthomps@adroit-001 gdbtest]$ cuda-gdb gorp2.o

NVIDIA ® CUDA Debugger

BETA release

GNU gdb 6.6

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type “show copying” to see the conditions.

There is absolutely no warranty for GDB. Type “show warranty” for details.

This GDB was configured as “x86_64-unknown-linux-gnu”…

Using host libthread_db library “/lib64/libthread_db.so.1”.

(cuda-gdb) b 6

Breakpoint 1 at 0x400e0a: file gorp2.cu, line 6.

(cuda-gdb) r

Starting program: /home/ptthomps/gdbtest/gorp2.o

[Thread debugging using libthread_db enabled]

[New process 16866]

[New Thread 47131675713456 (LWP 16866)]

[Switching to Thread 47131675713456 (LWP 16866)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, cuLsoda () at gorp2.cu:6

6 int mxhnl0 = 10;

Current language: auto; currently c++

(cuda-gdb) r

The program being debugged has been started already.

Start it from the beginning? (y or n) y

`/tmp/tmp_cudagdb_11358_16865_1805747363.o’ has disappeared; keeping its symbols.

Starting program: /home/ptthomps/gdbtest/gorp2.o

[Thread debugging using libthread_db enabled]

[New process 16870]

[New Thread 47323864711088 (LWP 16870)]

r[Switching to Thread 47323864711088 (LWP 16870)]

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, cuLsoda () at gorp2.cu:6

6 int mxhnl0 = 10;

(cuda-gdb) r

The program being debugged has been started already.

Start it from the beginning? (y or n) y

`/tmp/tmp_cudagdb_11358_16865_1805747363.o’ has disappeared; keeping its symbols.

`/tmp/tmp_cudagdb_11358_16865_2128825692.o’ has disappeared; keeping its symbols.

Starting program: /home/ptthomps/gdbtest/gorp2.o

[Thread debugging using libthread_db enabled]

[New process 16872]

[New Thread 47875464417200 (LWP 16872)]

Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2342: cuda-gdb internal error

Aborted[/codebox]

The machine it’s running on is as in the device query above.

Thanks,

Paul

PTThompson · August 3, 2009, 7:39pm

Sorry, just noticed your reply. Is this purely a debugger problem, or does it affect actual code execution too?

Thanks

Paul

tmurray · August 3, 2009, 8:35pm

Purely a debugger problem.

Topic		Replies	Views
Cuda-gdb internal-error of copy_type on basic fortran example CUDA-GDB cuda-gdb	5	980	January 2, 2024
Warning: Cuda API error detected: cuModuleLoadFatBinary returned (0xd1) CUDA-GDB	25	1098	July 30, 2024
Cuda-gdb deadlock in pgi_managed_new CUDA-GDB	15	1083	November 13, 2023
Cuda-gdb aborted CUDA-GDB	7	145	November 24, 2024
attach cuda-gdb to a running process failed CUDA-GDB	10	3115	November 29, 2017
cuda-gdb detach causes "unspecified launch failure" Jetson TX1	7	1312	October 26, 2016
Anomalies with __device__ functions. Or is cuda-gdb playing stupid? CUDA Programming and Performance	0	3500	November 9, 2011
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53761	August 1, 2011
Cudbgprocess CUDA Programming and Performance	22	2877	August 2, 2022
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64341	April 20, 2011

Possible debugger bug? Debugger doesn't recognize functions instantiated from templates

Related topics