Ok, so I tried stripping out all of the template stuff and just hardcoding for one particular type, recompiled, ran it in the debugger, and it still gives me these errors when it’s running on the device. The code being run is at http://culsoda.googlecode.com/files/testnotemp.cu
Compiling with ‘nvcc testnotemp.cu -arch=sm_13 -o testntdev -g -G’ for the device and ‘nvcc testnotemp.cu -arch=sm_13 -o testnt -g -G -deviceemu’ for emulation.
Running on the device we have:
[codebox][ptthomps@adroit-001 gdbtest]$ cuda-gdb testntdev
NVIDIA ® CUDA Debugger
BETA release
Portions Copyright © 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”…
Using host libthread_db library “/lib64/libthread_db.so.1”.
(cuda-gdb) break 275
Breakpoint 1 at 0x4021da: file testnotemp.cu, line 275.
(cuda-gdb) break 1312
Breakpoint 2 at 0x4021f4: file testnotemp.cu, line 1312.
(cuda-gdb) r
Starting program: /home/ptthomps/gdbtest/testntdev
[Thread debugging using libthread_db enabled]
[New process 12478]
[New Thread 47515754291120 (LWP 12478)]
[Switching to Thread 47515754291120 (LWP 12478)]
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
Breakpoint 1, cuLsoda () at testnotemp.cu:275
275 int kgo = 0;
Current language: auto; currently c++
(cuda-gdb) l
270 double rh = 0.;
271 int mu = 0;
272 double tp = 0.;
273 int lf0 = 0;
274 double big = 0.;
275 int kgo = 0;
276 double ayi = 0.;
277 double hmx = 0.;
278 double tol = 0.;
279 double sum = 0.;
(cuda-gdb) p tp
Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2278: cuda-gdb internal error
Aborted
[ptthomps@adroit-001 gdbtest]$ cuda-gdb testntdev
NVIDIA ® CUDA Debugger
BETA release
Portions Copyright © 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”…
Using host libthread_db library “/lib64/libthread_db.so.1”.
(cuda-gdb) break 1312
Breakpoint 1 at 0x4021f4: file testnotemp.cu, line 1312.
(cuda-gdb) r
Starting program: /home/ptthomps/gdbtest/testntdev
[Thread debugging using libthread_db enabled]
[New process 12483]
[New Thread 47320958222256 (LWP 12483)]
[Switching to Thread 47320958222256 (LWP 12483)]
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
Breakpoint 1, cuLsoda () at testnotemp.cu:1314
1314 goto L601;
Current language: auto; currently c++
(cuda-gdb) l
1309 /* If ISTATE .gt. 1 but the flag INIT shows that initialization has */
1310 /* not yet been done, an error return occurs. */
1311 /* If ISTATE = 1 and TOUT = T, return immediately. */
1312 /* ----------------------------------------------------------------------- */
1313 if (*istate < 1 || *istate > 3) {
1314 goto L601;
1315 }
1316 if (*itask < 1 || *itask > 5) {
1317 goto L602;
1318 }
(cuda-gdb) p *istate
Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r190/r190_00/drivers/gpgpu/cuda/src/debugger/cudbgtarget.c, line 2278: cuda-gdb internal error
Aborted
[ptthomps@adroit-001 gdbtest]$
in the second run, *istate should have a value of 1.[/codebox]
running on emulation we have:
[codebox][ptthomps@adroit-001 gdbtest]$ cuda-gdb testnt
NVIDIA ® CUDA Debugger
BETA release
Portions Copyright © 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”…
Using host libthread_db library “/lib64/libthread_db.so.1”.
(cuda-gdb) break 275
Breakpoint 1 at 0x40cf72: file testnotemp.cu, line 275.
(cuda-gdb) break 1312
Breakpoint 2 at 0x40d05d: file testnotemp.cu, line 1312.
(cuda-gdb) r
Starting program: /home/ptthomps/gdbtest/testnt
[Thread debugging using libthread_db enabled]
[New process 12489]
[New Thread 47549481721776 (LWP 12489)]
[New Thread 1101580608 (LWP 12492)]
[Switching to Thread 1101580608 (LWP 12492)]
Breakpoint 1, dlsoda_ (f={__dummy = 0 ‘\0’}, neq=0x11d94c00, y=0x11d94a00, t=0x11d94900, tout=0x11d92f00,
itol=0x11d92b00, rtol=0x11d92d00, atol=0x11d92a00, itask=0x11d93000, istate=0x11d93500, iopt=0x11d92c00,
rwork=0x11d93200, lrw=0x11d94e00, iwork=0x11d93100, liw=0x11d94d00, jac={__dummy = 0 '\0'}, jt=0x11d94b00,
common=0x11d96400) at testnotemp.cu:275
275 int kgo = 0;
Current language: auto; currently c++
(cuda-gdb) l
270 double rh = 0.;
271 int mu = 0;
272 double tp = 0.;
273 int lf0 = 0;
274 double big = 0.;
275 int kgo = 0;
276 double ayi = 0.;
277 double hmx = 0.;
278 double tol = 0.;
279 double sum = 0.;
(cuda-gdb) p tp
$1 = 0
(cuda-gdb) c
Continuing.
Breakpoint 2, dlsoda_ (f={__dummy = 0 ‘\0’}, neq=0x11d94c00, y=0x11d94a00, t=0x11d94900, tout=0x11d92f00,
itol=0x11d92b00, rtol=0x11d92d00, atol=0x11d92a00, itask=0x11d93000, istate=0x11d93500, iopt=0x11d92c00,
rwork=0x11d93200, lrw=0x11d94e00, iwork=0x11d93100, liw=0x11d94d00, jac={__dummy = 0 '\0'}, jt=0x11d94b00,
common=0x11d96400) at testnotemp.cu:1313
1313 if (*istate < 1 || *istate > 3) {
(cuda-gdb) p *istate
$2 = 1
(cuda-gdb) p istate
$3 = (int *) 0x11d93500
(cuda-gdb) q
The program is running. Exit anyway? (y or n) y
[ptthomps@adroit-001 gdbtest]$[/codebox]
It’s running on
[codebox]CUDA Device Query (Runtime API) version (CUDART static linking)
There are 4 devices supporting CUDA
Device 0: “Tesla C1060”
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.44 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Device 1: “Tesla C1060”
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.44 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Device 2: “Tesla C1060”
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.44 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Device 3: “Tesla C1060”
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.44 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Test PASSED
[/codebox]
Any clue what’s going on?
Thanks,
Paul