Anomalies with __device__ functions. Or is cuda-gdb playing stupid?

Dear fellow users and gurus,

I’m porting an application to CUDA C and I’ve come across a strange problem. According to gdb (non-cuda flavour), when my program returns from executing a global kernel, the results of kernel’s calculations are garbage, and the next call to any CUDA function results in a cudaLaunchFailure return value. With some deeper debugging I’ve managed to locate the trouble source, it’s that device functions do not work properly. Seems like they never even execute, and their return value is just substituted with 0. And what is worse, further program flow seems to get completely screwed over.

(cuda-gdb) b RKF_Nonshared							/* RKF_Nonshared is the kernel */

Breakpoint 1 at 0x404a28: file rate.cu, line 33.

(cuda-gdb) r

Starting program: /home/mrhn/Documents/calc/final/kinet-cuda_debug 3000.cfg

[Thread debugging using libthread_db enabled]

[New process 2590]

[New Thread 140433547200288 (LWP 2590)]

[New Thread 140433512212224 (LWP 2596)]

[Context Create of context 0x9929a0 on Device 0]

[New Thread 140433484781312 (LWP 2597)]

[Launch of CUDA Kernel 0 (RKF_Nonshared<<<(1,1,1),(216,1,1)>>>) on Device 0]

[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Breakpoint 1, RKF_Nonshared<<<(1,1,1),(216,1,1)>>> (x=0) at rate.cu:37

37		if ( x == 0 )

(cuda-gdb) n

38			i = blockIdx.x * blockDim.x + threadIdx.x;

(cuda-gdb) n

45		Dev_Calc[i].r[0] = Rate_Nonshared( i, 0., 0. );			/* Rate_Nonshared is a __device__ function */

(cuda-gdb) s

[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (96,0,0), device 0, sm 0, warp 3, lane 0]

46		k1 = Msc.v * Msc.h * Dev_Calc[i].r[0];				/* What? Shouldn't the debugger actually step into that function??? */

(cuda-gdb) n

48		temp_extra = k1/4.; new_v = XTR_V(temp_extra);

(cuda-gdb) n

563 return a / b;								/* WHAT. There is NO such statement in my ENTIRE program!!! *

(cuda-gdb) WTF???	/* WTF indeed */				 * Where in the hell did this come from???			*/

(cuda-gdb) n

48		temp_extra = k1/4.; new_v = XTR_V(temp_extra);			/* And why is doing the same statement 48 again? */

(cuda-gdb) n

563 return a / b;								/* This nonexistent **** statement again */

(cuda-gdb) n

48		temp_extra = k1/4.; new_v = XTR_V(temp_extra);			/* More of the same nonsense follows */

(cuda-gdb) n

49		k2 = new_v * Msc.h * Rate_Nonshared( i, temp_extra, new_v );

(cuda-gdb) n

[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (160,0,0), device 0, sm 0, warp 5, lane 0]

51		temp_extra = k1*3./32. + k2*9./32.; new_v = XTR_V(temp_extra);

(cuda-gdb) n

563 return a / b;

(cuda-gdb) n

51		temp_extra = k1*3./32. + k2*9./32.; new_v = XTR_V(temp_extra);

(cuda-gdb) n

563 return a / b;

(cuda-gdb) n

51		temp_extra = k1*3./32. + k2*9./32.; new_v = XTR_V(temp_extra);

(cuda-gdb) ne[Ks

563 return a / b;

(cuda-gdb) s

51		temp_extra = k1*3./32. + k2*9./32.; new_v = XTR_V(temp_extra);

(cuda-gdb) s

52		k3 = new_v * Msc.h * Rate_Nonshared( i, temp_extra, new_v );

(cuda-gdb) s

54		temp_extra = k1*1932./2197. - k2*7200./2197. + k3*7296./2197.; new_v = XTR_V(temp_extra);

(cuda-gdb) s

563 return a / b;

(cuda-gdb) s

54		temp_extra = k1*1932./2197. - k2*7200./2197. + k3*7296./2197.; new_v = XTR_V(temp_extra);

(cuda-gdb) s

563 return a / b;

(cuda-gdb) s

54		temp_extra = k1*1932./2197. - k2*7200./2197. + k3*7296./2197.; new_v = XTR_V(temp_extra);

(cuda-gdb) s

563 return a / b;

(cuda-gdb) s

54		temp_extra = k1*1932./2197. - k2*7200./2197. + k3*7296./2197.; new_v = XTR_V(temp_extra);

(cuda-gdb) s

563 return a / b;

(cuda-gdb) s

54		temp_extra = k1*1932./2197. - k2*7200./2197. + k3*7296./2197.; new_v = XTR_V(temp_extra);

(cuda-gdb) s

55		k4 = new_v * Msc.h * Rate_Nonshared( i, temp_extra, new_v );

(cuda-gdb) s

Program received signal SIGTRAP, Trace/breakpoint trap.

[Switching to Thread 140433547200288 (LWP 2590)]

0x00007fb939cda396 in ?? () from /usr/lib64/libcuda.so

(cuda-gdb) OH GREAT

(cuda-gdb) bt

#0 0x00007fb939cda396 in ?? () from /usr/lib64/libcuda.so

#1 0x00007fb939cc446f in ?? () from /usr/lib64/libcuda.so

#2 0x00007fb939ca653a in ?? () from /usr/lib64/libcuda.so

#3 0x00007fb939c8e990 in ?? () from /usr/lib64/libcuda.so

#4 0x00007fb93b4bfde9 in ?? () from /usr/local/cuda/lib64/libcudart.so.4

#5 0x00007fb93b4f29bc in cudaStreamSynchronize () from /usr/local/cuda/lib64/libcudart.so.4

#6 0x0000000000403d70 in RKF () at integration.cu:391

#7 0x00000000004043ce in Fork (routine_ptr=0x403bd7 <RKF()>) at integration.cu:501

#8 0x00000000004026be in Start_Calculation () at integration.cu:52

#9 0x00000000004014e9 in main (argc=2, argv=0x7fffbdecec18) at main.cu:105

(cuda-gdb) finish

Run till exit from #0 0x00007fb939cda396 in ?? () from /usr/lib64/libcuda.so

[Termination of CUDA Kernel 0 (RKF_Nonshared<<<(1,1,1),(216,1,1)>>>) on Device 0]

0x00007fb939cc446f in ?? () from /usr/lib64/libcuda.so

(cuda-gdb) finish

Run till exit from #0 0x00007fb939cc446f in ?? () from /usr/lib64/libcuda.so

0x00007fb939ca653a in ?? () from /usr/lib64/libcuda.so

(cuda-gdb) finish

Run till exit from #0 0x00007fb939ca653a in ?? () from /usr/lib64/libcuda.so

0x00007fb939c8e990 in ?? () from /usr/lib64/libcuda.so

(cuda-gdb) finish

Run till exit from #0 0x00007fb939c8e990 in ?? () from /usr/lib64/libcuda.so

0x00007fb93b4bfde9 in ?? () from /usr/local/cuda/lib64/libcudart.so.4

(cuda-gdb) finish

Run till exit from #0 0x00007fb93b4bfde9 in ?? () from /usr/local/cuda/lib64/libcudart.so.4

0x00007fb93b4f29bc in cudaStreamSynchronize () from /usr/local/cuda/lib64/libcudart.so.4

(cuda-gdb) bt

#0 0x00007fb93b4f29bc in cudaStreamSynchronize () from /usr/local/cuda/lib64/libcudart.so.4

#1 0x0000000000403d70 in RKF () at integration.cu:391

#2 0x00000000004043ce in Fork (routine_ptr=0x403bd7 <RKF()>) at integration.cu:501

#3 0x00000000004026be in Start_Calculation () at integration.cu:52

#4 0x00000000004014e9 in main (argc=2, argv=0x7fffbdecec18) at main.cu:105

(cuda-gdb) n

Single stepping until exit from function cudaStreamSynchronize, 

which has no line number information.

RKF () at integration.cu:394

394	}

(cuda-gdb) n

Fork (routine_ptr=0x403bd7 <RKF()>) at integration.cu:503

503		if ( (CudaRetVal = cudaMemcpy(	Calc_Array,			/* This copies to Host what Device was supposed to crunch */

(cuda-gdb) n

516	}

(cuda-gdb) p Calc_Array[0]							/* This used to work much better when done on CPUs */

$1 = {n = {0, 0, 0, 0, 0, 0, 0, 0, 0}, r = {-nan(0x8000000000000), 0, 0, 0, 0, 0, 0, 0, 0}, d_n = {-nan(0x8000000000000), 0, 0, 0, 0, 0, 0, 0, 0}, 

 temp_n = 0, temp_r = 0, temp_d_n = 0, predictor = -nan(0x8000000000000), corrector = -nan(0x8000000000000), adaptor = -nan(0x8000000000000)}

(cuda-gdb) SCREW YOU

(cuda-gdb) kill

Kill the program being debugged? (y or n) (cuda-gdb) q

I’ve tried to insert a breakpoint at Rate_Nonshared(). Breakpoint worked. But Rate_Nonshared calls another device function. Trying to debug it leads to results which are very similar to what I’ve just presented.

I’d appreciate some input on what the heck is going on here.

P.s. I won’t rule out the theory that cuda-gdb is off its crock. The other day it gave me this delightful treatment:

NVIDIA (R) CUDA Debugger

4.0 release

Portions Copyright (C) 2007-2011 NVIDIA Corporation

GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "x86_64-unknown-linux-gnu"...

Using host libthread_db library "/lib64/libthread_db.so.1".

(cuda-gdb) b integration.cu:496							/* Fun starts here */

No source file named integration.cu.

Make breakpoint pending on future shared library load? (y or [n]) (cuda-gdb) Hmmmmmmm.......

Undefined command: "Hmmmmmmm". Try "help".

(cuda-gdb) b main.cu:1

No source file named main.cu.

Make breakpoint pending on future shared library load? (y or [n]) (cuda-gdb) WTF????

Undefined command: "WTF". Try "help".

(cuda-gdb) b1

Undefined command: "b1". Try "help".

(cuda-gdb) b 1

Breakpoint 1 at 0x4012b4: file main.cu, line 1.

(cuda-gdb) ARE YOU *censored* ME AROUND OR WHAT???!!!

Undefined command: "ARE". Try "help".

(cuda-gdb) screw you

Undefined command: "screw". Try "help".

(cuda-gdb) q