kernel debugging digging in to kernel code for debugging

Hi,

I am having trouble debugging a kernel. I can run the program fine when it’s been built with emulation, and the results are correct, but when I try to use the GPU I get:

Cuda error: Kernel execution failed in file ‘aes.cu’ in line 133 : unspecified launch failure.

The line referred to is the CUT_CHECK_ERROR right after the kernel call. I have tried putting a breakpoint in the program before the kernel call to look at memory. My debugger (CPU) says that the memory addresses given by the cuda malloc commands are invalid, but I imagine that’s just because it’s a device memory address. Just the same, I tried inserting calls to CUT_CHECK_ERROR after the cudaMalloc and cudaMemcpy calls (already wrapped in CUDA_SAFE_CALL) to no avail.

Does anyone know of a debugger I can use? Should I use more of those macros like CUDA_CHECK_ERROR and CUT_CHECK_ERROR (anyone have any idea where they’re documented??) Do you have other suggestions?

Thanks so much,
Eli

Number of blocks and threads did you used in kernel ?

i think you should be to check that data used in a blocks must not more than device give.

Eli

I apologise if you are well past this point, but are you building with “make dbg=1”?

I wasted some time discovering that all the macro’s, like CUDA_SAFE_CALL() and CUT_CHECK_ERROR(), are defined as do nothings:

#define CUDA_SAFE_CALL(call) call

by default; you need to build with “make dbg=1” to get them to be helpful. (Have a look at the bottom of cutil.h to see what I mean.)

So maybe the macro’s weren’t checking anything?

Beyond this, my approach is the trivial, by often effective, ‘binary-chop the code’ using “#if 0 … #endif” until I’m staring at the broken line of code, when I start to eat chocolate for inspiration.

(Rant on: I’d prefer to have these macro’s always check for errors by default unless I do something active to switch them off. I think quick but wrong is slower than ‘slow’ and right! Anyway, NVIDIA, if you’re reading, how about adding another flag ‘quick=1’ which needs to be added to the build in order to void the macros out? Rant off.)

Thank you for your help. I isolated the problem by doing basically the same thing - except I just commented out text instead of using macros. It turned out to be some nasty thing about either allocating memory or accessing it - it was unclear which. My solution was to move initializing and allocating that memory to the host program that calls the kernel and pass the kernel a pointer to it :-/.

It works :)

I have actually been using Visual Studio, and I think when I specify to build in debug configuration those macros are on, but I’ll check.

Thanks again,
Eli