kernel not executed, profiler reports all-zeros

I’m running a fairly simple kernel and the Visual Profiler reports all zeros (except for the occupancy, which is correct at 67%, and the GPU time, which is preposterously high.) The output is also empty and I suspect the kernel isn’t actually executed. In the .cubin file everything seems in order (shMem: 4124, reg:12, threads per block: 256). If I trivialize the kernel some more, it runs and everything starts making sense again. No compilations errors in either case. Does anybody have any idea what could cause this? I’ve learned from the forum that CUDA isn’t particularly good at reporting errors, but never personally experienced anything this puzzling, and I’ve developed apps much more complicated that worked with no problems.
THanks.

Are you using the SDK error-checking macros?

I’m using CUDA_SAFE_CALL for all CUDA functions.

And are you running in Debug mode? The error-checking functions have this wonderful feature where they don’t work in Release.

This is the first time I’m having to run in debug (had to install it first :). After all’s said and done, I get a nondescript “unspecified launch failure”, on the first CUDA_SAFE_CALL line after kernel. If I erase that call (a cudaMemcpy), it will skip onto the next one (a cudaUnbindTexture).

The most common cause of unspecified launch failure is when a kernel writes outside of allocated memory.

I don’t see where something like that might occur, I ran all kinds of sanity checks. For example, it runs in emulation mode. Thanks, BTW, for your efforts.

Run it through valgrind in emulation mode.

can you post the code so we can check?

From what I gather, emulation mode isn’t all that great for debugging. Every time I’ve run into an unspecified launch failure, it’s been an out-of-bounds memory access in a kernel. So much so that I automatically equate “unspecified launch failure” with “segmentation fault”.

Emulation mode lets you use valgrind, which is the easiest way to determine where an unspecified launch error was caused.

Good to know. Haven’t ever used it, but I definitely will next time!

I wasn’t completely knocking emulation mode… I’m sure it can be a useful tool. However, I’ve read on here occasions where the emulator will give correct results, while the release mode doesn’t. I guess I was more making a point that, just because it ran in emulation mode (as the OP stated), doesn’t necessarily mean there aren’t any errors in the code.

Oh sure, it hides all sorts of race conditions, floating point differences, and problems like that from you. But, if you are getting an unspecified launch error and you’re not doing any madness with indirect addressing, emu+valgrind is the tool of choice.

Valgrind gets stuck on a cudaMemcpyToSymbol() call.

I found out what my problem was and thought I should post it so others could learn from my stupidity :( I was passing a structure pointer as a parameter to my kernel. In hindsight, I remember to have read something about this being forbidden, but I don’t remember where. Also, a suggestion for Nvidia: if this is indeed forbidden in release mode, then it would be very easy to point it out in debug mode (more explicit error messages in CUDA_SAFE_CALL(), please!), or in emulation mode. A simple feature like this could have saved me a few good hours of frustration.

It’s not forbidden. You were probably just doing it wrong (some other pointer that sat on the host, unallocated memory, something like that).

Now I’m really confused, because it worked well in emulation mode. I even printed the members of the structure from within the kernel and everything checked out.

Indeed, my pointer sat on the host. Thanks tmurray.

It really shouldn’t be a problem to check if a pointer passed to a kernel is in the range of memory that cudaMalloc handed out. In fact, a lot of interesting error checking can be done in debug and emulation builds.

CUDA error checking and reporting can be improved tremendously. NVIDIA has done so much right to ease the learning curve, and I applaud it, but this whole area sticks out painfully.