Should I be under the assumption that any difference in behavior between EmuDebug and Debug compilation profiles is a compiler bug?
I’ve been getting extremely frustrated trying to debug a kernel I’ve been working on that runs in EmuDebug with absolutely no problems, no exceptions, no memory exceptions, but in regular Debug mode which uses the hardware, I get billions of these in the output log:
First-chance exception at 0x7c812a5b in cudatest.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fae0…
and the kernel is obviously aborting early because it is finishing much quicker than it normally would. What’s the point of even having a “debug” profile if you can’t use it to debug anything, and what’s the point of the EmuDebug profile if it isn’t even going to mimic the runtime behavior on the hardware?
Nevermind, I got it. Turns out this is extremely descriptive error you will get if the compiler can’t figure out (because it is retarded, mind you) how to allocate storage for all your local variables, either to registers or to local memory, the latter of which is extremely plentiful.
I didn’t really understand what the problem might be until I looked at the ptx intermediate code. Hey Nvidia, have you guys heard of register renaming? It’s a really cool idea some guy had back in the day, you should look it up some time.
Okay, great, but .cubin files aren’t human readable. How am I supposed to figure out exactly what the compiler is doing to make optimizations if I can’t even see how it’s allocating precious registers?
I apologize for my attitude, I’ve spent a long time porting a complex piece of code and I’m really feeling frustrated with CUDA in general at this point.
You can use decuda to dissassemble the cubin if you want but that usually isn’t necessary. The most information you usually need from the cubin is to read the number of registers and shmem usage so that you can determine the maximum block size you can run.