Device Emulation

Hello again,

I am having a hard time figuring out exactly what the -deviceemu flag does. I was assuming that it embedded a cubin emulator in the fat binary, and called that emulator instead of the actual device.

It seems my assumption was wrong because with -deviceemu, nvopencc, ptxas and fatbin are never called. Therefore my new assumption is that the device code is instead compiled like normal code and executed directly on the host platform (why call that emulation then ?). Another consequence seems to be that there is no cubin-level emulator.

Does this mean that if I get a compiled CUDA binary I can extract the cubin file but I can’t emulate it ?

I would be glad to find any pointer or documentation on this issue, thanks.

Daniel

Yes I would like to have a cubin-level emulator as well. But I don’t think that it exists. I use only cu* devices calls, and load .cubin files. If you try putting a printf, the cu won’t compile. Cat the .cubin file and I get a bunch of bytecode.

Yes, the emulation mode is somewhat of a misnomer: It compiles to host code that is a sort of “source level emulation” of the CUDA kernel. The primary use of this mode is to enable the use of standard debugging tools like gdb and valgrind to verify the logic of your code. Emulation mode also won’t necessarily show race conditions because the thread scheduling is completely different on the CPU compared to the GPU. (In emulation mode, you can insert printf calls into your kernels as well, which would normally not be allowed.)

I suspect this mode was implemented to give an 80% solution with 20% of the effort. A complete hardware simulation would be much slower, making it more awkward to use emulation mode to debug code in many cases. That said, there reasons why a full simulation would be nice. NVIDIA, however, has not indicated any such tool will be released.