What is device emulation? information about cuda device emulation.

I just read the CUDA C programming guides, which says that it supports device emulation. I really have no exact idea of ‘device emulation’. I guess it is a sort of tool running to simulate the device hardware so that cuda code can be executed on this simulated platform. Am I right? Where could I find further information about it? Thanks.

Yes, you are right. Device emulation will simulate the GPU hardware, but the difference is that the device emulation will execute the threads in serial instead of the parallel execution in the actual hardware. If you want to debug through the kernel code you have to run it in device emulation.

If you are using cuda build rule, there you can find an option in the .cu file properties; Emulation mode - No/Yes. By default it is No, make it Yes to build it in emulation mode.

If you are working with custom build rule add ‘-deviceemu’ to the command line.

Yes, you are right. Device emulation will simulate the GPU hardware, but the difference is that the device emulation will execute the threads in serial instead of the parallel execution in the actual hardware. If you want to debug through the kernel code you have to run it in device emulation.

If you are using cuda build rule, there you can find an option in the .cu file properties; Emulation mode - No/Yes. By default it is No, make it Yes to build it in emulation mode.

If you are working with custom build rule add ‘-deviceemu’ to the command line.

And after you have done all of that, forget about it, because it is deprecated and will disappear from the CUDA toolchain shortly. The are superior alternatives such as Ocelot for running and debugging CUDA code on an x86 cpu.

And after you have done all of that, forget about it, because it is deprecated and will disappear from the CUDA toolchain shortly. The are superior alternatives such as Ocelot for running and debugging CUDA code on an x86 cpu.

Device emulation does something along the lines of source to source translation from CUDA to C++ and pthreads and then feeds it into your C++ host compiler cl/g++. So you end up executing some C++ program that is supposed to be functionally equivalent to what you wrote in CUDA.

Ocelot is an emulator/JIT for PTX, which is the virtual instruction set used by NVIDIA GPUs. You compile your program normally using nvcc and then link against libocelot.so rather than libcudart.so. Ocelot hijacks every CUDA kernel call and directs it to the selected device, which can be an emulator, JIT, or NVIDIA GPU. If it is the emulator, it interprets every PTX instruction one at a time to execute your kernel. So it executes the kernel in the same way that a GPU would execute it, even though it is actually running on your CPU.

Device emulation does something along the lines of source to source translation from CUDA to C++ and pthreads and then feeds it into your C++ host compiler cl/g++. So you end up executing some C++ program that is supposed to be functionally equivalent to what you wrote in CUDA.

Ocelot is an emulator/JIT for PTX, which is the virtual instruction set used by NVIDIA GPUs. You compile your program normally using nvcc and then link against libocelot.so rather than libcudart.so. Ocelot hijacks every CUDA kernel call and directs it to the selected device, which can be an emulator, JIT, or NVIDIA GPU. If it is the emulator, it interprets every PTX instruction one at a time to execute your kernel. So it executes the kernel in the same way that a GPU would execute it, even though it is actually running on your CPU.

Thanks! It is very clear.

Thanks! It is very clear.