CUDA Emulation fails in dual core machine CUDA Emulation

Hi,

I am a newbie to CUDA, I am facing problems while trying to use CUDA in emulation mode in a dual core machine (Core 2 Duo) which does not have a graphic card. I understand from manuals that such requirement is not necessary for emulation mode, am i missing any thing here? Also, whn I tried to execute the same program in other machines which I have NVDIA grapic card, am able to step through and execute the same (SDK 3.1 used). Appreciate your suggestions and pointers in this issue, thanks

CUDA 3.1 doesnt support device emulation any longer. I guess you used an older version on your other machines. Use cuda-gdb for debugging now. The old emulation wasnt an accurate emulation of any CUDA device at all and there was no guarantee that code which ran in emulation mode without problems didnt have any bugs when executed by a real CUDA device.

CUDA 3.1 doesnt support device emulation any longer. I guess you used an older version on your other machines. Use cuda-gdb for debugging now. The old emulation wasnt an accurate emulation of any CUDA device at all and there was no guarantee that code which ran in emulation mode without problems didnt have any bugs when executed by a real CUDA device.

Thanks much for the input. I guess I cannot use cuda-dbg as I am working in windows. However would try to do the same using parallel nsight

Thanks much for the input. I guess I cannot use cuda-dbg as I am working in windows. However would try to do the same using parallel nsight

And worse, code which ran properly on the device could fail in the emulator.

And worse, code which ran properly on the device could fail in the emulator.

Does anybody have an opinion about OCELOT? Or other emulators? I REALLY need that feature, because in our CUDA class, we want students to be able to run CUDA code on their laptops.

THANKS

Lanzcc

Does anybody have an opinion about OCELOT? Or other emulators? I REALLY need that feature, because in our CUDA class, we want students to be able to run CUDA code on their laptops.

THANKS

Lanzcc

Barra could also be an option. BarraWiki
But im not sure if Barra or GPU-Ocelot have implemented an emulation of smem bank conflicts or coalescing. From the wiki it looks like Barra has implemented some more features than Ocelot. But there also doesnt seem to be a possibility to check for such problems. Looks like you still would need to execute it on a real GPU and run the profiler then… If someone knows of tools (emulators) capable of that let us know!

Barra could also be an option. BarraWiki
But im not sure if Barra or GPU-Ocelot have implemented an emulation of smem bank conflicts or coalescing. From the wiki it looks like Barra has implemented some more features than Ocelot. But there also doesnt seem to be a possibility to check for such problems. Looks like you still would need to execute it on a real GPU and run the profiler then… If someone knows of tools (emulators) capable of that let us know!

Barra is an architecture simulator of G80 and Ocelot is a functional emulator of the PTX virtual machine, the difference being that Barra tries to do things in the same way that they would be done on a G80 series GPU and Ocelot tries to do the things in a generic way. For bank conflicts, Ocelot would just treat shared memory as a flat memory space without any banking structure whereas Barra would include the same number of banks and model conflicts in the same way that they would occur on a G80 GPU, but not necessarily a GT200/GF100/GF104/etc.

Ocelot includes add-on modules (trace generators) that model things such as coalescing and bank conflicts, but there are many different protocols for coalescing rules and bank conflicts, so you have to configure them to match some particular GPU.

In general I think that Barra/GPGPU-Sim is a better fit if you are trying to determine how architecture features (cache sizes, banking structure, warp scheduling policies) affect performance and Ocelot is a better fit if you just want to emulate a CUDA program on a CPU, or record a generic metric like the ratio of floating point instructions to memory instructions.

I could give an opinion but it would be incredibly biased. If you end up giving Ocelot a try, I would be interested in any suggestions as to how it could be improved to better suit such uses.

Barra is an architecture simulator of G80 and Ocelot is a functional emulator of the PTX virtual machine, the difference being that Barra tries to do things in the same way that they would be done on a G80 series GPU and Ocelot tries to do the things in a generic way. For bank conflicts, Ocelot would just treat shared memory as a flat memory space without any banking structure whereas Barra would include the same number of banks and model conflicts in the same way that they would occur on a G80 GPU, but not necessarily a GT200/GF100/GF104/etc.

Ocelot includes add-on modules (trace generators) that model things such as coalescing and bank conflicts, but there are many different protocols for coalescing rules and bank conflicts, so you have to configure them to match some particular GPU.

In general I think that Barra/GPGPU-Sim is a better fit if you are trying to determine how architecture features (cache sizes, banking structure, warp scheduling policies) affect performance and Ocelot is a better fit if you just want to emulate a CUDA program on a CPU, or record a generic metric like the ratio of floating point instructions to memory instructions.

I could give an opinion but it would be incredibly biased. If you end up giving Ocelot a try, I would be interested in any suggestions as to how it could be improved to better suit such uses.

My personal opinion is that Ocelot is great, and you should be using it if you need to run on the CPU.

My personal opinion is that Ocelot is great, and you should be using it if you need to run on the CPU.