CUDA Emulation fails in dual core machine CUDA Emulation

chandra035 · August 10, 2010, 6:28am

Hi,

I am a newbie to CUDA, I am facing problems while trying to use CUDA in emulation mode in a dual core machine (Core 2 Duo) which does not have a graphic card. I understand from manuals that such requirement is not necessary for emulation mode, am i missing any thing here? Also, whn I tried to execute the same program in other machines which I have NVDIA grapic card, am able to step through and execute the same (SDK 3.1 used). Appreciate your suggestions and pointers in this issue, thanks

ONeill · August 10, 2010, 10:24am

CUDA 3.1 doesnt support device emulation any longer. I guess you used an older version on your other machines. Use cuda-gdb for debugging now. The old emulation wasnt an accurate emulation of any CUDA device at all and there was no guarantee that code which ran in emulation mode without problems didnt have any bugs when executed by a real CUDA device.

ONeill · August 10, 2010, 10:24am

CUDA 3.1 doesnt support device emulation any longer. I guess you used an older version on your other machines. Use cuda-gdb for debugging now. The old emulation wasnt an accurate emulation of any CUDA device at all and there was no guarantee that code which ran in emulation mode without problems didnt have any bugs when executed by a real CUDA device.

chandra035 · August 10, 2010, 11:03am

Thanks much for the input. I guess I cannot use cuda-dbg as I am working in windows. However would try to do the same using parallel nsight

chandra035 · August 10, 2010, 11:03am

Thanks much for the input. I guess I cannot use cuda-dbg as I am working in windows. However would try to do the same using parallel nsight

YDD · August 10, 2010, 1:41pm

And worse, code which ran properly on the device could fail in the emulator.

YDD · August 10, 2010, 1:41pm

And worse, code which ran properly on the device could fail in the emulator.

clanz · August 16, 2010, 2:33pm

Does anybody have an opinion about OCELOT? Or other emulators? I REALLY need that feature, because in our CUDA class, we want students to be able to run CUDA code on their laptops.

THANKS

Lanzcc

clanz · August 16, 2010, 2:33pm

Does anybody have an opinion about OCELOT? Or other emulators? I REALLY need that feature, because in our CUDA class, we want students to be able to run CUDA code on their laptops.

THANKS

Lanzcc

ONeill · August 16, 2010, 3:01pm

Barra could also be an option. BarraWiki
But im not sure if Barra or GPU-Ocelot have implemented an emulation of smem bank conflicts or coalescing. From the wiki it looks like Barra has implemented some more features than Ocelot. But there also doesnt seem to be a possibility to check for such problems. Looks like you still would need to execute it on a real GPU and run the profiler then… If someone knows of tools (emulators) capable of that let us know!

ONeill · August 16, 2010, 3:01pm

Barra could also be an option. BarraWiki
But im not sure if Barra or GPU-Ocelot have implemented an emulation of smem bank conflicts or coalescing. From the wiki it looks like Barra has implemented some more features than Ocelot. But there also doesnt seem to be a possibility to check for such problems. Looks like you still would need to execute it on a real GPU and run the profiler then… If someone knows of tools (emulators) capable of that let us know!

Gregory_Diamos · August 16, 2010, 3:26pm

Barra is an architecture simulator of G80 and Ocelot is a functional emulator of the PTX virtual machine, the difference being that Barra tries to do things in the same way that they would be done on a G80 series GPU and Ocelot tries to do the things in a generic way. For bank conflicts, Ocelot would just treat shared memory as a flat memory space without any banking structure whereas Barra would include the same number of banks and model conflicts in the same way that they would occur on a G80 GPU, but not necessarily a GT200/GF100/GF104/etc.

Ocelot includes add-on modules (trace generators) that model things such as coalescing and bank conflicts, but there are many different protocols for coalescing rules and bank conflicts, so you have to configure them to match some particular GPU.

In general I think that Barra/GPGPU-Sim is a better fit if you are trying to determine how architecture features (cache sizes, banking structure, warp scheduling policies) affect performance and Ocelot is a better fit if you just want to emulate a CUDA program on a CPU, or record a generic metric like the ratio of floating point instructions to memory instructions.

I could give an opinion but it would be incredibly biased. If you end up giving Ocelot a try, I would be interested in any suggestions as to how it could be improved to better suit such uses.

Gregory_Diamos · August 16, 2010, 3:26pm

Barra is an architecture simulator of G80 and Ocelot is a functional emulator of the PTX virtual machine, the difference being that Barra tries to do things in the same way that they would be done on a G80 series GPU and Ocelot tries to do the things in a generic way. For bank conflicts, Ocelot would just treat shared memory as a flat memory space without any banking structure whereas Barra would include the same number of banks and model conflicts in the same way that they would occur on a G80 GPU, but not necessarily a GT200/GF100/GF104/etc.

Ocelot includes add-on modules (trace generators) that model things such as coalescing and bank conflicts, but there are many different protocols for coalescing rules and bank conflicts, so you have to configure them to match some particular GPU.

In general I think that Barra/GPGPU-Sim is a better fit if you are trying to determine how architecture features (cache sizes, banking structure, warp scheduling policies) affect performance and Ocelot is a better fit if you just want to emulate a CUDA program on a CPU, or record a generic metric like the ratio of floating point instructions to memory instructions.

I could give an opinion but it would be incredibly biased. If you end up giving Ocelot a try, I would be interested in any suggestions as to how it could be improved to better suit such uses.

tmurray · August 16, 2010, 6:23pm

My personal opinion is that Ocelot is great, and you should be using it if you need to run on the CPU.

tmurray · August 16, 2010, 6:23pm

My personal opinion is that Ocelot is great, and you should be using it if you need to run on the CPU.

Topic		Replies	Views
Is emulation mode removed from CUDA 3.0? CUDA Programming and Performance	23	22602	July 3, 2010
What is device emulation? information about cuda device emulation. CUDA Programming and Performance	8	5669	May 26, 2010
Ocelot 1.0 Alpha Release High Performance GPU and Multi-core CPU targets CUDA Programming and Performance	27	59842	January 1, 2010
Run CUDA on CPU while GPU is present CUDA Programming and Performance	5	3004	April 25, 2011
NVIDIA has hade a huge mistake with HW debugger Single-GPU debugging not supported and no emulation& CUDA Programming and Performance	34	6029	August 7, 2010
NVIDIA OpenCL fedora 14/15 support CUDA Programming and Performance	8	4075	August 17, 2011
Any reason to choose CUDA over OpenCL? CUDA Programming and Performance	27	26046	August 2, 2010
PTX Emulator Released CUDA Programming and Performance	32	8297	July 15, 2009
emudebug not working in CUDA v3.0? CUDA Programming and Performance	9	2255	May 1, 2010
OpenCL or CUDA? CUDA Programming and Performance	16	10958	October 26, 2011

CUDA Emulation fails in dual core machine CUDA Emulation

Related topics