What device does Emulation Mode emulate?

Does anybody know the specifics of the device that Emulation Mode emulates? I.e.

  1. How many multiprocessors?
  2. How many thread processors per multiprocessor?
  3. How much memory per multiprocessor?
  4. Can it emulate multiple CPU devices?
  5. Can it emulate specific cars, Fermi, Tesla?

The programming examples in the SDK and the docs don’t seem to cover this issue. There are two apps, MonteCarlo and MonteCarloMultiGPU, and multi GPU’s are set to 1 in emulation mode.

I would be nice to have a debuggable emulator for Tesla in the ideal case, i.e. one that allows you to catch errors and see the state of the device.

It doesn’t emulate any real device. The warp size is one, and thread execution is completely sequential. It is incapable of simulating any of the real execution characteristics of any actual CUDA hardware, and it is also incapable of detecting many of the most common programming errors which make code fail on the device. Which it is why it is deprecated in CUDA 3.0 and scheduled for removal from the toolkit release that will follow 3.0. There are much more useful tools like Nexus, cuda-gdb, gpu ocelot, barra, which let you either inspect the state of code running on an actual device, or provide a much richer emulation environment which is more useful for trouble shooting.

Instead of just removing it from the toolkit it would be better if they replaced it with a better emulator/simulator, maybe one of the ones you mentioned.

The task before me is to convince people that they should invest in CUDA hardware. To do this I have to learn CUDA and try some algorithms before I am in a position to recommend to someone to buy the hardware. Not to mention that for some hardware like the Fermi, the silicon is 2 days old and maybe a bit hard to get right away, making the pre-purchase decision more dependent on emulation.

So for people in my situation where we are exploring technologies but are not yet in a position to commit to a purchase, it’s really necessary to have a good emulator available, and one which will emulate the specific limits of a particular device and emulate the behavior of the device accurately. I don’t care if this is slow; I can use smaller dimensions in a test.

Thanks for the lecture, but it has nothing to do with me. I just offered an answer your question.

If you really want to try CUDA development, the cheapest CUDA cards (1.2 compute capability which have all the same facilities as a current Tesla except double precision support) cost €50/$50/SEK750. The barriers to development on real hardware are not high.