CUDA Cycle-Accurate Simulator Is there one?

I’m writing some highly optimised code for a 4000 word essay for my school, however I do not own a CUDA enabled card (And the one I need to test on is a $500 GTX580 – a bit expensive for a student).

I was wondering if there was a cycle-accurate simulator for CUDA, that would allow me to debug, test and benchmark my code (Obviously in ‘slow-motion’ as it were)

While I obviously expect such a thing to be 100-500 times slower than the real deal, it would be really handy if there was one. My code should only take 10 real seconds to simulate, waiting an hour for that may be boring but sure beats $500.

gpuocelot simulates at the PTX code level.

Barra simulates at the G80 (i.e. nVidia GTX 8800) instruction level.

Both are open source projects, but Barra seems more incomplete (GPU feature-wise) and is a bit orphaned.

neither one is cycle accurate in any way. (also a cycle accurate simulator would be at least thousands of times slower than hardware, probably a lot more than that)

Sorry to hear that. External Image

Actually, it is still actively being developed, even though we do not publicize it much. We now have a decent timing model of the SM. It is by no means cycle-accurate (that would require much more reverse-engineering than reasonnable), but it is already useable for hardware feature experimentation and design space exploration.

We focus on improving the accuracy of the simulation and supporting more micro-architectural features, rather than keeping up with the latest CUDA versions and supporting new programmer-visible features (read: no CUDA Toolkit >3.0, no CC 2.x).

If somebody is interested in beta-testing the new timing model, please drop me a line. We always appreciate feedback.

As of simulation time, an instruction-level functional simulator like Barra or Ocelot’s emulator is ~1000 times slower than hardware. For cycle-accurate simulation, add 2 or 3 more orders of magnitude…

Thanks for the information :) And the assignment I’m doing can afford to have a 1 week simulation time if needed.

I’ll post back after giving both of these a try