A while ago I posted about a CUDA/PTX emulator that my research group developed last semester.
We finally released the source code today under the BSD license. The emulator implements the PTX virtual machine and executes programs using a single CPU thread, one instruction at a time. We have verified that all of the CUDA SDK examples from 2.1 and 2.2 run using the emulator except for the programs that use the Driver Level API, which we do not support. Like Barra and GPGPU-sim, we provide a set of libraries that replace libcudart.so, so you should be able to link any CUDA program against the emulator and have it transparently replace the NVIDIA driver and runtime.
The emulator has hooks for trace generators that can examine the complete system state after each instruction is executed. We have several trace generators to record all memory traffic and inter-thread communication through shared memory in place already and it should be fairly easy to add others.
We also release a set of program analysis tools for PTX that allows you to generate control flow graphs, dominator trees, dataflow graphs, and convert PTX to pure SSA form as part of the code base.
The entire project can be downloaded here http://gpuocelot.googlecode.com/files/ocelot-0.4.50.tar.gz . API documentation can be found here: http://www.gdiamos.net/classes/translator/api/index.html . We have a mailing list here in case you would like to contribute ideas or hear about updates: http://groups.google.com/group/gpuocelot . Finally, we have put together a quick tutorial for running a CUDA program on the emulator: http://code.google.com/p/gpuocelot/wiki/Installation .
We plan to continue to develop this project with the goal of eventually having a complete compilation chain from CUDA for x86 CPUs as well as NVIDIA GPUs as well as analysis tools supporting each path.
Hopefully people here find this useful.