I am interested in sometimes running CUDA code in emulation mode for debug and other purposes, when CUDA hardware is not available. So far it appears that my emulation does not take advantage of the multi-core CPU (I have quad core Xeon, and it is 25% busy while running my CUDA program). I also tried to use the nvcc --multicore option, but this does not seem to be implemented yet? (I am using CUDA version 2.2).
Any thoughts on taking advantage of my multi-core CPU when emulating?