Our customer is looking to get an idea of how much better certain algorithms execute on a GPU vs a CPU. The numbers don’t need to be exact, just a ballpark figure. I’ve got a CUDA version ready to roll, but in order to save time time on implementing the CPU version I was hoping to just force the CUDA code to execute on just the CPU and take those numbers. Is this even possible? Any advice or tips on going about this?
Used to be able to run Cuda in emulator mode this was dropped when debugging code running on the GPU became possible.
But if you get an older version of cuda (~2 yrs ago) that would allow you to do what you want providing you are not using newer features of cuda.
However its not a fair benchmark as there are different approaches for making parrallel (GPU) and sequential (CPU) efficient.
As kban suggested, this is not the right way to do it as it will give you a very very crude information on numbers.
Another thing is that if your algorithms are of standard algorithm, you may try to browse different publications to see if community has already published these algorithms on CUDA.
Maybe you can use Ocelot. It can compile PTX (the intermediate code you get after compiling your CUDA program) to CPU (multi-core x86) using LLVM. I don’t know if the result is the most efficient you can get, but at least you get some numbers out of it to compare.
The emulator is NOT a good idea for benchmarking. The performance is terrible, and many people in the past have been misled by using it.
Ocelot is a much better option, although getting it running can be a little involved. I don’t recall if the current release is yet generating SSE instructions, but if so I think it would make a reasonably fair benchmark if there isn’t a proper CPU-optimized version of the algorithm to compare to.
Are you kidding? To use CUDA emulator on CPU is similar to use Intel hypothetical CUDA-emulator - it would be a “great” consulting service. Besides, the algorithms must be designed from ground up to be efficient on GPU or on CPU and I sure you the most effective algorithms for each platform are vastly different algorithms. The only way to compare CPU vs GPU is to compare performance of the final result; let say high quality interactive volume rendering is a good example, there are GPU efficient and also CPU efficient implementations so comparison is possible.