I’m thinking of benchmarks between GPU and CPU. I’ve read a lot of papers where people benchmark CUDA software executed on a GPU device and the same algorithm executed on one single core…
I don’t think this is politically correct…
What’s your idea about this?
What’s the way to do a correct benchmark in your point of view?
GPU algorithm is obviously written with the aim of exposing the highest level of parallelism, the same algorithm on a different architecture should be rewritten…
You can’t have a clean comparison between GPU and CPU algorithms because for a start any GPU algorithm still requires a little help from the CPU as well. The appropriate thing to do - when you really want to (or must) compare to CPU implementations - is: take the best implementation of the fastest known CPU algorithm and compare yourself to that.
But when I’m given a code (the only extant version of its algorithms) which uses BLAS to do a lot of 2x2 matrix multiplications, am I supposed to fix that little oversight prior to benchmarking?
Thank you guys, your ideas are really interesting!!
Jjp idea is really correct in my point of view, but sadly no good implementation for CPU of something similar to my GPU algorithm exist…
Sorry Eyal… Wat’s exactly “wall time”? I like a lot your idea… :-)