How to estimate compute time?

I’m writing an article on CUDA for and am having trouble with my numbers, why are my numbers so bad? Here’s the math:

CPU = 3Ghz
8800gtx = 575Mhz

The program is 30 ops. Running this operation 3 million times on my CPU should take 35.5ms, instead it takes 702ms.
The data transfer is 8 ops total (read and write). Running this operation 3 million times on my 8800gtx should take 42 ms, instead it takes 38ms.

It’s one read or write per cycle on the GPU right, how am I going faster? Why are the CPU numbers so bad? Are there any articles or discussions about this stuff?

  • Valles

The 8800GTX has a 1.35GHz shader clock, not 575MHz, just to add to the confusion. But here is a hint: memory bandwidth and latency.

Let me clarify, the transfer on the GPU is referring to a transfer from the CPU to the GPU and back, this is a much slower transfer rate than all of the other memory transfers. Do you know of any other developers attempting this level of detail?

The memory bandwidth suggestion was in reference to your “poor” Core i7 results.

As for “level of detail”, I think your problem is your have far to little, not that you are trying to be too ambitious, at least on the GPU side. If you want to see clear and logical analyses of the GPU architecture and performance deduced by micro-benchmarking, look at some of these.

Thank you avidday! I thought you were a villian but you turned out to be a real hero!

  • Valles

I upvoted you, you’re now a 4 star instead of a 3 star.