I’m working on motion estimation, which is composed of very regular computation of blocks of data, like a typical image processing algorithm.

The measured performance on GeForce 8800GTX is around 40 GFLOPS, 1/10 of the peak. I checked the assembly code, and found that a large part of the instructions is computing memory index, branch, and other stuff I don’t fully understand. Only 1/3 of the code is doing real floating point computation desired.

So I’m not sure if those index and other integer operations can be done in parallel with floating point calculations. Seems Intel CPU has a superscalar architecture that parallelize integer and floating point computation. What’s the case for GPU? If it’s not superscalar, is it correct to say non-floating point operations takes up roughly 2/3 of the total time, if they counts for 2/3 of the total # of instructions?

Thanks.