speed of integer and FP operation on ALU


I’m newbie on CUDA… I want to benchmark ALU on GPU.
Anyone do benchmark on ADD, MUL successfully?

How many times MUL 32bit integer is faster than MUL on 32bit float, in GPU architecture? ADD is same?

345Gflops means MUL on 32bit Floating point operation speed?

In Tesla C870 or Geforce 8899GTX, ALU spec is below…

575 MHz core clock (GeForce 8800 GTX).
128 scalar (not vector!) floating-point ALUs
(integer and floating-point formats, 32-bit FP precision to meet the IEE 754 standard, clock-lossless MAD+MUL).
Doubled ALU clock rate (1.35 GHz for 8800GTX).

345 GFlops means 345/2 MADD/s per second. Benchmarking just the ALU is challenging: See the benchmark written by Simon Green here for a starting point: http://forums.nvidia.com/index.php?showtop…ndpost&p=250179