I’m newbie on CUDA… I want to benchmark ALU on GPU.
Anyone do benchmark on ADD, MUL successfully?
How many times MUL 32bit integer is faster than MUL on 32bit float, in GPU architecture? ADD is same?
345Gflops means MUL on 32bit Floating point operation speed?
In Tesla C870 or Geforce 8899GTX, ALU spec is below…
575 MHz core clock (GeForce 8800 GTX).
128 scalar (not vector!) floating-point ALUs
(integer and floating-point formats, 32-bit FP precision to meet the IEE 754 standard, clock-lossless MAD+MUL).
Doubled ALU clock rate (1.35 GHz for 8800GTX).