hi, all
I execute dependent operation a += b*c to measure pipeline latency.
my GPU is 9600GT, launch only one thread.
let a, b, c all reside in register, latency is 24 cycles;
let a, c reside in register, b in constant memory, latency is 22 cycles;
let a, c reside in register, b in share memory, latency is 26 cycles.
Does the difference of operand selection result in above results?
Can anyone give me a detailed explanation or other related measure data?
Thanks in advance.