Pipeline latency

hi, all

I execute dependent operation a += b*c to measure pipeline latency.
my GPU is 9600GT, launch only one thread.
let a, b, c all reside in register, latency is 24 cycles;
let a, c reside in register, b in constant memory, latency is 22 cycles;
let a, c reside in register, b in share memory, latency is 26 cycles.

Does the difference of operand selection result in above results?
Can anyone give me a detailed explanation or other related measure data?

Thanks in advance.


In Section 3.4 of this paper, http://mc.stanford.edu/cgi-bin/images/6/65…_Volkov_GPU.pdf, similar measurments are described. Maybe this paper could help you.

Use of a value from the constant buffer is faster than use of a value from a register?