on CUDA 4.0 programming guide page 90 it says “The number of clock cycles it takes for a warp to be ready to execute its next instruction is called the latency”, and on the following page it says “If some input operand resides in off-chip memory, the latency is much higher: 400 to 800 clock cycles.” Does clock cycle here refer to core clock instead of memory clock? how are these two terms related to each other? and why core clock is less than memory clock? Thanks for help!
It Reffers to shader clock(2X core clock).
The core clock(and shader clock) reffer to the computational speed of the device. This one affects the number of operations the processors do.
Memory clock is the speed at which the data from the memory is transfered to the cores, if the processor doesnt have the data in its local registers then it needs to get it from the memory, here is where the latency usually occurs.
Core clock and memory clock doesnt need to be the same for it to work(i am not sure exactly how this is done in the gpu) but basically the memory is a separate piece from the core, and it works on different speeds.