Local Memory - What is that? Memory Hierarchies

Yes, I know about the PTX_ISA PDF… It is not meant as any hardware description. In one of the first pages, it already talks about a ‘virtual machine’. It is meant as a generic description of current and future NVidia computing devices. Did you notice it contains more things that aren’t actually implemented? One example is the .surface memory space. AFAIK, it does not exist for G80.

Any of the real hardware descriptions (like in the CUDA developer guide) does not mention local memory cache. So you cannot assume local memory is actually cached. Some experiments and timings have also shown that local memory is slow. Also, explicitly making things local was deprecated in 1.0. Try to stay clear from it as much as possible.

Yes, on the same lines – One cannot say that it is NOT cached OR that it resides in Global Memory.

Sure. It could be that since this memory is per-thread in nature – it would cause the WARP to do lock-step execution when they are acccessed. Thus one memory access that usually completes or stalls in one clock cycle (depending on the kind of memory) for the entire WARP , now occurs in lock-step fashion (completes or stalls) which can drastically slow down performance.

If some1 from NVIDIA talks about it – it would be great!

It does reside in global memory, I’m sure of that much. Then again, constant data and shader code also reside in global memory, that fact doesn’t tell anything about the caching scheme, that’s true.

Whats do you base your claim on?

By dumping the GPU memory. You can find the code and constants for all the kernels by reading the right (global memory) offsets, in a kernel.

Aah. Thats pretty interesting. So, I assume you did and found that out! Hmm… That sounds coool!

Check the link in his signature! :yes: