ptx .local memory

Hi,

when I check the .ptx file, some arrays i declared in .cu file, like" int A[3]", are marked as .local (aligned). Are these arrays are stored int slow global memory or in fast registers?

When access these arrays, it seems that, they are faster than access some global memory address. But when copy their values to global memory address, it seems no difference between normal write and coalesce write.

I checked the ptx isa doc, but still not sure about it. the section only mentions that they are private to thread and the size is limited!

If it is reported as local it is stored in Global memory, since it was not possible to keep in registers. However if your GPU has cache (as in the case of Fermi), it may be available in the cache, and you can get good performance.

If it is reported as local it is stored in Global memory, since it was not possible to keep in registers. However if your GPU has cache (as in the case of Fermi), it may be available in the cache, and you can get good performance.