memory organization

I wonder the association between logical and physical memory organization in CUDA enabled cards. I understand registers and shared memory are on-chip areas and local memory, global memory, constant memory and texture memory are on-device memory areas. I guess latter happens in a single memory structure (what we call 512MB memory?). So following are my questions:

  • registers are register in the sense of digital electronics or it is an high-speed, probably multi-port on-chip memory sharing the same resources with shared memory ? If so, there is no access speed difference between them ?

  • local memory, global memory, constant memory and texture memory is inside the memory chips of device ? so they share the same resource ? If so, why constant and texture memory is read-only ? If so, are they dynamically partitioned (before runtime or during runtime) or have fixed amounts e.g. 1/2 is global memory, 1/2 is texture memory etc.

Mete

I found almost all the answers in the programming guide. I guess register access is faster than shared memory access. Only thing I dont know is the size of texture memory and local memory, if they are dynamic etc.

Shared memory, when there are no bank conflicts, is as fast as a register. In the common case of accessing a shared memory array, shared memory can appear slower since it often takes a few cycles to compute the array index before the shared memory read is issued.

Local memory, global memory, and texture memory all share the full, physical memory space. Constant memory is limited to 64 kB. The difference is only in the way they are accessed:

  • Local memory is global memory that has been automatically assigned by the compiler to individual threads. It is usually used as scratch space during a calculation when there are not enough (or it would be worse to use more) registers. It is not cached.

  • Constant memory is read through an 8 KB local cache on each multiprocessor. It is optimized for broadcast reads, where every thread access the same value at the same time.

  • Texture memory is global memory that is read through an 8 KB cache (separate from constant cache) that is optimized for spatially related reads that are not necessarily in linear order. The read is also passed through dedicated hardware to do common operations, like normalization and interpolation, on the fly.

  • Global memory is the standard memory on the card and is not cached.

The two cached memories, constant and texture, are read-only in order to ensure cache coherency is maintained without requiring synchronization between multiprocessors, which each have their own caches.

Thanks seibert for clear answer.