memory organization

metebalci · March 10, 2008, 9:44am

I wonder the association between logical and physical memory organization in CUDA enabled cards. I understand registers and shared memory are on-chip areas and local memory, global memory, constant memory and texture memory are on-device memory areas. I guess latter happens in a single memory structure (what we call 512MB memory?). So following are my questions:

registers are register in the sense of digital electronics or it is an high-speed, probably multi-port on-chip memory sharing the same resources with shared memory ? If so, there is no access speed difference between them ?
local memory, global memory, constant memory and texture memory is inside the memory chips of device ? so they share the same resource ? If so, why constant and texture memory is read-only ? If so, are they dynamically partitioned (before runtime or during runtime) or have fixed amounts e.g. 1/2 is global memory, 1/2 is texture memory etc.

Mete

metebalci · March 10, 2008, 11:06am

I found almost all the answers in the programming guide. I guess register access is faster than shared memory access. Only thing I dont know is the size of texture memory and local memory, if they are dynamic etc.

seibert · March 10, 2008, 1:52pm

Shared memory, when there are no bank conflicts, is as fast as a register. In the common case of accessing a shared memory array, shared memory can appear slower since it often takes a few cycles to compute the array index before the shared memory read is issued.

Local memory, global memory, and texture memory all share the full, physical memory space. Constant memory is limited to 64 kB. The difference is only in the way they are accessed:

Local memory is global memory that has been automatically assigned by the compiler to individual threads. It is usually used as scratch space during a calculation when there are not enough (or it would be worse to use more) registers. It is not cached.
Constant memory is read through an 8 KB local cache on each multiprocessor. It is optimized for broadcast reads, where every thread access the same value at the same time.
Texture memory is global memory that is read through an 8 KB cache (separate from constant cache) that is optimized for spatially related reads that are not necessarily in linear order. The read is also passed through dedicated hardware to do common operations, like normalization and interpolation, on the fly.
Global memory is the standard memory on the card and is not cached.

The two cached memories, constant and texture, are read-only in order to ensure cache coherency is maintained without requiring synchronization between multiprocessors, which each have their own caches.

metebalci · March 10, 2008, 4:32pm

Thanks seibert for clear answer.

Topic		Replies	Views
About the different memories CUDA Programming and Performance	12	11671	December 6, 2007
basic doubts about cuda CUDA Programming and Performance	9	3766	February 7, 2008
Local Memory? CUDA Programming and Performance	1	4909	August 3, 2007
memory confusion how big is local/shared/global memory? CUDA Programming and Performance	6	3434	January 20, 2009
questions on register, local memory and block CUDA Programming and Performance	5	4887	February 28, 2008
What is local memory ? CUDA Programming and Performance	1	983	October 13, 2009
Really slow constant memory Random access to constant memory CUDA Programming and Performance	13	4412	December 4, 2009
shared memory, texture memory, arrays, etc. clarification? CUDA Programming and Performance	1	2316	January 31, 2008
What's the difference between CUDA stack and local memory? CUDA Programming and Performance	3	537	September 13, 2024
dumb, newbie CUDA texture question CUDA Programming and Performance	2	27965	June 29, 2007

memory organization

Related topics