shared memory, texture memory, arrays, etc. clarification?

hi all,

i’ve been reading the docs and can’t seem to get my head around some of the memory stuff.

first off, my problem: i need to transfer a fairly large amount of data (specifically arrays of structs) from the host to the device and all threads need read-only access to it. so, any simple solutions to do this?

ok, so my confusion…i understand global memory is limited to 16kb, meaning the sum of global variables can’t exceed 16kb. however, i’ve been reading about cuda’s texture memory and 1- and 2d array memory. is that considered global (thus limited to 16kb), or is this different?

these cards have 320, 512, and 768mb ram, so i’m guessing (hoping) there’s some way to get >16kb memory, especially if it’s read-only.

any answers y’all might have would be fantastic…


Global memory is not limited to 16k. It is read-write and is only limited by the amount of free DRAM on the card. The per-block shared memory is only 16k.

And the single most important thing you can learn about memory in CUDA is coalescing. Coalesced reads of 32-bit or 64-bit types are needed to attain any reasonable amount of memory throughput on the device.