I want to have an operation(Read and write) on an array,width = 256,height = 128, or even bigger, I know that if the input data is too big, then the best choose of memory is texture sometimes maybe constant, the shared memory is 16kb, but when I need read/write and high performace, can I use shared memory???
if I define the sharedata[256];
divide Grid(256,1,1),BLock(1,128,1);
Shared and constant memory are one of the fastest memory types on the GPU I think (don’t know about Textures never use them :P). And whether you put your data depends on how big the array is. Like you mentioned yourself shared memory is 16kb as constant memory is 64kb. also constant memory can not be written to from the device so if you want to write to it you need to use local, shared or tex mem. Where the size of local mem depends on how much video memory you still have.
The thing is, within the memory presentations of CUDA found on the ECE 498 course (Course Websites | The Grainger College of Engineering | UIUC) one of the slides mentions that using constant memory depends highly on the cache locality so can vary a lot (from 1 to 100 cycles), where as you know with shared memory it’ll only take a single cycle.
So in effect, constant memory depends a lot on the GPU’s coalescing and caching mechanism?
Constant memory always performs best when all threads in a warp access the same element of shared memory. If threads in a warp access different values in constant memory, then shared memory may be a better option.
I understand that constant is quick, but it can’t be written.
Maybe shared is a better choice,but it’s only 16kb,
the question is can I still use it(for high performance) when the size of input data is 128kb or even bigger?
Shared is 16k, but per block. If you need more read/write memory than that, then your only option is to use device memory, either as global or local. Coalescing will the the most important aspect for performance here.