Shared Memory variables ? In multiple kernel invocations

Hi,

I have to call a kernel several times in my program.

Suppose I have a shared memory array of size 1000x 1000. Can I just fill in few elements in each invocation ?

Will the data filled in one invocation still be available in the next invocation ? I know that this is true in case of a global memory array.

Can anyone tell me how to do this, because my performance with global memory is not too good…

Thanks

You can’t have shared memory of size 1000x1000.
It is limited to 16KB per block. (However you can operate on a 1000x1000 sized global memory array using shared memory :smile: )

Shared memory had block scope only. Which means data written in to shared memory by one block will not be available for the next block.

If you are trying to use/learn shared memory for the first time, I will recommend you to read the following presentation from Mark Harris.

[url=“http://www.gpgpu.org/sc2007/SC07_CUDA_5_Optimization_Harris.pdf”]http://www.gpgpu.org/sc2007/SC07_CUDA_5_Op...tion_Harris.pdf[/url]

Shared memory can be used for the following purposes

  1. For achieving coalesced memory access.
  2. To minimize data accessing from global memory (Repeated reads)

See fig 1.5 in NVIDIA_CUDA_Programming_Guide_1.1.pdf
(I think this fig is not available in 2.0beta2 programming guide)

Thanks for the tip External Image