using global memory as shared ?

Hello everyone,

I created a kernel that evidently requires shared memory that doesn’t exist. I was hoping that the required shared would ‘spill’ into the global memory, the same way registers spill in shared memory. I guess it was wrong thinking…

The point is that I need arrays that have the properties of shared memory (aka the sharing within the block and having the lifetime of the block) without necessarily having the fast access the actual shared memory has.

On the other hand, to replace these arrays with global memory directly from the ‘outside’, means that I would need to reserve memory quantities proportionate to the size of the whole grid, to account for all the blocks that will run (which would be way too much).

I would really appreciate any ideas on the issue, and please if you see anything wrong with the above way of thinking don’t hesitate to correct.

Ok a few things though I expect you have found them already.

registers spill into local memory not shared and local is actually taken from global memory (which makes sense as developers need to know how much shared memory they are guaranteed to have)

There is a nice example of what you want to do called
Per Thread Block Allocation
at B.17.3.2 in CUDA_C_Programming_Guide.pdf ( 4.2 version and maybe earlier ones)


Thank you very much for the reply. I guess the title of the example should have ringed a bell. well…