Shared memory : shared access

Is that possible to access the same portion of shared memory from all threads on particular multiprocessor?

Actually I just need more constant space, so I am wondering if it is possible to use shared memory in a similar way.

I’m not sure, but i think that shared mem has whats called broadcast mode. download the program guide from the docs page and look in there.

http://www.nvidia.com/object/cuda_develop.html

I believe that shared memory is local to a block, and a multiprocessor has between 1 and 8 blocks running at any given time, each with its own “piece” of the shared memory (the 16k gets divvied up among the blocks, so if each block wants 16k of shared memory, then the multiprocessor is forced to run them one at a time). However, there is (to my knowledge) no mechanism for a block to read from shared memory not exclusively allocated to it.

So my best answer is this: “sort of”. If the multiprocessor is forced to run one block at a time by setting the cache to 16k or maxing out the register count (not recommended…that means global writes and other high-latency operations can’t be hidden by other blocks, among other bad things), then yes, all threads on the multiprocessor, being in the same block, can access all the shared memory on the multiprocessor that has been allocated to that block. However, if there are more than one blocks running simultaneously, they cannot cross-share shared memory. (does that make sense?)

BTW: “Broadcast” mode is a device-level speedup for reducing bank conflicts when threads from within the same block all try to read from the same location in shared memory at the same time. It does not (to my knowledge) allow shared memory to cross-talk between blocks.

Good luck!

Ben

There is no mechanism for shared memory across blocks. One potential solution: run bigger blocks. (Blocks of blocks? That seems like a bad idea for a lot of reasons…)

If you need an emulation of constant memory, you might try textures. They get cached AFAIR so they’re a step better than global memory in that aspect. They are also readable across the whole grid.