Quick Question about __shared__ variables

Hey all,

CUDA newcomer with a quick question about the shared directive. I am running some code that has 4 blocks, 64 threads each. Within each block, I want to be able to easily share some data, hence I want to use a shared variable. My question is this: Lets say in my kernel i declare

shared float s_my_shared_array[64];

Then within a block, I can have each of my threads (0-63) index and update this array using their TID as the indexer. However, just to make sure, this s_my_shared_array is only accessed/modified from WITHIN a block correct? That is, my block 0’s array will not interfere with my block 1 array, correct? I do not need to make my array of size [4 * 64] and index it by block ID as well as TID, right?

I’m sure this is a stupid question, I just want to make sure! Thanks!

Yeah, its only per block. That is why the number of active blocks in the multiprocessoris is dependent on the amount of shared memory used in each block (among other things).