Newnish doubt on cuda

Suppose i have an array say data[3] that is in the global memory , and needs to be accessed by all blocks simultaneously. Do i have to create a separate data[3] for each block, or will the blocks be able to read it without any problem just having 1 copy.

Every block will be able to read it. (If the array is small, read-only and every thread will read the same element at the same time, you should also look at constant memory.)

No, all blocks can access the same global memory.

But when all the blocks try to access at the same time, won’t it create an overhead.

That’s true, though the effect depends on how much data there is. If the rest of the kernel takes a long time, and you only read data once, then it won’t be a big deal.

If you read data over and over again in the same thread, you should definitely look at constant memory.

Threadblocks aren’t synchronized at instruction level. So, it’s quite unlikely that they all will request the same data at once. I would really not worry about this.

Paulius