I wrote a small program like this in device-emulation mode.
device shared int num;
printf(“GPU Hello World! Num = %d\n”, num);
And invoked it as
grid.x = 2;
grid.y = 2;
block.x = 16;
binomial_kernel <<< block , grid , 16 >>>();
When I run the app, I see that the numbers from 1 to 64 are printed.
Now, Shared Memory is common/shared only to a BLOCK, Am I right?
How is it that all the blocks are able to see the same variable in device emulation mode ???
I would expect prints of 1 to 16 , 4 times.
Is my understanding of shared memory wrong?
Ok, If multiple blocks are scheduled in same multiprocessor-- Do they share the shared-mem variables??? – I would NOT think so.
Thanks for any inputs.