Hello,
there are examples in the docs that show that I can also declare a dynamically shared memory globally so that all CUDA kernels can access it:
shared float Data1;
The size of the shared memory is derived from the kernel configuration. But what happens if I have two lines of the above shared memory
shared float Data1;
shared float Data2;
The compiler does of course not complain. But what is the behavior here?
Also, what happens if I declare a global shared memory and two kernels are executed in parallel with different shared memory configurations?
Thanks for clarifying this?
Martin
You may want to read the documentation:
[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared[/url]
Dynamically allocated shared memory must be used with the extern keyword, so your examples are not syntactically correct.
In answer to your question about having two (or more) such definitions, the documentation states:
“All variables declared in this fashion, start at the same address in memory,”
so the behavior is that Data1 and Data2 will point to the same location.
Shared memory cannot be a global definition. It can only have scope within a specific kernel definition. If you define a shared memory location at global scope it is as if you wrote that definition in each kernel in the compilation unit. The behavior therefore would be sorted out according to the rules already given in the programming guide
Hi txbob, thanks for the info
You are right. I have already read it. But often it is not quite obvious in the docs how it really works and how to interpret the text that is written. Sometimes it helps to get a kick in the right direction.
Did you also think about the question what happens if two kernels are executed in parallel with a different shared memory configuration?
Thanks
Martin
Yes, and I responded in my posting starting with:
“Shared memory cannot be a global definition…”
Two kernels, running in parallel, can have a different shared memory configuration. shared memory is by definition within the local scope of a given kernel definition, so there is no connection between the shared memory of different kernels.
If kernel A requires 4KB/threadblock, and kernel B requires 6KB/threadblock, there is no conflict or confusion that I can see. Your original statement in this regard mentioned “global” shared memory which is an invalid concept.