I experience a problem using shared memory on my 470 GTX (Cuda 3.2).
When I use shared memory allocated that way :
__shared__ float2 coor_shared[MAX_THREADS];
everything works great. But strangely I have “NAN” and very long numbers (as if I access unallocated memory) when I use dynamic allocated shared memory :
I experience a problem using shared memory on my 470 GTX (Cuda 3.2).
When I use shared memory allocated that way :
__shared__ float2 coor_shared[MAX_THREADS];
everything works great. But strangely I have “NAN” and very long numbers (as if I access unallocated memory) when I use dynamic allocated shared memory :
You can only have a single [font=“Courier New”]extern shared[/font] array.
If you need more, allocate one array large enough for all data and manually segment it into multiple non-overlapping regions. See Appendix B.2.3 of the Programming Guide for the technique to do this.
You can only have a single [font=“Courier New”]extern shared[/font] array.
If you need more, allocate one array large enough for all data and manually segment it into multiple non-overlapping regions. See Appendix B.2.3 of the Programming Guide for the technique to do this.
How are you defining all three array pointers in the dynamic case? You need to offset them all from each other. Your quoted line of code only shows one definition, not all three.
Finally, I know it’s not your question, but why are you using shared memory at all? Your code snippit shows that every thread only accesses its own indices. It’s both easier code-wise and faster execution-wise to simply leave them as per-thread locals, left in fast registers.
How are you defining all three array pointers in the dynamic case? You need to offset them all from each other. Your quoted line of code only shows one definition, not all three.
Finally, I know it’s not your question, but why are you using shared memory at all? Your code snippit shows that every thread only accesses its own indices. It’s both easier code-wise and faster execution-wise to simply leave them as per-thread locals, left in fast registers.