Hi Everyone,
When I use shared memory within a kernel, is the shared memory variable created for each block of threads?
For example:
__global__ void SOMEKERNEL(double *a, double *b)
{
__shared__ double c[512];
}
Does each block get c[512]? So if I had 10 blocks, would there be 10 copies of c[512]? Each of which I could load with data from global memory…? I’m trying to implement the reduction code from the SDK in one of my kernels, but my program is crashing and I want to make sure I’ve understood how using shared memory works.
Thanks in advance to any help!