suppose kernel=nullKernel declares some shared memory, here array u at line 5.
AND then for each block the shared memory is loaded (lines 10-15)
Then the helper device function trashKernel() is called at line 18.
When we arrive in trashKernelFn, we discover that regardless of the block
we are in, the address of the shared memory is the same.
I was hoping each thread block would have its own private static shared memory u
pointer u passed in. Cause the blocks all seem to have the same value AND
the contents of U are corrupt inside trashKernelFn. The are ok at line 14
where they are set, differently for every block.
I was counting on each separate instanciation of nullKernel by thread block
would have its separate distinct allocation at line 5.
1 global void nullKernel(struct Pdef *P)
3 int bid = blockIdx.x;
4 int tid = threadIdx.x;
5 shared struct PbyM u[nMonths];
8 if( tid == 0)
10 struct Pdef *pb = P+bid;
11 struct PbyM pm = pb->M; / load the input state */
12 int c;
13 for ( c=0;c<nMonths;c++)
14 u[c] = pm[c];
23 host device void trashKernelFn( struct PbyM *u)
25 int bid = blockIdx.x;
26 int tid = threadIdx.x;
27 printf(%s:%ld bid=%ld tid=%ld u=%p\n", FILE,LINE, bid,tid,u);
Of course the idea is that every distinct thread block can share the same
array u filled with distinct values for that thread block, and all the
threads in that thread block can share it.
The print at line 27 shows that each block does NOT a distinct shared memory
address as I had hoped it would.
This declaration of the shared mem early inside nullKernel is so that
I can have a clean non gpu specific declarartion of trashKernelFn
that will work for gpu or cpu.
Even with formally %p print of the passed shared address of u, I was hoping that
things would be sane and the context within a thread block would make things work out but
things are corrupt in u within the passed function regarding the passed in shared memory address.
The thread blocks inside trashKernelFn seem to have sane u contents except for the first block=0.
They all share that same address, but I was hoping that was an artifact of the thread blocks being really
separate, and t he same %p address would not be significant. Crazy numbers for block 0 makes be think I
am fundamentally mistaken on that.