Pass shared mem address to helper function .

LHickey · April 4, 2011, 6:28pm

suppose kernel=nullKernel declares some shared memory, here array u at line 5.
AND then for each block the shared memory is loaded (lines 10-15)
Then the helper device function trashKernel() is called at line 18.

When we arrive in trashKernelFn, we discover that regardless of the block
we are in, the address of the shared memory is the same.
I was hoping each thread block would have its own private static shared memory u
pointer u passed in. Cause the blocks all seem to have the same value AND
the contents of U are corrupt inside trashKernelFn. The are ok at line 14
where they are set, differently for every block.

I was counting on each separate instanciation of nullKernel by thread block
would have its separate distinct allocation at line 5.

1 global void nullKernel(struct Pdef *P)
2 {
3 int bid = blockIdx.x;
4 int tid = threadIdx.x;
5 shared struct PbyM u[nMonths];
6
7
8 if( tid == 0)
9 {
10 struct Pdef *pb = P+bid;
11 struct PbyM pm = pb->M; / load the input state */
12 int c;
13 for ( c=0;c<nMonths;c++)
14 u[c] = pm[c];
15 }
16
17 __syncthreads();
18 trashKernelFn(u);
19 }
20
21
22
23 host device void trashKernelFn( struct PbyM *u)
24 {
25 int bid = blockIdx.x;
26 int tid = threadIdx.x;
27 printf(%s:%ld bid=%ld tid=%ld u=%p\n", FILE,LINE, bid,tid,u);
28 };

Of course the idea is that every distinct thread block can share the same
array u filled with distinct values for that thread block, and all the
threads in that thread block can share it.

The print at line 27 shows that each block does NOT a distinct shared memory
address as I had hoped it would.

This declaration of the shared mem early inside nullKernel is so that
I can have a clean non gpu specific declarartion of trashKernelFn
that will work for gpu or cpu.

Even with formally %p print of the passed shared address of u, I was hoping that
things would be sane and the context within a thread block would make things work out but
things are corrupt in u within the passed function regarding the passed in shared memory address.

The thread blocks inside trashKernelFn seem to have sane u contents except for the first block=0.
They all share that same address, but I was hoping that was an artifact of the thread blocks being really
separate, and t he same %p address would not be significant. Crazy numbers for block 0 makes be think I
am fundamentally mistaken on that.

hyqneuron · April 5, 2011, 4:22am

Very confused by your typos :(

10 struct Pdef *pb = P+bid * sizeof(Pdef); may give you the right u in trash

I haven’t checked any ptx that involves the use of shared memory before. Though I guess shared memory has its own memory instructions and a non-unified address space between different MPs. So what’s the problem with their addresses being the same f the same address refer to different things on each MP?

hyqneuron · April 5, 2011, 4:29am

If I were you I would do this:
if tid < nMonth
u[tid]=P[bid]->M[tid]
I am on a mobile device so did not type out things fully

tera · April 5, 2011, 4:46am

It might help to reverse the order of [font=“Courier New”]nullKernel[/font] and [font=“Courier New”]trashKernelFn[/font] because as the code is, [font=“Courier New”]trashKernelFn[/font] is called before its declaration.

LHickey · April 5, 2011, 3:54pm

The stuff in P is stacked up so that the first thread block gets the first P record, the second gets the second P rec… All the threads in the first thread block use the first P rec. All the threads in the second thread block use the second. etc. So Rather than make the threads in a given block read from global memory indexed by block, I thought it would be good to use shared memory for each individual block to hold the policy stuff. The policy stuff itself is an array that varies from 0 to nMonths, so I want to move this into shared mem.

Whats going on- the real question, is independent of issues of how the shared memory is loaded though. If you take addresses of shared memory using %p in nullKernel (u), then you pass it to a called device function
trashKernelFn, is the address u passed in supposed to be different when the block id of the caller is different?
I thought at because the shared memory was distinct in each instance of nullKernel, that the address of u would be different too. when passed to the worker trashKErnelFn, and we look at u, the block id of trashKernelFn can change but the address of the passed in u does not. This might be a case of address of u being the same does not matter because the address space is different for different thread blocks.