Odd memory usage for pointer swap.

I’m not sure why local memory is being used for this pointer swap. In my kernel I have the following arrays and pointers declared:

[codebox]

 int H_left2[4], H_left[4];

 int *h_l1 = H_left;

 int *h_l2 = H_left2;

 int *tmp;

[/codebox]

The arrays are initialized to 0 and inside of the main loop of my kernel, values are read from h_l1 and written to h_l2. At the very end of the loop I perform a swap:

[codebox]

 tmp = h_l1;

 h_l1 = h_l2;

 h_l2 = tmp;

[/codebox]

When compiling with -ptaxs-options=v, I get the following:

ptxas info : Used 31 registers, 64+0 bytes lmem, 9296+16 bytes smem, 12 bytes cmem[1]

When I comment out the swap, so just the last 3 lines – I’m still reading and writing to h_l1 and h_l2 as before, and recompile I get

ptxas info : Used 25 registers, 32+0 bytes lmem, 9296+16 bytes smem, 16 bytes cmem[1]

So, my question is what is really going on here? When I comment out the swap, the only variable which isn’t being used is *tmp and if the compiler can determine it doesn’t need to reference H_left and H_left2 via the pointers if no swap is occurring, it doesn’t need space for them either. So I can see a decrease in the register usage but why the decrease in local memory?

My interpretation of lmem in this context is memory which is declared local to a kernel but for one reason or another is being stored in global storage. I cannot figure out why the swap is using more local memory than when commented out unless the compiler is forcing a deep copy of the arrays and is using global storage for the copy. Can anyone clarify what is really going on here?

Registers aren’t addressable so the only thing the compiler can do is use local memory. What about using shared memory instead?

Ahhh :) Thanks! That explains it. We were trying to keep these values in registers. We may try an alternative to shared memory, perhaps using a 2d register array and switching an integer value for the columns to access at a given iteration.

Just keep in mind that the value of the integer will have to be known at compile time (perhaps via loop unrolling if the compiler doesn’t do it automatically), otherwise you’ll have the exact same problem with local memory usage.

Thanks, will keep that in mind. Your replies have cured more than one headache :)