Performance penalty for using local variable and point to global data

Hi All,

I have a question regarding the following :

Lets say i would do something like this inside my kernel

int *ptr = &global[tid];

ptr[0] = 23;
ptr[1] = 0;

instead of just

global[tid] = 23;
global[tid+1] = 0

My question is :
Would the int *ptr = &global[tid] considered to be an memory read or is it possible for the compiler optimize the two code parts be the same?

They’re going to be the same.

If you aren’t sure if the compiler is doing what you want then it’s best to either inspect the output directly or write a test for at least partial confirmation.

#define KERNEL_QUALIFIERS   extern "C" __global__

KERNEL_QUALIFIERS
void AAA(int* const out)
{
  int* const ptr = out + threadIdx.x;

  ptr[0] = 23;
  ptr[1] = 0;
}

KERNEL_QUALIFIERS
void BBB(int* const out)
{
  out[threadIdx.x  ] = 23;
  out[threadIdx.x+1] = 0;
}

Thank you allanmac!