Performance penalty of using threadIdx.x

What is the performance penalty of using threadIdx.x and other similar system keywords/values. Consider a scenario where I am using this threadIdx.x for a considerable number of times. What is the better option -

  1. Declare a variable and copy the value of the threadIdx.x and use this variable everywhere. OR
  2. Use threadIdx.x as it is everywhere.

I am not sure if somebody has already posted such query already apologies if so.

Thank you.

– Mandar Gurav

define in kernel:

int myid=threadIdx.x;

this means that myid is put in registers, which means that it can be read instantaneous and is local to each thread.

It doesn’t matter how you access threadIdx.x in your code. The optimizing compiler will almost certainly do the smart thing, so write your code in a way that is easy to read.

In case you are curious what happens at a low level:

  • When you access threadIdx.x in your C code, nvcc compiles that to an access to %tid.x, which is a "special register."
  • Depending on what you do with threadIdx.x, a type conversion (a "cvt" instruction) might be required, but the compiler is smart enough to not do it twice if possible.
  • There is basically no correlation between hardware registers and C variables in any modern compiler. You can affect things somewhat using keywords like "volatile", but it is usually counterproductive.