Cost of accessing built-in variables

Hello everyone!

I am sorry if this is documented somewhere, but I did not find it:

Accessing global memory costs between 400 and 600 cycles, accessing shared memory costs 4 cycles, registers add no overhead to the computation cost. Where do the built-in variables like threadIdx, blockDim, etc. reside? What is the access cost?



It looks like this is discussed in a few other threads.…mp;#entry314099

They come from shared memory

Thank you!