blockDim question Where is it stored?

I have a for loop inside my kernel that, in an attempt to make things dynamic, is iterated based on blockDim. Obviously, if blockDim is stored in global (or local) memory, I would be better of moving this value into a register or shared memory, since it will be accessed over and over. Where are the values for blockDim (and the others, such as threadIdx, etc.) stored?

Another question I’ve been asking myself (and not quite found a definitive answer on) is where are variables declared within my kernel stored? For instance, in the following code, where is ‘a’ stored?

static __global__ void myKernel()

{

    int a = 5;

    ...

}

My instinct says it’s stored in local memory, which is actually global memory. This is bad if the value is being accessed over and over, correct? It would be much better if these values could be stored in the registers, but how exactly is that accomplished? And is it possible to do that without sharing the variable between all the threads in a block?

Thanks!

Bryan

edit: had some syntax wrong… shared memory on the brain I guess!

Shared AFAIK, along with the function parameters

From the Programmer’s Guide:

Appears that the parameters are passed and accessed using shared memory, which would be beneficial to performance, right?

Yep! I can vaguely imagine that hacking block mgmt data into constant memory might be useful in some weird edge cases, but I think the parameters are as well configured as they can be in 99% of cases. If I’m not mistaken - which has happened before External Media

I found the answer to my second question. From the Programmer’s Guide (guess I missed it before):

However, I’m still unsure of where blockDim, threadIdx, etc. are stored…

They are stored in special read-only registers. Check the PTX guide for more info.

threadIdx is (at least I am certain on threadIdx.x). blockIdx, blockDim and gridDim are in shared memory, actually.

Thanks a ton! I guess I didn’t see the PTX documentation.

So using the values explicitly is actually a better option than storing it into a variable declared within the kernel itself, because a variable declared in the kernel may or may not be stored in a register, whereas these constants always are.

Interesting. This is, of course, unless one would declare these variables as constant, which would then be stored in constant memory.

Thanks again!

I guess that would make sense, since those are the same value in each thread within a block. That allows you to store the value in shared memory rather than taking up a register in each thread.

The special registers are:

%tid (Thread ID within a CTA)
%ntid (Number of thread IDs per CTA)
%ctaid (CTA id within a grid)
%nctaid (Number of CTA ids per grid)
%gridid (Grid ID)
%clock (A predefined, read-only 32-bit unsigned cycle counter)