I have a for loop inside my kernel that, in an attempt to make things dynamic, is iterated based on blockDim. Obviously, if blockDim is stored in global (or local) memory, I would be better of moving this value into a register or shared memory, since it will be accessed over and over. Where are the values for blockDim (and the others, such as threadIdx, etc.) stored?
Another question I’ve been asking myself (and not quite found a definitive answer on) is where are variables declared within my kernel stored? For instance, in the following code, where is ‘a’ stored?
static __global__ void myKernel()
{
int a = 5;
...
}
My instinct says it’s stored in local memory, which is actually global memory. This is bad if the value is being accessed over and over, correct? It would be much better if these values could be stored in the registers, but how exactly is that accomplished? And is it possible to do that without sharing the variable between all the threads in a block?
Thanks!
Bryan
edit: had some syntax wrong… shared memory on the brain I guess!
Yep! I can vaguely imagine that hacking block mgmt data into constant memory might be useful in some weird edge cases, but I think the parameters are as well configured as they can be in 99% of cases. If I’m not mistaken - which has happened before External Media
Thanks a ton! I guess I didn’t see the PTX documentation.
So using the values explicitly is actually a better option than storing it into a variable declared within the kernel itself, because a variable declared in the kernel may or may not be stored in a register, whereas these constants always are.
Interesting. This is, of course, unless one would declare these variables as constant, which would then be stored in constant memory.
I guess that would make sense, since those are the same value in each thread within a block. That allows you to store the value in shared memory rather than taking up a register in each thread.
%tid (Thread ID within a CTA)
%ntid (Number of thread IDs per CTA)
%ctaid (CTA id within a grid)
%nctaid (Number of CTA ids per grid)
%gridid (Grid ID)
%clock (A predefined, read-only 32-bit unsigned cycle counter)