A novice question here, but when I declare variables within a global/device block, e.g.
global void myfunc()
{
int me;
float and_me;
char etc;
}
in which memory are they stored?
If it’s in the shared memory or some other cache, then what would happen if you defined more than 16 KB worth of local variables? Presumably it would be too slow to put anything in global memory…
Hopefully, a more experienced person than I will give you a better answer. But until that happens… my understanding is that the compiler will first attempt to store them in on-chip registers, which makes fetches extremely fast (as fast as it gets!). But if the specified maximum register count is reached, it will then store them in global memory, which is tremendously slower. I don’t think the compiler will ever put them in shared memory.
Hopefully, a more experienced person than I will give you a better answer. But until that happens… my understanding is that the compiler will first attempt to store them in on-chip registers, which makes fetches extremely fast (as fast as it gets!). But if the specified maximum register count is reached, it will then store them in global memory, which is tremendously slower. I don’t think the compiler will ever put them in shared memory.
In general, it’ll be in registers. However, nvcc often feels free not to map each local variable with one register (it rearranges your code and multiple variables can share registers). You may want to add the “volatile” qualifier to ensure the one to one mapping (sometimes useful to keep control on things).
In the case of many registers are used, local variables go into “local memory” (and never in shared memory), which is actually another name of global memory in this particular case (google “register spilling”)
The register spilling is done at compile time, given the maximum register/kernel you allow (via the --maxrregcount option), or when nvcc thinks your’re using too much registers.
You can also use shared memory for your variables, but this can be done only manually. This is sometimes a good strategy when youre full of shared memory and out of registers (eg. to achieve better occuppancy)
Look at the “–ptxas-options -v” option, it’ll tell you the effective memory usage of your kernels. (smem, lmem, cmem, registers)
In general, it’ll be in registers. However, nvcc often feels free not to map each local variable with one register (it rearranges your code and multiple variables can share registers). You may want to add the “volatile” qualifier to ensure the one to one mapping (sometimes useful to keep control on things).
In the case of many registers are used, local variables go into “local memory” (and never in shared memory), which is actually another name of global memory in this particular case (google “register spilling”)
The register spilling is done at compile time, given the maximum register/kernel you allow (via the --maxrregcount option), or when nvcc thinks your’re using too much registers.
You can also use shared memory for your variables, but this can be done only manually. This is sometimes a good strategy when youre full of shared memory and out of registers (eg. to achieve better occuppancy)
Look at the “–ptxas-options -v” option, it’ll tell you the effective memory usage of your kernels. (smem, lmem, cmem, registers)