Local memory / register bank

Hello.

I’m not sure but I guess heard somewhere that there’s a way to force static ‘local’ vectors to be kept in register instead of global memory?

Does this possibility really exist?

John.

Hello.

I’m not sure but I guess heard somewhere that there’s a way to force static ‘local’ vectors to be kept in register instead of global memory?

Does this possibility really exist?

John.

The trick I remember being mentioned in a thread a few months ago was to put the keyword volatile in front of variables that you dont want to remain in registers

A search for volatile in the forums migth get you the earlier discussion

The trick I remember being mentioned in a thread a few months ago was to put the keyword volatile in front of variables that you dont want to remain in registers

A search for volatile in the forums migth get you the earlier discussion

If you have a local array and you index into it with variable (at runtime) indices, then the local array will be in device memory. It will be cached on a Fermi device, but it’s still device memory.

If you index into a local array using indices known at compile time, then the compiler might hold them in registers. It’s up to the compiler based on its own register budget (especially if theres a -maxreg limit.)
Similarly, non-array local variables also might be held in registers depending on the compiler’s mood.
You can examine the ptx to see if a variable is in local memory or registers.

If you have a local array and you index into it with variable (at runtime) indices, then the local array will be in device memory. It will be cached on a Fermi device, but it’s still device memory.

If you index into a local array using indices known at compile time, then the compiler might hold them in registers. It’s up to the compiler based on its own register budget (especially if theres a -maxreg limit.)
Similarly, non-array local variables also might be held in registers depending on the compiler’s mood.
You can examine the ptx to see if a variable is in local memory or registers.

is cudacc smart enough to unroll the following loop - or something like that, (considering that there are enough resources per thread/block:

#define MAX 10

for (int i=0; i<MAX; i++) {
vet[i] = …
}

?

is cudacc smart enough to unroll the following loop - or something like that, (considering that there are enough resources per thread/block:

#define MAX 10

for (int i=0; i<MAX; i++) {
vet[i] = …
}

?

#pragma unroll

#pragma unroll