Basically I want to access (read/write) a local (per thread) variable from anywhere inside of my kernal, e.g. both from inside the global function as well as from any device functions it happens to call. Basically, I want to have a local variable with global scope. The variable in question is the per-thread seed for my random number generator.
I suppose I could do something like
__device__ float foo(int*seed,...)
{
...
}
called like such:
foo(&seed,...)
but the pointer referencing/dereferencing disturbs me. Taken literally, it would force a write to addressable local memory (read:slow) followed by a read from the same location (read:slow+write dependent latency). Is the compiler smart enough to optimize the referencing out?
Short answer no, the compiler is not smart enough. Just compile with --keep-ptx. Humorous answer, calling NVCC a compiler is like calling an AMC Gremlin a monster truck. Think of it like an assembler.
Long answer when you write your code and be explicit with regards to temporary calculations, variable types, pointers, etc. For example:
do:
i = (j * 4) % 6;
a[i] = b[j];
a[i + 32] = b[j + 32];
do not:
a[(j * 4) % 6] = b[j];
a[((j * 4) % 6) + 32] = b[j];
do:
for(i = 0; i < 32; i++)
a[i] = b[i];
do not (unrolls but does explicit increments at each step of the unroll):
for(i = 0; i < 32; i++)
*(a++) = *(b++);
As it doesn’t appear to use tri-graphs or any form of common statement collapsing, loop unrolling is somewhat lacking (example #2) and many other things found in compilers in the last decade. Play around with various code and --keep-ptx, then you’ll get a hang of how to tweak performance.
The problem with this is that I want to continue reseeding the RNG each time I call it. Otherwise, it won’t be very random! Thus, seed needs to retain whatever changes happened to it in foo() since it will certainly be used again later.
Thankfully, in this case, foo() is heavyweight enough that the pointer reference is completely hidden.
I suppose that I could make foo() a macro, though. The only real reason I have it as a function is for code readability.