Could someone enlighten me or post an example as to how you might use the --ptxas memory usage output to guide optimization efforts?
I see it lists register usage, shared memory usage, constant memory usage for all of my kernels. Perhaps not coincidentally, my slowest kernel has the highest register count. How do I know how many registers is too many? What is the significance of smem usages of “32+16”, etc.?
Perhaps a related question: some of my kernels are things like:
void foo(const Complex * ptrA, float * ptrB);
foo() is called hundreds of times, and the arguments are always the same each call. Does this mean it’s a good idea to move ptrA and ptrB to device constant memory, and omit the arguments? Would this reduce my register usage (at the expense of more cmem being used)? Could this improve performance?