I know that certain compilation flags can be set to report shared memory and register usage per kernel. Along with the CUDA lore, this information is used (via #defines) to compute thread and block dimensions and allocation at run time in my code.
- Can the information for kernel shared memory and register usage be obtained during runtime so that the dimensions can be calculated dynamically?
- Is this a customer part of development flow for GPU / CUDA ?
Basically I am trying to adapt my code to leverage whatever CUDA-enabled device is available on a given machine effectively and as the code evolves, I don’t want to constantly reinsert these descriptions. I have figured out how to compute shared memory usage to the byte, but am not sure how registers are consumed.