How can I find the register usage of my kernel??
I think this will help in figuring out the best block and grid dimensions for running this kernel.
My line of thought is like this:
- The number of threads (irrespective of the blocks) that one can run on a multi-processor depends on the register usage of the kernel.
a) For an MP that has 8192 registers, it will take 512 threads using 16 registers
each to saturate the full bandwidth.
B) I would ideally like to place at least 2 blocks in this MP.
So, having 256 threads per block would be ideal in this scenario.
c) I would also know that 512 threads corresponds to 16 warps. The remaining
8 warps of the MP are un-used. They JUST CANNOT be used and they ARE
At this point, I can think of what I can do to optimize my kernel so that I can stuff in more threads inside the multi-processor.
This is why I would like to know the register usage of the kernel and how it can be optimized to get the maximum concurrency.