Register Usage of my program To optimize scheduling of my program

How can I find the register usage of my kernel??

I think this will help in figuring out the best block and grid dimensions for running this kernel.

My line of thought is like this:

  1. The number of threads (irrespective of the blocks) that one can run on a multi-processor depends on the register usage of the kernel.
    a) For an MP that has 8192 registers, it will take 512 threads using 16 registers
    each to saturate the full bandwidth.
    B) I would ideally like to place at least 2 blocks in this MP.
    So, having 256 threads per block would be ideal in this scenario.
    c) I would also know that 512 threads corresponds to 16 warps. The remaining
    8 warps of the MP are un-used. They JUST CANNOT be used and they ARE
    being WASTED.

At this point, I can think of what I can do to optimize my kernel so that I can stuff in more threads inside the multi-processor.

This is why I would like to know the register usage of the kernel and how it can be optimized to get the maximum concurrency.


See the sticky

Don’t worry too much about the wasted warps. You can easily max out the performance of your kernel with only 50% warp occupancy.

Nice. SO, How do I find the “registers per thread” ???

Yes, I found that out. Its there in the same CUDA occupany page. Thanks

Did you check out how to use occupancy calculator,???

I think you can find out there.

Yes, I did Mr.Bangalore and I did find how to find my register usage there in that post from Mark Harris. That is what I had posted above.

For using the CUDA occupancy calculator, you first need to know your register usage. The XLS sheet does NOT do anything fancy. You need to feed in the right data for it.

Harris’ post basically says that you need to use “-cubin” option to generate the cubin file that has the info about your program. Just see the “registers=xxx” line and you would know how many registers you r using.

Alternatively, my own stuff — Do a -keep option and count the number of registers from the generated PTX assembly file. :D

dlmeetei - (Dalai Lama in a meeting?? huh…) ,

Me too from Bangalore man. Nice to know one another guy out there.