According to the occupancy calculator, the total kernel register usage is defined as:
Registers = CEILING(CEILING(MyWarpsPerBlock,myWarpAllocationGranularity)*MyRegCount*32,myAllocationSize)
I’d have thought it should simply be:
Registers = MyWarpsPerBlock32MyRegCount
As such, I have 2 questions:
1. What is ‘warp allocation granularity’ and why is it 2?
2. What is ‘allocation size’? Why is the register usage need to be a multiple of 512?