Hi, I have been doing a little experimentation with workgroup size and I have come to this unexpected result: The register ratio was set to 1 with 32768 registers occupied, although if I am issuing 4 workgroups of size 160 with each workitem using 38 registers, the total register usage should have been 24320, not 32768. In fact, I was assuming that I can issue 5 workgroups at a time (then the total register usage would have been 30400, still within limits and using one more warp than 3 x 256 workitems). Does anyone have an explanation for these weird results?
Kernel details : Grid size: 5284 x 1, Block size: 160 x 1 x 1
Register Ratio = 1 ( 32768 / 32768 ) [38 registers per thread]
Shared Memory Ratio = 0.75 ( 36864 / 49152 ) [8976 bytes per Block]
Active Blocks per SM = 4 : 8
Active threads per SM = 640 : 1536
Occupancy = 0.416667 ( 20 / 48 )
Achieved occupancy = 0.416667 (on 16 SMs)
Occupancy limiting factor = Block-Size
In fact, now when I have looked on the 256 work-item example, the counts do not fit as well: 256 x 3 x 38 = 29184, not 30720. Are the 1536 registers on holiday?
Kernel details : Grid size: 5408 x 1, Block size: 256 x 1 x 1
Register Ratio = 0.9375 ( 30720 / 32768 ) [38 registers per thread]
Shared Memory Ratio = 0.90625 ( 44544 / 49152 ) [14352 bytes per Block]
Active Blocks per SM = 3 : 8
Active threads per SM = 768 : 1536
Occupancy = 0.5 ( 24 / 48 )
Achieved occupancy = 0.5 (on 16 SMs)
Occupancy limiting factor = Registers , Shared-memory
Thanks