Hi! I am new to CUDA. What decides the maximum number of thread per block for a gpu? Why hasn’t there been any increase in the maximum limit with increasing compute capability?
Only a GPU architect could answer that authoritatively, and they don’t normally frequent this forum.
However, practical programming experience with CUDA would seem to indicate that allowing larger thread blocks does not provide a significant benefit. At the same time, increasing the maximum block size would lead to an increase in hardware complexity, while the overarching goal of GPU hardware design is to minimize the complexity of processing elements while providing more of them. In other words, with high likelihood, the trade-offs simply do not justify a larger maximum block size.
From a performance perspective the GPU hardware is usually utilized most efficiently (and performance is maximized) when using smaller granularity, i.e. medium-sized blocks, and simply using enough of them to cover the data. A reasonable starting point when designing CUDA code is to plan for 128 to 256 threads per block, then adjusting this up or down only where use cases require it.