Hi, I’m not familiar with the underlying design of memory accessing hardware in A100.
I’m curious that whether the theoretical global memory access bandwidth (i.e., 1555GB/s) can be achieved by using only one SM?
And what’s the memory bandwidth to be like for each SM when multiple SMs are involved during kernel execution? Will they have no contentions among each other, or they will share the same bus for memory queries (indicating memory bandwidth can be preempted by other SMs)?