hello all, I’m not very clear about it, as programming guider say, every cuda block can use at most 16K shared memory (at capability 1.1 or early), is that to say, whatever shared memory I used under 16K, the number of blocks that can be executed concurrently is equal to the number of SM on cuda device?
where can i find these information?