I’m trying to learn how the GPU splits the computation between the multiprocessors.
I wrote a program with two kernels which run concurrently and the Visual Profiler showed me the following results:
the kernels indeed runs concurrently (figure 1)
the kernel ker_1 SM’s distribution in figure 2:
the kernel ker_2 SM’s distribution in figure 3:
Meaning that together the kernels occupy more then 100% of the GPU SMs, which is impossible because the kernels runs concurrently.
Furthermore, ker_2 has 32 blocks of 265 threads per block. According to the profiler every thread uses 38 registers so ker_2 is supposed to occupy about 50% of the hardware resources and not 100% as figure 3 claims (see figure 4)
So how can I see the real distribution between the SMs?