I accelerated an image processing application but I would like to know how many threads are executed at the same time.
I’m currently developing on the SoC Tegra K1. TK1 has 1 SMX which contain 192 cuda core.
I’m processing Image with 1024x1024 dimension. I decided to create 1024 blocks with 1024 threads in each block to have Number of pixels = Number of threads
-> gridSize(32,32); //1024 blocks
-> blockSize(32,32); //1024 threads
I also know that we can launch only 1 block at the same time because there is only 1 SMX. The GPU won’t execute 1024 threads at the same time because it owns only 192 core. Is the GPU executing 192 threads at the same time (while others are waiting) ?