I’ve run a simple test cuda program with <<<256,256>>> execution configuration on GTX1080 ( 20 SM, 128 core per SM)
and Nsight shows that 160/256 blocks are “running”, which means 160256 threads are running on 12820 cuda cores.
The spec sheet shows that the max number of threads per one SM 2048. so the result is correct as 204820 = 160256.
So, I can guess, 2048/128 = 20 threads are “running” on one cuda core and I think “running” means are actually “active or occupy a cuda core”.
am I correct??