I’m trying to understand how the “cta launched” counter in the CUDA Visual Profiler works.
I wrote a small program with a block size of 16x16 threads.
My grid has a size of 10x10 (= 100 blocks).
According to the specifications my graphics card (GeForce 8600M GT) has 4 multiprocessors.
When I’m using the Profiler.app (v1.1.7) the column “cta launched” shows “50”.
I know that this counter only reflects the activity of one MP but 100 blocks divided by 4 MPs is “25” not “50”.