Is there any tool that can show the working load of the GPU multiprocessors?
I know that I can see the occupancy in the Cuda Visual Profiler, but this value is on warp basis and this is not exactly what I would like to see.
It could be possible, for example that one thread in a block needs a lot more time thean the others. In this case the multiprocessor would have a lot of idle time and that is the info I would like to see.
Thanks for your help
Chris