Preemption and occupancy values

Hello, is there any way to get the actual occupancy in a V100 GPU after the kernels are launched? For instance, I launch: kernel<<<X,Y,…>>>(…), and I would like to know how many blocks out of X are actually running in parallel after the launch, not just calculating the occupancy beforehand. Is that possible from the cuda API in the host?

Another thing is the preemption. Is it possible in V100? As in a running block being preempted and another waiting block running before the previous one finishing?

Run your code under the visual profiler to see both theoretical and achieved occupancy.

Devices since Pascal support preemption. However for performance reasons it is only used sparingly, like during single-GPU debugging.

I would like to find it via the API actually, not via external tools, would that be possible?

There isn’t any ordinary CUDA api that will tell you this. (e.g. CUDA Runtime API, CUDA Driver API)

If you want to explore CUPTI, it might be possible, but I don’t have a recipe for you. This:

“I would like to know how many blocks out of X are actually running in parallel after the launch”

is a number that can change instantaneously. It will generally vary over the duration of the kernel execution.