Hello, is there any way to get the actual occupancy in a V100 GPU after the kernels are launched? For instance, I launch: kernel<<<X,Y,…>>>(…), and I would like to know how many blocks out of X are actually running in parallel after the launch, not just calculating the occupancy beforehand. Is that possible from the cuda API in the host?
Another thing is the preemption. Is it possible in V100? As in a running block being preempted and another waiting block running before the previous one finishing?