Utilization of Streaming Multiprocessors (GPU) while running Games

Is there any way where we can measure the the GPU occupancy of SMs while running Games. Further Number of blocks & Warps. In case of applications we can measure it using benchmarks like Parboil2, Rodinia2, nvprof(quite famous).

Just like we do with the nvprof for the .cu files while execution, can we do it by running multiple Games.

Can anyone tell this one?