How to monitor # of SM used on a CUDA kernel under MPS SM oversubscribed environment?

I have a question about monitoring a number of SM used on a CUDA kernel on Multi Process Service (MPS) environment.
If we run one CUDA stream, maximum # of SM is static in each CUDA stream and we can guess # of SM on each CUDA Kernel.
Under the MPS oversubscribed environment, maximum # of SM is dynamic in each CUDA stream.
It is because each CUDA stream sets # of SM. And total sum of SM is following equation.

My case example is follows

  CUDA stream A(10) + B(20) + C(30) > GPU Device (40)
  For example A=10, B=20, C=30, Device=40 (unit SM  CU_EXEC_AFFINITY_TYPE_SM_COUNT )

Each CUDA stream has multiple CUDA kernels. # of SM allocation is differentiated by CUDA Kernel.

For example We asssume # of SM allocation as follows

  C-1 30
  C-2 20
  C-3 10

When we try to run the stream B in parallel, Is there any method to monitor available # of SM?
When we are in C-1 phase, can we know only 10 SM remaining?

I want to monitor # of SM allocated to running CUDA kernel. It is because processing resource is coming from # of SM x time duration.

cuCtxCreate_v3/cuCtxCreate_v4 CU_EXEC_AFFINITY_TYPE_SM_COUNT

cuCtxCreate_v4
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gd84cbb0ad9470d66dc55e0830d56ef4d