Question about blocks scheduling on NVIDIA GPU's SMs

I am lunching a kernel (in Opencl but I suppose there is nothing different from CUDA as far as the SMs scheduling concern) and I observe that during the kernel execution the sm percentage usage is something like that: 12% 12% 12% 25% 25% 4% 4% 4% 4% 4%.

Those are 10 samples of SMs utilization provided by “nvidia-smi dmon -c 1 -s u” in different times while the opencl kernel is running.

I am wondering why the number of the working SMs changes.

Especially why from 12 % it goes to 25 % ? I can understand the 25% -> 4%, as many blocks could finish earlier but not the 12% -> 25%. How is it possible when launching just one kernel to happen a thing like that?
Additionally, since the NVIDIA GPU I am using is used only by my application, what I was expecting is to launch 25% of SMs (as they are available) from the beggining.

I know that there are not a lot of resources that could explain in detail the GPU scheduler, but it would be helpful if someone could give me an idea of what migh happen. Propably I am missing something.

Is is possible for nvidia-smi not to give me correct monitoring results because I run opencl application?