What are possible reasons of heavy kernel launch latency?

it is not a case of one particular GPU behaving badly, but the scenario is rather that occasionally and seemingly randomly one some of the GPUs in the system experiences an unexpected delay in launching a kernel although it is idle.

Exactly what I try to ask!
Sorry about the font stuff. The program uses 4 gpus and to show timeline view of all, I just keep it as 1x. Furthermore, what the ops are does not really matter imho. Just let me know if you need a more detail view, I can zoom-in and re-capture it.
The program runs in a container and it does NOT use MPS.

I’ve searched in forum and read some posts talking about similar issues. However, I cannot find something helpful. And I’ve raise a question in Nsys forum. (FYI, Kernel operation delays when gpu is idle)