It might be that you have a misplaced cudaStreamSynchronize() that is preventing your higher priority stream kernel from getting scheduled, as your picture seems to be suggesting. There is really no way to confirm this from a static picture of the profiler output.
It doesn’t seem to apply to your case, but you should also be aware that depending on the GPU, blocks from high priority streams must wait for available space on the GPU SMs. If blocks from low priority streams get scheduled first (e.g. because they were launched first), and they fully occupy an SM, then subsequently issued blocks from higher priority streams may have to wait until space (resources) is available on the SM, before they can be scheduled.