How do I measure parallel executation when I have multiple streams?

Here is a profile that I generated by nsys. I have multi streams and I want to measure the parallel executation by a metrics (maybe just time). Which metrics should I use? and how get the quantity of improvment by using multi streams?

One possible approach: Just measure overall application run time, or the overall run time of the section you are interested in. The improvement in performance (if any) will be evident in reduced run time for the application or section. You can use simple ratios for measuring improvement.

tOLD/tNEW = speedup factor

Thanks Robert.
Actualy I do not have the single stream version now to compare. Can I force it to run in single stream mode to save time in rewriting single version.

I don’t know of any way to do that.