How to use Nsys to catch exact kernel cost without stream sync in source codes?

Hi guys, I try to use Nsys to catch she pipeline performance, but i am not sure about the cost of cuLauchKernel & cudaLauchKernel in the profiling report.

like this:

I think the kernel cost maybe not exact when there is no stream sync in source codes?
and how can i do to get the exact kernel cost by using Nsys not adding codes in source?

@jasoncohen