Hi guys, I try to use Nsys to catch she pipeline performance, but i am not sure about the cost of cuLauchKernel
& cudaLauchKernel
in the profiling report.
like this:
I think the kernel cost maybe not exact when there is no stream sync in source codes?
and how can i do to get the exact kernel cost by using Nsys not adding codes in source?