Memory Transfer activities during cudaStreamSynchronize


I am new to cuda programming and performance testing and would like to do a memory test for my program. Right now, I am testing it using Jetson AGX Xavier platform and here is what I’ve got.

The method I use for testing is Nsight System commnad line plus nvprof command (nsys nvprof --profile-child-processes [my_program] [arguments]…), then import the report to Nsight System.

I find that the majority of the time was spent on cudaStreamSynchronize and there is no record related to this part in memory section. Could anyone suggests how to get memory transfer information within this part?

Thanks for your time!

Moved to Nsight Systems forum