Memory Transfer activities during cudaStreamSynchronize

Hi,

I am new to cuda programming and performance testing and would like to do a memory test for my program. Right now, I am testing it using Jetson AGX Xavier platform and here is what I’ve got.


The method I use for testing is Nsight System commnad line plus nvprof command (nsys nvprof --profile-child-processes [my_program] [arguments]…), then import the report to Nsight System.

I find that the majority of the time was spent on cudaStreamSynchronize and there is no record related to this part in memory section. Could anyone suggests how to get memory transfer information within this part?

Thanks for your time!
Sam

Moved to Nsight Systems forum

同步操作只是在等待和函数执行完毕,所以看起来像是同步在耗时