Cuda memory copy throughput in jetson device

Jetson devices use unified memory, is there any reason why the memory copy speed of host to device (even if pinned memory is used) is slower than device to device?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.


Have you maximized the device performance before benchmarking?
If yes, could you share the time you measured with us first?