MemcpyAsync DtoD transformed in DtoH

Hi all,

I’m working on a Orin Nano with Jetpack 5.1.1 [L4T 35.3.1].
In my code, I have a cudaMemcpy2DAsyncwith the flag cudaMemcpyDeviceToDevice set. However, when I’m profiling the execution with $ nsys profile ..., this memcpy is interpreted as a DeviceToHost, (device to pinned memory).

I couldn’t create a minimal reproducible example, but does anyone have an idea of what might be happening?
Could it be that Nsight is misinterpreting the cudaMemcpy?
Or is NVCC perhaps doing something unexpected with Jetson’s unified memory?

Thank you in advance,

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

We need more information about the issue.
Are you able to share the snapshot of the cudaMemcpy2DAsync call and nsys output with us?

Thanks.