Is the Bus Accessed with pinned cudaMemcpyAsync?

Let’s assume and I’ve allocated pinned memory (with the usual cudaMallocHost) and device memory (with cudaMalloc of course ) and I perform a normal cudaMemcpyAsync between the two (for simplicity just cudaMemcpyHostToDevice can be considered).

Is the bus involved in such an operation?

Since all the memory resides in the DRAM, my intuition is telling me that it should not; but I can’t find any proof of this or a good way of verifying this fact.

Any source that points to the solution, or just hints, would be very appreciated!


1) You don’t need to do memcpy for a pinned memory. The memory is directly accessible for both CPU and GPU.

2) Pinned memory is not I/O coherency with the compute capability lower than 7.2. (TX2=6.2)

3) Transfer is done by CUDA driver when an access occurs.

Not sure if I answer your question correctly.
Please let me know if anything I can help further.

Here is a tutorial for your reference:

Thank you for the hints. The reference is helpful for understanting better how the Tegra memory system works. But I think I’m not getting the point 3:

Which kind transfer are you referring to?

Moreover, I’ve probably set up the wrong example in the original post. Consider this new setup:

There are 2 device memory regions allocated with cudaMalloc and i want to copy from one to another using cudaMemcpy (so with the cudaMemcpyDeviceToDevice kind).

Is the bus actively involved in this operation?


1) I/O synchronization

2) YES. You will get two different physical address if the memory type is not zero-copy(ex. pinned or unified).
And the transfer will use bus.




I agree with you. It has been verified in the documentation of the Jetson Xavier (I assume the same behaviour). See its Technical Reference Manual here (page 109):$product,jetson_agx_xavier