copy from pinned memory to host is 3x slower than copy from cuda to host, why?

heyworld · October 18, 2018, 5:43am

My platform is TX2.

Method1
I copied data from cuda to host by using cudaMemcpy().
cuda memory is allocated by cudaMalloc, host memory is allocated by using new. It takes about 10ms.

Method2
Then I tried another method by copying data from pinned memory to host by using memcpy().
pinned memory is allocated by cudaMallocHost, host memory is allocated by using new, it takes about 30ms.

I am confused here, GPU in TX2 doesn’t have its own memory, all memory can be regarded as CPU memory, so method 2 should take at most 10ms( let alone method 1 needs to do GPU mapping->pinned->host, method 2 only needs pinned->host)

saulocpp · October 18, 2018, 6:51am

TX2 forum:
https://devtalk.nvidia.com/default/board/188/jetson-tx2/

Topic		Replies	Views
copy from pinned memory to host is 3x slower than copy from cuda to host, why? Jetson TX2	2	1296	October 18, 2021
Pinned memory slows CPU computation Jetson TK1	5	1414	January 8, 2016
CPU operation is very slow on memory allocated by cudaMallocHost CUDA Programming and Performance	0	380	October 9, 2018
CPU operation is very slow on memory allocated by cudaMallocHost TensorRT	1	827	October 8, 2018
CPU operation is very slow on memory allocated by cudaMallocHost Jetson TX2	13	1731	October 18, 2021
Memory copy improvement ? CUDA Programming and Performance	6	3072	April 25, 2012
The memory sharing between cpu and gpu in Jetson TX2 Jetson TX2	6	7160	October 18, 2021
cudaMalloc and cudaMemcpy (cudaMemcpyAsync) are too time-consuming Jetson Xavier NX cuda	2	343	June 9, 2023
malloc() + cuMemHostRegister() faster than cuMemAllocHost() CUDA Programming and Performance	0	1080	October 9, 2013
data transfer cost a lot of time Jetson TX2	2	742	October 18, 2021

copy from pinned memory to host is 3x slower than copy from cuda to host, why?

Related topics