Issue with cuda pinned memory on Tegra K1(XiaoMi pad)

XinSu · September 26, 2014, 6:36am

Hi all,

I meet a problem when using cuda pinned memory(created by cudaHostAlloc or cudaMallocHost) on XiaoMi Pad. (XiaoMi Pad use Tegra K1). The performance of copy data from pinned memory to paged memory is so bad!!
I did some tests. The detail step described as below:
Test1:

use malloc to allocate a paged memory
use cudaHostAlloc or cudaMallocHost to allocate a pinned memory
copy a 4704000 bytes data from paged memory to pinned memory, cost about 7ms
copy same size buffer from pinned memory to paged memory cost about 95ms
paged memory->pinned memory about 7ms
pinned memory->paged memory about 95ms

Test2:
Copy 4704000 byte size buffer between two paged memory on Xiaomi Pad(Tegra K1)
paged memory → paged memory about 7ms
paged memory ← paged memory about 7ms

Test3: same application on GTX650:
paged memory->pinned memory about 1.9ms
pinned memory->paged memory about 1.9ms
paged memory → paged memory about 1.9ms
paged memory ← paged memory about 1.9ms

My question is why the memory performance of pinned memory on (XiaoMi Pad)Tegra K1 and GTX650 are so different ? Since paged memory and pinned memory are both on the host, they should have the similar performance on memcpy or cudaMemcpy(HostToHost), just like GTX650

I not sure if it is a issue of Tegra K1 or only XiaoMi pad have this issue. I am appreciate if who can tell me what should i do. I am in hurry to solve this!!

BTW, if anyone who also meet this problem, please let me know.

Thanks

mdashti · January 21, 2015, 8:05am

I’ve encountered a similar issue on the Jetson TK1. I’ve asked a similar question on StackOverflow, and later answered my own question. The short answer is that data allocated using cudaHostAlloc() is not cached in the CPU caches. So accesses from the host are really slow. Here’s the link:
http://stackoverflow.com/questions/27972491/cpu-memory-access-latency-of-data-allocated-with-malloc-vs-cudahostalloc-on

Topic		Replies	Views
Issue with cuda pinned memory on Tegra K1(XiaoMi pad) Android Development	1	1503	January 21, 2015
CUDA memory performance Jetson TK1	3	1129	October 18, 2021
CPU operation is very slow on memory allocated by cudaMallocHost CUDA Programming and Performance	0	384	October 9, 2018
CPU operation is very slow on memory allocated by cudaMallocHost TensorRT	1	831	October 8, 2018
Issue with cudaHostAlloc on Xiaomi pad CUDA Programming and Performance	0	547	September 26, 2014
cudaHostAlloc vs cudaMallocHost vs cudaMallocManaged Jetson TK1	2	4093	October 20, 2016
malloc() + cuMemHostRegister() faster than cuMemAllocHost() CUDA Programming and Performance	0	1084	October 9, 2013
Managed memory vs cudaHostAlloc - TK1 CUDA Programming and Performance	10	6134	February 22, 2016
Issue with cudaHostAlloc on Xiaomi pad Android Development	0	1061	September 26, 2014
Low performance for CPU accessing page-locked memory? CUDA Programming and Performance	3	613	March 7, 2019

Issue with cuda pinned memory on Tegra K1(XiaoMi pad)

Related topics