CPU operation is very slow on memory allocated by cudaMallocHost

heyworld · October 8, 2018, 7:57am

The original thread is posted here, then maybe it’s more appropriate to post on TensorRT branch.
https://devtalk.nvidia.com/default/topic/1042530/cpu-operation-is-very-slow-on-memory-allocated-by-cudamallochost-/#5288277

The speed of copying data between GPU and CPU is faster when I use cudaMallocHost(rather than malloc) to allocate host memory(let’s say hostMem).

However CPU operation on hostMem is much slower, is there a method I can allocate memory that could make copying faster but doesn’t slow CPU operation?

I found from some other topics that pinned memory(allocated by cudaMallocHost) didn’t use cache which is the reason why CPU operation is slow on pinned memory.

Is there a faster way I can do CPU operation on this pinned memory allocated by cudaMallocHost.

NVES · October 8, 2018, 3:25pm

Hello,

I think this is more suitable for discussion in “CUDA Programming and Performance”
https://devtalk.nvidia.com/default/board/57/cuda-programming-and-performance/

The source of the cudaMallocHost overhead maybe due to data allocated with cudaHostAlloc() are marked “uncacheable”.

Topic		Replies	Views
CPU operation is very slow on memory allocated by cudaMallocHost CUDA Programming and Performance	0	380	October 9, 2018
malloc() + cuMemHostRegister() faster than cuMemAllocHost() CUDA Programming and Performance	0	1080	October 9, 2013
Low performance for CPU accessing page-locked memory? CUDA Programming and Performance	3	605	March 7, 2019
CPU operation is very slow on memory allocated by cudaMallocHost Jetson TX2	13	1731	October 18, 2021
Is cudaHostAlloc() fast? CUDA Programming and Performance	5	550	March 28, 2024
Pinned Memory slower than pageable memory CUDA Programming and Performance	4	3166	September 16, 2010
Why is cudaMallocHost() so slow? CUDA Programming and Performance	7	8848	November 17, 2021
cudaHostAlloc vs cudaMallocHost vs cudaMallocManaged Jetson TK1	2	4078	October 20, 2016
Memory copy improvement ? CUDA Programming and Performance	6	3072	April 25, 2012
Pinned Memory zero copy No-copy pinning of system memory CUDA Programming and Performance	3	1096	December 1, 2011

CPU operation is very slow on memory allocated by cudaMallocHost

Related topics