transfer from pageable host memory to page-locked host memory?

Michael_Chan · May 30, 2012, 4:32am

It’s right to get a good performace by using pinned host memory(page-locked) for transfering from Host to Device. However, I have to transfer my data which is located in pageable memory to page-locked memory after all.

If I use a function like “memcpy”, it’s synchronous, which costs 5-10ms every frame.
And if I use a CPU work thread to memcpy it asynchronously, it also taks several milliseconds to synchronous the thread.
CUmemcpyAsync is able to transfer from host to host but is only used when unified addressing is enable which is not possible on Windows XP.

So am I wrong? Is there any other CUDA funciton or way to transfer from pageable host memory to page-locked host memory asynchronously?

tera · May 30, 2012, 9:52am

Copying from pageable to pinned memory and from there to the device is exactly what the driver does for host->device copies from pageable memory, so you won’t gain any speedup by explicitly coding that yourself.

What you can do with newer CUDA toolkits is to pin down pageable memory (without performing a memcpy) using cudaHostRegister() / cuMemHostRegister() before copying from there to the device.

EDIT: It is a reasonable assumption that newest or future versions of the driver will take the cuMemHostRegister() path themselves, so explicitly coding a memcpy into your application would even slow things down for future drivers.

Michael_Chan · May 30, 2012, 11:05am

Greatï¼I will try the functions you provide. Thanks a lot!

shawkie · June 1, 2012, 2:08pm

Any idea what happens on systems with dual Intel Xeon CPU? In these configurations its common to have one PCIe x16 slot directly connected to each CPU and of course each CPU has its own memory controller and memory so presumably there is a performance benefit to having the pinned memory local to whichever CPU is connected to the GPU. What is the best way of making sure this happens?

Topic		Replies	Views
Async transfers with non-cuda host memory using page-locked memory not cuda memory CUDA Programming and Performance	5	11719	July 4, 2008
cudaHostRegister/Unregister vs Host Memcpy to Pagelocked CUDA Programming and Performance	3	2659	November 26, 2012
question about page locked memory CUDA Programming and Performance	2	9115	April 21, 2009
Memory-type quesions CUDA Programming and Performance	7	682	April 21, 2023
Is it possible to use pinned memory? Outside of CUDA CUDA Programming and Performance	14	6564	January 22, 2025
Highly varying copy throughput from/to pinned to/from pageable memory CUDA Programming and Performance cuda	9	1355	July 10, 2020
Using async memcopy without using cudaMallocHost/cudaHostAlloc? CUDA Programming and Performance	3	16581	March 30, 2010
Transfer Speed For AWE-Allocated Memory CUDA Programming and Performance	6	3043	March 20, 2013
cudaHostAllocMapped CUDA Programming and Performance	5	8275	October 15, 2009
Why can page-locked Memory be acc in memcpy funciton CUDA Programming and Performance	1	3583	April 6, 2009

transfer from pageable host memory to page-locked host memory?

Related topics