Zero Copy Access CUDA Pipeline

mk_opc · June 16, 2017, 8:44pm

Hello all,

I have some code which executes the same function on the host, and then on the device, and outputs each to a file (it’s just some array arithmetic, nothing exciting).

I was using the standard method of allocating memory on the device, and then copying the array contents from the host (see “Standard CUDA pipeline”: http://arrayfire.com/zero-copy-on-tegra-k1/)

I was unhappy with the results, as it was not much faster than the CPU, so I implemented the “Zero Copy Access” method and got a tremendous improvement in speed (and verified that the 2 outputs, from CPU and GPU were the same).

HOWEVER: This DRASTICALLY slowed down the HOST (CPU) execution speed… by an order of magnitude!!!

Why is the CPU execution on the host arrays so slow when using this zero copy access method???

Thanks for the help

AastaLLL · June 19, 2017, 3:18am

Hi,

cudaHostAlloc doesn’t guarantee fast speed, check
[url]Programming Guide :: CUDA Toolkit Documentation

Unified Memory offers a “single-pointer-to-data” model that is conceptually similar to CUDA’s zero-copy memory. One key difference between the two is that with zero-copy allocations the physical location of memory is pinned in CPU system memory such that a program may have fast or slow access to it depending on where it is being accessed from. Unified Memory, on the other hand, decouples memory and execution spaces so that all data accesses are fast.

mk_opc · June 19, 2017, 6:42am

extremely helpful, thank you!!

Topic		Replies	Views
Difference between cudaMallocManaged and zero copy memory function CUDA Programming and Performance	1	6301	March 1, 2018
Zero Copy Memory vs Unified memory CUDA processing Jetson TX1	27	21257	February 23, 2018
Why zero copy in Jetson TX2 is so slow? Jetson TX2	1	1175	March 23, 2018
How to disable zero-copy on TX1? Jetson TX1	3	854	October 20, 2017
Why the access speed of memory allocated by cudaMallocHost is so slow? Jetson TX2 cuda	7	843	July 13, 2020
CPU operation is very slow on memory allocated by cudaMallocHost Jetson TX2	12	1988	October 26, 2018
Memory-type quesions CUDA Programming and Performance	6	748	April 21, 2023
Tegra K1 MatVec Multiplication Benchmark Revision (Zero Copy vs Unified Memory) CUDA Programming and Performance	3	1404	February 14, 2016
Why does it take longer for a program to use Unified Memory than not to use Uuified Memoery? Jetson AGX Xavier cuda	2	497	January 26, 2021
Zero-copy still copy data? Jetson AGX Xavier	6	4049	July 9, 2020

Zero Copy Access CUDA Pipeline

Related topics