Different types of memory transfer change the execution time of kernel on Tegra x1

Sinay · May 30, 2018, 6:18pm

Dear all,
i am doing an exploration between classic way of transfer data, pinned memory, unified memory, zero-copy memory and UVA memory type of data.

I observe that apart from the time of the data trasnfer(that is obviously changing), the execution time of the kernels are using the data i have sent with the various types of transfer are changed.

I cannot imagine why, as i am not sure that global memory is cached.

Are the L1,L2 caches on the gpu, caches for data?

If it is, is the only factor that execution time of kernel differs, the more or less cache hits that happen in each way of transfer?

Is there any explanation of why every way of data transfer caches with a different way its data?

I am using Tegra x1, where cpu and gpu shares a common memory.

Thank you in advance!!
Any help will be very useful to me!

Honey_Patouceul · May 30, 2018, 7:02pm

This is not an obvious topic, there may be more recent info, but you may start with this.

Not sure, but I think that caching is not enabled for pinned/zero-copy memory. It can be a good scheme if you want to do simple operations (read input, compute only in registers without accessing memory, then output) with gpu on buffers available from CPU without copy.

But if you intend to do more complex processing from GPU requiring storing data and rereading it, then caching would probably be better and you would use Unified memory for that.

Someone with better knowledge may comment further.

AastaLLL · May 31, 2018, 6:21am

Hi,

Two major memory type on Jetson: pinned memory and unified memory.

You can find the main difference in our CUDA document:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-introduction
Unified Memory offers a “single-pointer-to-data” model that is conceptually similar to CUDA’s zero-copy memory. One key difference between the two is that with zero-copy allocations the physical location of memory is pinned in CPU system memory such that a program may have fast or slow access to it depending on where it is being accessed from. Unified Memory, on the other hand, decouples memory and execution spaces so that all data accesses are fast.

Thanks.

Sinay · May 31, 2018, 9:52am

Thank you for your answers!!
They helped me a lot!

I would like to ask, if Unified and Pinned Memory are achievable on Jetson because of its common memory between cpu and gpu or because of its compute capability?

If it is because of the common memory, what else advantages does the common memory offers us?

Thank you in advance!

AastaLLL · June 4, 2018, 8:00am

Hi,

Unified and pinned memory are available on both Jetson and x86-machine.
The difference of physical memory is handled by CUDA driver and user can just implement their program without managing it.

To have a shared physical memory, Jetson do have some benefit in transferring data between CPU/GPU.
Thanks.

Topic		Replies	Views
Different types of memory transfer change the execution time of kernel CUDA Programming and Performance	5	621	May 30, 2018
Zero-Copy and Managed memory on Jetson Jetson TX1	9	11461	August 20, 2018
The memory sharing between cpu and gpu in Jetson TX2 Jetson TX2	6	7005	October 18, 2021
cudaMallocManaged on jetson devices CUDA Programming and Performance cuda , jetson	3	1590	March 6, 2023
Unified Memory on Jetson TK1 Jetson TK1	2	1532	February 15, 2016
Asynchronous memory transfer on Jetson TX1 Jetson TX1	10	1617	October 18, 2021
Unified Memory Access Performance of Arrays of Structures Problem on Jetson TX2 Jetson TX2 cuda	5	620	October 18, 2021
Using CUDA Unified memory on embedded board (psychical unified memory) CUDA Programming and Performance	6	1487	July 14, 2016
Best hardware options to reduce GPU and CPU memory transfer time? Jetson Nano	6	1025	January 19, 2022
CPU operation is very slow on memory allocated by cudaMallocHost Jetson TX2	13	1712	October 18, 2021

Different types of memory transfer change the execution time of kernel on Tegra x1

Related topics