i am doing an exploration between classic way of transfer data, pinned memory, unified memory, zero-copy memory and UVA memory type of data.
I observe that apart from the time of the data trasnfer(that is obviously changing), the execution time of the kernels are using the data i have sent with the various types of transfer are changed.
I cannot imagine why, as i am not sure that global memory is cached.
Are the L1,L2 caches on the gpu, caches for data?
If it is, is the only factor that execution time of kernel differs, the more or less cache hits that happen in each way of transfer?
Is there any explanation of why every way of data transfer caches with a different way its data?
I am using Tegra x1, where cpu and gpu shares a common memory.
Thank you in advance!!
Any help will be very useful to me!