On Jetson kit if we allocate pinned memory to increase the speed of H2D/D2H transfer what we have observed is that the time to access the memory increases on the host side. i.e.
If I allocated memory using malloc and did some simple computation like vector addition it takes lesser time as compared to allocating pinned memory and doing same computation on cpu.
The difference in timing is as high as 2x slowdown. To increase speed on GPU we end up increasing host side time. This is true for even small allocations of few kb memory ( i.e. to nullify the effect of OS not getting enough pages ) which is very small.
for(int i =0; i < N; i++)
{
ptr[i] = (float)i;
temp[i] = (float)i;
}
//Time this part
for(int i =0 ; i < N; i++)
{
ptr[i] = temp[i];
}
When allocated through pinned memory the loop timed is twice as slow as compared to when allocation is done using malloc. We are using gettimeOfDay to time this.
This makes sense. But then is there any way I can use benefits of pinned memory (fast transfers … ) without sacrificing CPU performance i.e. make it still CPU cached.
The reason for asking this is we want to do workload balancing in GPU and CPU. Hence computation needs to be performed on both. If I use pinned memory the CPU part of computation gets affected.