I have a lense correction kernel using OpenCv (Cv4Tegra) running on nVidia Tx1.
I have tested the kernel using two of the memory models.
1- Pinned Memory allocated using GpuMat . Uploading data to it . Processing it. then downloading.
2 - Zero Copy Mapped memory. no uploading , no downloading. Just processing
Since Tx1 is an integrated GPU with same memory space as the host so I shouldn’t have to “upload” to device memory before processing. If i understand it correctly, there is no device memory per-say.
I ran my tests and approach 1 is twice as fast as approach 2. even with uploading and downloading.
So when we “upload” to GpuMat what exactly is happening ? Why is this faster.
Similarly, why is processing on zero copy data slower.
What does it mean by “read once write once” ? is it w.r.t the whole matrix or is it talking about indexing e-g read index 0 only once. Do not go back to index 0 again.
I have gone through the documentation already but I haven’t been able to figure out why the performance loss instead of gain.