if I want to use the OpenCV function cv::gpu::CudaMem with page locked memory on the TX1, the software crahes with the last two commands. Is this the case because I have the Unified Memory for both, the CPU and the GPU?

using namespace cv;

// allocate page locked memory
gpu::CudaMem page_locked(1024, 1024, CV_16UC1, gpu::CudaMem::ALLOC_PAGE_LOCKED);
gpu::CudaMem zero_copy(1024, 1024, CV_16UC1, gpu::CudaMem::ALLOC_ZEROCOPY);
gpu::CudaMem write_combined(1024, 1024, CV_16UC1, gpu::CudaMem::ALLOC_WRITE_COMBINED);

// connect header with page-locked memory, dont copy data
gpu::GpuMat header_zero_copy = zero_copy.createGpuMatHeader();

// programms fails with both following lines
gpu::GpuMat header_page_locked = page_locked.createGpuMatHeader();
gpu::GpuMat header_write_combined = write_combined.createGpuMatHeader();



It requires to allocate with ‘gpu::CudaMem::ALLOC_ZEROCOPY’ if you want to use memory mapping.

By the way, related information shown with deviceQuery in cuda sample.


Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = NVIDIA Tegra X1
Result = PASS