Unable to create PAGE_LOCKED or SHARED host memory using the Python binding to OpenCV's HostMem

jim1 · July 30, 2020, 9:56pm

I’m having trouble with the Python binding to OpenCV’s HostMem class and can’t create PAGE_LOCKED or SHARED host memory. I’m not sure I’m using it correctly and haven’t been able to find any examples. I’ve tried two different ways of creating page-locked host memory so that I can call cv2.cuda image processing methods. Here’s what I’ve tried:

a_mem = cv2.cuda_HostMem(cv2.cuda.HostMem_PAGE_LOCKED)
a_mem.create(num_rows, num_cols, cv2.CV_8UC1)
a_host = a_mem.createMatHeader()
a_dev = cv2.cuda_GpuMat(a_host)

or -

a_mem = cv2.cuda_HostMem(num_rows, num_cols, cv2.CV_8UC1, cv2.cuda.HostMem_PAGE_LOCKED)
a_host = a_mem.createMatHeader()
a_dev = cv2.cuda_GpuMat(a_host)

In both cases I get Mat and GpuMat references that I can successfully use to make CUDA calls:

a_dev.upload(a_host)
cv2.cuda.add(a_dev, b_dev, c_dev)
c_dev.download, c_host)

But when I use NVIDIA Visual Profiler to examine the uploads and downloads it tells me that my host memory is Pageable and not Pinned as I would expect for page-locked host memory. I have been able to use cv2.cuda.registerPageLocked() to create Pinned host memory so I believe what Visual Profiler is telling me. I’ve tried this same test with cv2.cuda.HostMem_SHARED and I get the same results.

Can someone please tell me if I’m creating the host memory and the Mat and GpuMat references incorrectly?

Also, when I do succeed in creating SHARED host memory, how do I get a GpuMat reference to it? I feel like I should be using HostMem’s createGpuMatHeader method for this but it doesn’t have a Python binding.

Thanks for any help I can get. I’ve been stuck on this for three days.

fabian.solano · July 30, 2020, 11:12pm

Hello @jim1,

Working with PAGE_LOCKED and shared host memory on OpenCV can be kind of tricky. You need to pay special attention to the cv::cuda::HostMem datatype.

CUDA Streams help in creating an execution pipeline therefore when a Host to Device operation is being performed then another kernel can be executed, as well as for the Device to Host operations. Within this context you can take advantage of using this kind of memory.

You need to allocate all the references for the paged memory objects prior to its utilization within the same scope in code. For example:

//Create Pinned Memory (PAGE_LOCKED) arrays
std::shared_ptr<std::vector<cv::cuda::HostMem >> srcMemArray = std::make_shared<std::vector<cv::cuda::HostMem >>();

You will also need GPU mats to use that pinned memory

//Create GpuMat arrays to use them on OpenCV CUDA Methods
std::shared_ptr<std::vector< cv::cuda::GpuMat >> gpuSrcArray = std::make_shared<std::vector<cv::cuda::GpuMat>>();

You can analyzed then the pipelined structure in NVIDIA’s Nsight for example:

You can find a minimal working example in the following wiki page:

Regards,
Fabian
www.ridgerun.com