I’ve been working with OpenCV3 and CUDA, and I may get access to a TX1 soon. I wanted to see if using using unified buffers /Shared Memory (using the UMA, I assume) would give any sort of performance increase on that device, but I’m not really sure how to allocate shared buffers and use them as Mats.
I see in the documentation that there’s a data structure called CudaMem. That structure has the option to allocated zero-copy memory, but to be honest I don’t know what header file to include in order to gain access to that type. So I guess my first question is, what do I have to include to get access to the CudaMem type?
Assuming I do find the header and can allocate a SHARED CudaMem object, what would be the protocol for using it? I see that I can create a GpuMat header that maps CPU memory to GPU hardware, so I assume it would go something like this
using namespace cv::cuda; ... // Allocate shared memory CudaMem data(CudaMem::SHARED); // Use h_DataHeader in host operations cv::Mat h_DataHeader = data.CreateMatHeader(); // Use d_DataHeader in device operations GpuMat d_DataHeader = data.createGpuMatHeader();
But I’m not really sure if this is the case / if this would require any sort of device synchronization, like it does using buffers allocated via cudaMallocManaged.
Finally, I’m not sure how / if Shared CudaMem objects would work with streams, or if those are reserved for page locked memory. Does anyone know anything about this?
Thanks for your help. I know I asked a lot, but if you have any input on the above I’d love to hear it. Also feel free to yell at me for not being specific enough.
EDIT: stackoverflow user talonmies got mad at me for calling them shared buffers, which is fair since it’s easily confused with shared memory used within kernel calls. I suppose I meant “zero-copy” or “unified” memory. My mistake.