ALLOC_ZEROCOPY strange issue

Hi all,

can you help me figuring out why this code does not work?

<i>[buf is a buffer sized "src_W x src_H x 3" acquired by a grabber]</i>

cv::Mat res(480,640,CV_8UC3);

if( buf )
{
   cv::Mat ocvAcqFrame( src_H, src_W, CV_8UC3, buf );

   cv::gpu::CudaMem gpu_frame_mem( ocvAcqFrame, cv::gpu::CudaMem::ALLOC_ZEROCOPY );
   cv::gpu::CudaMem gpu_res_mem( res, cv::gpu::CudaMem::ALLOC_ZEROCOPY );

   cv::gpu::GpuMat gpu_frame = gpu_frame_mem.createGpuMatHeader();
   cv::gpu::GpuMat gpu_res = gpu_res_mem.createGpuMatHeader();

   cv::gpu::resize( gpu_frame, gpu_res, cv::Size(640,480) );

   // gpu_res.download( res ); <--- This works
   res = gpu_res_mem.createMatHeader(); <--- This crashes
}

Using “download” the code works fine, but is really SLOWER than using “cpu” resize.
Using “createMatHeader” I have a crash without any information.

Thank you
Walter

Hi,

Thanks for your question.

Tried your code and be able to run without error.(I replace ocvAcqFrame with cv::Mat)
Could you check if there is something wrong in ocvAcqFrame allocation?

#include <iostream>
#include <opencv2/core/core.hpp>
#include "opencv2/gpu/gpu.hpp"

int main()
{
    cv::Mat buf(480,620,CV_8UC3);
    cv::Mat res(480,640,CV_8UC3);

    cv::gpu::CudaMem gpu_frame_mem( buf, cv::gpu::CudaMem::ALLOC_ZEROCOPY );
    cv::gpu::CudaMem gpu_res_mem( res, cv::gpu::CudaMem::ALLOC_ZEROCOPY );

    cv::gpu::GpuMat gpu_frame = gpu_frame_mem.createGpuMatHeader();
    cv::gpu::GpuMat gpu_res = gpu_res_mem.createGpuMatHeader();

    cv::gpu::resize( gpu_frame, gpu_res, cv::Size(640,480) );

    // gpu_res.download( res ); <--- This works
    res = gpu_res_mem.createMatHeader();
    std::cout << "All goods" << std::endl;
    return 0;
}

For zerocopy performance issue, this topic can give you some hint:
https://devtalk.nvidia.com/default/topic/810053/jetson-tk1/opencv-performance-tk1/post/4479121/#4479121

I’m writing a benchmark application to better understand this issue and I’m facing really strange behaviours.

For example:
resize in a pipeline where GpuMat is initialized with “upload” takes about 2 msec
resize in a pipeline where GpuMat is initialized with “ZEROCOPY” takes about 32 msec

The benchmark is available on Github:

I replied to the topic you suggested to me with the result of one of my test…

Hi,

Thanks for your feedback.

Please remember to maximize cpu/gpu frequency for fair comparison.

sudo ./jetson_clocks.sh

Sure… I have this in my rc.local ;)

#Maximize performances 
( sleep 60 && /home/ubuntu/jetson_clocks.sh )&

exit 0

Hi,

Thanks for your feedback.
May I know which your target device is since you mentioned that you will try tx1 and tx2.
We can check the performance issue together after clarifying the target device.

And surely, tx2 is our current fast device.