Translating CPU based OpenCV code to GPU based OpenCV code

Hi Folks,

I am trying to apply some basic operations on frames captured by camera. When I try to do so using OpenCV functions ( for eg, absdiff ) it appears pretty slow. To make the code more optimized w.r.t runtime, I am trying to use cuda based calls ( for eg, cv::cuda::absdiff ) instead. I am trying to read frames from camera as cv::Mat using the following code snippet:

int fd = iImageNativeBuffer->createNvBuffer(Argus::Size {m_framesize.width, m_framesize.height},
               NvBufferColorFormat_YUV420, NvBufferLayout_Pitch, &status);
        if (status != STATUS_OK)
               TEST_ERROR_RETURN(status != STATUS_OK, "Failed to create a native buffer");

        NvBufferParams params;
        NvBufferGetParams(fd, &params);

        int fsize = params.pitch[0] * m_framesize.height ;
        char *data_mem = (char*)mmap(NULL, fsize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, params.offset[0]);
        if (data_mem == MAP_FAILED)
           printf("mmap failed : %s\n", strerror(errno));

 	struct timeval tp;
    	gettimeofday(&tp, NULL);
    	long start = tp.tv_sec * 1000 + tp.tv_usec / 1000;
	cout<<"Time at frame capture : "<< start - prevTime <<" ms "<<endl; 

	prevTime = start;

        cv::Mat imgbuf = cv::Mat(m_framesize.height, m_framesize.width, CV_8UC1, data_mem,params.pitch[0]);

cv::cuda::absdiff takes cv::cuda::GpuMat as argument and hence I am required to convert cv::Mat to cv::cuda::GpuMat using the upload function as used below :

gpu::GpuMat gimgCurrFrame;

gimgCurrFrame.upload(currFrameGray);

where currFrameGray is single channel Mat image

The upload function call consumes a lot of time and hence the whole purpose of running the code on GPU is defeated because of the conversion required.

Following are some of the queries I have in this context :

  1. As per my knowledge,on TX1 supports memory is shared between CPU and GPU. If that is the case then why is this conversion required (if it is) ?

  2. Is there any other way of using CUDA based OpenCV function calls without conversion from cv::Mat to cv::cuda::GpuMat ?

  3. Based on my research, I have found a way to read frames in CUeglFrame format. Is there a way to use cuda based OpenCV functions like cv::cuda::absdiff using this format ?

  4. Is there a way to read camera frames in cv::cuda::GpuMat instead of cv::Mat ?

Thanks

Thanks

Hi Folks,

I found this thread very helpful.

https://devtalk.nvidia.com/default/topic/998962/jetson-tx1/cuda-zero-copy-on-tx1/2

However I am not able to fully comprehend this. Could some please please explain following steps -

I found there is a way to pass a dmabuffer fd to the v4l2. so could I get a device(cuda) address from an alloced NvBuffer?

Like this.
1. NvBufferCreate(&fd, w,h,NvBufferLayout_Pitch,get_nvbuff_color_fmt(ctx->cam_pixfmt)))
2. cudaAddrFromFD(fd, &d_a) 
Like this.
1. NvBufferCreate(&fd, w,h,NvBufferLayout_Pitch,get_nvbuff_color_fmt(ctx->cam_pixfmt)))
2. cudaAddrFromFD(fd, &d_a)

or These -

Yes, this is a good way to avoid memory copy.

Flow likes this:
V4L2_buffer -> EGLImageKHR -> CUDA-Array
(dmabuf_fd) (cuGraphicsEGLRegisterImage) (pDevPtr)

Thanks,

Hi dumbogeorge,

Please refer to mmapi backend sample.

This function call would map a EGLimage to cuda buffer.

**
  * Performs map egl image into cuda memory.
  *
  * @param pEGLImage: EGL image
  * @param width: Image width
  * @param height: Image height
  * @param cuda_buf: destnation cuda address
  */

void mapEGLImage2Float(void* pEGLImage, int width, int height, void* cuda_buf)

A EGLimage can be created by this function.
 NvEGLImageFromFd (EGLDisplay display, int dmabuf_fd);