Translating CPU based OpenCV code to GPU based OpenCV code

dumbogeorge · July 5, 2017, 2:41pm

Hi Folks,

I am trying to apply some basic operations on frames captured by camera. When I try to do so using OpenCV functions ( for eg, absdiff ) it appears pretty slow. To make the code more optimized w.r.t runtime, I am trying to use cuda based calls ( for eg, cv::cuda::absdiff ) instead. I am trying to read frames from camera as cv::Mat using the following code snippet:

int fd = iImageNativeBuffer->createNvBuffer(Argus::Size {m_framesize.width, m_framesize.height},
               NvBufferColorFormat_YUV420, NvBufferLayout_Pitch, &status);
        if (status != STATUS_OK)
               TEST_ERROR_RETURN(status != STATUS_OK, "Failed to create a native buffer");

        NvBufferParams params;
        NvBufferGetParams(fd, &params);

        int fsize = params.pitch[0] * m_framesize.height ;
        char *data_mem = (char*)mmap(NULL, fsize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, params.offset[0]);
        if (data_mem == MAP_FAILED)
           printf("mmap failed : %s\n", strerror(errno));

 	struct timeval tp;
    	gettimeofday(&tp, NULL);
    	long start = tp.tv_sec * 1000 + tp.tv_usec / 1000;
	cout<<"Time at frame capture : "<< start - prevTime <<" ms "<<endl; 

	prevTime = start;

        cv::Mat imgbuf = cv::Mat(m_framesize.height, m_framesize.width, CV_8UC1, data_mem,params.pitch[0]);

cv::cuda::absdiff takes cv::cuda::GpuMat as argument and hence I am required to convert cv::Mat to cv::cuda::GpuMat using the upload function as used below :

gpu::GpuMat gimgCurrFrame;

gimgCurrFrame.upload(currFrameGray);

where currFrameGray is single channel Mat image

The upload function call consumes a lot of time and hence the whole purpose of running the code on GPU is defeated because of the conversion required.

Following are some of the queries I have in this context :

As per my knowledge,on TX1 supports memory is shared between CPU and GPU. If that is the case then why is this conversion required (if it is) ?
Is there any other way of using CUDA based OpenCV function calls without conversion from cv::Mat to cv::cuda::GpuMat ?
Based on my research, I have found a way to read frames in CUeglFrame format. Is there a way to use cuda based OpenCV functions like cv::cuda::absdiff using this format ?
Is there a way to read camera frames in cv::cuda::GpuMat instead of cv::Mat ?

Thanks

dumbogeorge · July 6, 2017, 7:48am

Hi Folks,

I found this thread very helpful.

https://devtalk.nvidia.com/default/topic/998962/jetson-tx1/cuda-zero-copy-on-tx1/2

However I am not able to fully comprehend this. Could some please please explain following steps -

I found there is a way to pass a dmabuffer fd to the v4l2. so could I get a device(cuda) address from an alloced NvBuffer?

Like this.
1. NvBufferCreate(&fd, w,h,NvBufferLayout_Pitch,get_nvbuff_color_fmt(ctx->cam_pixfmt)))
2. cudaAddrFromFD(fd, &d_a) 
Like this.
1. NvBufferCreate(&fd, w,h,NvBufferLayout_Pitch,get_nvbuff_color_fmt(ctx->cam_pixfmt)))
2. cudaAddrFromFD(fd, &d_a)

or These -

Yes, this is a good way to avoid memory copy.

Flow likes this:
V4L2_buffer -> EGLImageKHR -> CUDA-Array
(dmabuf_fd) (cuGraphicsEGLRegisterImage) (pDevPtr)

Thanks,

WayneWWW · July 7, 2017, 2:52am

Hi dumbogeorge,

Please refer to mmapi backend sample.

This function call would map a EGLimage to cuda buffer.

**
  * Performs map egl image into cuda memory.
  *
  * @param pEGLImage: EGL image
  * @param width: Image width
  * @param height: Image height
  * @param cuda_buf: destnation cuda address
  */

void mapEGLImage2Float(void* pEGLImage, int width, int height, void* cuda_buf)

A EGLimage can be created by this function.
 NvEGLImageFromFd (EGLDisplay display, int dmabuf_fd);

Topic		Replies	Views
Encoding from OpenCV GpuMat and Writing Output to File Jetson Xavier NX opencv , cuda , jetson-inference	13	1306	December 15, 2023
Transfer video frames from a PCIe capture card to Jetson TX1 device memory (for RT video processing) Jetson TX1	20	5780	June 1, 2018
About 04_video_dec_trt dec dma data difference with the opencv Jetson Nano mmapi	6	1080	October 15, 2021
How to pass to hardware encoder from OpenCV Jetson Xavier NX opencv , encoder	15	4956	October 18, 2021
Performance check ( opencv + tx1 camera ) Jetson TX1	4	863	October 18, 2021
Optimizing access to image data acquired with nvcamerasrc Jetson TX1	4	3028	October 18, 2021
Reading CSI camera input directly to GPU memory Jetson Nano camera , gstreamer , jetson-nano	5	2596	December 4, 2022
Allocate CUDA host memory and copy NVBuffer Image into it Jetson TX1	6	2644	October 18, 2021
Faster way to cache images on Jetson DeepStream SDK	8	383	June 21, 2023
Transfer data CPU/GPU is an issue.. Jetson TX2	8	1868	October 18, 2021

Translating CPU based OpenCV code to GPU based OpenCV code

Related topics