Handing off cudaImage object to OpenCV CUDA function? (expects CV::MAT)

willemvdkletersteeg · February 18, 2021, 2:00pm

I’m creating a computer vision application of sorts and for one of the features (besides object recognition which functions perfectly) I need to do motion detection. I use OpenCV for this.

I have it functioning perfectly if I use jetson.utils (python, by the way) to crop the image and then hand it off via jetson.utils.cudaToNumpy() to create the numpy array OpenCV needs.

But, the following functions of OpenCV I use (equalizehist, absdif, threshold, dilate, etc.) all run on CPU and I think that’s a waste of performance. These functions are all available on GPU/CUDA as well, but they require a CV::MAT object instead of a Numpy array. Without all sorts of matrix conversions and copy between GPU and CPU RAM, is there a way to hand off pointer to the cudaImage object as a CV::MAT? Or is there some conversion available that does the work in GPU?

I can’t find anything in the documentation about this, nor on the forum. I must be missing something because I wouldnt believe I would be the first person doing this… ;)

dusty_nv · February 18, 2021, 4:45pm

Hi @willemvdkletersteeg, do you mean using the C++ interface of OpenCV with cv::Mat? I have only seen numpy arrays used with cv2 from Python. If there is a cv::Mat in Python, I bet there is also a way to create it from the numpy array.

If you are using cv::Mat in C++, you should be able to create it from the CUDA pointer like so:

cv::Mat cv_image(cv::Size(imgWidth, imgHeight), CV_8UC3, imgCUDA);

Since images are allocated in jetson-inference in shared CPU/GPU memory, you can use the CUDA pointer directly on the CPU.

Honey_Patouceul · February 18, 2021, 10:35pm

If you want to investigate cv::GpuMat from jetson-utils; you may check this topic.
Note that there would be no automatic translation to cuda processing on GPU, you would have to adapt your processing rewriting with opencv cuda.
Also note that cuda backend only provides a subset of opencv CPU functions, when available the API may differ, and when API is similar the results may also differ. But if you get it working it may be faster.

willemvdkletersteeg · February 19, 2021, 8:02am

Thank you both for your response. Much appreciated. As said, I work in Python. But I may not have been clear what I’m trying to do. Excuse me. This is the code that currently works like a charm: (as in: it produces the results I want)

subFrame = jetson.utils.cudaAllocMapped(width=area.width, height=area.height, format=frame.format)
jetson.utils.cudaCrop(frame, subFrame, area.roi)

# First convert the frame (in GPU memory) to something OpenCV can use
cvFrame = jetson.utils.cudaAllocMapped(width=frame.width, height=frame.height, format="bgr8")

# TODO: convert to grayscale with CUDA/in GPU MEM?
jetson.utils.cudaConvertColor(frame, cvFrame)

# make sure the GPU is done working before we convert to cv2
jetson.utils.cudaDeviceSynchronize()

# convert to cv2 image (cv2 images are numpy arrays)
cvFrame = jetson.utils.cudaToNumpy(cvFrame)

# Convert to grayscale - TODO: do this sooner in the process
cvFrame = cv2.cvtColor(cvFrame, cv2.COLOR_BGR2GRAY)
cvFrame = cv2.equalizeHist(cvFrame)

frameDelta = cv2.absdiff(area.previous_frame, cvFrame)
thresh = cv2.threshold(frameDelta, 128, 255, cv2.THRESH_BINARY)
thresh = cv2.dilate(thresh, None, iterations=2)

But I have to run this every single frame (albeit, the regions that are cropped to are quite small) so I would like to optimize this and run everything in/on GPU. The OpenCV functions that I use are - as far as I know - all available in CUDA. So I wanted to change it to:

subFrame = jetson.utils.cudaAllocMapped(width=area.width, height=area.height, format=frame.format)
jetson.utils.cudaCrop(frame, subFrame, area.roi)

# First convert the frame (in GPU memory) to something OpenCV can use
cvFrame = jetson.utils.cudaAllocMapped(width=frame.width, height=frame.height, format="gray8")

# TODO: convert to grayscale with CUDA/in GPU MEM?
jetson.utils.cudaConvertColor(frame, cvFrame)

# make sure the GPU is done working before we convert to cv2
 jetson.utils.cudaDeviceSynchronize()

# This doesn't work:
# cvFrame = jetson.utils.cudaToNumpy(cvFrame)

cvFrame = cv2.cuda.equalizeHist(cvFrame)

# Detect areas with motion
frameDelta = cv2.cuda.absdiff(area.previous_frame, cvFrame)
thresh = cv2.cuda.threshold(frameDelta, 128, 255, cv2.THRESH_BINARY)
thresh = cv2.cuda.dilate(thresh, None, iterations=2)

But this doesn’t work because the cudaImage object that cudaConvertColor() produces can’t be given to the CV2 function(s). Also, converting to a Numpy array doesn’t work. The cv2.cuda.* functions expect a GpuMat object as input. How do I go about this efficiently?

dusty_nv · February 19, 2021, 2:07pm

OK, gotcha. I haven’t used the Python API for OpenCV’s CUDA functions before (cv2.cuda), but first try this:

gpu_frame = cv.cuda_GpuMat()
gpu_frame.upload(numpy_array)    # numpy_array is from cudaToNumpy()

Ideally you could use this constructor for GpuMat instead, which takes a user pointer and in theory would avoid the upload - however I can’t find a reference to this being done from Python since OpenCV has non-existent Python documentation.

My cudaImage object has a .ptr member with the CUDA memory address, should you be able to use the above constructor from Python. Then you could skip the whole numpy part.

Also, if you are running your code above in a loop (i.e. processing a video stream), you will not want to allocate the data each frame - instead allocate it beforehand, or allocate it on the first iteration of the loop.

willemvdkletersteeg · February 22, 2021, 11:00am

Thanks! That actually works. What I do now is:

self.cv_frame = cv2.cuda_GpuMat(self.width, self.height, 0)

in the object’s constructor and then, in the loop, I run:

area.cv_frame.upload(jetson.utils.cudaToNumpy(area.bgr_sub_frame))

Which is probably not ideal because it technically downloads to hostmem and then re-uploads but I’m hoping this is still quite fast because it’s mapped? (zerocopy)

Anyhow, I can’t find any way to supply the GpuMat() constructor with a datapointer in python, afaik only the C++ implementation has/accepts such a pointer…

Thank you for the extra tip regarding the allocations, you are totally right! I alllocate it beforehand, now, and only do:

jetson.utils.cudaCrop(frame, area.sub_frame, area.roi)
jetson.utils.cudaConvertColor(area.sub_frame, area.bgr_sub_frame)
jetson.utils.cudaDeviceSynchronize()

in the processing loop before going to OpenCV. I hope I’m implementing this the most efficient way. Anyhow: it works! Thanks!

brking · June 7, 2021, 12:08pm

Hi, is this still the correct method to load a cuda frame into opencv ?

dusty_nv · June 7, 2021, 3:11pm

I’m not sure about the GpuMat() interface, but if you wanted the pointer from the cudaImage object, you can get it from cudaImage.ptr member:

# https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-image.md#image-capsules-in-python
<jetson.utils.cudaImage>
  .ptr      # memory address (not typically used)
  .size     # size in bytes
  .shape    # (height,width,channels) tuple
  .width    # width in pixels
  .height   # height in pixels
  .channels # number of color channels
  .format   # format string
  .mapped   # true if ZeroCopy

dusty_nv · June 7, 2021, 3:12pm

Hi @brking, yes the cudaToNumpy() method is still the way.

brking · June 9, 2021, 9:21pm

Thank you. I’m getting an error with cv2.cvtColor(array, cv2.COLOR_RGB2BGR). It doesn’t seem to find that color code. Do I need to install something additional for cv2 ?

brking · June 11, 2021, 3:40pm

I think this was just me screwing up my python packages in general, nevermind.

dusty_nv · June 11, 2021, 5:33pm

OK gotcha - no worries. Glad you got it working.

Topic		Replies	Views
Python: cudaImage <-> OpenCV conversions very slow Jetson Nano cuda	11	728	January 25, 2023
Translating CPU based OpenCV code to GPU based OpenCV code Jetson TX1 opencv	3	2699	October 18, 2021
Encoding from OpenCV GpuMat and Writing Output to File Jetson Xavier NX opencv , cuda , jetson-inference	13	1421	December 15, 2023
OpenCV,CUDA,Python? Jetson Nano opencv	4	2023	October 14, 2021
cudaToNumpy -> cv2.imshow not responding, no video output, no Error - csi camera Jetson Nano camera , opencv , cuda , jetson-inference	13	7745	October 15, 2021
Equivalent of jetson.uitls.cudaFromNumpy in C++ Jetson AGX Xavier cuda	6	623	October 21, 2021
Zero Copy Memory vs Unified memory CUDA processing Jetson TX1	28	20396	October 18, 2021
Eliminate upload/download for OpenCV cuda::GpuMat using shared memory? Jetson Nano opencv	14	20716	October 14, 2021
Can not use OpenCV to display image from jetson.utils.gstCamera Jetson Nano	15	5579	April 15, 2020
jetson-inference with OpenCV camera input? Jetson TX2 opencv	14	6213	October 18, 2021

Handing off cudaImage object to OpenCV CUDA function? (expects CV::MAT)

Related topics