Copy DeepStream frame to cv2.cuda_GpuMat object

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version Deepstream 6.0
• JetPack Version (valid for Jetson only) 4.6.2
• TensorRT Version
**. OpenCV 4.6.0 CUDA compiled
• Issue Type( questions, new requirements, bugs) Questions
I’m using DeepStream Python code. In the probe callback function, I copy the frame from the DeepStream pipeline to the CPU and perform some image processing on it. In order to accelerate it, I use cv2.cuda functions. This is my code:

import numpy as np
import cv2

def get_new_frame(pad, info, u_data): # probe callback function
gst_buffer = info.get_buffer()
if not gst_buffer:
print("Unable to get GstBuffer ")
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
frame_meta = pyds.NvDsFrameMeta.cast(
except StopIteration:
n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)
GPU_raw = cv2.cuda_GpuMat()
warp_matrix = np.array([[-1.73323209e+00, 3.06689479e+00, 2.62390280e+03],
[-6.86363713e-16, 5.28105496e+00, -9.71431119e+02],
[-5.07596244e-19, 3.19468212e-03, 1.00000000e+00]])
im_res = cv2.cuda.warpPerspective(GPU_raw, warp_matrix, (1920, 1080))
GPU_frame = cv2.cuda.resize(im_res, (717, 403), interpolation=cv2.INTER_LINEAR)
GPU_crop = cv2.cuda_GpuMat(GPU_frame, (78, 34, 560, 336)) # (top corner X, top corner Y, width X, width Y)
cuMat = cv2.cuda_GpuMat(336, 560, cv2.CV_32FC4)
GPU_crop_scale = GPU_crop.convertTo(cv2.CV_32FC4, 1/255, cv2.cuda.Stream_Null(), cuMat)
frame = np.zeros([336, 560, 4])
frame =
l_frame =
except StopIteration:
return Gst.PadProbeReturn.OK

As you can see, I copy (by n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)) from GPU to CPU and in the next line (GPU_raw.upload(n_frame)) back to GPU. Is there a way to avoid going through the CPU? Transferring from CPU to GPU is from a time perspective a costly operation.

Could you try to use pyds.get_nvds_buf_surface_gpu instead of get_nvds_buf_surface?

Thank you for the quick reply.
When I do this, I get the following: data_type, shape, strides, dataptr, size = pyds.get_nvds_buf_surface_gpu(hash(gst_buffer), frame_meta.batch_id)
RuntimeError: get_nvds_buf_Surface: Currently we only support x86.

Currently, no relevant binding has been made about get_nvds_buf_surface_gpu on Jetson. You can try to bind by yourself and we’ll check if we can support this later.

Making Python bindings is new to me. Could you provide me an example?

You can refer to the link below as the binding is open source code.

Thank you very much. I’ll take a look at it. In the mean time, I found other ways of accelerating the code.