Copy DeepStream frame to cv2.cuda_GpuMat object

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version Deepstream 6.0
• JetPack Version (valid for Jetson only) 4.6.2
• TensorRT Version 8.2.1.8
**. OpenCV 4.6.0 CUDA compiled
• Issue Type( questions, new requirements, bugs) Questions
I’m using DeepStream Python code. In the probe callback function, I copy the frame from the DeepStream pipeline to the CPU and perform some image processing on it. In order to accelerate it, I use cv2.cuda functions. This is my code:

import numpy as np
import cv2

def get_new_frame(pad, info, u_data): # probe callback function
gst_buffer = info.get_buffer()
if not gst_buffer:
print("Unable to get GstBuffer ")
return
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
try:
frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
except StopIteration:
break
n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)
GPU_raw = cv2.cuda_GpuMat()
GPU_raw.upload(n_frame)
warp_matrix = np.array([[-1.73323209e+00, 3.06689479e+00, 2.62390280e+03],
[-6.86363713e-16, 5.28105496e+00, -9.71431119e+02],
[-5.07596244e-19, 3.19468212e-03, 1.00000000e+00]])
im_res = cv2.cuda.warpPerspective(GPU_raw, warp_matrix, (1920, 1080))
GPU_frame = cv2.cuda.resize(im_res, (717, 403), interpolation=cv2.INTER_LINEAR)
GPU_crop = cv2.cuda_GpuMat(GPU_frame, (78, 34, 560, 336)) # (top corner X, top corner Y, width X, width Y)
cuMat = cv2.cuda_GpuMat(336, 560, cv2.CV_32FC4)
GPU_crop_scale = GPU_crop.convertTo(cv2.CV_32FC4, 1/255, cv2.cuda.Stream_Null(), cuMat)
frame = np.zeros([336, 560, 4])
frame = GPU_crop_scale.download(frame)
try:
l_frame = l_frame.next
except StopIteration:
break
return Gst.PadProbeReturn.OK

As you can see, I copy (by n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)) from GPU to CPU and in the next line (GPU_raw.upload(n_frame)) back to GPU. Is there a way to avoid going through the CPU? Transferring from CPU to GPU is from a time perspective a costly operation.

Could you try to use pyds.get_nvds_buf_surface_gpu instead of get_nvds_buf_surface?

Thank you for the quick reply.
When I do this, I get the following: data_type, shape, strides, dataptr, size = pyds.get_nvds_buf_surface_gpu(hash(gst_buffer), frame_meta.batch_id)
RuntimeError: get_nvds_buf_Surface: Currently we only support x86.

Currently, no relevant binding has been made about get_nvds_buf_surface_gpu on Jetson. You can try to bind by yourself and we’ll check if we can support this later.

Making Python bindings is new to me. Could you provide me an example?

You can refer to the link below as the binding is open source code.
https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/bindings

Thank you very much. I’ll take a look at it. In the mean time, I found other ways of accelerating the code.

Hi @jan.anthonis ,
Do you still need help on this topic? Please let us know whether this topic can be closed, thank you.

Dear Yingliu,

Thank you very much for the rapid response and support.

The topic can be closed.

Best regards,

Jan

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.