Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version Deepstream 6.0
• JetPack Version (valid for Jetson only) 4.6.2
• TensorRT Version 8.2.1.8
**. OpenCV 4.6.0 CUDA compiled
• Issue Type( questions, new requirements, bugs) Questions
I’m using DeepStream Python code. In the probe callback function, I copy the frame from the DeepStream pipeline to the CPU and perform some image processing on it. In order to accelerate it, I use cv2.cuda functions. This is my code:
import numpy as np
import cv2
def get_new_frame(pad, info, u_data): # probe callback function
gst_buffer = info.get_buffer()
if not gst_buffer:
print("Unable to get GstBuffer ")
return
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
try:
frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
except StopIteration:
break
n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)
GPU_raw = cv2.cuda_GpuMat()
GPU_raw.upload(n_frame)
warp_matrix = np.array([[-1.73323209e+00, 3.06689479e+00, 2.62390280e+03],
[-6.86363713e-16, 5.28105496e+00, -9.71431119e+02],
[-5.07596244e-19, 3.19468212e-03, 1.00000000e+00]])
im_res = cv2.cuda.warpPerspective(GPU_raw, warp_matrix, (1920, 1080))
GPU_frame = cv2.cuda.resize(im_res, (717, 403), interpolation=cv2.INTER_LINEAR)
GPU_crop = cv2.cuda_GpuMat(GPU_frame, (78, 34, 560, 336)) # (top corner X, top corner Y, width X, width Y)
cuMat = cv2.cuda_GpuMat(336, 560, cv2.CV_32FC4)
GPU_crop_scale = GPU_crop.convertTo(cv2.CV_32FC4, 1/255, cv2.cuda.Stream_Null(), cuMat)
frame = np.zeros([336, 560, 4])
frame = GPU_crop_scale.download(frame)
try:
l_frame = l_frame.next
except StopIteration:
break
return Gst.PadProbeReturn.OK
As you can see, I copy (by n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)) from GPU to CPU and in the next line (GPU_raw.upload(n_frame)) back to GPU. Is there a way to avoid going through the CPU? Transferring from CPU to GPU is from a time perspective a costly operation.