Post-processing without loading frames into CPU

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 5.1
• Issue Type( questions, new requirements, bugs) Question

I am planning to do post-processing on nvinfer results. I have an object detection TLT model running on a nvinfer element. For instance, I would like to crop the frames using the model’s output bounding box. For this, of course, I can write a custom plugin that loads the frame and meta from the buffer and produces the crops. However, this will have the overhead of loading the entire frame to the CPU, which is inefficient. How can I improve this? How can I avoid loading the entire frame from buffer to CPU? Is there any plugin?

do you mean you want to copy part of the frame into CPU buffer?
What the frame format you want?
What region of frame you want to copy out? the region in bbox?

Yes, for instance, only the detected bounding boxes. Imagine that I want to send the crops to a web API. I don’t need the entire frame, but only the crops.

To load the frame from the buffer with get_nvds_buf_surface, I had to convert it to RGBA format with nvvideconvert. However, the format I am using in general is RGB.

Exactly.

you can use cudaMemcpy2d*() API to copy the region from GPU memory to CPU memory.

Thank you for your help. I will definitely try this out. In the mean time, is there any way to do this in Python? (in the chain function of a custom plugin that is written in Python)

there is pycuda API, you may take a try - Device Interface - PyCUDA 2021.1 documentation