Post-processing without loading frames into CPU

Mohammad_Amin_Parchami · May 12, 2021, 10:17am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 5.1
• Issue Type( questions, new requirements, bugs) Question

I am planning to do post-processing on nvinfer results. I have an object detection TLT model running on a nvinfer element. For instance, I would like to crop the frames using the model’s output bounding box. For this, of course, I can write a custom plugin that loads the frame and meta from the buffer and produces the crops. However, this will have the overhead of loading the entire frame to the CPU, which is inefficient. How can I improve this? How can I avoid loading the entire frame from buffer to CPU? Is there any plugin?

mchi · May 13, 2021, 2:02am

do you mean you want to copy part of the frame into CPU buffer?
What the frame format you want?
What region of frame you want to copy out? the region in bbox?

Mohammad_Amin_Parchami · May 15, 2021, 7:33am

Yes, for instance, only the detected bounding boxes. Imagine that I want to send the crops to a web API. I don’t need the entire frame, but only the crops.

To load the frame from the buffer with get_nvds_buf_surface, I had to convert it to RGBA format with nvvideconvert. However, the format I am using in general is RGB.

Exactly.

mchi · May 15, 2021, 7:41am

you can use cudaMemcpy2d*() API to copy the region from GPU memory to CPU memory.

Mohammad_Amin_Parchami · May 21, 2021, 7:26pm

Thank you for your help. I will definitely try this out. In the mean time, is there any way to do this in Python? (in the chain function of a custom plugin that is written in Python)

mchi · May 25, 2021, 5:41am

there is pycuda API, you may take a try - Device Interface - pycuda 2022.1 documentation