Inference performance measure

• Hardware Platform: GPU
• DeepStream Version: 5.0.0
• TensorRT Version:
• NVIDIA GPU Driver Version (valid for GPU only): 460.32.03

Hi, just started using DS SDK and facing some questions right now:

Question: Using deepstream-app i set up a config based on the objectDetection_SSD example. How could I measure the time needed for an inference? Not sure about terminology here right now but what I mean is the time it took for my model being executed.

You can use trtexec to measure the model’s performance.
If you need to measure the Deepstream component latency, you can refer DeepStream SDK FAQ

1 Like

That works great. Thanks. A follow up question for understanding: the primary gie component in my pipeline is in charge for doing the inference. I would like to ensure that this is actually being performed on the GPU. Component latency for primary gie from 8ms to 10ms is a sound argument for that i guess. However following the objectDetection_SSD example and it’s custom bounding box parser which i customized to suite my model I can see in the Makefile it’s being compiled with g++, not cuda as I would have expected. So question is if this is still being executed on the GPU and if there is any resource for this I can read up for better understanding as i expected only cuda-compiled code to be executed on the GPU or is this any nvinfer or tensorrt voodoo?

There is only a post process parser code under the dir, a typical process in gst_nvinfer pluin will be preprocess->inference->postprocess, the memory already copied to host memory from device memory when the postprocess executed. For inference, it will call TensorRT lib, so the inference must be executed on device.
You can refer and

I see. Can you tell me where in the pipeline the bounding boxes are actually drawn to the frames and how they get forwarded to there?

OSD do the drawing based on the display meta data.