Print inference time in deepstream 5.1 on TX2NX


Here is my environment:

• Hardware Platform (Jetson TX2NX)
• DeepStream 5.1
• JetPack Version 4.5.1
• TensorRT Version 7.1.3

I am trying to print the inference time of my model, I followed the instructions in this post:

However, once I recompile libs/nvdsinfer/ and install the files and run my pipeline:

$ gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4 ! \
qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! \
vvideoconvert ! nvinfer config-file-path= config_infer.txt ! perf ! fakesink

I got a message telling me that it took approximately 4415 us to perform inference in my model. But if I use an external tool such as gst-perf I found that my pipeline is running at 12 FPS, which does not match with the message printed (without nvinfer I got a high FPS with gst-perf).

In the past I have tested to run the model using the Python TensorRT API and the inference takes around 83 ms.

I tested this approach with the same model in Jetpack 4.4 on TX2 and I had 83-90 ms of inference time.

What is your problem?

Hello Fiona,

My problem is that the “inference time” in sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp::NvDsInferContextImpl::queueInputBatch is incorrect, I am getting an inference time of 4.4 ms but the FPS of my pipeline is 12 FPS (83 ms). I know for sure that the inference time is way bigger than 4.4 ms.

Do you have another method to measure the inference time in nvinfer using Deepstream?

The pipeline will do a lot of things, not only inferencing. Even with nvinfer, NvDsInferContextImpl::queueInputBatch is just the model inferencing time, there is also preprocessing, postprocessing inside nvinfer.

What you want is just inference time, the result is correct. Pipeline speed is decided by all processing happens in it(video reading, video decoding, video scaling, batching, display composition,… The FPS never reflect inferencing speed.

Please disable sink synchronization by setting “sync=0” with fakesink plugin.

I set a timer in gst-nvinfer get_converted_buffer to measure the pre-processing and I got 0.244 msm for the postprocessing I am using a custom function that returns an empty vectorm I know that using python and tensorRT I have an inference time of 80 ms.

Here is my configuration file for nvinfer:

#0=RGB, 1=BGR
# Pathname of the ONNX model file (Our current network uses this type of format)
# Pathname of the serialized model engine file
# Pathname of a text file containing the labels for the mode
# Number of frames or objects to be inferred together in a batch
## 0=FP32, 1=INT8, 2=FP16 mode
# Unique ID to be assigned to the GIE to enable the application and other elements to identify detected bounding boxes and labels
# Name of the custom bounding box parsing function. If not specified, Gst-nvinfer uses the internal function for the resnet model provided by the SDK.
# Enable tensor metadata output

I noticed that by setting output-tensor-meta=1 I got 12 ms of inference time, but if I change output-tensor-meta=0 I have an inference time of 70 ms which seems more accurate according to my experience running the trt Engine in Python.

How can output-tensor-meta change the result of the inference time?

Please refer to the gst-nvinfer source code. /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer

DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

output-tensor-meta=1 does not change the inference time directly. But it will output tensorRT output and skip the post processing, the processing time will be less than output-tensor-meta=0.

But the post processing is performed in dequeueOutputBatch right ?
Or there is any special post processing performed in queueInputBatch ?

If I measure the latency in dequeueOutputBatch I have 88 ms with ** output-tensor-meta=0 and ** output-tensor-meta=1.



The dequeueOutputBatch is in another thread, the total time can not be calculate by just adding for asynchronization.

Hello thank you for this clarification.

It still not clear how output-tensor-meta can change the measured time in queueInputBatch if it affects the post processing which runs in the thread dequeueOutputBatch.

Do you have a more precise way to measure the time spent in nvinfer before having an output ?

The time can be available by enable latency measurement. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

1 Like

The time can be available by enable latency measurement. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

Thank you Fiona this is precisely what I was looking for, I would like to add that in order to have a more precise inference time I had to add the parameter buffer-pool-size=1 to nvstreammux.