How to use enable_perf_measurement=1 using dockerd

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) A6000
• DeepStream Version 6.4
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, I am using docker to run my deepstream application. However, I would like to know the latency of each plugins and the average fps of the application. May i know how can i export the command in docker compose file to use in my deepstream docker application.

I have tried command: > sh -c "export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 && python3 deepstream_lpr_app.py" but it only prompts until my environmental variables, and the docker exited with a code 0.

If you want get latency of element in Python, please refer to this FAQ.

This doesn’t seem to be an official sample.This requires a little modification to your code

Thanks for the resources, i am able to get the latency. Do you mind sharing ideas on how to reduce the latency for each of the plugins in the deepstream pipeline? Besides, for enable-perf-measurement, can we directly use export enable-perf-measurement=1?

What is your pipeline and what elements do you use? For the commonly used elements below

  1. For nvv4l2decoder, drop-frame-interval property is use to drop the frames,ex: value of 5 means every 5th frame will be given by decoder, rest all dropped

  2. For nvstreammux, use original frame width and height to avoid scaling.

  3. For nvinfer, interval property specifies number of consecutive batches to be skipped for inference.

This is only for deepstream-app

For what have u mentioned, we have tried it before. However, without dropping frames or setting the interval property for nvinfer, the amount of CCTVs that are able to handled is ard 10 CCTV @ 25fps. We noticed the GPU utilisation is 100% but the decoder utilisation is only at 10%. Our current deepstream are using 7 model to inference the attribute of a vehicle. May i know any suggestion to reduce the GPU utilisation?

Besides, i noticed the latency for each plugins, where the streammux contains the most latency, ~210ms. Any suggestion to reduce this latency?

Try to optimize the model first. Is the model’s precision fp32 or int8? int8 may lose some accuracy, but can improve performance.

Did you set a longer timeout for nvstreammux? If no scaling is done, nvstreammux is usually not heavily loaded

Yup, we are proceeding to this now.

The configurations we set for streammux is as below:

streammux.set_property("width", 1920)
streammux.set_property("height", 1080)
streammux.set_property("batch-size", number_sources)
streammux.set_property("batched-push-timeout", 40000)

The width and height are the original resolution of our rtsp stream, will the scaling be carried out even when the input stream is same as the scalling for nvstreammux? Besides, for the timeout is set at 40000, as the livestreams are running at 25fps.

You are right. I think there should be no problem with this configuration. Try to optimize the model first. I have no better suggestions.

I don’t think the model is much an issue as they are not contributing much of latency. All within 1-25ms. Only for the streammux that is contributing for 250-300ms. Any idea or are there alternative than using the nvstreammux.

300 milliseconds is indeed too long. Have you configured any additional parameters? Try adding live-source=1 and the following FAQ

streammux.set_property("live-source", 1)

nvstreammux is required because the images need to be formed into a batch.

You can follow the guidance of FAQ and try to use new nvstreammux

Thanks for the hint, i forgot to switch off the live-source to 0 as i am testing using video files only. My average latency for each batch is around 11ms now for running live rtsp sources. However, there will be a spike where average latency goes up until 72ms.

Source id = 0 Frame_num = 6725 Frame latency = 11.65185546875 (ms) 
Source id = 0 Frame_num = 6726 Frame latency = 11.420166015625 (ms) 
Source id = 0 Frame_num = 6727 Frame latency = 11.5380859375 (ms) 
Source id = 0 Frame_num = 6728 Frame latency = 11.59814453125 (ms) 
Source id = 0 Frame_num = 6729 Frame latency = 11.676025390625 (ms) 
Source id = 0 Frame_num = 6730 Frame latency = 11.515869140625 (ms) 
Source id = 0 Frame_num = 6731 Frame latency = 11.284912109375 (ms) 
Source id = 0 Frame_num = 6732 Frame latency = 11.794921875 (ms) 
Source id = 0 Frame_num = 6733 Frame latency = 11.387939453125 (ms) 
Source id = 0 Frame_num = 6734 Frame latency = 11.636962890625 (ms) 
Source id = 0 Frame_num = 6735 Frame latency = 11.486083984375 (ms) 
Source id = 0 Frame_num = 6736 Frame latency = 11.14599609375 (ms) 
Source id = 0 Frame_num = 6737 Frame latency = 11.1728515625 (ms) 
Source id = 0 Frame_num = 6738 Frame latency = 11.73486328125 (ms) 
Source id = 0 Frame_num = 6739 Frame latency = 11.711181640625 (ms) 
Source id = 0 Frame_num = 6740 Frame latency = 11.695068359375 (ms) 
Source id = 0 Frame_num = 6741 Frame latency = 11.60205078125 (ms) 
Source id = 0 Frame_num = 6742 Frame latency = 11.556884765625 (ms) 
Source id = 0 Frame_num = 6743 Frame latency = 11.219970703125 (ms) 
Source id = 0 Frame_num = 6744 Frame latency = 11.4208984375 (ms) 
Source id = 0 Frame_num = 6745 Frame latency = 11.469970703125 (ms) 
Source id = 0 Frame_num = 6746 Frame latency = 11.572021484375 (ms) 
Source id = 0 Frame_num = 6747 Frame latency = 13.178955078125 (ms) 
Source id = 0 Frame_num = 6748 Frame latency = 16.723876953125 (ms) 
Source id = 0 Frame_num = 6749 Frame latency = 20.635009765625 (ms) 
Source id = 0 Frame_num = 6750 Frame latency = 23.602783203125 (ms) 
Source id = 0 Frame_num = 6751 Frame latency = 26.8818359375 (ms) 
Source id = 0 Frame_num = 6752 Frame latency = 30.07080078125 (ms) 
Source id = 0 Frame_num = 6753 Frame latency = 33.703857421875 (ms) 
Source id = 0 Frame_num = 6754 Frame latency = 37.684814453125 (ms) 
Source id = 0 Frame_num = 6755 Frame latency = 42.52294921875 (ms) 
Source id = 0 Frame_num = 6756 Frame latency = 45.56396484375 (ms) 
Source id = 0 Frame_num = 6757 Frame latency = 50.10498046875 (ms) 
Source id = 0 Frame_num = 6758 Frame latency = 52.7958984375 (ms) 
Source id = 0 Frame_num = 6759 Frame latency = 62.435791015625 (ms) 
Source id = 0 Frame_num = 6760 Frame latency = 64.0498046875 (ms) 
Source id = 0 Frame_num = 6761 Frame latency = 66.85888671875 (ms) 
Source id = 0 Frame_num = 6762 Frame latency = 72.3369140625 (ms) 
Source id = 0 Frame_num = 6763 Frame latency = 66.775146484375 (ms) 
Source id = 0 Frame_num = 6764 Frame latency = 10.366943359375 (ms) 
Source id = 0 Frame_num = 6765 Frame latency = 9.678955078125 (ms) 
Source id = 0 Frame_num = 6766 Frame latency = 10.300048828125 (ms) 
Source id = 0 Frame_num = 6767 Frame latency = 16.987060546875 (ms) 
Source id = 0 Frame_num = 6768 Frame latency = 11.588134765625 (ms) 
Source id = 0 Frame_num = 6769 Frame latency = 11.590087890625 (ms) 
Source id = 0 Frame_num = 6770 Frame latency = 11.429931640625 (ms) 
Source id = 0 Frame_num = 6771 Frame latency = 11.4638671875 (ms) 
Source id = 0 Frame_num = 6772 Frame latency = 11.30810546875 (ms) 
Source id = 0 Frame_num = 6773 Frame latency = 11.06982421875 (ms) 
Source id = 0 Frame_num = 6774 Frame latency = 11.60400390625 (ms) 
Source id = 0 Frame_num = 6775 Frame latency = 11.093994140625 (ms) 
Source id = 0 Frame_num = 6776 Frame latency = 11.593994140625 (ms) 
Source id = 0 Frame_num = 6777 Frame latency = 11.488037109375 (ms) 

This spike happens every 28-35 frames interval.
Besides, I notice that the spike for the frames is caused by the nvstreammux. Averagely it uses 0.2ms but out of a sudden it can spike for few frames until reaches 56ms.

Any idea of this spike happening?

1.This problem may be caused by high gpu loading. If the GPU loading is too high, nvinfer will consume a batch slowly.
The batch is allocated from the memory pool of nvstreammux. If there is no available memory in the pool, nvstreammux will be stuck. Therefore, you need to optimize the model first to reduce GPU loading.
Increasing the value of buffer-pool-size property of nvstreammux can alleviate this problem, but this is not the root cause.

2.For LPR, the number of vehicles and license plates also affects GPU loading.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.