Deepstream 6 python app performance degradation

I have developed a custom python app using DeepStream: primary detector + classifier with custom post-processing (output-tensor-meta=1).
With DS 5.1 ngc container my app runs ~250 FPS, with DS 6.0 ~140 FPS.

For DS 6.0 container I installed pyds from https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/releases/download/v1.1.0/pyds-1.1.0-py3-none-linux_x86_64.whl

I tried sample app from deepstream_python_apps, deepstream-test3 and deepstream-imagedata-multistream with one file source (sample_720p.mp4) and one small patch nveglglessinkfakesink sync=0.

  • deepstream-test3.py runs 350 FPS in DS 5.1 and 340 FPS in DS 6.0.
  • deepstream-imagedata-multistream (without frame saving) runs 225 FPS in DS 5.1 and 216 FPS in DS 6.0

I checked resnet10.caffemodel_b1_gpu0_int8.engine (primary detector for sample apps) with trtexec, for TRT 7.2.2 I got Host Latency end to end 0.815338 ms at 99%, throughput: 2153.88 qps, and for TRT 8.0.1 End-to-End Host Latency percentile(99%) = 0.802185 ms, Throughput: 2155.33 qps, thus the engine is even slightly faster in DS 6.0.

I tried downgrading pyds in DS 6.0 (1.1.0 to 1.0.2):

  • deepstream-test3 - no effect
  • deepstream-imagedata-multistream - 225 FPS (= DS 5.1)
  • my app - 250 FPS (= DS 5.1)

Can you explain the reason for the performance degradation and help fix it?
Thanks

Platform

3 Likes

Sorry for the late response, is this still an issue to support? Thanks

Yes, pyds 1.1.0 is noticeably slower under heavy use than 1.0.2.

I observed the same effect for the DeepStream primary detector and various custom models.

Hi @metarefl ,
In the three cases (your app, deepstream-test3, deepstream-imagedata-multistream), seems only your app is affected by different pyds seriously (250 vs 140) while the others two have much less perf drop, right?

Is your detection running with TensorRT backend?
Is it possible for you to narrow down which components cause the perf drop referring to DeepStream SDK FAQ - #12 by bcao ?

Yes, my app affected more seriously. Python sample apps are very simple and use little bindings.

“Is your detection running with TensorRT backend?” - yes, it is TRT engine with tensor output post processing ending with adding object meta to frame

“Is it possible for you to narrow down which components cause the perf drop referring to DeepStream SDK FAQ - #12 by bcao ?” - no, it’s python app and pyds doesn’t provide bindings for latency measurement api

Thanks for reporting the issue. Can you share more info on what you app does and which bindings you are using?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi @metarefl, thanks for bringing this to our attentions. Can you share some more info on your use case, especially how it deviates from our sample apps? We haven’t seen this with the sample apps so would like to understand how to replicate the issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.