How can I config maxium the number of detect in frame at same time to reduce resource

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
dGPU 1660s
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

  • I have tried performance testing with my dGPU 1660s it’s seem working smoothly, yub I want to set my question that how many object does my dGPU detect (multiple stream, a stream many of object)
  • Example 1: i duplicate source x6 time source0 → source5 with same input, my video have 10 object
    => Result: output of video with detect is lag, reduce fps => result very bad
  • Example 2: i duplicate source x6 time (like above) but video only 1 object detected
    => Result: output of video perfectly seem good with no issue
    ===> so I think deepstream solve overload detect make result video very bad, i tried to find limit of number detection but seem like doesn’t support, isn’t it ? please give me a solution
    Thanks you very much

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi @kylenyan1215 ,
Normally, the inference time (if there is not NMS in the model) is independent of object number, the post-processing time may be related to the object number.
Have you found out which component of your pipeline is affected by the the object number and could become a bottleneck of your pipeline?

yeah another problem i take some hours to check component bottleneck seem use multiple stream (one stream) use tiled display with high resolution (row 2, column 3) so I set height =1080 * 2 = 2160, width 1920 * 3 = 5760 hahaha this my fail, I change this with width=1920, height=720 (same width/height percent) and look like work best performance

but another question about pushing my code to jetson nano ?, how can I calculate my code can run in jetson nano with the performance (how many object it can be detected, how many stream it can handle , yeah I known jetson nano max decode and endcode 2 stream from camera) I would like to reach maximum camera I can handle,
But look compare my code is running on dGPU currently using is 1660super with 1408 cuda core and just handle 6 camera (30fps) vs jetson nano with 128 cuda core ??? it’s my code handle so bad performance isn’t it ?

does you mean this issue has been solved?

It’s hard to get the perf by only calculation since there are many other differences besides GPU, such as DDR bandwidth, CPU capability, etc. Since DeepStream application is easy to port from x86+dGPU platform to Jetson platform, you can port it to NANO and benchmark.

actually is no, some way i need to reduce some thing, may number of object detect seem like hard to do, another solution can I reduce FPS ?, with only 10 fps not origin 30 FPS

i’m sure that CPU, memory, DDR bandwidth is ventilation it depend performance of GPU cuda

Here is nvtop of my dGPU

As said above, I don’t think it can be calculated, at least for me as there are too few informantion for me about your application, such as no info about the pipeline, the bottleneck of the pipeline, how you handle the objects…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.