How to adjust the paramerters to acclearte the yolov7 on deepstream? I got fps 8, i think it must be happened something wrong when i did

TESLA T4
Deepstream-6.1.1 docker environment
[NVIDIA-AI-IOT - yolo_deepstream]
choesd yolov7.onnx → fp16.engine
batch-size = 16
detected sample_1080p_h264.mp4
model = yolov7.onnx
deepstream_test_config.txt (3.9 KB)

finally, max fps is 8.
(upload://1TdFWiMOdOAx5oNtJNPSizOBL24.txt) (3.1 KB)
command = deepstream-app -c deepstream_test_config.txt

can you refer to GitHub - NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt

yep, i followed yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub

But the sample in the project can get hundreds of fps, what’s the difference between yours and the project?

i also want to know …

do you mean you didn’t make any change to the github code?

i do not change the code, the two config txts i have been uploaded.

can you use trtexec to run fp16.engine and check the QPS?

/usr/src/tensorrt/bin/trtexec --loadEngine=fp16.engine

the total_fps is: batch_number * batch/second ?

is 8 the total_fps or batch/second?

first,qps:Throughput: 115.914 qps,Latency: min = 9.33789 ms, max = 14.3026 ms, mean = 9.67278 ms, median = 9.61938 ms, percentile(99%) = 10.2032 ms
[11/22/2022-02:23:05] [I] Enqueue Time: min = 0.906738 ms, max = 2.44945 ms, mean = 1.70993 ms, median = 1.74048 ms, percentile(99%) = 2.29266 ms
[11/22/2022-02:23:05] [I] H2D Latency: min = 0.406738 ms, max = 0.47998 ms, mean = 0.42492 ms, median = 0.422363 ms, percentile(99%) = 0.471191 ms
[11/22/2022-02:23:05] [I] GPU Compute Time: min = 8.26126 ms, max = 13.2157 ms, mean = 8.58368 ms, median = 8.53056 ms, percentile(99%) = 9.12329 ms
[11/22/2022-02:23:05] [I] D2H Latency: min = 0.653076 ms, max = 0.677811 ms, mean = 0.664174 ms, median = 0.663818 ms, percentile(99%) = 0.671997 ms
[11/22/2022-02:23:05] [I] Total Host Walltime: 3.02811 s
[11/22/2022-02:23:05] [I] Total GPU Compute Time: 3.01287 s
[11/22/2022-02:23:05] [W] * GPU compute time is unstable, with coefficient of variance = 3.40474%.
[11/22/2022-02:23:05] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[11/22/2022-02:23:05] [I] Explanations of the performance metrics are printed in the verbose logs.
second, the total_fps = 8*16=128

Did you boost T4 clocks?

With the YoloV7.onnx from the project, I can get ~137 fps as below.

sorry,i do not know abult T4 clocks, can you explain more about it ?

boost GPU frequency

$ sudo nvidia-smi -pm ENABLED -i 0 // suppose T4 GPU id is 0
$ sudo nvidia-smi -ac “5001,1590” -i 0 // set memory clock and the graphics clock
$ nvidia-smi -q -d CLOCK -i 0 // confirm

And, since the fps you got is lower than what I got as screenshot above (115 vs 137), besides GPU clock, CPU capability may be another possible reason.

Hi @383796283
Any other question about this?

1 Like

nope
,thannks for your help