I was testing a project performance on a system with a Titan RTX
the frame rate drops to ~3 fps when the GPU still has a lot of VRAM, CPU, and RAM available, and while the GPU isn’t at 100% utilization
what is the cause for that ?
the model used is a converted yolo v4
**• Hardware Platform: Titan RTX GPU **
• DeepStream Version 5.0
• TensorRT Version 7.0
• NVIDIA GPU Driver Version: 460.91.03
• Issue Type ( question / bugs)
• How to reproduce the issue? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
1 Like
mchi
October 13, 2021, 1:31am
3
mchi
October 13, 2021, 1:36am
4
And, you can extract the trtexec from the TensorRT tar package that can be downloaded from https://developer.nvidia.com/nvidia-tensorrt-7x-download .
I used an explicit batch since it’s a dynamic model
and here’s the result of running it
is this what’s needed, and what does it indicate?
root@e06e7092f0cf:/workspace/pytorch-YOLOv4# trtexec --explicitBatch --workspace=15120 --fp16 --optShapes=input:3x3x608x608 --maxShapes=input:30x3x608x608 --minShapes=input:1x3x608x608 --shapes=input:30x3x608x608 --useSpinWait --loadEngine=yolov4-dynamic.engine
&&&& RUNNING TensorRT.trtexec # trtexec --explicitBatch --workspace=15120 --fp16 --optShapes=input:3x3x608x608 --maxShapes=input:30x3x608x608 --minShapes=input:1x3x608x608 --shapes=input:30x3x608x608 --useSpinWait --loadEngine=yolov4-dynamic.engine
[10/18/2021-09:17:04] [I] === Model Options ===
[10/18/2021-09:17:04] [I] Format: *
[10/18/2021-09:17:04] [I] Model:
[10/18/2021-09:17:04] [I] Output:
[10/18/2021-09:17:04] [I] === Build Options ===
[10/18/2021-09:17:04] [I] Max batch: explicit
[10/18/2021-09:17:04] [I] Workspace: 15120 MB
[10/18/2021-09:17:04] [I] minTiming: 1
[10/18/2021-09:17:04] [I] avgTiming: 8
[10/18/2021-09:17:04] [I] Precision: FP16
[10/18/2021-09:17:04] [I] Calibration:
[10/18/2021-09:17:04] [I] Safe mode: Disabled
[10/18/2021-09:17:04] [I] Save engine:
[10/18/2021-09:17:04] [I] Load engine: yolov4-dynamic.engine
[10/18/2021-09:17:04] [I] Inputs format: fp32:CHW
[10/18/2021-09:17:04] [I] Outputs format: fp32:CHW
[10/18/2021-09:17:04] [I] Input build shape: input=1x3x608x608+3x3x608x608+30x3x608x608
[10/18/2021-09:17:04] [I] === System Options ===
[10/18/2021-09:17:04] [I] Device: 0
[10/18/2021-09:17:04] [I] DLACore:
[10/18/2021-09:17:04] [I] Plugins:
[10/18/2021-09:17:04] [I] === Inference Options ===
[10/18/2021-09:17:04] [I] Batch: Explicit
[10/18/2021-09:17:04] [I] Iterations: 10
[10/18/2021-09:17:04] [I] Duration: 3s (+ 200ms warm up)
[10/18/2021-09:17:04] [I] Sleep time: 0ms
[10/18/2021-09:17:04] [I] Streams: 1
[10/18/2021-09:17:04] [I] ExposeDMA: Disabled
[10/18/2021-09:17:04] [I] Spin-wait: Enabled
[10/18/2021-09:17:04] [I] Multithreading: Disabled
[10/18/2021-09:17:04] [I] CUDA Graph: Disabled
[10/18/2021-09:17:04] [I] Skip inference: Disabled
[10/18/2021-09:17:04] [I] Inputs:
[10/18/2021-09:17:04] [I] === Reporting Options ===
[10/18/2021-09:17:04] [I] Verbose: Disabled
[10/18/2021-09:17:04] [I] Averages: 10 inferences
[10/18/2021-09:17:04] [I] Percentile: 99
[10/18/2021-09:17:04] [I] Dump output: Disabled
[10/18/2021-09:17:04] [I] Profile: Disabled
[10/18/2021-09:17:04] [I] Export timing to JSON file:
[10/18/2021-09:17:04] [I] Export output to JSON file:
[10/18/2021-09:17:04] [I] Export profile to JSON file:
[10/18/2021-09:17:04] [I]
[10/18/2021-09:17:09] [I] Warmup completed 0 queries over 200 ms
[10/18/2021-09:17:09] [I] Timing trace has 0 queries over 3.51998 s
[10/18/2021-09:17:09] [I] Trace averages of 10 runs:
[10/18/2021-09:17:09] [I] Average on 10 runs - GPU latency: 144.241 ms - Host latency: 198.804 ms (end to end 296.445 ms)
[10/18/2021-09:17:09] [I] Average on 10 runs - GPU latency: 142.773 ms - Host latency: 196.592 ms (end to end 285.503 ms)
[10/18/2021-09:17:09] [I] Host latency
[10/18/2021-09:17:09] [I] min: 195.419 ms (end to end 283.243 ms)
[10/18/2021-09:17:09] [I] max: 220.26 ms (end to end 381.42 ms)
[10/18/2021-09:17:09] [I] mean: 197.52 ms (end to end 290.225 ms)
[10/18/2021-09:17:09] [I] median: 196.625 ms (end to end 285.56 ms)
[10/18/2021-09:17:09] [I] percentile: 220.26 ms at 99% (end to end 381.42 ms at 99%)
[10/18/2021-09:17:09] [I] throughput: 0 qps
[10/18/2021-09:17:09] [I] walltime: 3.51998 s
[10/18/2021-09:17:09] [I] GPU Compute
[10/18/2021-09:17:09] [I] min: 141.602 ms
[10/18/2021-09:17:09] [I] max: 159.055 ms
[10/18/2021-09:17:09] [I] mean: 143.377 ms
[10/18/2021-09:17:09] [I] median: 142.808 ms
[10/18/2021-09:17:09] [I] percentile: 159.055 ms at 99%
[10/18/2021-09:17:09] [I] total compute time: 3.29768 s
&&&& PASSED TensorRT.trtexec # trtexec --explicitBatch --workspace=15120 --fp16 --optShapes=input:3x3x608x608 --maxShapes=input:30x3x608x608 --minShapes=input:1x3x608x608 --shapes=input:30x3x608x608 --useSpinWait --loadEngine=yolov4-dynamic.engine
mchi
October 21, 2021, 3:57am
6
from the log - “[10/18/2021-09:17:09] [I] mean: 143.377 ms”, its inference fps could be “(1000 / 143) * batch_size” = (1000 / 143) * 30 = 209 fps.
mchi
October 21, 2021, 3:59am
7
how do you check the fps? what’s the pipeline? could you refer to DeepStream SDK FAQ - #10 by mchi to dump the pipeline graph?
system
Closed
November 9, 2021, 1:34am
9
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.