What kind of hardware rigs can support 100+ videos analytics using deepstream?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 5.0.1

I’m working on a hardware plan for my client, my client ask me to deploy 10 deepstream object detection models, to deal with 100+ IPCamera video analytics, basically 1 model:10 cameras.

I’m not sure what kind of hardware rigs can support 100+ videos analytics, Here’s the conditions:

  • Assume all OD models are yolov4, which consume 1.8G graphics memory, 10 models cost 20G graphics memory, so I have to choose nvidia T4 x 2 or V100 x 1.
  • 100 1080p videos needs about 400M network bandwidth, a 1 Gig ethernet card can deal with it.
  • I don’t know how many FPS can be achieved with yolov4 in nvidia T4 or V100.
  • CPU: 20 cores 40 threads 2x Xeon E5 maybe enough?
  • memory: 64G should be ok?
  • My budget is about $4500.
  • Although nvidia doesn’t want us to use gaming gpus, but 4 gaming gpus seems really nice.

Thanks for sharing your thoughts.

Hi CoderJustin,

We have the YoloV4 perf data on T4 and Xavier@Max-N.

YOLO 416x416 on Tesla T4 16GB

YOLOv4

Max Batch Size Batch size FP16 (fps) INT8 (fps)
1 1 164 250
4 4 225 370
8 8 232 396
16 16 234 412
32 32 234 412

YOLO 416x416 on AGX Xavier

YOLOv4

Max Batch Size Batch size FP16 (fps) INT8 (fps)
1 1 60 95
4 4 72 124
8 8 75 131
16 16 77 135

We can achieve close to 400fps end-to-end performance with BS>8. If your stream is running 30 fps, then you can run about 13 streams per T4, so for 100 streams, you would need around 8 T4s. Here’s a GitHub link to learn how to run YoloV4 with DeepStream.

What is the application that you are building for your client?

If you are trying to increase the number of channels per GPU, we have Transfer learning toolkit (TLT) which offers training on lots of model architecture from high accuracy to high performance. It also offers pruning on the model to improve inference performance. I would suggest taking a look at models in TLT.

1 Like

thank you @kayccc ! I’ve tried running yolov4 benchmark on T4, but I can’t run batch size > 4, can you please check my issue here? Iplugin tensorrt engine error for ds5.0

I’ve set the batch size in [primary-gie] equals to that in [property] from pgie configuration file, which is 8, but it keep telling me that the max batch size is 1. But if I export the model engine file with static batch size=4, it can be run successfully.

The number of sources in [source0] and batch size in [streammux] are set to 8.

The second question is, can T4 decode over 100 live camera videos simultaneously?

Hi @CoderJustin,
Could you share me the pgie config?

Thanks!

@mchi here is my app config and pgie config:

  • app config:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_1080p_h264.mp4
#uri=file:/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.h264
num-sources=16
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=0
gpu-id=0
border-width=1
text-size=12
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=16
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=yolov4-hat_8.engine
labelfile-path=hat_labels.txt
batch-size=8

#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV4.txt

[tracker]
enable=0
tracker-width=512
tracker-height=320
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so

[tests]
file-loop=0
  • pgie config:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=yolov4-hat_8.engine
labelfile-path=hat_labels.txt
batch-size=8
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=2
gie-unique-id=1
network-type=0
is-classifier=0
## 0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV4
custom-lib-path=libnvdsinfer_custom_impl_Yolo.so
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
nms-iou-threshold=0.6
pre-cluster-threshold=0.4
  • exporting darknet weights to onnx (batch size set to 8):
python demo_darknet2onnx.py cfg/yolov4-hat.cfg yolov4-hat_7000.weights 233.png 8

and I can get yolov4_8_3_416_416_static.onnx model file.

  • generate engine file in T4:
/usr/src/tensorrt/bin/trtexec --onnx=yolov4_8_3_416_416_static.onnx --explicitBatch --workspace=4096 --saveEngine=yolov4-hat_8.engine --fp16
  • run the deepstream-app: deepstream-app -c deepstream_app_config_yoloV4.txt
Unknown or legacy key specified 'is-classifier' for group [property]
WARNING: ../nvdsinfer/nvdsinfer_func_utils.cpp:36 [TRT]: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
0:00:04.538091375 18351 0x55709d497610 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1701> [UID = 1]: deserialized trt engine from :/home/ubuntu/projects/deepstream-apps/deepstream-test5-safety-helmet/configs/yolov4-hat_8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input           3x416x416       
1   OUTPUT kFLOAT boxes           10647x1x4       
2   OUTPUT kFLOAT confs           10647x2         

0:00:04.538178936 18351 0x55709d497610 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1805> [UID = 1]: Use deserialized engine model: /home/ubuntu/projects/deepstream-apps/deepstream-test5-safety-helmet/configs/yolov4-hat_8.engine
0:00:04.540627042 18351 0x55709d497610 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/ubuntu/projects/deepstream-apps/deepstream-test5-safety-helmet/configs/config_infer_primary_yoloV4.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

** INFO: <bus_callback:181>: Pipeline ready

WARNING: nvdsinfer_backend.cpp:162 Backend context bufferIdx(0) request dims:2x3x416x416 is out of range, [min: 8x3x416x416, max: 8x3x416x416]
ERROR: nvdsinfer_backend.cpp:425 Failed to enqueue buffer in fulldims mode because binding idx: 0 with batchDims: 2x3x416x416 is not supported 
ERROR: nvdsinfer_context_impl.cpp:1532 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_INVALID_PARAMS
0:00:04.784615682 18351 0x55708ed23cf0 WARN                 nvinfer gstnvinfer.cpp:1216:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1216): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
Quitting
WARNING: nvdsinfer_backend.cpp:162 Backend context bufferIdx(0) request dims:6x3x416x416 is out of range, [min: 8x3x416x416, max: 8x3x416x416]
ERROR: nvdsinfer_backend.cpp:425 Failed to enqueue buffer in fulldims mode because binding idx: 0 with batchDims: 6x3x416x416 is not supported 
ERROR: nvdsinfer_context_impl.cpp:1532 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_INVALID_PARAMS
0:00:04.822287531 18351 0x55708ed23cf0 WARN                 nvinfer gstnvinfer.cpp:1216:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1216): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
App run failed

Hi @CoderJustin,
Pelase try adding “force-implicit-batch-dim=1” in the pgie config.

Thanks!

Thanks for your reply @mchi, I found the problem.
It truns out that my pytorch-yolov4 code is not the latest, after I git pull the latest code, I can export the onnx model with dynamic batch size:

# the last number set to 0, we can get the dynamic batch size.
python demo_darknet2onnx.py cfg/yolov4-hat.cfg yolov4-hat_7000.weights 233.png 0

I can got yolov4-hat-dynamic.engine file.

And then generate the engine in T4:

/usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_416_416_dynamic.onnx \
--minShapes=input:1x3x416x416 --optShapes=input:32x3x416x416 --maxShapes=input:32x3x416x416 \
--workspace=4096 --saveEngine=yolov4-hat-dynamic.engine --fp16

Make sure the minShapes, optShapes and maxShapes are added for dynamic batch size.

Finally run the app with configs mentioned aboved, I got what I want:

Unknown or legacy key specified 'is-classifier' for group [property]
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
gstnvtracker: Past frame output is OFF
0:00:02.829977566 13834 0x5645bdda2330 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1701> [UID = 1]: deserialized trt engine from :/home/ubuntu/projects/deepstream-apps/deepstream-test5-safety-helmet/configs/yolov4-hat-dynamic.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [FullDims Engine Info]: layers num: 3
0   INPUT  kFLOAT input           3x416x416       min: 1x3x416x416     opt: 32x3x416x416    Max: 32x3x416x416    
1   OUTPUT kFLOAT boxes           10647x1x4       min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT confs           10647x2         min: 0               opt: 0               Max: 0               

0:00:02.830075040 13834 0x5645bdda2330 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1805> [UID = 1]: Use deserialized engine model: /home/ubuntu/projects/deepstream-apps/deepstream-test5-safety-helmet/configs/yolov4-hat-dynamic.engine
0:00:02.838679323 13834 0x5645bdda2330 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/ubuntu/projects/deepstream-apps/deepstream-test5-safety-helmet/configs/config_infer_primary_yoloV4.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

** INFO: <bus_callback:181>: Pipeline ready
** INFO: <bus_callback:167>: Pipeline running

**PERF:  FPS 0 (Avg)	FPS 1 (Avg)	FPS 2 (Avg)	FPS 3 (Avg)	FPS 4 (Avg)	FPS 5 (Avg)	FPS 6 (Avg)	FPS 7 (Avg)	FPS 8 (Avg)	FPS 9 (Avg)	FPS 10 (Avg)	FPS 11 (Avg)	FPS 12 (Avg)	FPS 13 (Avg)	FPS 14 (Avg)	FPS 15 (Avg)	
**PERF:  8.89 (8.69)	8.89 (8.69)	9.35 (9.19)	9.35 (9.19)	9.86 (9.68)	9.07 (8.85)	8.60 (8.46)	9.24 (9.08)	8.60 (8.46)	9.35 (9.19)	9.35 (9.19)	9.35 (9.19)	9.07 (8.85)	9.88 (9.68)	9.35 (9.19)	8.89 (8.69)	
**PERF:  9.01 (8.93)	9.01 (8.93)	9.01 (9.05)	9.01 (9.05)	9.01 (9.17)	9.01 (8.97)	9.01 (8.85)	9.01 (9.02)	9.01 (8.85)	9.01 (9.05)	9.01 (9.05)	9.01 (9.05)	9.01 (8.97)	9.01 (9.15)	9.01 (9.05)	9.01 (8.93)	
**PERF:  8.94 (8.87)	8.94 (8.87)	8.94 (8.94)	8.94 (8.94)	8.94 (9.01)	8.94 (8.89)	8.94 (8.83)	8.94 (8.92)	8.94 (8.83)	8.94 (8.94)	8.94 (8.94)	8.94 (8.94)	8.94 (8.89)	8.94 (9.00)	8.94 (8.94)	8.94 (8.87)

But the top FPS is just 144 (9 FPS/video x 16 videos), seems it cannot achieve the best performance that @kayccc provides, what’s the best config for the best performance? @mchi

What’s your GPU device? The perf data was tested on Tesla T4.

I use T4, aws g4dn xlarge instance.

Try command:

$ sudo nvidia-smi -pm ENABLED -i $GPU_ID // change GPU_ID accordingly sudo nvidia-smi -ac “5001,1590” -i GPU_ID nvidia-smi -q -d CLOCK -i 0 // share me the output

(base) ubuntu@ip-172-31-13-21:~$ sudo nvidia-smi -pm ENABLED -i 0
Persistence mode is already Enabled for GPU 00000000:00:1E.0.
All done.

(base) ubuntu@ip-172-31-13-21:~$ sudo nvidia-smi -ac "5001,1590" -i 0
Applications clocks set to "(MEM 5001, SM 1590)" for GPU 00000000:00:1E.0
All done.
(base) ubuntu@ip-172-31-13-21:~$ nvidia-smi -q -d CLOCK -i 0

==============NVSMI LOG==============

Timestamp                           : Thu Oct 29 04:46:05 2020
Driver Version                      : 440.33.01
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:00:1E.0
    Clocks
        Graphics                    : 300 MHz
        SM                          : 300 MHz
        Memory                      : 405 MHz
        Video                       : 540 MHz
    Applications Clocks
        Graphics                    : 1590 MHz
        Memory                      : 5001 MHz
    Default Applications Clocks
        Graphics                    : 585 MHz
        Memory                      : 5001 MHz
    Max Clocks
        Graphics                    : 1590 MHz
        SM                          : 1590 MHz
        Memory                      : 5001 MHz
        Video                       : 1470 MHz
    Max Customer Boost Clocks
        Graphics                    : 1590 MHz
    SM Clock Samples
        Duration                    : Not Found
        Number of Samples           : Not Found
        Max                         : Not Found
        Min                         : Not Found
        Avg                         : Not Found
    Memory Clock Samples
        Duration                    : Not Found
        Number of Samples           : Not Found
        Max                         : Not Found
        Min                         : Not Found
        Avg                         : Not Found
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A

what’s the inference perf output from this command line?

The newest command:

/usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_416_416_dynamic.onnx \
--minShapes=input:1x3x416x416 --optShapes=input:32x3x416x416 --maxShapes=input:32x3x416x416 \
--workspace=4096 --saveEngine=yolov4-hat-dynamic.engine --fp16

Outputs:

[10/29/2020-06:07:05] [I] === Model Options ===
[10/29/2020-06:07:05] [I] Format: ONNX
[10/29/2020-06:07:05] [I] Model: yolov4_-1_3_416_416_dynamic.onnx
[10/29/2020-06:07:05] [I] Output:
[10/29/2020-06:07:05] [I] === Build Options ===
[10/29/2020-06:07:05] [I] Max batch: explicit
[10/29/2020-06:07:05] [I] Workspace: 4096 MB
[10/29/2020-06:07:05] [I] minTiming: 1
[10/29/2020-06:07:05] [I] avgTiming: 8
[10/29/2020-06:07:05] [I] Precision: FP16
[10/29/2020-06:07:05] [I] Calibration: 
[10/29/2020-06:07:05] [I] Safe mode: Disabled
[10/29/2020-06:07:05] [I] Save engine: yolov4-hat-dynamic.engine
[10/29/2020-06:07:05] [I] Load engine: 
[10/29/2020-06:07:05] [I] Inputs format: fp32:CHW
[10/29/2020-06:07:05] [I] Outputs format: fp32:CHW
[10/29/2020-06:07:05] [I] Input build shape: input=1x3x416x416+32x3x416x416+32x3x416x416
[10/29/2020-06:07:05] [I] === System Options ===
[10/29/2020-06:07:05] [I] Device: 0
[10/29/2020-06:07:05] [I] DLACore: 
[10/29/2020-06:07:05] [I] Plugins:
[10/29/2020-06:07:05] [I] === Inference Options ===
[10/29/2020-06:07:05] [I] Batch: Explicit
[10/29/2020-06:07:05] [I] Iterations: 10
[10/29/2020-06:07:05] [I] Duration: 3s (+ 200ms warm up)
[10/29/2020-06:07:05] [I] Sleep time: 0ms
[10/29/2020-06:07:05] [I] Streams: 1
[10/29/2020-06:07:05] [I] ExposeDMA: Disabled
[10/29/2020-06:07:05] [I] Spin-wait: Disabled
[10/29/2020-06:07:05] [I] Multithreading: Disabled
[10/29/2020-06:07:05] [I] CUDA Graph: Disabled
[10/29/2020-06:07:05] [I] Skip inference: Disabled
[10/29/2020-06:07:05] [I] Inputs:
[10/29/2020-06:07:05] [I] === Reporting Options ===
[10/29/2020-06:07:05] [I] Verbose: Disabled
[10/29/2020-06:07:05] [I] Averages: 10 inferences
[10/29/2020-06:07:05] [I] Percentile: 99
[10/29/2020-06:07:05] [I] Dump output: Disabled
[10/29/2020-06:07:05] [I] Profile: Disabled
[10/29/2020-06:07:05] [I] Export timing to JSON file: 
[10/29/2020-06:07:05] [I] Export output to JSON file: 
[10/29/2020-06:07:05] [I] Export profile to JSON file: 
[10/29/2020-06:07:05] [I] 
----------------------------------------------------------------
Input filename:   yolov4_-1_3_416_416_dynamic.onnx
ONNX IR version:  0.0.4
Opset version:    11
Producer name:    pytorch
Producer version: 1.3
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[10/29/2020-06:07:07] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/29/2020-06:07:07] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/28/2020-12:55:30] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[10/28/2020-13:12:28] [I] [TRT] Detected 1 inputs and 8 output network tensors.
[10/28/2020-13:12:34] [I] Warmup completed 0 queries over 200 ms
[10/28/2020-13:12:34] [I] Timing trace has 0 queries over 3.65968 s
[10/28/2020-13:12:34] [I] Trace averages of 10 runs:
[10/28/2020-13:12:34] [I] Average on 10 runs - GPU latency: 224.415 ms - Host latency: 236.691 ms (end to end 455.919 ms)
[10/28/2020-13:12:34] [I] Host latency
[10/28/2020-13:12:34] [I] min: 233.869 ms (end to end 440.656 ms)
[10/28/2020-13:12:34] [I] max: 240.808 ms (end to end 518.528 ms)
[10/28/2020-13:12:34] [I] mean: 236.161 ms (end to end 453.082 ms)
[10/28/2020-13:12:34] [I] median: 235.465 ms (end to end 446.822 ms)
[10/28/2020-13:12:34] [I] percentile: 240.808 ms at 99% (end to end 518.528 ms at 99%)
[10/28/2020-13:12:34] [I] throughput: 0 qps
[10/28/2020-13:12:34] [I] walltime: 3.65968 s
[10/28/2020-13:12:34] [I] GPU Compute
[10/28/2020-13:12:34] [I] min: 218.841 ms
[10/28/2020-13:12:34] [I] max: 228.87 ms
[10/28/2020-13:12:34] [I] mean: 223.998 ms
[10/28/2020-13:12:34] [I] median: 223.521 ms
[10/28/2020-13:12:34] [I] percentile: 228.87 ms at 99%
[10/28/2020-13:12:34] [I] total compute time: 3.35997 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_416_416_dynamic.onnx --minShapes=input:1x3x416x416 --optShapes=input:32x3x416x416 --maxShapes=input:32x3x416x416 --workspace=4096 --saveEngine=yolov4-hat-dynamic.engine --fp16

The perf we got is based on the onnx model exported from https://github.com/Tianxiaomo/pytorch-YOLOv4
We will check again and get back to you.

Thank you, if you can provide your app config and pgie config, it’ll be appreciated. I’m not so sure if my model export settings is the same as yours.

please check deepstream_yolov4

yeah, I’ve read that repo many times, seems my config has no much different settings compare to it.

Hey @kayccc how do you create the INT8 TensorRT model? I have tried using the deepstream yolov4 nvidia github and following instructions but it just says “If you want to use int8 mode in conversion, extra int8 calibration is needed.”

When I run it i get Calibrator not being used. Users must provide dynmaic range for all tensors not int32

Hi @gabe_ddi,
INT8 calibration needs to be generated based on the images.
The INT8 calibration table is generated with the same way as TensorRT - https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimizing_int8_c

So I would need to create the calibration file based on images from the cameras I would be using? I don’t have much experience with TensorRT, do you have the steps you used for the above performance results? or what code you used to create the INT8 calibration files for your testing?