TRT for yolov3: FP16 and INT8 optimization failed

chris-gun-detection · October 22, 2018, 1:50pm

Using the following repo: https://github.com/vat-nvidia/deepstream-plugins, I was able to get an optimized model for the default YOLOv3 model with FP32 precision (kFLOAT). However, it fails when I try to use other precisions in trt-yolo-app:
a) kHALF
Platform doesn’t support this precision.
trt-yolo-app: yolo.cpp:150: void Yolo::createYOLOEngine(int, std::__cxx11::string, std::__cxx11::string, std::__cxx11::string, nvinfer1::DataType, Int8EntropyCalibrator*): Assertion 0' failed. b) kINT8 I'm currently trying to get this working with the default calibration table, as the app throws an exception: Using cached calibration table to build the engine trt-yolo-app: ../builder/cudnnBuilder2.cpp:1227: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion it != tensorScales.end()’ failed.

Also, a few questions:

If kSAVE_DETECTIONS is configured as true, the images appear in the folder, but there are no bounding boxes drawn. Is that how it’s supposed to be?
Is there a tool to perform the INT8 calibration on a custom dataset? I see some related classes and the calibration table for the default YOLO, but not a complete tool.
There are related bits and pieces in README for NvYolo plugin to GStreamer, but we are not testing DeepStream SDK just yet.
Batch_size parameter in the sample app - I would assume that it is used for specifying multiple images to be sent to the GPU at once (as a batch), which should be faster than processing images one by one.
But using batch_size of 4 shows a increase in the reported frame processing time to about 17ms per image.
Github repo specifies CUDA 9.2 and TensorRT 4.x as a requirement. However, since darknet uses CUDA 9.0, that’s the version I used. May it cause issues or lead to performance decrease?

Software and hardware used:
Ubuntu 16.04.5, Nvidia graphics driver 380.134, CUDA 9.0, CUDNN 7.1.3, TensorRT 4.0.1.6,
Asus GTX 1080 Ti Turbo at default clocks.

NVES · October 22, 2018, 6:55pm

Hello,

Your questions are deepstream-plugins repo specific. Please contact https://github.com/vat-nvidia/deepstream-plugins.

Topic		Replies	Views
TRT for yolov3: FP16 and INT8 optimization failed General	7	4505	October 12, 2021
TRT for yolov3 with INT8 calibration TensorRT	7	1325	October 12, 2021
tensorrt for caffe-yolov3 optimization failed TensorRT	2	1193	April 5, 2019
Custom yolov8 model not working for int8 DeepStream SDK deepstream	8	333	February 20, 2025
INT8 Calibration YOLOv3 TensorRT	2	1075	December 2, 2019
Batch Size Failure in Custom YOLOv3 INT8 DeepStream SDK	5	792	October 12, 2021
Migrating INT8 calibration from TensorRT 6 to TensorRT 7 in YoloV3 and YoloV4 failed TensorRT tensorrt , jetson-inference	9	1635	December 28, 2020
How to create my custom yolov3 tensorrt calibration DeepStream SDK	2	492	October 12, 2021
Int8 Calibration is not accurate .. see image diff with and without TensorRT	20	2818	January 4, 2021
TensorRT Yolo Int8 on TITAN RTX Frameworks (archived) tensorflow	0	708	September 7, 2020

TRT for yolov3: FP16 and INT8 optimization failed

Related topics