TRT for yolov3: FP16 and INT8 optimization failed

Using the following repo: https://github.com/vat-nvidia/deepstream-plugins, I was able to get an optimized model for the default YOLOv3 model with FP32 precision (kFLOAT). However, it fails when I try to use other precisions in trt-yolo-app:
a) kHALF
Platform doesn’t support this precision.
trt-yolo-app: yolo.cpp:150: void Yolo::createYOLOEngine(int, std::__cxx11::string, std::__cxx11::string, std::__cxx11::string, nvinfer1::DataType, Int8EntropyCalibrator*): Assertion 0' failed. b) kINT8 I'm currently trying to get this working with the default calibration table, as the app throws an exception: Using cached calibration table to build the engine trt-yolo-app: ../builder/cudnnBuilder2.cpp:1227: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion it != tensorScales.end()’ failed.

Also, a few questions:

  1. If kSAVE_DETECTIONS is configured as true, the images appear in the folder, but there are no bounding boxes drawn. Is that how it’s supposed to be?
  2. Is there a tool to perform the INT8 calibration on a custom dataset? I see some related classes and the calibration table for the default YOLO, but not a complete tool.
    There are related bits and pieces in README for NvYolo plugin to GStreamer, but we are not testing DeepStream SDK just yet.
  3. Batch_size parameter in the sample app - I would assume that it is used for specifying multiple images to be sent to the GPU at once (as a batch), which should be faster than processing images one by one.
    But using batch_size of 4 shows a increase in the reported frame processing time to about 17ms per image.
  4. Github repo specifies CUDA 9.2 and TensorRT 4.x as a requirement. However, since darknet uses CUDA 9.0, that’s the version I used. May it cause issues or lead to performance decrease?

Software and hardware used:
Ubuntu 16.04.5, Nvidia graphics driver 380.134, CUDA 9.0, CUDNN 7.1.3, TensorRT 4.0.1.6,
Asus GTX 1080 Ti Turbo at default clocks.

Answers to your questions below
a) and b) We have tested with GTX 1080 Ti with cuda 9.0 and TensorRT 4 and you should not be seeing those errors. Do you happen to have multiple versions of cuda toolkit / TensorRT on your system ?

  1. Please make sure you have OpenCV 3.4 installed and try a clean build of the trt-yolo-app. The images should have bounding boxes overlayed on them if the kSAVE_DETECTIONS parameter is set to true.

  2. There is no separate tool for Int8 calibration. You just need to add absolute paths of your calibration images in calibration_images.txt, delete any previously present calib tables and set the PRECISION to kINT8. The lib will calibrate and generate the engine. It will also save the calibration table for subsequent runs.

  3. Increasing batch sizes should provide improvements in inference speeds in general. Do you see any increase in inference speeds for higher batch sizes as well ?

  4. TensorRT 4 and Cuda 9.0 should be fine if you are using just the standalone trt-yolo-app and don’t have any deepstream dependencies. The cuda toolkit with which the weights were trained should not matter to the inference pipeline.

@chris-gun-detecion: An update regarding the answers for first couple of questions
a) kHALF is not supported on GTX 1080 Ti

b) kINT8 with Cuda 9.0 and TensoRT 4 should work fine.

Let us know if you have any other questions.

b) I was able to optimize YOLOv2 default model for INT8 (with both default and custom calibration tables). However, YOLOv3 still throws exceptions. If I try to use the default calibration table from the repo, I get the error from the first post:

Using cached calibration table to build the engine
trt-yolo-app: ../builder/cudnnBuilder2.cpp:1227: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()' failed.

I see others have similar issue with other neural nets:
https://devtalk.nvidia.com/default/topic/1037060/tensorrt/trt-4-0-sampleuffssd-int8-calibration-failing/
https://devtalk.nvidia.com/default/topic/1015387/tensorrt-fails-to-build-fasterrcnn-gie-model-with-using-int8/
Maybe yolov3-calibration.table just isn’t compatible? I took the config and the weights from here: YOLO: Real-Time Object Detection

If I remove the default calibration table and try to build a custom one (filling calibration_images.txt), it throws another exception:

New calibration table will be created to build the engine
trt-yolo-app: ../builder/cudnnBuilder2.cpp:685: virtual std::vector<nvinfer1::query::Ports<nvinfer1::query::TensorRequirements> > nvinfer1::builder::Node::getSupportedFormats(const nvinfer1::query::Ports<nvinfer1::query::AbstractTensor>&, const nvinfer1::cudnn::HardwareContext&, nvinfer1::builder::Format::Type, const nvinfer1::builder::FormatTypeHack&) const: Assertion `sf' failed.

It appears that some other guys experiences the same issue: https://devtalk.nvidia.com/default/topic/1043046/tensorrt/-tensorrt-for-yolo-v3-int8-optimization-failed-/

There’s only one Cuda/TensoRT version installed:

deepstream-plugins$ dpkg -l | grep cuda
ii  cuda-command-line-tools-9-0                                9.0.176-1                                    amd64        CUDA command-line tools
ii  cuda-core-9-0                                              9.0.176-1                                    amd64        CUDA core tools
ii  cuda-cublas-9-0                                            9.0.176-1                                    amd64        CUBLAS native runtime libraries
ii  cuda-cublas-dev-9-0                                        9.0.176-1                                    amd64        CUBLAS native dev links, headers
ii  cuda-cudart-9-0                                            9.0.176-1                                    amd64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-9-0                                        9.0.176-1                                    amd64        CUDA Runtime native dev links, headers
ii  cuda-cufft-9-0                                             9.0.176-1                                    amd64        CUFFT native runtime libraries
ii  cuda-cufft-dev-9-0                                         9.0.176-1                                    amd64        CUFFT native dev links, headers
ii  cuda-curand-9-0                                            9.0.176-1                                    amd64        CURAND native runtime libraries
ii  cuda-curand-dev-9-0                                        9.0.176-1                                    amd64        CURAND native dev links, headers
ii  cuda-cusolver-9-0                                          9.0.176-1                                    amd64        CUDA solver native runtime libraries
ii  cuda-cusolver-dev-9-0                                      9.0.176-1                                    amd64        CUDA solver native dev links, headers
ii  cuda-cusparse-9-0                                          9.0.176-1                                    amd64        CUSPARSE native runtime libraries
ii  cuda-cusparse-dev-9-0                                      9.0.176-1                                    amd64        CUSPARSE native dev links, headers
ii  cuda-demo-suite-9-0                                        9.0.176-1                                    amd64        Demo suite for CUDA
ii  cuda-documentation-9-0                                     9.0.176-1                                    amd64        CUDA documentation
ii  cuda-driver-dev-9-0                                        9.0.176-1                                    amd64        CUDA Driver native dev stub library
ii  cuda-drivers                                               384.81-1                                     amd64        CUDA Driver meta-package
ii  cuda-libraries-9-0                                         9.0.176-1                                    amd64        CUDA Libraries 9.0 meta-package
ii  cuda-libraries-dev-9-0                                     9.0.176-1                                    amd64        CUDA Libraries 9.0 development meta-package
ii  cuda-license-9-0                                           9.0.176-1                                    amd64        CUDA licenses
ii  cuda-misc-headers-9-0                                      9.0.176-1                                    amd64        CUDA miscellaneous headers
ii  cuda-npp-9-0                                               9.0.176-1                                    amd64        NPP native runtime libraries
ii  cuda-npp-dev-9-0                                           9.0.176-1                                    amd64        NPP native dev links, headers
ii  cuda-nvgraph-9-0                                           9.0.176-1                                    amd64        NVGRAPH native runtime libraries
ii  cuda-nvgraph-dev-9-0                                       9.0.176-1                                    amd64        NVGRAPH native dev links, headers
ii  cuda-nvml-dev-9-0                                          9.0.176-1                                    amd64        NVML native dev links, headers
ii  cuda-nvrtc-9-0                                             9.0.176-1                                    amd64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-9-0                                         9.0.176-1                                    amd64        NVRTC native dev links, headers
ii  cuda-repo-ubuntu1604-9-0-local                             9.0.176-1                                    amd64        cuda repository configuration files
ii  cuda-runtime-9-0                                           9.0.176-1                                    amd64        CUDA Runtime 9.0 meta-package
ii  cuda-samples-9-0                                           9.0.176-1                                    amd64        CUDA example applications
rc  cuda-toolkit-9-0                                           9.0.176-1                                    amd64        CUDA Toolkit 9.0 meta-package
rc  cuda-visual-tools-9-0                                      9.0.176-1                                    amd64        CUDA visual tools
ii  graphsurgeon-tf                                            4.1.2-1+cuda9.0                              amd64        GraphSurgeon for TensorRT package
ii  libcuda1-384                                               384.130-0ubuntu0.16.04.1                     amd64        NVIDIA CUDA runtime library
ii  libcudnn7                                                  7.0.5.15-1+cuda9.0                           amd64        cuDNN runtime libraries
ii  libcudnn7-dev                                              7.0.5.15-1+cuda9.0                           amd64        cuDNN development libraries and headers
ii  libcudnn7-doc                                              7.0.5.15-1+cuda9.0                           amd64        cuDNN documents and samples
ii  libnvinfer-dev                                             4.1.2-1+cuda9.0                              amd64        TensorRT development libraries and headers
ii  libnvinfer-samples                                         4.1.2-1+cuda9.0                              amd64        TensorRT samples and documentation
ii  libnvinfer4                                                4.1.2-1+cuda9.0                              amd64        TensorRT runtime libraries
ii  nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612 1-1                                          amd64        nv-tensorrt repository configuration files
ii  nvinfer-runtime-trt-repo-ubuntu1404-3.0.4-ga-cuda9.0       1.0-1                                        amd64        nvinfer-runtime-trt repository configuration files
ii  python3-libnvinfer                                         4.1.2-1+cuda9.0                              amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                                     4.1.2-1+cuda9.0                              amd64        Python 3 development package for TensorRT
ii  python3-libnvinfer-doc                                     4.1.2-1+cuda9.0                              amd64        Documention and samples of python bindings for TensorRT
ii  tensorrt                                                   4.0.1.6-1+cuda9.0                            amd64        Meta package of TensorRT
ii  uff-converter-tf                                           4.1.2-1+cuda9.0                              amd64        UFF converter for TensorRT package

Packed the test project with all the files, maybe it will help to reproduce the issue: Dropbox - File Deleted
Here’s how it’s built and run:

deepstream-plugins$ cd sources/apps/trt-yolo/
deepstream-plugins/sources/apps/trt-yolo$ make clean && make
deepstream-plugins/sources/apps/trt-yolo$ cd ../../..
deepstream-plugins$ ./sources/apps/trt-yolo/trt-yolo-app

Please note that calibration_images.txt and test_images.txt need to be updated with absolute paths.

@chris-gun-detection

Can you try to generate the engine with the following calibration table for yolo v3 INT8 precision ?
https://drive.google.com/open?id=1c9ov5ZaPnHSdVISx1cD8DCdXNOq1dwuz
This should work with your current TensorRT / cuda setup.

You can also upgrade to TensorRT 5 and use the above calibration file or generate your own without any issues. You will need to make a small change in the yolo lib to compile with TRT 5. The compute(…) function in nvinfer1::IOutputDimensionsFormula is now const so you will need to add the const keyword in trt_utils.h file. The updated class after the change would look like

class YoloTinyMaxpoolPaddingFormula : public nvinfer1::IOutputDimensionsFormula
{

private:
    std::set<std::string> m_SamePaddingLayers;

    nvinfer1::DimsHW compute(nvinfer1::DimsHW inputDims, nvinfer1::DimsHW kernelSize,
                             nvinfer1::DimsHW stride, nvinfer1::DimsHW padding,
                             nvinfer1::DimsHW dilation, const char* layerName) const override
    {
        assert(inputDims.d[0] == inputDims.d[1]);
        assert(kernelSize.d[0] == kernelSize.d[1]);
        assert(stride.d[0] == stride.d[1]);
        assert(padding.d[0] == padding.d[1]);

        int outputDim;
        // Only layer maxpool_12 makes use of same padding
        if (m_SamePaddingLayers.find(layerName) != m_SamePaddingLayers.end())
        {
            outputDim = (inputDims.d[0] + 2 * padding.d[0]) / stride.d[0];
        }
        // Valid Padding
        else
        {
            outputDim = (inputDims.d[0] - kernelSize.d[0]) / stride.d[0] + 1;
        }
        return nvinfer1::DimsHW{outputDim, outputDim};
    }

public:
    void addSamePaddingLayer(std::string input) { m_SamePaddingLayers.insert(input); }
};

Please let us know if you continue to see any errors after the above changes.

Thanks, updated TensorRT to 5.0.0.10 RC, now both calibration tables (from Google Drive in the previous post and custom) work.

can you pls share the link how you were able to convert yolov2 to int8 weights