Assertion failed using tlt-converter for RetinaNet

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) : T4
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) : RetinaNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

when converting the etlt RetineNet model to TenosrRT engine I am getting the error Assertion failed: numPriors * numLocClasses * nbBoxCoordinates == inputDims[param.inputOrder[0]].d[0] following these instructions and running the tlt_converter via docker image nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3

root@fe4184f4919f:/usr/lib/x86_64-linux-gnu# tlt-converter -k nvidia_tlt
-d 3,384,1248
-o NMS
-c /deepstream_tlt_apps_TRT7.2.1/models/retinanet/cal.bin
-e retinanet_resnet18_trt.int8.engine
-b 8
-m 1
-t int8
-i nchw
/models/retinanet/retinanet_resnet18.etlt
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
[INTERNAL_ERROR] Assertion failed: numPriors * numLocClasses * nbBoxCoordinates == inputDims[param.inputOrder[0]].d[0]
/home/bcao/code/github/TRT7.2/TensorRT/plugin/nmsPlugin/nmsPlugin.cpp:244
Aborting…
Aborted (core dumped)

For " running the tlt_converter via docker image nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3", can you elaborate more detailed info? Do you login the 3.0-py3 docker and use its default tlt-converter?

Correct, From the above error, it seems the plugin is pointed to your work directory? /home/bcao/code/github/TRT7.2/TensorRT/plugin/nmsPlugin/nmsPlugin.cpp:244

You are using tlt 3.0, so please download model from 3.0 branch https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/release/tlt3.0/

The docker image nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 is based on TRT 7.2.1, that was the reason I was using branch tlt 2.0.1 Now with tlt 3.0 branch. (based on TRT 7.2.2) I am getting the same error

root@4f875ca46c9d:/usr/lib/x86_64-linux-gnu# tlt-converter -k nvidia_tlt
-d 3,384,1248
-o NMS
-c /deepstream_tlt_apps_TRT7.2.2/models/retinanet/cal.bin
-e retinanet_resnet18_trt.int8.engine
-b 8
-m 1
-t int8
-i nchw
/deepstream_tlt_apps_TRT7.2.2/models/retinanet/retinanet_resnet18.etlt

Error:
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
[INTERNAL_ERROR] Assertion failed: numPriors * numLocClasses * nbBoxCoordinates == inputDims[param.inputOrder[0]].d[0]
/home/bcao/code/gitlab/TRT7.2/oss/plugin/nmsPlugin/nmsPlugin.cpp:244
Aborting...

Aborted (core dumped)

Although I have updated the plugin libnvinfer_plugin.so*" with the prebuild lib here, I still the stdout error is pointing to /home/bcao/code/gitlab/TRT7.2/oss/plugin/nmsPlugin/nmsPlugin.cpp:244

Sorry for late reply. How did you download /deepstream_tlt_apps_TRT7.2.2/models/retinanet/retinanet_resnet18.etlt?
Is it from https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/release/tlt2.0/models/retinanet ?

1 Like

Please note that for that model, its input is 960x544.
See https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tlt2.0/pgie_retina_tlt_config.txt

uff-input-dims=3;544;960;0

So, please modify your command when run tlt-converter.

1 Like

Hi @Morganh, thanks for your support. I am seeing performance degradation when I deploy the optimized model retinanet-resnet18 in INT8 precision mode with DeepStream-Triton, see below:

Reference baseline inference with trtexec:
throughput: 306.362 qps

DeepStream only:
**PERF: 320.73 (294.19)

DeepStream-Triton integration with strict_model_config: true
**PERF: 233.16 (200.13)

DeepStream-Triton integration with strict_model_config: false
**PERF: 319.06 (129.13)

Questions:
1- What is the actual performance for DeepStream and DeepStream-Triton?, the number inside or outside of the parenthesis?
2- What should be the correct configuration to be setup in the below config.pbtxt file when using strict_model_config: true?
3- Where is saved the config file when strict_model_config: false?
4- Why the optimized model in INT8 precision still shows output data_type: TYPE_FP32 when inspecting it with polygraphy tool?, also I have seen some Triton samples for INT8 with data_type: TYPE_FP32 in the config.pbtxt

$ polygraphy inspect model samples/models/TLT_pre-trained_models_resnet18_backbone/retinanet/retinanet_resnet18.etlt_b1_gpu0_int8.engine

[I] Loading bytes from /workspace/Deepstream_6.0_Triton/samples/models/TLT_pre-trained_models_resnet18_backbone/retinanet/retinanet_resnet18.etlt_b1_gpu0_int8.engine
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Implicit Batch Engine (109 layers)
---- 1 Engine Input(s) ----
{Input [dtype=float32, shape=(3, 544, 960)]}

---- 2 Engine Output(s) ----
{NMS [dtype=float32, shape=(1, 250, 7)],
 NMS_1 [dtype=float32, shape=(1, 1, 1)]}

---- Memory ----
Device Memory: 21437440 bytes

---- 1 Profile(s) (3 Binding(s) Each) ----
- Profile: 0
    Binding Index: 0 (Input)  [Name: Input] | Shapes: min=(3, 544, 960), opt=(3, 544, 960), max=(3, 544, 960)
    Binding Index: 1 (Output) [Name: NMS]   | Shape: (1, 250, 7)
    Binding Index: 2 (Output) [Name: NMS_1] | Shape: (1, 1, 1)

config.pbtxt with strict_model_config: true:

name: "retinanet_resnet18"
platform: "tensorrt_plan"
default_model_filename: "retinanet_resnet18.etlt_b1_gpu0_int8.engine"
max_batch_size: 1
input [
  {
    name: "Input"
    format: FORMAT_NCHW
    data_type: TYPE_FP32
    dims: [ 3, 544, 960 ]
  }
]
output [
  {
    name: "NMS"
    data_type: TYPE_FP32
    dims: [ 1, 250, 7 ]
  },
  {
    name: "NMS_1"
    data_type: TYPE_FP32
    dims: [ 1, 1, 1 ]  
  }
]

instance_group {
  count: 1
  gpus: 0
  kind: KIND_GPU
}

I have created a separated issued here just in case it is not a topic for this forum

Hi,
Original issue is solved on your side.
Let’s track with the topic here .