Cannot convert model with dynamic input shape to TRT engine

Please provide the following information when requesting support.

• Hardware: Jetson Nano 4GB
• Network Type: Detectnet_v2
• Jetpack: 4.6
• TensorRT: 8.0.1
• How to reproduce the issue ?

When I run the following command in order to convert my DetectnetV2 TLT model trained for LPD to TensorRT engine:
./tao-converter -k nvidia_tlt -o output_cov/Sigmoid,output_bbox/BiasAdd -p image_input,1x3x480x640,4x3x480x640,16x3x480x640 …/repos/lpd/res/dnn/lpd/models/resnet18_detector.etlt -t fp16 -e …/repos/lpd/res/dnn/lpd/models/tensorrt/resnet18_detector_lpd.engine

I get:
Error: no input dimensions given

However, if I specify the input size with the -d flag there are no problems, the command executes correctly, but the optimization profiles are not generated.

Am I doing something wrong?
Thank you for your attention

Which version of TLT(Now is TAO) did you run the training?
$ tlt info --verbose

Is …/repos/lpd/res/dnn/lpd/models/resnet18_detector.etlt trained by you or downloaded from lpd model card?

Hi,

I ran the training with v3.21.08. The output of given command is:

Configuration of the TAO Toolkit Instance
dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. augment
2. bpnet
3. classification
4. detectnet_v2
5. dssd
6. emotionnet
7. faster_rcnn
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. converter
nvidia/tao/tao-toolkit-pyt:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021

The model was trained by me using NGC Detectnet_V2 as the baseline model.
Regards

Any other information that I can supply?
Best regards

Please add “-d 3,480,640” .
It is required argument.
See
https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/detectnet_v2.html#using-the-tao-converter

Hi. As mentioned, I ran the following command with -d flag specified:

./tao-converter -k tlt_encode -o output_cov/Sigmoid,output_bbox/BiasAdd -p image_input,1x3x480x640,4x3x480x640,16x3x480x640 …/repos/lpr/res/dnn/lpd/models/resnet18_detector.etlt -t fp16 -e …/repos/lpr/res/dnn/lpd/models/tensorrt/resnet18_detector_lpd.engine -d 3,480,640

I got this output:

[INFO] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 221, GPU 2264 (MiB)

[INFO] [MemUsageSnapshot] Builder begin: CPU 237 MiB, GPU 2290 MiB

[INFO] ---------- Layers Running on DLA ----------

[INFO] ---------- Layers Running on GPU ----------

[INFO] [GpuLayer] conv1/convolution + activation_1/Relu

[INFO] [GpuLayer] block_1a_conv_1/convolution + block_1a_relu_1/Relu

[INFO] [GpuLayer] block_1a_conv_shortcut/convolution

[INFO] [GpuLayer] block_1a_conv_2/convolution + add_1/add + block_1a_relu/Relu

[INFO] [GpuLayer] block_1b_conv_1/convolution + block_1b_relu_1/Relu

[INFO] [GpuLayer] block_1b_conv_2/convolution + add_2/add + block_1b_relu/Relu

[INFO] [GpuLayer] block_2a_conv_1/convolution + block_2a_relu_1/Relu

[INFO] [GpuLayer] block_2a_conv_shortcut/convolution

[INFO] [GpuLayer] block_2a_conv_2/convolution + add_3/add + block_2a_relu/Relu

[INFO] [GpuLayer] block_2b_conv_1/convolution + block_2b_relu_1/Relu

[INFO] [GpuLayer] block_2b_conv_2/convolution + add_4/add + block_2b_relu/Relu

[INFO] [GpuLayer] block_3a_conv_1/convolution + block_3a_relu_1/Relu

[INFO] [GpuLayer] block_3a_conv_shortcut/convolution

[INFO] [GpuLayer] block_3a_conv_2/convolution + add_5/add + block_3a_relu/Relu

[INFO] [GpuLayer] block_3b_conv_1/convolution + block_3b_relu_1/Relu

[INFO] [GpuLayer] block_3b_conv_2/convolution + add_6/add + block_3b_relu/Relu

[INFO] [GpuLayer] block_4a_conv_1/convolution + block_4a_relu_1/Relu

[INFO] [GpuLayer] block_4a_conv_shortcut/convolution

[INFO] [GpuLayer] block_4a_conv_2/convolution + add_7/add + block_4a_relu/Relu

[INFO] [GpuLayer] block_4b_conv_1/convolution + block_4b_relu_1/Relu

[INFO] [GpuLayer] block_4b_conv_2/convolution + add_8/add + block_4b_relu/Relu

[INFO] [GpuLayer] output_bbox/convolution

[INFO] [GpuLayer] output_cov/convolution

[INFO] [GpuLayer] PWN(output_cov/Sigmoid)

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +159, now: CPU 403, GPU 2449 (MiB)

[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU +239, now: CPU 644, GPU 2688 (MiB)

[WARNING] Detected invalid timing cache, setup a local cache instead

[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

[INFO] Detected 1 inputs and 2 output network tensors.

[INFO] Total Host Persistent Memory: 34048

[INFO] Total Device Persistent Memory: 10447360

[INFO] Total Scratch Memory: 0

[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 11 MiB, GPU 1032 MiB

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 896, GPU 2705 (MiB)

[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 896, GPU 2705 (MiB)

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 896, GPU 2705 (MiB)

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 895, GPU 2705 (MiB)

[INFO] [MemUsageSnapshot] Builder end: CPU 887 MiB, GPU 2705 MiB

However, when I set the optimization profile I want in the Inference with
self.context.active_optimization_profile = 1

I get:
[TensorRT] ERROR: 3: [executionContext.cpp::setOptimizationProfileInternal::753] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setOptimizationProfileInternal::753, condition: profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles()

It seems that no optimization profiles were found in the TensorRT engine. What can I do?
Thank you for your attention.

Can you try -p image_input,1x3x480x640,1x3x480x640,1x3x480x640

Same thing. When i set profile 0 this error shows up:

ValueError: could not broadcast input array from shape (921600) into shape (14745600)

It seems engine is waiting for a batch of 16 images (Default max_batch value of -m flag in tao-converter). And when I set the optimization profile greater than 0:
self.context.active_optimization_profile = 1

The same error appears:
[TensorRT] ERROR: 3: [executionContext.cpp::setOptimizationProfileInternal::753] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setOptimizationProfileInternal::753, condition: profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles()

Please, do you have any other appreciation?
Thank you so much.

The detectnet_v2 engine does not support explicit batch.
Refer to Convert detectnet_v2 model to engine - #6 by Morganh