Incorrect bindings after retraining LPDNet model

artemchepurnoy · August 17, 2025, 6:22am

Please provide the following information when requesting support.

• Hardware: training - T4 (g4dn at AWS), inference - Jetson Xavier NX
• Network Type: Yolo_v4_tiny (LPDNet)
• TLT Version toolkit_version: 6.0.0 published_date: 07/11/2025, also used v.4.0.1 for training as proposed in this post ( Key used to load the model is incorrect - #2 by Morganh )
• Training spec file attached

yolo_v4_tiny_train_kitti.txt (2.0 KB)

I retrained LPDNet model on my dataset using tao toolkit and got .etlt model.
Then I converted the .etlt model to the .engine using tao-converter on my Jetson Xavier NX with this command:

./tao-converter yolov4_cspdarknet_tiny_epoch_070.etlt -k nvidia_tlt -d 3,480,640 -p Input,1x3x480x640,8x3x480x640,16x3x480x640, -c cal.bin -b 1 -m 16 -t int8 -o 'output_bbox/BiasAdd','output_cov/Sigmoid' -e yolov4_tiny_lpdnet_elevated_b16_int8.engine

However, when I run an inference using this retrained model, I’ve got an error because of wrong input binding size:

TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       96
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 100868608
[TRT]       -- bindings     5
[TRT]       binding 0
                -- index   0
                -- name    'Input'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  4
                -- dim #0  -1
                -- dim #1  3
                -- dim #2  480
                -- dim #3  640
[TRT]       binding 1
                -- index   1
                -- name    'BatchedNMS'
                -- type    INT32
                -- in/out  OUTPUT
                -- # dims  2
                -- dim #0  -1
                -- dim #1  1
[TRT]       binding 2
                -- index   2
                -- name    'BatchedNMS_1'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  -1
                -- dim #1  200
                -- dim #2  4
[TRT]       binding 3
                -- index   3
                -- name    'BatchedNMS_2'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  2
                -- dim #0  -1
                -- dim #1  200
[TRT]       binding 4
                -- index   4
                -- name    'BatchedNMS_3'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  2
                -- dim #0  -1
                -- dim #1  200
[TRT]    
[TRT]    binding to input 0 Input  binding index:  0
[TRT]    binding to input 0 Input  dims (b=1 c=4294967295 h=3 w=480) size=18446744073705865216
[cuda]   cudaMalloc((void**)&inputCUDA, inputSize)
[cuda]      out of memory (error 2) (hex 0x02)
[cuda]      /home/artem/Projects/jetson-inference/c/tensorNet.cpp:1583
[TRT]    failed to alloc CUDA device memory for tensor input, 18446744073705865216 bytes
[TRT]    device GPU, failed to create resources for CUDA engine
[TRT]    failed to load yolov4-tiny_elevated/yolov4_tiny_lpdnet_elevated_b16_int8.engine
[TRT]    detectNet -- failed to initialize.

Below is the TRT output of the original LPDNet model:

TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       4
[TRT]       -- maxBatchSize 16
[TRT]       -- deviceMemory 40550400
[TRT]       -- bindings     3
[TRT]       binding 0
                -- index   0
                -- name    'input_1'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  3
                -- dim #0  3
                -- dim #1  480
                -- dim #2  640
[TRT]       binding 1
                -- index   1
                -- name    'output_bbox/BiasAdd'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  4
                -- dim #1  30
                -- dim #2  40
[TRT]       binding 2
                -- index   2
                -- name    'output_cov/Sigmoid'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1
                -- dim #1  30
                -- dim #2  40
[TRT]    
[TRT]    binding to input 0 input_1  binding index:  0
[TRT]    binding to input 0 input_1  dims (b=16 c=3 h=480 w=640) size=58982400
[TRT]    binding to output 0 output_cov/Sigmoid  binding index:  2
[TRT]    binding to output 0 output_cov/Sigmoid  dims (b=16 c=1 h=30 w=40) size=76800
[TRT]    binding to output 1 output_bbox/BiasAdd  binding index:  1
[TRT]    binding to output 1 output_bbox/BiasAdd  dims (b=16 c=4 h=30 w=40) size=307200

How do I make the bindings correct in the retrained model?

Morganh · August 18, 2025, 2:11am

Your command is not expected. The model you generated is actually a YOLO_v4_tiny model.

Please refer to the command in https://docs.nvidia.com/tao/tao-toolkit-archive/tao-40-1/text/tao-converter/tao_converter_yolo_v4_tiny.html#sample-output-log.

More,

artemchepurnoy:

[cuda]   cudaMalloc((void**)&inputCUDA, inputSize)
[cuda]      out of memory (error 2) (hex 0x02)
[cuda]      /home/artem/Projects/jetson-inference/c/tensorNet.cpp:1583
[TRT]    failed to alloc CUDA device memory for tensor input, 18446744073705865216 bytes

It is out-of-memory error. Please set a larger -w. Refer to Converting .etlt file to trt engine using tao-converter - #3 by Morganh.

artemchepurnoy · August 18, 2025, 10:45am

I’ve changed the command according to your response to

tao-converter yolov4_cspdarknet_tiny_epoch_070.etlt -k nvidia_tlt -p Input,1x3x480x640,8x3x480x640,16x3x480x640 -c cal.bin -m 16 -t int8 -w 100000000 -e 2025_08_18_yolov4_tiny_lpdnet_elevated_b16_int8.engine

But the problem remained.
The out-of-memory error seems to appear from the input binding’s incorrect dim #0:

-- dim #0  -1

It equals -1 and is interpreted as 4294967295, therefore TRT calculates enormous Input size:

[TRT]    binding to input 0 Input  dims (b=1 c=4294967295 h=3 w=480) size=18446744073705865216

So the question is how to get correct dims at binding 0

Morganh · August 19, 2025, 8:25am

There is two kinds of LPDNet models. One is trained on detectnet_v2 network. Another is trained on YOLO_v4_tiny network.

For detectnet_v2 network, the bindings are input_1, output_bbox/BiasAdd, output_cov/Sigmoid.

For YOLO_v4_tiny network, the bindings are Input, BatchedNMS, BatchedNMS_1, BatchedNMS_2,BatchedNMS_3.

So your model has correct bindings.

Morganh · August 19, 2025, 8:26am

Please run with lower batch to retry.

-p Input,1x3x480x640,1x3x480x640,1x3x480x640

system · September 9, 2025, 10:09pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao Poseclassification net inference on Tensorrt in python TAO Toolkit tensorrt	4	380	December 24, 2022
Cannot generate LPDNet tensorrt engine TAO Toolkit tensorrt , tao	2	702	April 2, 2023
TensorRT Inference form a .etlt model on Python TAO Toolkit tensorrt	7	1258	November 16, 2021
LPRnet ERROR: [TRT]: INVALID_ARGUMENT: Cannot find binding of given name: output_bbox/BiasAdd DeepStream SDK	2	658	November 9, 2021
Cannot convert model with dynamic input shape to TRT engine TAO Toolkit	9	1198	October 12, 2021
DLA (Xavier), model working with trtexec but not with binding input Jetson AGX Xavier	3	554	October 18, 2021
Tlt-convert for custom trained YoloV4 model failed on Jetson Nano 4G TAO Toolkit	42	2388	August 27, 2021
Running nvidia pretrained models in Tensorrt inference TAO Toolkit	14	974	October 6, 2022
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2104	December 13, 2021
Failed to load tao models on triton TAO Toolkit	24	3180	August 9, 2022

Incorrect bindings after retraining LPDNet model

Related topics