Incorrect bindings after retraining LPDNet model

Please provide the following information when requesting support.

• Hardware: training - T4 (g4dn at AWS), inference - Jetson Xavier NX
• Network Type: Yolo_v4_tiny (LPDNet)
• TLT Version toolkit_version: 6.0.0 published_date: 07/11/2025, also used v.4.0.1 for training as proposed in this post ( Key used to load the model is incorrect - #2 by Morganh )
• Training spec file attached

yolo_v4_tiny_train_kitti.txt (2.0 KB)

I retrained LPDNet model on my dataset using tao toolkit and got .etlt model.
Then I converted the .etlt model to the .engine using tao-converter on my Jetson Xavier NX with this command:

./tao-converter yolov4_cspdarknet_tiny_epoch_070.etlt -k nvidia_tlt -d 3,480,640 -p Input,1x3x480x640,8x3x480x640,16x3x480x640, -c cal.bin -b 1 -m 16 -t int8 -o 'output_bbox/BiasAdd','output_cov/Sigmoid' -e yolov4_tiny_lpdnet_elevated_b16_int8.engine

However, when I run an inference using this retrained model, I’ve got an error because of wrong input binding size:

TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       96
[TRT]       -- maxBatchSize 1
[TRT]       -- deviceMemory 100868608
[TRT]       -- bindings     5
[TRT]       binding 0
                -- index   0
                -- name    'Input'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  4
                -- dim #0  -1
                -- dim #1  3
                -- dim #2  480
                -- dim #3  640
[TRT]       binding 1
                -- index   1
                -- name    'BatchedNMS'
                -- type    INT32
                -- in/out  OUTPUT
                -- # dims  2
                -- dim #0  -1
                -- dim #1  1
[TRT]       binding 2
                -- index   2
                -- name    'BatchedNMS_1'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  -1
                -- dim #1  200
                -- dim #2  4
[TRT]       binding 3
                -- index   3
                -- name    'BatchedNMS_2'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  2
                -- dim #0  -1
                -- dim #1  200
[TRT]       binding 4
                -- index   4
                -- name    'BatchedNMS_3'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  2
                -- dim #0  -1
                -- dim #1  200
[TRT]    
[TRT]    binding to input 0 Input  binding index:  0
[TRT]    binding to input 0 Input  dims (b=1 c=4294967295 h=3 w=480) size=18446744073705865216
[cuda]   cudaMalloc((void**)&inputCUDA, inputSize)
[cuda]      out of memory (error 2) (hex 0x02)
[cuda]      /home/artem/Projects/jetson-inference/c/tensorNet.cpp:1583
[TRT]    failed to alloc CUDA device memory for tensor input, 18446744073705865216 bytes
[TRT]    device GPU, failed to create resources for CUDA engine
[TRT]    failed to load yolov4-tiny_elevated/yolov4_tiny_lpdnet_elevated_b16_int8.engine
[TRT]    detectNet -- failed to initialize.

Below is the TRT output of the original LPDNet model:

TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       4
[TRT]       -- maxBatchSize 16
[TRT]       -- deviceMemory 40550400
[TRT]       -- bindings     3
[TRT]       binding 0
                -- index   0
                -- name    'input_1'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  3
                -- dim #0  3
                -- dim #1  480
                -- dim #2  640
[TRT]       binding 1
                -- index   1
                -- name    'output_bbox/BiasAdd'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  4
                -- dim #1  30
                -- dim #2  40
[TRT]       binding 2
                -- index   2
                -- name    'output_cov/Sigmoid'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1
                -- dim #1  30
                -- dim #2  40
[TRT]    
[TRT]    binding to input 0 input_1  binding index:  0
[TRT]    binding to input 0 input_1  dims (b=16 c=3 h=480 w=640) size=58982400
[TRT]    binding to output 0 output_cov/Sigmoid  binding index:  2
[TRT]    binding to output 0 output_cov/Sigmoid  dims (b=16 c=1 h=30 w=40) size=76800
[TRT]    binding to output 1 output_bbox/BiasAdd  binding index:  1
[TRT]    binding to output 1 output_bbox/BiasAdd  dims (b=16 c=4 h=30 w=40) size=307200

How do I make the bindings correct in the retrained model?

Your command is not expected. The model you generated is actually a YOLO_v4_tiny model.

Please refer to the command in https://docs.nvidia.com/tao/tao-toolkit-archive/tao-40-1/text/tao-converter/tao_converter_yolo_v4_tiny.html#sample-output-log.

More,

It is out-of-memory error. Please set a larger -w. Refer to Converting .etlt file to trt engine using tao-converter - #3 by Morganh.

I’ve changed the command according to your response to

tao-converter yolov4_cspdarknet_tiny_epoch_070.etlt -k nvidia_tlt -p Input,1x3x480x640,8x3x480x640,16x3x480x640 -c cal.bin -m 16 -t int8 -w 100000000 -e 2025_08_18_yolov4_tiny_lpdnet_elevated_b16_int8.engine

But the problem remained.
The out-of-memory error seems to appear from the input binding’s incorrect dim #0:

-- dim #0  -1

It equals -1 and is interpreted as 4294967295, therefore TRT calculates enormous Input size:

[TRT]    binding to input 0 Input  dims (b=1 c=4294967295 h=3 w=480) size=18446744073705865216

So the question is how to get correct dims at binding 0

There is two kinds of LPDNet models. One is trained on detectnet_v2 network. Another is trained on YOLO_v4_tiny network.

For detectnet_v2 network, the bindings are input_1, output_bbox/BiasAdd, output_cov/Sigmoid.

For YOLO_v4_tiny network, the bindings are Input, BatchedNMS, BatchedNMS_1, BatchedNMS_2,BatchedNMS_3.

So your model has correct bindings.

Please run with lower batch to retry.

-p Input,1x3x480x640,1x3x480x640,1x3x480x640

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.