Does TAO 5.0 support exporting a model trained by TAO 3.0?

hyperlight · November 5, 2025, 3:06pm

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc): GPU
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Yolo_v4
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here): NA
• Training spec file(If have, please share here): NA
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.): NA

I need to deploy a Yolo_v4 model trained using TAO 3.0 in Deepstream 8.0, since tao-converter is deprecated and the exported file is .etlt, I used nvinfer to build the engine but I got this error:

WARNING: ../nvdsinfer/nvdsinfer_model_builder.cpp:1261 Deserialize engine failed because file path: /home/me/detector/detector.engine open error
0:00:00.321296740 3265927 0x5b20590ec6e0 WARN                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<detector> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2097> [UID = 1]: deserialize engine from file :/home/me/detector/detector.engine failed
0:00:00.321311770 3265927 0x5b20590ec6e0 WARN                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<detector> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2202> [UID = 1]: deserialize backend context from engine from file :/home/me/detector/detector.engine failed, try rebuild
0:00:00.321319070 3265927 0x5b20590ec6e0 INFO                 nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<detector> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2123> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
WARNING: [TRT]: BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
WARNING: [TRT]: Calibration Profile is not defined. Calibrating with Profile 0
ERROR: [TRT]: Unexpected exception _Map_base::at
Segmentation fault (core dumped)

It seems that the model was exported with BatchedNMSPlugin which is deprecated since TensorRT 9.0 and Deepstream 8.0 uses TensorRT 10.9 so nvinfer cannot build an engine for this .etltfile.

Can I re-export the original .tlt file (trained using TAO 3.0) to .onnx using TAO 5.0? If not, how do I deploy a model trained using TAO 3.0 to Deepstream 8.0?

Morganh · November 6, 2025, 5:56am

Please try either of below options.

Please try to change the .etlt file to .onnx file. Reference: tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub.
Please try to change the .tlt file to .hdf5 file. Reference: tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub. And then export to .onnx file using TAO5.0.

hyperlight · November 7, 2025, 9:33pm

@Morganh

After I convert the .etlt file to .onnx file and use trtexec to build an int8 engine from the .onnx file.

The .onnx file was converted from a .etlt file exported using TAO 3.0 following the link you provided, the calibration file is for the .etlt file, I reuse the same the calibration file for the .onnx file to build engine file.

I got this error:

[11/07/2025-13:22:05] [I] [TRT] No checker registered for op: BatchedNMSDynamic_TRT. Attempting to check as plugin.
[11/07/2025-13:22:05] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[11/07/2025-13:22:05] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[11/07/2025-13:22:05] [W] [TRT] onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[11/07/2025-13:22:05] [W] [TRT] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
[11/07/2025-13:22:05] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[11/07/2025-13:22:05] [I] Finished parsing network model. Parse time: 0.015883
[11/07/2025-13:22:05] [I] Set shape of input tensor Input for optimization profile 0 to: MIN=1x3x832x4096 OPT=3x3x832x4096 MAX=3x3x832x4096
[11/07/2025-13:22:05] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/07/2025-13:22:05] [I] Set calibration profile for input tensor Input to 3x3x832x4096
[11/07/2025-13:22:05] [I] [TRT] Calibration table does not match calibrator algorithm type.
[11/07/2025-13:22:05] [I] [TRT] Perform graph optimization on calibration graph.
[11/07/2025-13:22:05] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/07/2025-13:22:06] [I] [TRT] Compiler backend is used during engine build.
[11/07/2025-13:22:07] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/07/2025-13:22:07] [I] [TRT] Total Host Persistent Memory: 430976 bytes
[11/07/2025-13:22:07] [I] [TRT] Total Device Persistent Memory: 0 bytes
[11/07/2025-13:22:07] [I] [TRT] Max Scratch Memory: 55355904 bytes
[11/07/2025-13:22:07] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 284 steps to complete.
[11/07/2025-13:22:07] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 29.9706ms to assign 19 blocks to 284 nodes requiring 1458625536 bytes.
[11/07/2025-13:22:07] [I] [TRT] Total Activation Memory: 1458625536 bytes
[11/07/2025-13:22:07] [I] [TRT] Total Weights Memory: 6216192 bytes
[11/07/2025-13:22:07] [I] [TRT] Compiler backend is used during engine execution.
[11/07/2025-13:22:07] [I] [TRT] Engine generation completed in 1.58926 seconds.
[11/07/2025-13:22:07] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1391, now: CPU 0, GPU 1397 (MiB)
[11/07/2025-13:22:07] [I] [TRT] Starting Calibration.
[11/07/2025-13:22:07] [E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[11/07/2025-13:22:07] [E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[11/07/2025-13:22:07] [E] Engine could not be created from network
[11/07/2025-13:22:07] [E] Building engine failed
[11/07/2025-13:22:07] [E] Failed to create engine from model or file.
[11/07/2025-13:22:07] [E] Engine set up failed

trtexec command:

/usr/src/tensorrt/bin/trtexec \
                --onnx=/home/me/Downloads/8.0-engine-build-test/models/detector/yolov4.onnx \
                --saveEngine=/home/me/Downloads/8.0-engine-build-test/models/detector/yolov4.engine \
                --minShapes=Input:1x3x832x4096 \
                --optShapes=Input:3x3x832x4096 \
                --maxShapes=Input:3x3x832x4096 \
                --int8 \
                --calib=/home/me/Downloads/8.0-engine-build-test/models/detector/yolov4.bin \
                --memPoolSize=workspace:1G

For float 16 precision, I can build the engine from the .onnx file, but with these warnings:

[11/07/2025-13:44:43] [I] [TRT] No checker registered for op: BatchedNMSDynamic_TRT. Attempting to check as plugin.
[11/07/2025-13:44:43] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[11/07/2025-13:44:43] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[11/07/2025-13:44:43] [W] [TRT] onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[11/07/2025-13:44:43] [W] [TRT] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
[11/07/2025-13:44:43] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT

Morganh · November 10, 2025, 9:03am

Can you share the yolov4.bin ?

hyperlight · November 13, 2025, 12:03am

@Morganh

Here is the file (I had to add .txt extension because .bin file is not allowed to be uploaded):

yolov4.bin.txt (3.1 KB)

Morganh · November 13, 2025, 5:10am

hyperlight:

After I convert the .etlt file to .onnx file and use trtexec to build an int8 engine from the .onnx file.

The .onnx file was converted from a .etlt file exported using TAO 3.0 following the link you provided, the calibration file is for the .etlt file, I reuse the same the calibration file for the .onnx file to build engine file.

I got this error:

[11/07/2025-13:22:05] [I] [TRT] No checker registered for op: BatchedNMSDynamic_TRT. Attempting to check as plugin.
[11/07/2025-13:22:05] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[11/07/2025-13:22:05] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[11/07/2025-13:22:05] [W] [TRT] onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[11/07/2025-13:22:05] [W] [TRT] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
[11/07/2025-13:22:05] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[11/07/2025-13:22:05] [I] Finished parsing network model. Parse time: 0.015883
[11/07/2025-13:22:05] [I] Set shape of input tensor Input for optimization profile 0 to: MIN=1x3x832x4096 OPT=3x3x832x4096 MAX=3x3x832x4096
[11/07/2025-13:22:05] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/07/2025-13:22:05] [I] Set calibration profile for input tensor Input to 3x3x832x4096
[11/07/2025-13:22:05] [I] [TRT] Calibration table does not match calibrator algorithm type.
[11/07/2025-13:22:05] [I] [TRT] Perform graph optimization on calibration graph.
[11/07/2025-13:22:05] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/07/2025-13:22:06] [I] [TRT] Compiler backend is used during engine build.
[11/07/2025-13:22:07] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/07/2025-13:22:07] [I] [TRT] Total Host Persistent Memory: 430976 bytes
[11/07/2025-13:22:07] [I] [TRT] Total Device Persistent Memory: 0 bytes
[11/07/2025-13:22:07] [I] [TRT] Max Scratch Memory: 55355904 bytes
[11/07/2025-13:22:07] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 284 steps to complete.
[11/07/2025-13:22:07] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 29.9706ms to assign 19 blocks to 284 nodes requiring 1458625536 bytes.
[11/07/2025-13:22:07] [I] [TRT] Total Activation Memory: 1458625536 bytes
[11/07/2025-13:22:07] [I] [TRT] Total Weights Memory: 6216192 bytes
[11/07/2025-13:22:07] [I] [TRT] Compiler backend is used during engine execution.
[11/07/2025-13:22:07] [I] [TRT] Engine generation completed in 1.58926 seconds.
[11/07/2025-13:22:07] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1391, now: CPU 0, GPU 1397 (MiB)
[11/07/2025-13:22:07] [I] [TRT] Starting Calibration.
[11/07/2025-13:22:07] [E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[11/07/2025-13:22:07] [E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[11/07/2025-13:22:07] [E] Engine could not be created from network
[11/07/2025-13:22:07] [E] Building engine failed
[11/07/2025-13:22:07] [E] Failed to create engine from model or file.
[11/07/2025-13:22:07] [E] Engine set up failed

Please use 5.5 tao-deploy docker to generate cal.bin again.

nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy

Reference: tao_tutorials/notebooks/tao_launcher_starter_kit/yolo_v4/yolo_v4.ipynb at tao_5.5_release · NVIDIA/tao_tutorials · GitHub

yolo_v4 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.onnx \
                                   -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                                   --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
                                   --data_type int8 \
                                   --batch_size 16 \
                                   --min_batch_size 1 \
                                   --opt_batch_size 8 \
                                   --max_batch_size 16 \
                                   --batches 10 \
                                   --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                                   --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                                   --engine_file $USER_EXPERIMENT_DIR/export/trt.engine.int8 \
                                   --results_dir $USER_EXPERIMENT_DIR/export

hyperlight · November 17, 2025, 7:33pm

@Morganh

To remove TAO 3.0 to TAO 5.5 migration complexity, we repeated the exact same steps using a .hdf5 model trained with TAO 5.5. We exported the ONNX and calibration files again, following the same procedure, and we still got the same error. QAT is enabled in both the training and retraining processes.

We generated the .onnx file using the following command:

!tao model yolo_v4 export \
    -m $USER_EXPERIMENT_DIR/yolov4_cspdarknet19_epoch_152.hdf5 \
    -o $USER_EXPERIMENT_DIR/export_qat_152/yolov4_cspdarknet19_epoch_152.onnx \
    -e $SPECS_DIR/yolo_v4_retrain_cspdarknet19_kitti_seq.txt \
    --cal_json_file $USER_EXPERIMENT_DIR/export_qat_152/cal.json \
    --gen_ds_config

After generating the ONNX file, we created the calibration file using the following command:

!tao deploy yolo_v4 gen_trt_engine \
    -m $USER_EXPERIMENT_DIR/export_qat_152/yolov4_cspdarknet19_epoch_152.hdf5 \
    -e $SPECS_DIR/yolo_v4_retrain_cspdarknet19_kitti_seq.txt \
    --data_type int8 \
    --batch_size 8 \
    --min_batch_size 1 \
    --opt_batch_size 3 \
    --max_batch_size 16 \
    --cal_cache_file $USER_EXPERIMENT_DIR/export_qat_152/cal.bin \
    --cal_json_file $USER_EXPERIMENT_DIR/export_qat_152/cal.json \
    --engine_file $USER_EXPERIMENT_DIR/export_qat_152/trt.engine.int8 \
    --results_dir $USER_EXPERIMENT_DIR/export_qat_152

I got this error when I tried to build int8 engine using trtexec:

[11/17/2025-11:27:01] [I] [TRT] No checker registered for op: BatchedNMSDynamic_TRT. Attempting to check as plugin.
[11/17/2025-11:27:01] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[11/17/2025-11:27:01] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[11/17/2025-11:27:01] [W] [TRT] onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[11/17/2025-11:27:01] [W] [TRT] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
[11/17/2025-11:27:01] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[11/17/2025-11:27:01] [I] Finished parsing network model. Parse time: 0.0170763
[11/17/2025-11:27:01] [I] Set shape of input tensor Input for optimization profile 0 to: MIN=1x3x832x4096 OPT=3x3x832x4096 MAX=3x3x832x4096
[11/17/2025-11:27:01] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/17/2025-11:27:01] [I] Set calibration profile for input tensor Input to 3x3x832x4096
[11/17/2025-11:27:01] [I] [TRT] Calibration table does not match calibrator algorithm type.
[11/17/2025-11:27:01] [I] [TRT] Perform graph optimization on calibration graph.
[11/17/2025-11:27:01] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/17/2025-11:27:01] [I] [TRT] Compiler backend is used during engine build.
[11/17/2025-11:27:02] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[11/17/2025-11:27:03] [I] [TRT] Total Host Persistent Memory: 429056 bytes
[11/17/2025-11:27:03] [I] [TRT] Total Device Persistent Memory: 0 bytes
[11/17/2025-11:27:03] [I] [TRT] Max Scratch Memory: 55359488 bytes
[11/17/2025-11:27:03] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 284 steps to complete.
[11/17/2025-11:27:03] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 28.1168ms to assign 19 blocks to 284 nodes requiring 1440734208 bytes.
[11/17/2025-11:27:03] [I] [TRT] Total Activation Memory: 1440734208 bytes
[11/17/2025-11:27:03] [I] [TRT] Total Weights Memory: 6082560 bytes
[11/17/2025-11:27:03] [I] [TRT] Compiler backend is used during engine execution.
[11/17/2025-11:27:03] [I] [TRT] Engine generation completed in 1.72396 seconds.
[11/17/2025-11:27:03] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1373, now: CPU 0, GPU 1379 (MiB)
[11/17/2025-11:27:03] [I] [TRT] Starting Calibration.
[11/17/2025-11:27:03] [E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[11/17/2025-11:27:03] [E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[11/17/2025-11:27:03] [E] Engine could not be created from network
[11/17/2025-11:27:03] [E] Building engine failed
[11/17/2025-11:27:03] [E] Failed to create engine from model or file.
[11/17/2025-11:27:03] [E] Engine set up failed

I also tried using nvinfer plugin to build engines, same outcomes, nvinfer can build fp16 engine, but cannot build int8 engine, I got this error:

WARNING: [TRT]: onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
WARNING: [TRT]: BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
WARNING: [TRT]: Calibration Profile is not defined. Calibrating with Profile 0
ERROR: [TRT]: Unexpected exception _Map_base::at
Segmentation fault (core dumped)

Hardware/Software info:

Train PC: TAO 5.5, GPU: GeForce 4090
Engine building PC: DeepStream 8.0, GPU: GeForce 5090

The calibration file:

yolov4-tao5.5-nov17.bin.txt (3.1 KB)

Morganh · November 18, 2025, 3:11am

OK, this is the important information. For QAT workflow, you can refer to “12.7. Deployment of the QAT model” of tao_tutorials/notebooks/tao_launcher_starter_kit/yolo_v4/yolo_v4.ipynb at tao_5.5_release · NVIDIA/tao_tutorials · GitHub. You need not use --cal_cache_file. Please refer to below and retry. Thanks.

hyperlight · November 19, 2025, 11:30pm

@Morganh

Hardware/Software info:

Train PC: TAO 5.5, GPU: GeForce 4090
Engine building PC: DeepStream 8.0, GPU: GeForce 5090

We followed this: Does TAO 5.0 support exporting a model trained by TAO 3.0? - #9 by Morganh

But there was no calibration file generated. Is that expected?

I went ahead and create int8 and fp16 engines from the exported .onnx file anyway.

Here are the two setups:

Case 1:
- Model: YOLO V4
- On Train PC:
  - Trained using TAO 3.0 (.tlt file)
    - We had a .tlt file from TAO 3, and we first converted the .tlt file to .hdf5 using the method described here: tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub
  - Exported using TAO 5.5 (.onnx file)
- On Engine building PC
  - Engines built using trtexec (TensorRT version 10.9)
Case 2:
- Model: YOLO V4
- On Train PC:
  - Trained using TAO 5.5 (.hdf5 file)
  - Exported using TAO 5.5 (.onnx file)
- On Engine building PC
  - Engines built using trtexec (TensorRT version 10.9)

In both cases:

I can build fp16 engine from the exported .onnx file and the engine detects objects as expect.

However, for int8 mode, although I can create nt8 engine from the exported .onnx file, the engine doesn’t detect any object at all.

Here are three ways I built int8 engine from the exported .onnx for both case 1 and case 2 mentioned above:

- --int8: build int 8 engine without provide a calibration file

Can build engine with this warning:

[W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32 or Bool.

--int8 --calib=cal.bin: build int 8 engine providing a calibration file that doesn’t exist

Can build engine with these info:

[I] [TRT] Starting Calibration.
[I] [TRT] Calibrated batch 0 in 0.558499 seconds.
[I] [TRT] Post Processing Calibration data in 11.0437 seconds.
[I] [TRT] Calibration completed in 13.2405 seconds.

--int8 --calib=cal.json: build int 8 engine providing cal.json instead of cal.bin

Failed to build engine with these errors:

[I] [TRT] Calibration table does not match calibrator algorithm type.
[I] [TRT] Starting Calibration.
[E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[E] Engine could not be created from network
[E] Building engine failed
[E] Failed to create engine from model or file.
[E] Engine set up failed

The parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT I used is from: deepstream_tao_apps/post_processor at release/tao_ds8.0ga · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Morganh · November 20, 2025, 3:06am

Hi @hyperlight ,
Thanks for the detailed info. Seems that the two cases have similar behavior. Let us focus on case2 to check further.
Use the steps of notebook[tao_tutorials/notebooks/tao_launcher_starter_kit/yolo_v4/yolo_v4.ipynb at tao_5.5_release · NVIDIA/tao_tutorials · GitHub] as below.

1.After you run tao model yolo_v4 export, please use Netron to check if the onnx file has QDQ node. During this step, yes, there was no cal.bin generated. Only cal.json is generated.
2. If onnx file has QDQ node, then use tao deploy yolo_v4 gen_trt_engine to generate int8 engine. Then, please use tao deploy yolo_v4 inference to check the inference result.

All of above items follow tao’s pipeline. Temporally not use trtexec and deepstream. Just to narrow down to check if tao’s pipeline works.

hyperlight · November 20, 2025, 8:21pm

Hi @Morganh,

Based on your suggestion, after running tao model yolo_v4 export, my co-worker inspected the exported ONNX file in Netron and confirmed that there are no QDQ nodes. I’ve attached a screenshot of the exported model for reference. (Note: enable_qat: true was set during the train, prune, and retrain phases.)

Even though the ONNX file did not contain QDQ nodes, my co-worker was still able to generate an INT8 engine using tao deploy yolo_v4 gen_trt_engine, and then ran tao deploy yolo_v4 inference to test it. The INT8 engine successfully detected the objects and drew bounding boxes. For both cases I mentioned in my previous response, she tested that INT8 engine works in TAO (tao deploy engine creation and inference). The same yolov4 model was used to create INT8 engine using trtexec on the PC with Deepstream 8.0 installed, but the INT8 engine created by trtexec didn’t detect anything on inference, and trtexec failed to create INT8 engine with the cal.json file.

Note: Based on NVIDIA’s documentation (relevant part is in the below screenshot), it is normal that the exported ONNX file does not contain QDQ nodes. For YOLOv4, the QAT scale information is extracted during export and written into the cal.json file instead.

Additionally, if I install TAO 5.5 on the PC that has DeepStream 8.0 installed and use TAO 5.5 to create INT8 engine, would the INT8 engine created by TAO 5.5 work for DeepStream 8.0. Two caveats with this approach:

TAO 5.5 Software Requirements uses Ubuntu LTS 22.04 but DeepStream 8.0 uses Ubuntu LTS 24.04.
TAO 5.5 and DeepStream 8.0 also use different CUDA and TensorRT version too.

Morganh · November 24, 2025, 2:44am

So, may I confirm that using the tao pipeline, the Int8 trt engine can work without any issue?

Could you do one more experiment? Please use the trtexec which is inside the deploy docker to generate int8 engine and run inference again.
You can use below way.
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy /bin/bash
Then run something like
# yolo_v4 gen_trt_engine xxx
# yolo_v4 inference xxx

hyperlight · November 24, 2025, 8:00pm

Hi @Morganh,

So, may I confirm that using the tao pipeline, the Int8 trt engine can work without any issue?

Yes.

Could you do one more experiment? Please use the trtexec which is inside the deploy docker to generate int8 engine and run inference again.

My co-worker ran yolo_v4 gen_trt_engine and yolo_v4 inference inside the deploy docker to generate int8 engine and run inference again. She can create an int8 engine and the int8 engine also successfully detected the objects and drew bounding boxes. The deploy docker gives the same outcomes as running yolo_v4 gen_trt_engine and yolo_v4 inference to generate int8 engine and run inference using TAO launcher.

My co-worker did another experiment. Inside the deploy docker, she used trtexec to generate int8 engine and yolo_v4 inference to run inference. Because there is no calibration file, even though she can generate int8 engine, that engine doesn’t detect anything.

Morganh · November 26, 2025, 3:00am

Glad to know the tao deploy inference works. More, it is expected to get the same outcomes when use tao-launcher because tao-launcher is just a wrapper.

If there is no calibration cache file(cal.bin), it is expected to be not working for int8 engine. So, it is needed to generate cal.bin if run with trtexec.
You can add below
--cal_cache_file /path/to/export_qat/cal.bin

in belowyolo_v4 gen_trt_engine command line. (Reference code)

Then, cal.bin will be generated.

BTW, the cal.json can not be used in trtexec. The trtexec can only accept calibration cache(cal.bin).

hyperlight · November 26, 2025, 8:51pm

Hi @Morganh,

Here is a summary of what we have tried so far:

Build int8 engine using TAO 3.0 exported files (yolo4.etlt and yolo4.bin) using DeepStream 8.0 nvinfer

Outcome:

Failed to build int8 engine, TAO 3.0 exported files are incompatible with DeepStream 8.0

WARNING: ../nvdsinfer/nvdsinfer_model_builder.cpp:1261 Deserialize engine failed because file path: /home/me/detector/detector.engine open error
0:00:00.321296740 3265927 0x5b20590ec6e0 WARN                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<detector> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2097> [UID = 1]: deserialize engine from file :/home/me/detector/detector.engine failed
0:00:00.321311770 3265927 0x5b20590ec6e0 WARN                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<detector> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2202> [UID = 1]: deserialize backend context from engine from file :/home/me/detector/detector.engine failed, try rebuild
0:00:00.321319070 3265927 0x5b20590ec6e0 INFO                 nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<detector> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2123> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnxOpImporters.cpp:6521: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
WARNING: [TRT]: BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
WARNING: [TRT]: Calibration Profile is not defined. Calibrating with Profile 0
ERROR: [TRT]: Unexpected exception _Map_base::at
Segmentation fault (core dumped)

We followed your suggestions in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #3 by Morganh

suggestion 1: change the .etlt file to .onnx file

suggestion 2: change the .tlt file to .hdf5 file, and then export to .onnx file using TAO5

Outcome:

In both suggestions, what we ended up with is an .onnx file. I can build float 16 engine from the .onnx file for DeepStream 8.0, but I cannot build int 8 engine because both suggestion only provide an .onnx file, there is no new calibration cache (cal.bin) file. The calibration cache file exported by TAO 3.0 cannot be used with the .onnx file.

We followed your suggestions in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #7 by Morganh

Please use 5.5 tao-deploy docker to generate cal.bin again.

Outcome:

Failed to build int8 engine with the following errors:

[I] [TRT] Calibration table does not match calibrator algorithm type.
[E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )

We followed your suggestions in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #9 by Morganh

For QAT workflow […]. You need not use --cal_cache_file . Please refer to below and retry. Thanks.

We shared the outcomes for two cases in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #10 by hyperlight

Trained using TAO 3.0 + Exported using TAO 5.5
Trained using TAO 5.5 + Exported using TAO 5.5

Again, in both cases, we ended up with is an .onnx file:

float 16 engine: build works, detection works
int 8 engine: build works, detection failed due to missing calibration cache file.

We followed your suggestions in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #11 by Morganh and Does TAO 5.0 support exporting a model trained by TAO 3.0? - #13 by Morganh

All of above items follow tao’s pipeline. Temporally not use trtexec and deepstream. Just to narrow down to check if tao’s pipeline works.

Could you do one more experiment? Please use the trtexec which is inside the deploy docker to generate int8 engine and run inference again.

Outcome:

We can create an int8 engine and the int8 engine also successfully detected the objects and drew bounding boxes. The deploy docker gives the same outcomes as running yolo_v4 gen_trt_engine and yolo_v4 inference to generate int8 engine and run inference using TAO launcher.

We followed your suggestions in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #15 by Morganh

You can add --cal_cache_file /path/to/export_qat/cal.bin inyolo_v4 gen_trt_engine command line. Then, cal.bin will be generated.

Outcome:

Failed to build int8 engine with the following errors:

[I] [TRT] Calibration table does not match calibrator algorithm type.
[E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )

Please let me state our goal again:

Our goal is to deploy yolov4 model trained by TAO 3.0 as int8 precision egnine in DeepStream 8.0, how do we do that?

Morganh · November 27, 2025, 2:16am

OK, could you please share below files via private message?

TAO 3.0 .tlt file and .hdf5 file, model_key
TAO 3.0 exported files (yolo4.etlt and yolo4.bin) and .onnx file , model_key
several images

I will try to reproduce and debug further. Thanks!

Morganh · November 27, 2025, 2:49am

More, the official way to deploy TAO3.0 model in deepstream is to follow the official github in GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.
You can checkout the corresponding branch as below.

You can also find the Deestream version in “Prerequisites” of readme for each branch.

Morganh · November 27, 2025, 3:24am

hyperlight:

We followed your suggestions in Does TAO 5.0 support exporting a model trained by TAO 3.0? - #15 by Morganh

You can add --cal_cache_file /path/to/export_qat/cal.bin inyolo_v4 gen_trt_engine command line. Then, cal.bin will be generated.

Outcome:

Failed to build int8 engine with the following errors:
[I] [TRT] Calibration table does not match calibrator algorithm type.
[E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )

Please use a new name cal_qat.bin to avoid using existing cal.bin. Thanks.
--cal_cache_file $USER_EXPERIMENT_DIR/export_qat/cal_qat.bin

hyperlight · November 27, 2025, 10:50pm

Hi @Morganh,

I sent you the files you requested.

We deployed yolov4 model trained by TAO 3.0 as int8 precision egnine using tao-converter up to DS6.4. We are upgrading to DS8.0 and tao-converter is no longer supported.

We tried that and got the same error:

[I] [TRT] Calibration table does not match calibrator algorithm type.
[E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "Input" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )

Morganh · November 30, 2025, 2:43pm

Well received. I will check further about the trtexec.

Topic		Replies	Views
Tao deploy error - TAO Toolkit jetson , deepstream	43	815	August 25, 2025
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	42	1650	August 18, 2023
Convert TAO Yolov4 model to DLA engine fails TAO Toolkit	21	1982	February 15, 2022
Unable to generate tensorrt engine using ds-tao-detection app for yolov4_tiny for QAT trained etlt model DeepStream SDK	15	793	May 31, 2023
How to integrate TAO custom trained model 's tao export and tao deploy files with Deep Streem DeepStream SDK	7	544	January 18, 2023
Unable to export QAT yolov3 in int8 TAO Toolkit	6	668	April 25, 2023
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	20	2326	January 30, 2022
Error when generating engine file from a TAO trained yolov4_tiny model in Deepstream 6.1.1 DeepStream SDK	10	597	June 12, 2023
TAO Toolkit Yolov4 DeepStream SDK	19	750	November 30, 2022
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	60	2192	August 28, 2023

Does TAO 5.0 support exporting a model trained by TAO 3.0?

Related topics