Please provide the following information when requesting support.
• Hardware (DGPU)
• Network Type (Classification TF_2)
• TLT Version (4.0.0)
• Training spec file(default)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
When training a classification model, we normally use the TF_1 version, but we noticed the TF_2 can do QAT. After training there is no option to output a cache/calibrationfile. With TF_1 we use the calibrationfile to convert it to an INT8 model on out NX’es. Do we still need a calibrationfile for generating a TRT model for the Jetson platform? And if so, can you point out how to create a calibration file?
For Classification TF_2, there is a parameter qat in training section. User can set it to true or false. You can download the notebook and refer to the specs folder. If the model is trained with QAT enabled, there is no need to calibrate one more time when exporting the model.
Thanx Morghan! When converting the telt on gpu with the TAO-deploy, all is well.
When using the TAO converter (3.2) on Jetson, it gives me the following error:
xxxx@host1:/runtime/models# sudo ./tao-converter /runtime/models/primary/final_model.etlt -k -o predictions/Softmax -d 3,224,224 -i nchw -m 64 -t int8 -e /runtime/models/primary/final_model.trt -b 64
[INFO] [MemUsageChange] Init CUDA: CPU +188, GPU +0, now: CPU 213, GPU 4469 (MiB)
[INFO] [MemUsageChange] Init builder kernel library: CPU +107, GPU +116, now: CPU 342, GPU 4604 (MiB)
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Interpreting non ascii codepoint 137.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Expected identifier, got: �
[ERROR] ModelImporter.cpp:688: Failed to parse ONNX model from file: /tmp/filexRAUBs
[ERROR] Failed to parse the model, please check the encoding key to make sure it’s correct
[INFO] Model has no dynamic shape.
[ERROR] 4: [network.cpp::validate::2770] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault
2023-05-29 14:43:08,993 [INFO] root: Registry: [‘nvcr.io’]
2023-05-29 14:43:09,028 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1
2023-05-29 12:43:10.427480: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
Log file already exists at /workspace/tao-experiments/classification_tf2/output_retrain_qat/status.json
Starting classification export.
Signatures found in model: [serving_default].
Output names: [‘predictions’]
Using tensorflow=2.9.1, onnx=1.12.0, tf2onnx=1.12.0/ddca3a
Using opset <onnx, 13>
Computed 0 values for constant folding
Optimizing ONNX model
After optimization: BatchNormalization -42 (49->7), Cast -1 (33->32), Const -378 (569->191), GlobalAveragePool +16 (0->16), Identity -2 (2->0), ReduceMean -16 (16->0), Reshape -16 (33->17), Transpose -17 (17->0), Unsqueeze -64 (64->0)
The etlt model is saved at /workspace/tao-experiments/classification_tf2/export_qat/efficientnet-b0.qat.etlt
Export finished successfully.
Sending telemetry data.
Telemetry data couldn’t be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: PASS
2023-05-29 14:43:27,299 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks