Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) V100
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) faster_rcnn
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.22.05-tf1.15.5-py3
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
DeepStream: nvcr.io/nvidia/deepstream:6.1.1-devel & base
Hi All,
a while back I have trained QAT faster-rcnn QAT model using Tao and successfully integrated it with DS5.1
Output of old tao info
was:
Configuration of the TAO Toolkit Instance dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm'] format_version: 1.0 toolkit_version: 3.21.08 published_date: 08/17/2021
Now I would like to migrate that model and use it with DS6.1.1 as per Readme First — DeepStream 6.1.1 Release documentation but unfortunately there is not much guidance provided.
My understanding is that I have to again use tao faster_rcnn export to export .tlt to .etlt and calib cache.
I have done so with the command below:
!tao faster_rcnn export --gpus 1 --gpu_index 1 \
-m /workspace/tao-experiments/input_model_qat.tlt \
-o /workspace/tao-experiments/detector_qat.etlt -k $KEY \
-e /workspace/tao-experiments/specs/my_spec.txt \
--data_type int8 \
--cal_cache_file /workspace/tao-experiments/calibration_qat.bin \
--gen_ds_config \
-v
Export ends with the following:
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: Proposal yet.
Converting proposal as custom op: Proposal
DEBUG: convert reshape to flatten node
Warning: No conversion function registered for layer: CropAndResize yet.
Converting roi_pooling_conv_1/CropAndResize_new as custom op: CropAndResize
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
2022-11-09 10:58:54,167 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Tao container that was pulled now is:
!tao info --verbose
Configuration of the TAO Toolkit Instance
dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022
I have build and supplied new libnvdsinfer_plugin.so as described in FasterRCNN — TAO Toolkit 3.22.05 documentation
And build and copied libnvds_infercustomparser_tao.so from GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream changing config_infer.txt accordingly.
However, when I run my app with new etlt/calib cache, it fails like so:
0:00:00.222066423 80 0x7fb254002390 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1170> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
ERROR: [TRT]: 1: [stdArchiveReader.cpp::StdArchiveReader::30] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: [TRT]: 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1528 Deserialize engine failed from file: /root/ds_app_config_files/detector_qat.etlt
0:00:01.734298285 80 0x7fb254002390 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/root/cameraModel/ds611_tao/detector_qat.etlt failed
0:00:01.782751325 80 0x7fb254002390 WARN nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/root/cameraModel/ds611_tao/detector_qat.etlt failed, try rebuild
0:00:01.782779680 80 0x7fb254002390 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:860 failed to build network since there is no model file matched.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:799 failed to build network.
0:00:02.723256048 80 0x7fb254002390 ERROR nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1943> [UID = 1]: build engine file failed
0:00:02.742307267 80 0x7fb254002390 ERROR nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2029> [UID = 1]: build backend context failed
0:00:02.742368341 80 0x7fb254002390 ERROR nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1266> [UID = 1]: generate backend failed, check config file settings
0:00:02.742675571 80 0x7fb254002390 WARN nvinfer gstnvinfer.cpp:846:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:02.742686352 80 0x7fb254002390 WARN nvinfer gstnvinfer.cpp:846:gst_nvinfer_start:<primary_gie> error: Config file path: /root/ds_app_config_files/config_infer_tao.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: <main:1326>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(846): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /root/ds_app_config_files/config_infer_tao.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed
What is the magic tag and how do I verify it? Did I miss something in the migration steps? Is my understanding of migrating QAT model correct?
I am aware of this post with similar issue ERROR: [TRT} stdArchiveReader ... Serialization assertion - #10 by Morganh but it does not seem to answer my problem.