Migrating QAT faster_rcnn model to DS6.1.1

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) V100
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) faster_rcnn
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.22.05-tf1.15.5-py3

• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
DeepStream: nvcr.io/nvidia/deepstream:6.1.1-devel & base

Hi All,

a while back I have trained QAT faster-rcnn QAT model using Tao and successfully integrated it with DS5.1
Output of old tao info was:

Configuration of the TAO Toolkit Instance dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm'] format_version: 1.0 toolkit_version: 3.21.08 published_date: 08/17/2021

Now I would like to migrate that model and use it with DS6.1.1 as per Readme First — DeepStream 6.1.1 Release documentation but unfortunately there is not much guidance provided.

My understanding is that I have to again use tao faster_rcnn export to export .tlt to .etlt and calib cache.

I have done so with the command below:

!tao faster_rcnn export --gpus 1 --gpu_index 1 \
-m /workspace/tao-experiments/input_model_qat.tlt \
-o /workspace/tao-experiments/detector_qat.etlt -k $KEY \
-e /workspace/tao-experiments/specs/my_spec.txt \
--data_type int8 \
--cal_cache_file /workspace/tao-experiments/calibration_qat.bin \
--gen_ds_config \
-v

Export ends with the following:

NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: Proposal yet.
Converting proposal as custom op: Proposal
DEBUG: convert reshape to flatten node
Warning: No conversion function registered for layer: CropAndResize yet.
Converting roi_pooling_conv_1/CropAndResize_new as custom op: CropAndResize
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
2022-11-09 10:58:54,167 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Tao container that was pulled now is:

!tao info --verbose
Configuration of the TAO Toolkit Instance

dockers: 		
	nvidia/tao/tao-toolkit-tf: 			
		v3.22.05-tf1.15.5-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. augment
				2. bpnet
				3. classification
				4. dssd
				5. faster_rcnn
				6. emotionnet
				7. efficientdet
				8. fpenet
				9. gazenet
				10. gesturenet
				11. heartratenet
				12. lprnet
				13. mask_rcnn
				14. multitask_classification
				15. retinanet
				16. ssd
				17. unet
				18. yolo_v3
				19. yolo_v4
				20. yolo_v4_tiny
				21. converter
		v3.22.05-tf1.15.4-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. detectnet_v2
	nvidia/tao/tao-toolkit-pyt: 			
		v3.22.05-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. speech_to_text
				2. speech_to_text_citrinet
				3. speech_to_text_conformer
				4. action_recognition
				5. pointpillars
				6. pose_classification
				7. spectro_gen
				8. vocoder
		v3.21.11-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. text_classification
				2. question_answering
				3. token_classification
				4. intent_slot_classification
				5. punctuation_and_capitalization
	nvidia/tao/tao-toolkit-lm: 			
		v3.22.05-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

I have build and supplied new libnvdsinfer_plugin.so as described in FasterRCNN — TAO Toolkit 3.22.05 documentation
And build and copied libnvds_infercustomparser_tao.so from GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream changing config_infer.txt accordingly.

However, when I run my app with new etlt/calib cache, it fails like so:

0:00:00.222066423    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1170> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
ERROR: [TRT]: 1: [stdArchiveReader.cpp::StdArchiveReader::30] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: [TRT]: 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1528 Deserialize engine failed from file: /root/ds_app_config_files/detector_qat.etlt
0:00:01.734298285    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/root/cameraModel/ds611_tao/detector_qat.etlt failed
0:00:01.782751325    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/root/cameraModel/ds611_tao/detector_qat.etlt failed, try rebuild
0:00:01.782779680    80 0x7fb254002390 INFO                 nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:860 failed to build network since there is no model file matched.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:799 failed to build network.
0:00:02.723256048    80 0x7fb254002390 ERROR                nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1943> [UID = 1]: build engine file failed
0:00:02.742307267    80 0x7fb254002390 ERROR                nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2029> [UID = 1]: build backend context failed
0:00:02.742368341    80 0x7fb254002390 ERROR                nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1266> [UID = 1]: generate backend failed, check config file settings
0:00:02.742675571    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:846:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:02.742686352    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:846:gst_nvinfer_start:<primary_gie> error: Config file path: /root/ds_app_config_files/config_infer_tao.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: <main:1326>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(846): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /root/ds_app_config_files/config_infer_tao.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

What is the magic tag and how do I verify it? Did I miss something in the migration steps? Is my understanding of migrating QAT model correct?

I am aware of this post with similar issue ERROR: [TRT} stdArchiveReader ... Serialization assertion - #10 by Morganh but it does not seem to answer my problem.

Please comment out the model-engine-file line in your config file and retry.

Thanks @Morganh I was able to run the model after your comment - it was a stupid error after all.

To anyone else looking at this post - double check how you provide the model in inference config: model-engine-file is for, well, model.engine files while to use etlt file you need to provide it under tlt-encoded-model :)

Just a last set of questions:

  • I see quite a few warnings like Missing scale and zero-point for tensor ... - this is expected as not all the layers have their int8 alternatives, correct?
  • Should the users take any action regarding Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5 as OpenCV is now deprecated from DS? Is it confirmed that this change is not affecting the quality of detections?

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

  1. Yes.
  2. Yes, it is confirmed. Actually this is a deepstream topic. More info can be found in Application Migration to DeepStream 6.0 from DeepStream 5.X — DeepStream 6.0 Release documentation

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.