Migrating QAT faster_rcnn model to DS6.1.1

dddd713 · November 9, 2022, 1:27pm

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) V100
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) faster_rcnn
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.22.05-tf1.15.5-py3

• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
DeepStream: nvcr.io/nvidia/deepstream:6.1.1-devel & base

Hi All,

a while back I have trained QAT faster-rcnn QAT model using Tao and successfully integrated it with DS5.1
Output of old tao info was:

Configuration of the TAO Toolkit Instance dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm'] format_version: 1.0 toolkit_version: 3.21.08 published_date: 08/17/2021

Now I would like to migrate that model and use it with DS6.1.1 as per Readme First — DeepStream 6.1.1 Release documentation but unfortunately there is not much guidance provided.

My understanding is that I have to again use tao faster_rcnn export to export .tlt to .etlt and calib cache.

I have done so with the command below:

!tao faster_rcnn export --gpus 1 --gpu_index 1 \
-m /workspace/tao-experiments/input_model_qat.tlt \
-o /workspace/tao-experiments/detector_qat.etlt -k $KEY \
-e /workspace/tao-experiments/specs/my_spec.txt \
--data_type int8 \
--cal_cache_file /workspace/tao-experiments/calibration_qat.bin \
--gen_ds_config \
-v

Export ends with the following:

NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: Proposal yet.
Converting proposal as custom op: Proposal
DEBUG: convert reshape to flatten node
Warning: No conversion function registered for layer: CropAndResize yet.
Converting roi_pooling_conv_1/CropAndResize_new as custom op: CropAndResize
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
2022-11-09 10:58:54,167 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Tao container that was pulled now is:

!tao info --verbose
Configuration of the TAO Toolkit Instance

dockers: 		
	nvidia/tao/tao-toolkit-tf: 			
		v3.22.05-tf1.15.5-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. augment
				2. bpnet
				3. classification
				4. dssd
				5. faster_rcnn
				6. emotionnet
				7. efficientdet
				8. fpenet
				9. gazenet
				10. gesturenet
				11. heartratenet
				12. lprnet
				13. mask_rcnn
				14. multitask_classification
				15. retinanet
				16. ssd
				17. unet
				18. yolo_v3
				19. yolo_v4
				20. yolo_v4_tiny
				21. converter
		v3.22.05-tf1.15.4-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. detectnet_v2
	nvidia/tao/tao-toolkit-pyt: 			
		v3.22.05-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. speech_to_text
				2. speech_to_text_citrinet
				3. speech_to_text_conformer
				4. action_recognition
				5. pointpillars
				6. pose_classification
				7. spectro_gen
				8. vocoder
		v3.21.11-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. text_classification
				2. question_answering
				3. token_classification
				4. intent_slot_classification
				5. punctuation_and_capitalization
	nvidia/tao/tao-toolkit-lm: 			
		v3.22.05-py3: 				
			docker_registry: nvcr.io
			tasks: 
				1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

I have build and supplied new libnvdsinfer_plugin.so as described in FasterRCNN — TAO Toolkit 3.22.05 documentation
And build and copied libnvds_infercustomparser_tao.so from GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream changing config_infer.txt accordingly.

However, when I run my app with new etlt/calib cache, it fails like so:

0:00:00.222066423    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1170> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
ERROR: [TRT]: 1: [stdArchiveReader.cpp::StdArchiveReader::30] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: [TRT]: 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1528 Deserialize engine failed from file: /root/ds_app_config_files/detector_qat.etlt
0:00:01.734298285    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/root/cameraModel/ds611_tao/detector_qat.etlt failed
0:00:01.782751325    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/root/cameraModel/ds611_tao/detector_qat.etlt failed, try rebuild
0:00:01.782779680    80 0x7fb254002390 INFO                 nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:860 failed to build network since there is no model file matched.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:799 failed to build network.
0:00:02.723256048    80 0x7fb254002390 ERROR                nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1943> [UID = 1]: build engine file failed
0:00:02.742307267    80 0x7fb254002390 ERROR                nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2029> [UID = 1]: build backend context failed
0:00:02.742368341    80 0x7fb254002390 ERROR                nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1266> [UID = 1]: generate backend failed, check config file settings
0:00:02.742675571    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:846:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:02.742686352    80 0x7fb254002390 WARN                 nvinfer gstnvinfer.cpp:846:gst_nvinfer_start:<primary_gie> error: Config file path: /root/ds_app_config_files/config_infer_tao.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: <main:1326>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(846): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /root/ds_app_config_files/config_infer_tao.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

What is the magic tag and how do I verify it? Did I miss something in the migration steps? Is my understanding of migrating QAT model correct?

I am aware of this post with similar issue ERROR: [TRT} stdArchiveReader ... Serialization assertion - #10 by Morganh but it does not seem to answer my problem.

Morganh · November 9, 2022, 4:53pm

Please comment out the model-engine-file line in your config file and retry.

dddd713 · November 10, 2022, 11:24am

Thanks @Morganh I was able to run the model after your comment - it was a stupid error after all.

To anyone else looking at this post - double check how you provide the model in inference config: model-engine-file is for, well, model.engine files while to use etlt file you need to provide it under tlt-encoded-model :)

Just a last set of questions:

I see quite a few warnings like Missing scale and zero-point for tensor ... - this is expected as not all the layers have their int8 alternatives, correct?
Should the users take any action regarding Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5 as OpenCV is now deprecated from DS? Is it confirmed that this change is not affecting the quality of detections?

Morganh · November 13, 2022, 3:44pm

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Yes.
Yes, it is confirmed. Actually this is a deepstream topic. More info can be found in Application Migration to DeepStream 6.0 from DeepStream 5.X — DeepStream 6.0 Release documentation

system · November 29, 2022, 5:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO exported model used in deepstream failed DeepStream SDK deepstream	3	25	August 2, 2024
TAO and jetson (Jetpack compatibility ) TAO Toolkit tao , jetson , deepstream	6	1175	July 7, 2022
ERROR: [TRT} stdArchiveReader ... Serialization assertion TAO Toolkit tensorrt	12	6549	September 25, 2022
LPRNet trained with taotoolkit 4.0.0 does not compile with deepstream 6.1.1 TAO Toolkit	3	511	January 13, 2023
Tao Yolo_v4 transfer learning is not working with deepstream 5.0 DeepStream SDK	4	776	December 11, 2021
Issue while Feeding Pytorch TRT 2D-U-net model into Deepstream 6 TAO Toolkit tensorrt , deepstream , deepstream61	7	625	July 6, 2022
Not able to run Tao trained model in Deepstream pipeline TAO Toolkit tensorrt , tao , deepstream	11	644	March 22, 2023
Use re trainned model in deepstream TAO Toolkit deepstream	5	456	January 12, 2023
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	973	March 24, 2023
Failed to deploy efficientdet-tf1 in deepstream DeepStream SDK	2	372	November 6, 2023

Migrating QAT faster_rcnn model to DS6.1.1

Related topics