Deploy Object Detection TF-TRT INT8 with DS Triton

virsg · December 17, 2020, 1:49am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) T4
• DeepStream Version DeepStream Triton via container deepstream:5.0.1-20.09-triton
• JetPack Version (valid for Jetson only)
• TensorRT Version * TensorRT 7.0.
• NVIDIA GPU Driver Version (valid for GPU only) 450.51.06
• Issue Type( questions, new requirements, bugs)

I need to deploy the optimized model TF-TRT INT8 faster_rcnn_inception_v2_coco_2018_01_28 using DeepStream-Triton container. I am using as an example this blog Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server | NVIDIA Technical Blog, but the referenced script doesn’t include the option to optimize the model as TF-TRT INT8

What script is recommended to convert the model to TF-TRT INT8?. I have used this script https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py and seeing performance degradation

mchi · December 17, 2020, 1:57pm

hI @virsg,
DS-Triton doesn’t support TF-TRT INT8 online build, only FP32/FP16 supported.
But DS-Triton can support offline prebuilt TF-TRT INT8 model files, that is, you can refer to Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation to build INT8 saved model, and pass this saved model to DS-Triton (dsnvinferserver).
Note, current DS (DS5.x) only supports TF1.X.

virsg · December 17, 2020, 3:03pm

hi @mchi, in fact I created the offline prebuilt TF-TRT INT8 and passed the saved model to DS-Triton (dsnvinferserver) but I am seeing performance degradation. To build INT8 model I used the below script https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py which implements the building part of it, and the docker image nvcr.io/nvidia/tensorflow:20.02-tf2-py3 (since with TF1.X. the script threw errors):

What do you think could be the reason for the performance degradation?, building the model with TF2.X instead of TF1x.X?, see below deployment performance with Streams=1, BS=4, Count instance=1
TF FP32: 21 fps
TF-TRT FP16: 55 fps
TF-TRT INT8: 34 fps

mchi · December 17, 2020, 3:30pm

how about the perf if you just user TF-TRT to do the infer ?

virsg · December 17, 2020, 3:58pm

The scripts I am using https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py is showing this performance for TF-TRT INT8: images/sec: 45.

What script do you recommend to convert the object detection model to TF-TRT INT8 with NMS implementation?

mchi · December 21, 2020, 2:54am

The script should be fine.

I think the possible reason of INT8 slower than FP32 is, with INT8 on TRT and FP32 on TF, there are extra format conversion comparting FP32 on TRT and TF.

To dig out more clues about the perf difference, I think

use tensorboard to check if the same layers running on TF and TRT for INT8 and FP32,
or you may could find the information also in the build verbose log
use Nsight system to profile the inference part to find out the details about INT8 is slower than FP32.

And, note

As the perf data in Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation, TF-TRT INT8 is not always faster than FP32
Current DeepStream only support TF1.x
TRT supports NMS, is it possible to convert your model to ONNX to run with TRT?

virsg · January 12, 2021, 1:39am

Hi @mchi, I was able to optimize the model faster_rcnn_inception_v2 model to TF-TRT INT8 with NMS enabled (ops placed on the CPU) using TF 1.5.2 and the script https://github.com/tensorflow/tensorrt/tree/r1.14+/tftrt/examples/object_detection. So I got performance improvement with nms enable vs nms disable:
TF-TRT-INT8 (nms enabled): ~96FPS
TF-TRT-INT8 (no nms): ~43 FPS

The model was optimized with batch_size=8, image_shape=[600, 600], and minimum_segment_size=50. For DS-Triton deployment the max_batch_size=8

The issue is now when deploying the model to DeepStream-Triton, I got the below error Input shape axis 0 must equal 8, got shape [5,600,1024,3] (even though the model was optimized with BS=8):

I0112 01:06:22.313573 2643 model_repository_manager.cc:837] successfully loaded 'faster_rcnn_inception_v2' version 13
INFO: infer_trtis_backend.cpp:206 TrtISBackend id:1 initialized model: faster_rcnn_inception_v2
2021-01-12 01:06:36.202139: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_0 input shapes: [[8,600,1024,3]]
2021-01-12 01:06:36.202311: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2021-01-12 01:06:36.203128: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
2021-01-12 01:09:20.678239: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2021-01-12 01:09:20.709545: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_1 input shapes: [[800,14,14,576]]
2021-01-12 01:10:01.273658: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles

Runtime commands:
        h: Print this help
        q: Quit

        p: Pause
        r: Resume


**PERF:  FPS 0 (Avg)
**PERF:  0.00 (0.00)
** INFO: <bus_callback:181>: Pipeline ready

** INFO: <bus_callback:167>: Pipeline running

ERROR: infer_trtis_server.cpp:276 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
  (1) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
         [[ExpandDims_4/_199]]
0 successful operations.
0 derived errors ignored.
ERROR: infer_trtis_backend.cpp:532 TRTIS server failed to parse response with request-id:1 model:
0:03:46.539871495  2643 0x7f0cf80022a0 WARN           nvinferserver gstnvinferserver.cpp:519:gst_nvinfer_server_push_buffer:<primary_gie> error: inference failed with unique-id:1
ERROR from primary_gie: inference failed with unique-id:1
Debug info: gstnvinferserver.cpp(519): gst_nvinfer_server_push_buffer (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
Quitting
ERROR: infer_trtis_server.cpp:276 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
  (1) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
         [[ExpandDims_4/_199]]
0 successful operations.
0 derived errors ignored.
ERROR: infer_trtis_backend.cpp:532 TRTIS server failed to parse response with request-id:2 model:
ERROR from qtdemux0: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin0/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin0/GstQTDemux:qtdemux0:
streaming stopped, reason custom-error (-112)
I0112 01:10:01.644682 2643 model_repository_manager.cc:708] unloading: faster_rcnn_inception_v2:13
I0112 01:10:01.917792 2643 model_repository_manager.cc:816] successfully unloaded 'faster_rcnn_inception_v2' version 13
I0112 01:10:01.918447 2643 server.cc:179] Waiting for in-flight inferences to complete.
I0112 01:10:01.918460 2643 server.cc:194] Timeout 30: Found 0 live models and 0 in-flight requests
App run failed

Some recommendation on how to fix the input shape issue?

mchi · January 15, 2021, 3:51pm

Sorry for delay!
Still not yet get clear clues about this issue, will continue to check this.
btw, this error can also find from network.

virsg:

ERROR: infer_trtis_server.cpp:276 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
  (1) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
         [[ExpandDims_4/_199]]

virsg · January 15, 2021, 4:02pm

hi @mchi, it seems there is an issue with the TensorFlow Object Detection API producing incomplete input shapes when exporting the graph, reported issue at Incomplete shape error while exporting graph from object detection API · Issue #6159 · tensorflow/models · GitHub

I need to optimize the model as INT8 with NMS Ops placed on the CPU, and deploy it with DS-Triton, what do you recommend me?

mchi · January 19, 2021, 3:39pm

Hi @virsg
Thanks for your info!
If it’s relaed to API compability, maybe it’s because you are using TF2.x while it’s TF1.x in current DS.
So, you need to use TF1.x to create the INT8 model.

virsg · January 19, 2021, 3:47pm

Hi @mchi, sorry but I ran the new tests using TF 1.5.2 and it seems there is an issue with the TensorFlow Object Detection API . What approach do you recommend to run the optimized model with native TRT?

mchi · January 19, 2021, 3:54pm

In your model, is there any layer that TRT and TRT plugin don’t support ?

TRT supported layer - Support Matrix :: NVIDIA Deep Learning TensorRT Documentation
TRT plugins - TensorRT/plugin at master · NVIDIA/TensorRT · GitHub

The recommended way to run with natibve TRT is:

convert your model to ONNX - TensorRT/ONNX - eLinux.org
2.deploy onnx with TRT. ONNX is well supported by TRT

virsg · January 20, 2021, 7:02pm

Hi @mchi, I am using the model the TensorFlow 1 Detection Model Zoo faster_rcnn_inception_v2_coco. It has the layer TensorArrayGatherV3 which is not included neither in the TRT supported layers-matrix nor in the TRT plugins,

I have tried several methods with native TRT without success:
Method 1: Parsing the model to ONNX, then convert ONNX model to a tensorrt engine
1.1 Model converted to onnx
2.2 Generating the TRT engine
$ trtexec --onnx=//faster_rcnn_inceptionv2_coco_updated_model_opset12.onnx --explicitBatch
Error:
Unsupported ONNX data type: UINT8 (2)
ERROR: image_tensor:0:189 In function importInput:
[8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype)
[01/19/2021-14:52:42] [E] Failed to parse onnx file
[01/19/2021-14:52:42] [E] Parsing model failed
[01/19/2021-14:52:42] [E] Engine creation failed
[01/19/2021-14:52:42] [E] Engine set up failed

2.3 After applying a patch model to solve the Unsupported ONNX data type: UINT8 (2) issue I got a new error:
Error:
While parsing node number 7 [Loop]:
ERROR: ModelImporter.cpp:92 In function parseGraph:
[8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx)
[01/19/2021-20:35:59] [E] Failed to parse onnx file
[01/19/2021-20:35:59] [E] Parsing model failed
[01/19/2021-20:35:59] [E] Engine creation failed
[01/19/2021-20:35:59] [E] Engine set up failed

Method 2: Parsing the model to UFF, then run the model with TRT
2.1 Parsing the model to uff format:
$ python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py /faster_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb -o faster_rcnn_inception_v2_coco.uff
Error:
Using output node detection_boxes
Using output node detection_scores
Using output node num_detections
Using output node detection_classes
Converting to UFF graph
Warning: No conversion function registered for layer: TensorArrayGatherV3 yet.
…
Converting Preprocessor/map/while/TensorArrayReadV3/Enter as custom op: Enter
Traceback (most recent call last):
File “/usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py”, line 96, in
main()
File “/usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py”, line 92, in main
debug_mode=args.debug
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/conversion_helpers.py”, line 229, in from_tensorflow_frozen_model
return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/conversion_helpers.py”, line 178, in from_tensorflow
debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 94, in convert_tf2uff_graph
uff_graph, input_replacements, debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 79, in convert_tf2uff_node
op, name, tf_node, inputs, uff_graph, tf_nodes=tf_nodes, debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 41, in convert_layer
fields = cls.parse_tf_attrs(tf_node.attr)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 222, in parse_tf_attrs
return {key: cls.parse_tf_attr_value(val) for key, val in attrs.items() if val is not None and val.WhichOneof(‘value’) is not None}
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 222, in
return {key: cls.parse_tf_attr_value(val) for key, val in attrs.items() if val is not None and val.WhichOneof(‘value’) is not None}
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 218, in parse_tf_attr_value
return cls.convert_tf2uff_field(code, val)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 190, in convert_tf2uff_field
return TensorFlowToUFFConverter.convert_tf2numpy_dtype(val)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 103, in convert_tf2numpy_dtype
return tf.as_dtype(dtype).as_numpy_dtype
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py”, line 126, in as_numpy_dtype
return _TF_TO_NP[self._type_enum]
KeyError: 20

What possible solution can I apply?. Should I use another model? What pre-trained object detection model do you recommend to be optimized as TRT INT8 with NMS Ops placed on the CPU and deploy with DS-Triton?, I was following this blog Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server | NVIDIA Technical Blog as an example but for some reason they didn’t include INT8 precision with NMS Ops placed on the CPU

mchi · February 2, 2021, 1:31pm

Confirmed internally, this tf.op is not supported by TRT.
If you would not implement it as TRT plugin or replace it with other op that TRT supports, you have to run the model with Triton.

Thanks!

virsg · February 2, 2021, 2:31pm

Hi @mchi, correct, I don’t see the tf.op TensorArrayGatherV3 in the TRT plugins list. Implementing it as TRT plugin means writing the code from scratch as the other plugins at https://github.com/NVIDIA/TensorRT/tree/master/plugin.?

mchi · February 8, 2021, 5:11am

yes.

Topic		Replies	Views
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1494	October 12, 2021
Use pre-trained object detection TF2 models with TensorRT ONNX TensorRT	9	1925	May 31, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1409	July 12, 2022
I am trying to convert the ONNX SSD mobilnet v3 model into TensorRT Engine. I am getting the below error Jetson TX2 tensorrt , tensorflow	24	3701	February 17, 2022
Problem converting ONNX model to TensorRT Engine for SSD Mobilenet V2 Jetson Nano tensorrt , nvbugs , ssd , onnx	38	8763	October 18, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3611	April 20, 2022
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	720	April 30, 2024
Calibration failed: INTERNAL: Failed to build TensorRT engine (INT8 precision mode) in Jetson Xavier NX (16GB) Jetson Xavier NX tensorrt	9	751	April 12, 2023
Cannot convert SSD ONNX model to TensorRT TensorRT tensorrt	15	2358	November 23, 2022
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1292	October 12, 2021

Deploy Object Detection TF-TRT INT8 with DS Triton

Related topics