Deploy Object Detection TF-TRT INT8 with DS Triton

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) T4
• DeepStream Version DeepStream Triton via container deepstream:5.0.1-20.09-triton
• JetPack Version (valid for Jetson only)
• TensorRT Version * TensorRT 7.0.
• NVIDIA GPU Driver Version (valid for GPU only) 450.51.06
• Issue Type( questions, new requirements, bugs)

I need to deploy the optimized model TF-TRT INT8 faster_rcnn_inception_v2_coco_2018_01_28 using DeepStream-Triton container. I am using as an example this blog https://developer.nvidia.com/blog/deploying-models-from-tensorflow-model-zoo-using-deepstream-and-triton-inference-server/, but the referenced script doesn’t include the option to optimize the model as TF-TRT INT8

What script is recommended to convert the model to TF-TRT INT8?. I have used this script https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py and seeing performance degradation

hI @virsg,
DS-Triton doesn’t support TF-TRT INT8 online build, only FP32/FP16 supported.
But DS-Triton can support offline prebuilt TF-TRT INT8 model files, that is, you can refer to https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html to build INT8 saved model, and pass this saved model to DS-Triton (dsnvinferserver).
Note, current DS (DS5.x) only supports TF1.X.

hi @mchi, in fact I created the offline prebuilt TF-TRT INT8 and passed the saved model to DS-Triton (dsnvinferserver) but I am seeing performance degradation. To build INT8 model I used the below script https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py which implements the building part of it, and the docker image nvcr.io/nvidia/tensorflow:20.02-tf2-py3 (since with TF1.X. the script threw errors):
image

What do you think could be the reason for the performance degradation?, building the model with TF2.X instead of TF1x.X?, see below deployment performance with Streams=1, BS=4, Count instance=1
TF FP32: 21 fps
TF-TRT FP16: 55 fps
TF-TRT INT8: 34 fps

how about the perf if you just user TF-TRT to do the infer ?

The scripts I am using https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py is showing this performance for TF-TRT INT8: images/sec: 45.

What script do you recommend to convert the object detection model to TF-TRT INT8 with NMS implementation?

The script should be fine.

I think the possible reason of INT8 slower than FP32 is, with INT8 on TRT and FP32 on TF, there are extra format conversion comparting FP32 on TRT and TF.

To dig out more clues about the perf difference, I think

  1. use tensorboard to check if the same layers running on TF and TRT for INT8 and FP32,
    or you may could find the information also in the build verbose log
  2. use Nsight system to profile the inference part to find out the details about INT8 is slower than FP32.

And, note

  1. As the perf data in https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#verified-models, TF-TRT INT8 is not always faster than FP32
  2. Current DeepStream only support TF1.x
  3. TRT supports NMS, is it possible to convert your model to ONNX to run with TRT?

Hi @mchi, I was able to optimize the model faster_rcnn_inception_v2 model to TF-TRT INT8 with NMS enabled (ops placed on the CPU) using TF 1.5.2 and the script https://github.com/tensorflow/tensorrt/tree/r1.14+/tftrt/examples/object_detection. So I got performance improvement with nms enable vs nms disable:
TF-TRT-INT8 (nms enabled): ~96FPS
TF-TRT-INT8 (no nms): ~43 FPS

The model was optimized with batch_size=8, image_shape=[600, 600], and minimum_segment_size=50. For DS-Triton deployment the max_batch_size=8

The issue is now when deploying the model to DeepStream-Triton, I got the below error Input shape axis 0 must equal 8, got shape [5,600,1024,3] (even though the model was optimized with BS=8):

I0112 01:06:22.313573 2643 model_repository_manager.cc:837] successfully loaded 'faster_rcnn_inception_v2' version 13
INFO: infer_trtis_backend.cpp:206 TrtISBackend id:1 initialized model: faster_rcnn_inception_v2
2021-01-12 01:06:36.202139: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_0 input shapes: [[8,600,1024,3]]
2021-01-12 01:06:36.202311: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2021-01-12 01:06:36.203128: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
2021-01-12 01:09:20.678239: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2021-01-12 01:09:20.709545: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_1 input shapes: [[800,14,14,576]]
2021-01-12 01:10:01.273658: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles

Runtime commands:
        h: Print this help
        q: Quit

        p: Pause
        r: Resume


**PERF:  FPS 0 (Avg)
**PERF:  0.00 (0.00)
** INFO: <bus_callback:181>: Pipeline ready

** INFO: <bus_callback:167>: Pipeline running

ERROR: infer_trtis_server.cpp:276 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
  (1) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
         [[ExpandDims_4/_199]]
0 successful operations.
0 derived errors ignored.
ERROR: infer_trtis_backend.cpp:532 TRTIS server failed to parse response with request-id:1 model:
0:03:46.539871495  2643 0x7f0cf80022a0 WARN           nvinferserver gstnvinferserver.cpp:519:gst_nvinfer_server_push_buffer:<primary_gie> error: inference failed with unique-id:1
ERROR from primary_gie: inference failed with unique-id:1
Debug info: gstnvinferserver.cpp(519): gst_nvinfer_server_push_buffer (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
Quitting
ERROR: infer_trtis_server.cpp:276 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:2 root error(s) found.
  (0) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
  (1) Invalid argument: Input shape axis 0 must equal 8, got shape [5,600,1024,3]
         [[{{node Preprocessor/unstack}}]]
         [[ExpandDims_4/_199]]
0 successful operations.
0 derived errors ignored.
ERROR: infer_trtis_backend.cpp:532 TRTIS server failed to parse response with request-id:2 model:
ERROR from qtdemux0: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin0/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin0/GstQTDemux:qtdemux0:
streaming stopped, reason custom-error (-112)
I0112 01:10:01.644682 2643 model_repository_manager.cc:708] unloading: faster_rcnn_inception_v2:13
I0112 01:10:01.917792 2643 model_repository_manager.cc:816] successfully unloaded 'faster_rcnn_inception_v2' version 13
I0112 01:10:01.918447 2643 server.cc:179] Waiting for in-flight inferences to complete.
I0112 01:10:01.918460 2643 server.cc:194] Timeout 30: Found 0 live models and 0 in-flight requests
App run failed

Some recommendation on how to fix the input shape issue?

Sorry for delay!
Still not yet get clear clues about this issue, will continue to check this.
btw, this error can also find from network.

hi @mchi, it seems there is an issue with the TensorFlow Object Detection API producing incomplete input shapes when exporting the graph, reported issue at https://github.com/tensorflow/models/issues/6159

I need to optimize the model as INT8 with NMS Ops placed on the CPU, and deploy it with DS-Triton, what do you recommend me?

Hi @virsg
Thanks for your info!
If it’s relaed to API compability, maybe it’s because you are using TF2.x while it’s TF1.x in current DS.
So, you need to use TF1.x to create the INT8 model.

Hi @mchi, sorry but I ran the new tests using TF 1.5.2 and it seems there is an issue with the TensorFlow Object Detection API . What approach do you recommend to run the optimized model with native TRT?

In your model, is there any layer that TRT and TRT plugin don’t support ?

TRT supported layer - https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#layers-matrix
TRT plugins - https://github.com/NVIDIA/TensorRT/tree/master/plugin

The recommended way to run with natibve TRT is:

  1. convert your model to ONNX - https://elinux.org/TensorRT/ONNX
    2.deploy onnx with TRT. ONNX is well supported by TRT

Hi @mchi, I am using the model the TensorFlow 1 Detection Model Zoo faster_rcnn_inception_v2_coco. It has the layer TensorArrayGatherV3 which is not included neither in the TRT supported layers-matrix nor in the TRT plugins,

I have tried several methods with native TRT without success:
Method 1: Parsing the model to ONNX, then convert ONNX model to a tensorrt engine
1.1 Model converted to onnx
2.2 Generating the TRT engine
$ trtexec --onnx=//faster_rcnn_inceptionv2_coco_updated_model_opset12.onnx --explicitBatch
Error:
Unsupported ONNX data type: UINT8 (2)
ERROR: image_tensor:0:189 In function importInput:
[8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype)
[01/19/2021-14:52:42] [E] Failed to parse onnx file
[01/19/2021-14:52:42] [E] Parsing model failed
[01/19/2021-14:52:42] [E] Engine creation failed
[01/19/2021-14:52:42] [E] Engine set up failed

2.3 After applying a patch model to solve the Unsupported ONNX data type: UINT8 (2) issue I got a new error:
Error:
While parsing node number 7 [Loop]:
ERROR: ModelImporter.cpp:92 In function parseGraph:
[8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx)
[01/19/2021-20:35:59] [E] Failed to parse onnx file
[01/19/2021-20:35:59] [E] Parsing model failed
[01/19/2021-20:35:59] [E] Engine creation failed
[01/19/2021-20:35:59] [E] Engine set up failed

Method 2: Parsing the model to UFF, then run the model with TRT
2.1 Parsing the model to uff format:
$ python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py /faster_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb -o faster_rcnn_inception_v2_coco.uff
Error:
Using output node detection_boxes
Using output node detection_scores
Using output node num_detections
Using output node detection_classes
Converting to UFF graph
Warning: No conversion function registered for layer: TensorArrayGatherV3 yet.

Converting Preprocessor/map/while/TensorArrayReadV3/Enter as custom op: Enter
Traceback (most recent call last):
File “/usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py”, line 96, in
main()
File “/usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py”, line 92, in main
debug_mode=args.debug
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/conversion_helpers.py”, line 229, in from_tensorflow_frozen_model
return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/conversion_helpers.py”, line 178, in from_tensorflow
debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 94, in convert_tf2uff_graph
uff_graph, input_replacements, debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 79, in convert_tf2uff_node
op, name, tf_node, inputs, uff_graph, tf_nodes=tf_nodes, debug_mode=debug_mode)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 41, in convert_layer
fields = cls.parse_tf_attrs(tf_node.attr)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 222, in parse_tf_attrs
return {key: cls.parse_tf_attr_value(val) for key, val in attrs.items() if val is not None and val.WhichOneof(‘value’) is not None}
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 222, in
return {key: cls.parse_tf_attr_value(val) for key, val in attrs.items() if val is not None and val.WhichOneof(‘value’) is not None}
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 218, in parse_tf_attr_value
return cls.convert_tf2uff_field(code, val)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 190, in convert_tf2uff_field
return TensorFlowToUFFConverter.convert_tf2numpy_dtype(val)
File “/usr/lib/python3.6/dist-packages/uff/bin/…/…/uff/converters/tensorflow/converter.py”, line 103, in convert_tf2numpy_dtype
return tf.as_dtype(dtype).as_numpy_dtype
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py”, line 126, in as_numpy_dtype
return _TF_TO_NP[self._type_enum]
KeyError: 20

What possible solution can I apply?. Should I use another model? What pre-trained object detection model do you recommend to be optimized as TRT INT8 with NMS Ops placed on the CPU and deploy with DS-Triton?, I was following this blog https://developer.nvidia.com/blog/deploying-models-from-tensorflow-model-zoo-using-deepstream-and-triton-inference-server/ as an example but for some reason they didn’t include INT8 precision with NMS Ops placed on the CPU

Confirmed internally, this tf.op is not supported by TRT.
If you would not implement it as TRT plugin or replace it with other op that TRT supports, you have to run the model with Triton.

Thanks!

Hi @mchi, correct, I don’t see the tf.op TensorArrayGatherV3 in the TRT plugins list. Implementing it as TRT plugin means writing the code from scratch as the other plugins at https://github.com/NVIDIA/TensorRT/tree/master/plugin.?

yes.