DeepStream, Tensorflow Model Zoo - Incompatibility

I am having difficulties being able to train on the Tensorflow Object Detection API and deploy directly to DeepStream due to the input data type of Tensorflow’s models.

Jetson TX1
DeepStream 5.0
JetPack 4.4
TensorRT 7

**• Issue Type:
Compatibility between Tensorflow 2.0 model zoo and DeepStream.

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

  1. Download a Model From Here.
    https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

  2. Train Model or use Pre-Trained

  3. Convert to ONNX using tf2onnx

  4. Load model into trtexec.

Unsupported ONNX data type: UINT8
ERROR: input_tensor:0:188 in function importInput:

[E] Engine set up failed

Hi,

ONNX uses INT8 format for the image input from a certain opset version.
However, you will need to use the floating buffer as the TensorRT input.
(Deepstream uses TensorRT as backend inference engine)

To handle this incompatibility, please modify the model with ONNX graphsurgeon API.
Please check the following comment for the detailed steps:

Thanks.

Thank you. Triton with Deepstream is also a viable option?

I followed your instruction on ONNX-Graphsurgeon. This is my output on trtexec.

[01/07/2021-14:03:44] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/07/2021-14:03:44] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[01/07/2021-14:03:44] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: NonMaxSuppression. Attempting to import as plugin.
[01/07/2021-14:03:44] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: NonMaxSuppression, plugin_version: 1, plugin_namespace:
[01/07/2021-14:03:44] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1
ERROR: builtin_op_importers.cpp:3661 In function importFallbackPluginImporter:
[8] Assertion failed: creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/07/2021-14:03:44] [E] Failed to parse onnx file
[01/07/2021-14:03:44] [E] Parsing model failed
[01/07/2021-14:03:44] [E] Engine creation failed
[01/07/2021-14:03:44] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # ./trtexec --onnx=/media/AF68-D504/Tf2TRT/SSD_trained_alpha_trt/updated_SSD_tf2.onnx --saveEngine=/media/AF68-D504/Tf2TRT/SSD_train_alpha_trt/engine.trt

Error for Faster RCNN after onnx-graphsurgeon

[01/07/2021-14:12:51] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/07/2021-14:12:51] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[01/07/2021-14:12:51] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
ERROR: builtin_op_importers.cpp:1554 In function importIf:
[8] Assertion failed: cond.is_weights() && cond.weights().count() == 1 && “If condition must be a initializer!”
[01/07/2021-14:12:51] [E] Failed to parse onnx file
[01/07/2021-14:12:51] [E] Parsing model failed
[01/07/2021-14:12:51] [E] Engine creation failed
[01/07/2021-14:12:51] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # ./trtexec --onnx=/media/AF68-D504/Tf2TRT/RCNN_trained_alpha_trt/updated_RCNN.onnx --saveEngine=/media/AF68-D504/Tf2TRT/RCNN_trained_alpha_trt/engine.trt

Hi,

The log for updated_SSD_tf2.onnx indicates a NonMaxSuppression layer doesn’t be supported by the TensorRT.
There is some discussion for this layer, and it needs to implement as a plugin layer.

And the later error seems to hit the below validation.
Some limitations in the onnx parser when using ‘if’ layer.

Please noted that TensorRT supports not all the layers used in TensorFlow.
You can check our documentation for the latest support matrix:
https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html

Thanks.

@AastaLLL

I appreciate your reply. I have done some digging and found some helpful notebooks and examples on what kinds of TensorFlow models are supported. I am working through using this notebook as a place to begin understanding how to handle this problem.

[https://colab.research.google.com/drive/10ah6t0I2-MV_3uPqw6J_WhMHlfLflrr8]//Notebook mentioned

Particulary the section on replacing nodes using graphsurgeon so that the network can be parsed with the NMS plugin.

I fully intend on making a forum post of the resource I have found on getting TensorFlow models ported to TensorRT.

@AastaLLL Do you know which APIs of TensorFlow will support the operations found in these links:

[https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleUffSSD#tensorrt-api-layers-and-ops]
Open Source Sample SSD Uff

[https://colab.research.google.com/drive/10ah6t0I2-MV_3uPqw6J_WhMHlfLflrr8]
Notebook mentioned

As in, which version of TensorFlow would I have to checkout to be able to use? I found your github repository: https://github.com/AastaNV/TRT_object_detection

I have looked at some of the recent TensorFlow model configs and found that they are not too different from the 2018 SSD_Mobilenet that is used in the colab notebook.

I think that if anyone or someone could figure out how to use graphsurgeon like this (from colab notebook):

import ctypes
import uff
import tensorrt as trt
import graphsurgeon as gs
import pycuda.driver as cuda
import pycuda.autoinit

ctypes.CDLL("build/libflattenconcat.so")

# Preprocess function to convert TF model to UFF
def ssd_mobilenet_v2_unsupported_nodes_to_plugin_nodes(ssd_graph, input_shape):
    """Makes ssd_graph TensorRT comparible using graphsurgeon.

    This function takes ssd_graph, which contains graphsurgeon
    DynamicGraph data structure. This structure describes frozen Tensorflow
    graph, that can be modified using graphsurgeon (by deleting, adding,
    replacing certain nodes). The graph is modified by removing
    Tensorflow operations that are not supported by TensorRT's UffParser
    and replacing them with custom layer plugin nodes.

    Note: This specific implementation works only for
    ssd_mobilenet_v2_coco_2018_03_29 network.

    Args:
        ssd_graph (gs.DynamicGraph): graph to convert
        input_shape: input shape in CHW format
    Returns:
        gs.DynamicGraph: UffParser compatible SSD graph
    """

    channels, height, width = input_shape

    Input = gs.create_plugin_node(name="Input",
        op="Placeholder",
        dtype=tf.float32,
        shape=[1, channels, height, width])
    PriorBox = gs.create_plugin_node(name="GridAnchor", op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1],
        numLayers=6
    )
    NMS = gs.create_plugin_node(
        name="NMS",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=91,
        inputOrder=[1, 0, 2],
        confSigmoid=1,
        isNormalized=1
    )
    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        dtype=tf.float32,
        axis=2
    )
    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )
    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )

    # Create a mapping of namespace names -> plugin nodes.
    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": NMS,
        "Preprocessor/map": Input,
        "ToFloat": Input,
        # "image_tensor": Input,
        "Concatenate": concat_priorbox,
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }
    for node in ssd_graph.graph_inputs:
        namespace_plugin_map[node.name] = Input

    # Create a new graph by collapsing namespaces
    ssd_graph.collapse_namespaces(namespace_plugin_map)
    # Remove the outputs, so we just have a single output node (NMS).
    # If remove_exclusive_dependencies is True, the whole graph will be removed!
    ssd_graph.remove(ssd_graph.graph_outputs, remove_exclusive_dependencies=False)
    # Disconnect the Input node from NMS, as it expects to have only 3 inputs.
    ssd_graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
    
    return ssd_graph

  
# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()
      
      
def allocate_buffers(engine):
    """Allocates host and device buffer for TRT engine inference.

    This function is similair to the one in ../../common.py, but
    converts network outputs (which are np.float32) appropriately
    before writing them to Python buffer. This is needed, since
    TensorRT plugins doesn't support output type description, and
    in our particular case, we use NMS plugin as network output.

    Args:
        engine (trt.ICudaEngine): TensorRT engine

    Returns:
        inputs [HostDeviceMem]: engine input memory
        outputs [HostDeviceMem]: engine output memory
        bindings [int]: buffer to device bindings
        stream (cuda.Stream): cuda stream for engine inference synchronization
    """
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()

    # Current NMS implementation in TRT only supports DataType.FLOAT but
    # it may change in the future, which could brake this sample here
    # when using lower precision [e.g. NMS output would not be np.float32
    # anymore, even though this is assumed in binding_to_type]
    binding_to_type = {"Input": np.float32, "NMS": np.float32, "NMS_1": np.int32}

    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = binding_to_type[str(binding)]
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    return inputs, outputs, bindings, stream
  
# Export UFF model file
ssd_mobilenet_v2_pb_path = "ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb"
output_uff_filename = "ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.uff"
input_shape = (3, 300, 300)

dynamic_graph = gs.DynamicGraph(ssd_mobilenet_v2_pb_path)
dynamic_graph = ssd_mobilenet_v2_unsupported_nodes_to_plugin_nodes(dynamic_graph, input_shape)

uff.from_tensorflow(dynamic_graph.as_graph_def(), output_nodes=["NMS"], output_filename=output_uff_filename)

If someone could figure out how to do this on the recent Tensorflow Model Zoo, and then make it available for a handful of networks then there would be huge progress in opening up compatibility between TensorFlow 2.

Main issue is how would I get TensorFlow 2 saved_model.pb to a graph?!

Anyway, thanks for the response @AastaLLL. Looking forward to what kind of information or help you can give!

Hi,

If you use TensorFlow 1.15, the workflow usually looks like this: .pb -> .uff -> .engine
So you can find the supported operation in the section ‘TensorFlow’ below:

If you use the TensorFlow 2.x, you will need to go through .pb -> .onnx -> .engine
Since uff parser doesn’t support TF-v2.x.

Then you can find the .pb -> .onnx supported matrix here.
And .onnx -> .engine supported matrix in the ‘ONNX’ section below:

Thanks.

@AastaLLL

Thank you very much. It appears that there are a lot of options for compatibility between Tensorflow and TensorRT. I am searching and searching for beginner friendly ways training TensorFlow 2 models trained on the TensorFlow 2 API then deploying them to TensorRT. Your responses are helpful.

in Tensorflow 1 :
i.e. https://jkjung-avt.github.io/ for uff
(he appears to use 1.12 or 1.8)

Mr. Jung is kind enough to give us code showing how to use graphsurgeon to modify models to be compatible with the engine generator for TensorRt. Are there any samples of using onnx-graphsurgeon to modify ops of a Tensorflow 2 model as onnx?

For other ways of compatability I have read that Deepstream Triton can load and infer on .pb model files, does that use the same engine creator as trtexec and onnx2trt? Or is a pre-trained Tensorflow 2 model deployable on Triton directly from .pb? Select model -> Retrain on Tensorflow 2 API -> DeepStream-Triton. When I read the forum post on Triton it suggested that the TensorFlow ops are compatible in Triton. I have had some errors attempting to retraining and deploy the example model mentioned in the forum post on Triton.

For training and deploying Tensorflow 2 right now I am looking at:
Monk Object Detection API
Easing up the Process of Tensorflow 2.0 Object Detection API and TensorRT

The monk author appears to discovered how to properly train TensorFlow 1 and 2 models for deployment to RT. I am still exploring if Monk Object Detection is a viable path though.

Regardless of the issues I have I am satisified with my experience in learning to develop by using nvidia’s products. It has been a humbling, steep, and difficult learning curve, albeit rewarding with knowledge. I am very happy to learn this kind of software engineering this way. I look forward to nvidia and others creating more compatibility in the future.

Lastly, I want to say that I am grateful that NVIDIA offers so many tools to be able to do computer vision and offer this forum as a place for discussion.