Getting up to the Jetson inference performance

Hi everyone,

I’m posting this topic because I’m working on making my own object detection engine like in the Jetson-inference repo. For the moment, I’ve used the TF-TRT API because I was stuck with a TRT error on the PB->UFF->TRT workflow and that’s why I’m here. In fact, I did a lot of test with this repo’s engine, and with the Xavier, some models can go beyond 150 or even 200 FPS. With the TF-TRT API i’m “only” able to get my model to 75 FPS which is already good but I know I can do better so why not ! I have successfully converted my pb file to UFF but when i’m trying to convert my UFF file to TRT, I always have this error :

[TensorRT] ERROR: UffParser: Graph error: Cycle graph detected

[TensorRT] ERROR: Network must have at least one output

I’ve been looking a lot on internet on what is this cycle graph detected error and how to solve it but nothing has worked so far so I hope that someone here can help me to find a way to resolve this error.

Best regards

Ps : I’m using TF 1.15.2 with protobuf 3.8.0 compiled from C++ source

Hi,

When building an engine, TensorRT first find a shortest path from input layer to output layer.
Cycle graph indicates that the path contains a cycle.

Would you mind to attach the graph data from Tensorboard for your model.
We can give a more suggestion if has the graph information

Thanks.

Not sure if this is what you wanted but I’m not familiar with tensorboard… Tell me if it’s not

Hi,

Could you share the output layer name?
Input layer should be image_tensor, is it correct?

Thanks

Hi,

The output layer is the NMS layer

Hi ! I know you have a lot of stuff to do but I just wanted to know if you had any new information about my TRT Error ?

Thanks again

Hi,

Sorry to keep you waiting.

We found this issue might be related to the cuDNN version.
Do you convert your model into uff and TensorRT engine both on the Jetson?

If not, it’s recommended to do this for avoiding the compatibility issue.
Thanks.

Hi,

Thanks for your reply and no problem for the delay. Yes I’m doing all my work stuff on my Xavier so there should be no compatibility problem. I’m on Jetpack 4.3 with TF 1.15.2 just in case you know some compatibility issue between these.

Thanks again !

Hi,

Would you mind share the .pb file for us checking?
Thanks.

Hi, no problem. Is it possible to share it only with you ?

Is there any platform you prefer for sharing the pb file since it is not possible here ? Thanks again !

Hi,

You can pass the link through message so it won’t be public.
Thanks.

Hi,

Sorry for keeping you waiting.

We can convert your model into TensorRT engine with following steps and config.py file.

$ sudo python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py [your/pb/file] -o test.uff -O NMS -p config.py
$ /usr/src/tensorrt/bin/trtexec --uff=test.uff --uffInput=Input,3,300,300 --output=NMS

config.py

import graphsurgeon as gs
import tensorflow as tf
import numpy as np

Input = gs.create_node("Input",
    op="Placeholder",
    dtype=tf.float32,
    shape=[1, 3, 300, 300])
PriorBox = gs.create_plugin_node(name="GridAnchor", op="GridAnchor_TRT",
    numLayers=6,
    minSize=0.2,
    maxSize=0.95,
    aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
    variance=[0.1,0.1,0.2,0.2],
    featureMapShapes=[19, 10, 5, 3, 2, 1])
NMS = gs.create_plugin_node(name="NMS", op="NMS_TRT",
    shareLocation=1,
    varianceEncodedInTarget=0,
    backgroundLabelId=0,
    confidenceThreshold=1e-8,
    nmsThreshold=0.6,
    topK=100,
    keepTopK=100,
    numClasses=3,
    inputOrder= [0, 2, 1],
    confSigmoid=1,
    isNormalized=1)
concat_priorbox = gs.create_node(name="concat_priorbox", op="ConcatV2", dtype=tf.float32, axis=2)
concat_box_loc = gs.create_plugin_node("concat_box_loc", op="FlattenConcat_TRT", dtype=tf.float32, axis=1, ignoreBatch=0)
concat_box_conf = gs.create_plugin_node("concat_box_conf", op="FlattenConcat_TRT", dtype=tf.float32, axis=1, ignoreBatch=0)
dummy_const = gs.create_node(name="dummy_const", op="Const", dtype=tf.float32, value=np.array([1, 1], dtype=np.float32))

namespace_plugin_map = {
    "Concatenate": concat_priorbox,
    "MultipleGridAnchorGenerator": PriorBox,
    "Postprocessor": NMS,
    "image_tensor": Input,
    "Cast": Input,
    "ToFloat": Input,
    "Preprocessor": Input,
    "concat": concat_box_loc,
    "concat_1": concat_box_conf
}

namespace_remove = {
    "ToFloat",
    "image_tensor",
    "Preprocessor/map/TensorArrayStack_1/TensorArrayGatherV3",
}

def preprocess(dynamic_graph):
    dynamic_graph.remove(dynamic_graph.find_nodes_by_path(namespace_remove), remove_exclusive_dependencies=False)
    # Now create a new graph by collapsing namespaces
    dynamic_graph.collapse_namespaces(namespace_plugin_map)
    # Remove the outputs, so we just have a single output node (NMS).
    dynamic_graph.remove(dynamic_graph.graph_outputs, remove_exclusive_dependencies=False)
    dynamic_graph.append(dummy_const)
    dynamic_graph.find_nodes_by_op("GridAnchor_TRT")[0].input.append("dummy_const")

Thanks.

1 Like

Thanks for your reply, and no problem for the delay. I’m successfully converting my model to uff but when i’m trying to use the trtexec program I’m having this error :

[05/16/2020-13:23:13] [E] [TRT] UffParser: Validator error: TRTEngineOp_0: Unsupported operation _TRTEngineOp
[05/16/2020-13:23:13] [E] Failed to parse uff file
[05/16/2020-13:23:13] [E] Parsing model failed
[05/16/2020-13:23:13] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --uff=test.uff --uffInput=Input,3,300,300 --output=NMS

Any idea on why this is happening ? Is it because I’m running this on the Xavier ? Will try to find out why while awaiting for your answer, thanks again !

Edit: It seems like my error is here because I’m using a tf-trt model. But I also tried with the non tf-trt version of my model and the error is now :

[TRT] UffParser: Validator error: FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNormV3: Unsupported operation _FusedBatchNormV3

EDIT 2 : I’ve seen that The FusedBatchNormV3 operation is introduced in TensorFlow version 1.15.0 and is not supported by the TensorRT 6 UFF parser. As suggested on the internet, if the operation is replaced with FusedBatchNorm using Graph Surgeon, as explained here is working for me. If I’m successfully using my trt engine with good fps, i will finally close this issue. Will keep you updated, thanks again ! (The only way for me for the moment to convert uff to trt is by using this script :

“”“build_engine.py
This script converts a SSD model (pb) to UFF and subsequently builds
the TensorRT engine.
Input : ssd_mobilenet_v[1|2][coco|egohands].pb
Output: TRT_ssd_mobilenet_v[1|2]
[coco|egohands].bin
“””
import os
import ctypes
import argparse
import uff
import tensorrt as trt
import graphsurgeon as gs
import tensorflow as tf
DIR_NAME = os.path.dirname(file)
LIB_FILE = os.path.abspath(os.path.join(DIR_NAME, ‘libflattenconcat.so’))
MODEL_SPECS = {
‘ssd_mobilenet_v1_coco’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v1_coco.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v1_coco.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v1_coco.bin’)),
‘num_classes’: 91,
‘min_size’: 0.2,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘ssd_mobilenet_v1_egohands’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v1_egohands.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v1_egohands.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v1_egohands.bin’)),
‘num_classes’: 2,
‘min_size’: 0.05,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘ssd_mobilenet_v2_coco’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v2_coco.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v2_coco.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v2_coco.bin’)),
‘num_classes’: 91,
‘min_size’: 0.2,
‘max_size’: 0.95,
‘input_order’: [1, 0, 2], # order of loc_data, conf_data, priorbox_data
},
‘ssd_mobilenet_v2_egohands’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v2_egohands.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v2_egohands.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v2_egohands.bin’)),
‘num_classes’: 2,
‘min_size’: 0.05,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘yolov3_coco’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘yolov3_coco.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘yolov3_coco.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_yolov3_coco.bin’)),
‘num_classes’: 80,
‘min_size’: 0.05,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘ssd_inception_v2_boats’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘/media/nxavier/xavier_ssd/training_boat/hand-detection-tutorial/frozen_inference_graph_v3_haze.pb’)),#’/media/nxavier/xavier_ssd/training_boat/hand-detection-tutorial/model_exported/frozen_inference_graph.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v3_boats.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_inception_v3_boats.bin’)),
‘num_classes’: 2,
‘min_size’: 0.15,
‘max_size’: 0.9,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
}
INPUT_DIMS2 = (3,300,300)
INPUT_DIMS = [3, 300, 300]
DEBUG_UFF = False
def add_plugin(graph, model, spec):
“”“add_plugin
Reference:
1. https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v1_coco_2018_01_28.py
2. https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
3. https://devtalk.nvidia.com/default/topic/1050465/jetson-nano/how-to-write-config-py-for-converting-ssd-mobilenetv2-to-uff-format/post/5333033/#5333033
“””
numClasses = spec[‘num_classes’]
minSize = spec[‘min_size’]
maxSize = spec[‘max_size’]
inputOrder = spec[‘input_order’]
all_assert_nodes = graph.find_nodes_by_op(“Assert”)
graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)
all_identity_nodes = graph.find_nodes_by_op(“Identity”)
graph.forward_inputs(all_identity_nodes)
Input = gs.create_plugin_node(
name=“Input”,
op=“Placeholder”,
shape=(1,) + INPUT_DIMS2
)
PriorBox = gs.create_plugin_node(
name=“MultipleGridAnchorGenerator”,
op=“GridAnchor_TRT”,
minSize=minSize, # was 0.2
maxSize=maxSize, # was 0.95
aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
variance=[0.1, 0.1, 0.2, 0.2],
featureMapShapes=[19, 10, 5, 3, 2, 1],
numLayers=6
)
NMS = gs.create_plugin_node(name=“NMS”, op=“NMS_TRT”,
shareLocation=1,
varianceEncodedInTarget=0,
backgroundLabelId=0,
confidenceThreshold=1e-8,
nmsThreshold=0.6,
topK=100,
keepTopK=100,
numClasses=3,
inputOrder= [0, 2, 1],
confSigmoid=1,
isNormalized=1
)
concat_priorbox = gs.create_node(
“concat_priorbox”,
op=“ConcatV2”,
axis=2
)
if trt.version[0] >= ‘7’:
concat_box_loc = gs.create_plugin_node(
“concat_box_loc”,
op=“FlattenConcat_TRT”,
axis=1,
ignoreBatch=0
)
concat_box_conf = gs.create_plugin_node(
“concat_box_conf”,
op=“FlattenConcat_TRT”,
axis=1,
ignoreBatch=0
)
else:
concat_box_loc = gs.create_plugin_node(
“concat_box_loc”,
op=“FlattenConcat_TRT”
)
concat_box_conf = gs.create_plugin_node(
“concat_box_conf”,
op=“FlattenConcat_TRT”
)
namespace_plugin_map = {
“MultipleGridAnchorGenerator”: PriorBox,
“Postprocessor”: NMS,
“Preprocessor”: Input,
“ToFloat”: Input,
“image_tensor”: Input,
“MultipleGridAnchorGenerator/Concatenate”: concat_priorbox, # for ‘ssd_mobilenet_v1_coco’
#“MultipleGridAnchorGenerator/Identity”: concat_priorbox,
“Concatenate”: concat_priorbox, # for other models
“concat”: concat_box_loc,
“concat_1”: concat_box_conf
}
graph.collapse_namespaces(namespace_plugin_map)
graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
graph.find_nodes_by_op(“NMS_TRT”)[0].input.remove(“Input”)
if model == ‘ssd_mobilenet_v1_coco’:
graph.find_nodes_by_name(“Input”)[0].input.remove(“image_tensor:0”)
return graph
def replace_fusedbnv3(graph):
“”“Replace all ‘FusedBatchNormV3’ in the graph with ‘FusedBatchNorm’.
NOTE: ‘FusedBatchNormV3’ is not supported by UFF parser.
https://devtalk.nvidia.com/default/topic/1066445/tensorrt/tensorrt-6-0-1-tensorflow-1-14-no-conversion-function-registered-for-layer-fusedbatchnormv3-yet/post/5403567/#5403567
“””
for node in graph.find_nodes_by_op(‘FusedBatchNormV3’):
gs.update_node(node, op=‘FusedBatchNorm’)
return graph
def main():
parser = argparse.ArgumentParser()
parser.add_argument(‘model’, type=str, default = “ssd_inception_v2_boats”,choices=list(MODEL_SPECS.keys()))
args = parser.parse_args()
# initialize
if trt.version[0] < ‘7’:
ctypes.CDLL(LIB_FILE)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, ‘’)
# compile the model into TensorRT engine
model = args.model
spec = MODEL_SPECS[model]
dynamic_graph = add_plugin(
gs.DynamicGraph(spec[‘input_pb’]),
model,
spec)
dynamic_graph = replace_fusedbnv3(dynamic_graph)
_ = uff.from_tensorflow(
dynamic_graph.as_graph_def(),
[‘NMS’],
output_filename=spec[‘tmp_uff’],
text=True,
debug_mode=DEBUG_UFF)
#G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
builder.max_workspace_size = 1 << 28
builder.max_batch_size = 1
builder.fp16_mode = True
parser.register_input(‘Input’, INPUT_DIMS)
parser.register_output(‘MarkOutput_0’)
parser.parse(spec[‘tmp_uff’], network)
#last_layer = network.get_layer(network.num_layers)
#print(last_layer)
#network.mark_output(last_layer.get_output(0))
#engine=trt.utils.uff_to_trt_engine(TRT_LOGGER,_,parser,4,1<<30)
engine = builder.build_cuda_engine(network)
buf = engine.serialize()
with open(spec[‘output_bin’], ‘wb’) as f:
f.write(buf)
if name == ‘main’:
main()

EDIT 3 (hope to be the last ) : I’m now facing this error :

[TRT] Assertion failed: mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3

I’m having this issue when launching the command : /usr/src/tensorrt/bin/trtexec --loadEngine=TRT_ssd_inception_v2_boats.bin

(All the errors listed above also appears if I’m using TF 1.14.0 with a model trained with it ( tried because I saw that my version of UFF was tested with tf 1.14.0)

Is it bad ?

Ps: For you, what is the best architecture in terms of versions for Tensorflow, uff etc … if I wish to work with the workflow PB–> UFF --> TRT

Hi AastaLLL,

I know this thread is very long and is asking a lot of work for you… But I’m since my last post I’m struggeling a lot at going through all these issues. I’m still not able to use an engine as Nvidia is doing in Jetson-Inference repo and it is still my highest objectiv to master this PB->UFF>Engine workflow. Could you tell me if you converted my model using a Xavier ? If yes with which config please ?

Sorry again and thanks a lot.