Getting up to the Jetson inference performance

marconi.k · May 19, 2020, 7:13am

Hi everyone,

I’m posting this topic because I’m working on making my own object detection engine like in the Jetson-inference repo. For the moment, I’ve used the TF-TRT API because I was stuck with a TRT error on the PB->UFF->TRT workflow and that’s why I’m here. In fact, I did a lot of test with this repo’s engine, and with the Xavier, some models can go beyond 150 or even 200 FPS. With the TF-TRT API i’m “only” able to get my model to 75 FPS which is already good but I know I can do better so why not ! I have successfully converted my pb file to UFF but when i’m trying to convert my UFF file to TRT, I always have this error :

[TensorRT] ERROR: UffParser: Graph error: Cycle graph detected

[TensorRT] ERROR: Network must have at least one output

I’ve been looking a lot on internet on what is this cycle graph detected error and how to solve it but nothing has worked so far so I hope that someone here can help me to find a way to resolve this error.

Best regards

Ps : I’m using TF 1.15.2 with protobuf 3.8.0 compiled from C++ source

AastaLLL · May 19, 2020, 8:23am

Hi,

When building an engine, TensorRT first find a shortest path from input layer to output layer.
Cycle graph indicates that the path contains a cycle.

Would you mind to attach the graph data from Tensorboard for your model.
We can give a more suggestion if has the graph information

Thanks.

marconi.k · May 19, 2020, 9:47am

Not sure if this is what you wanted but I’m not familiar with tensorboard… Tell me if it’s not

AastaLLL · May 20, 2020, 2:58am

Hi,

Could you share the output layer name?
Input layer should be image_tensor, is it correct?

Thanks

marconi.k · May 20, 2020, 8:44am

Hi,

The output layer is the NMS layer

marconi.k · May 25, 2020, 9:53am

Hi ! I know you have a lot of stuff to do but I just wanted to know if you had any new information about my TRT Error ?

Thanks again

AastaLLL · June 3, 2020, 7:00am

Hi,

Sorry to keep you waiting.

We found this issue might be related to the cuDNN version.
Do you convert your model into uff and TensorRT engine both on the Jetson?

If not, it’s recommended to do this for avoiding the compatibility issue.
Thanks.

marconi.k · June 3, 2020, 12:36pm

Hi,

Thanks for your reply and no problem for the delay. Yes I’m doing all my work stuff on my Xavier so there should be no compatibility problem. I’m on Jetpack 4.3 with TF 1.15.2 just in case you know some compatibility issue between these.

Thanks again !

AastaLLL · June 4, 2020, 4:40am

Hi,

Would you mind share the .pb file for us checking?
Thanks.

marconi.k · June 4, 2020, 5:31am

Hi, no problem. Is it possible to share it only with you ?

marconi.k · June 4, 2020, 5:47am

Is there any platform you prefer for sharing the pb file since it is not possible here ? Thanks again !

AastaLLL · June 5, 2020, 2:43am

Hi,

You can pass the link through message so it won’t be public.
Thanks.

AastaLLL · June 16, 2020, 8:43am

Hi,

Sorry for keeping you waiting.

We can convert your model into TensorRT engine with following steps and config.py file.

$ sudo python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py [your/pb/file] -o test.uff -O NMS -p config.py
$ /usr/src/tensorrt/bin/trtexec --uff=test.uff --uffInput=Input,3,300,300 --output=NMS

config.py

import graphsurgeon as gs
import tensorflow as tf
import numpy as np

Input = gs.create_node("Input",
    op="Placeholder",
    dtype=tf.float32,
    shape=[1, 3, 300, 300])
PriorBox = gs.create_plugin_node(name="GridAnchor", op="GridAnchor_TRT",
    numLayers=6,
    minSize=0.2,
    maxSize=0.95,
    aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
    variance=[0.1,0.1,0.2,0.2],
    featureMapShapes=[19, 10, 5, 3, 2, 1])
NMS = gs.create_plugin_node(name="NMS", op="NMS_TRT",
    shareLocation=1,
    varianceEncodedInTarget=0,
    backgroundLabelId=0,
    confidenceThreshold=1e-8,
    nmsThreshold=0.6,
    topK=100,
    keepTopK=100,
    numClasses=3,
    inputOrder= [0, 2, 1],
    confSigmoid=1,
    isNormalized=1)
concat_priorbox = gs.create_node(name="concat_priorbox", op="ConcatV2", dtype=tf.float32, axis=2)
concat_box_loc = gs.create_plugin_node("concat_box_loc", op="FlattenConcat_TRT", dtype=tf.float32, axis=1, ignoreBatch=0)
concat_box_conf = gs.create_plugin_node("concat_box_conf", op="FlattenConcat_TRT", dtype=tf.float32, axis=1, ignoreBatch=0)
dummy_const = gs.create_node(name="dummy_const", op="Const", dtype=tf.float32, value=np.array([1, 1], dtype=np.float32))

namespace_plugin_map = {
    "Concatenate": concat_priorbox,
    "MultipleGridAnchorGenerator": PriorBox,
    "Postprocessor": NMS,
    "image_tensor": Input,
    "Cast": Input,
    "ToFloat": Input,
    "Preprocessor": Input,
    "concat": concat_box_loc,
    "concat_1": concat_box_conf
}

namespace_remove = {
    "ToFloat",
    "image_tensor",
    "Preprocessor/map/TensorArrayStack_1/TensorArrayGatherV3",
}

def preprocess(dynamic_graph):
    dynamic_graph.remove(dynamic_graph.find_nodes_by_path(namespace_remove), remove_exclusive_dependencies=False)
    # Now create a new graph by collapsing namespaces
    dynamic_graph.collapse_namespaces(namespace_plugin_map)
    # Remove the outputs, so we just have a single output node (NMS).
    dynamic_graph.remove(dynamic_graph.graph_outputs, remove_exclusive_dependencies=False)
    dynamic_graph.append(dummy_const)
    dynamic_graph.find_nodes_by_op("GridAnchor_TRT")[0].input.append("dummy_const")

Thanks.

marconi.k · June 16, 2020, 11:24am

Thanks for your reply, and no problem for the delay. I’m successfully converting my model to uff but when i’m trying to use the trtexec program I’m having this error :

[05/16/2020-13:23:13] [E] [TRT] UffParser: Validator error: TRTEngineOp_0: Unsupported operation _TRTEngineOp
[05/16/2020-13:23:13] [E] Failed to parse uff file
[05/16/2020-13:23:13] [E] Parsing model failed
[05/16/2020-13:23:13] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --uff=test.uff --uffInput=Input,3,300,300 --output=NMS

Any idea on why this is happening ? Is it because I’m running this on the Xavier ? Will try to find out why while awaiting for your answer, thanks again !

Edit: It seems like my error is here because I’m using a tf-trt model. But I also tried with the non tf-trt version of my model and the error is now :

[TRT] UffParser: Validator error: FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNormV3: Unsupported operation _FusedBatchNormV3

EDIT 2 : I’ve seen that The FusedBatchNormV3 operation is introduced in TensorFlow version 1.15.0 and is not supported by the TensorRT 6 UFF parser. As suggested on the internet, if the operation is replaced with FusedBatchNorm using Graph Surgeon, as explained here is working for me. If I’m successfully using my trt engine with good fps, i will finally close this issue. Will keep you updated, thanks again ! (The only way for me for the moment to convert uff to trt is by using this script :

“”“build_engine.py
This script converts a SSD model (pb) to UFF and subsequently builds
the TensorRT engine.
Input : ssd_mobilenet_v[1|2][coco|egohands].pb
Output: TRT_ssd_mobilenet_v[1|2][coco|egohands].bin
“””
import os
import ctypes
import argparse
import uff
import tensorrt as trt
import graphsurgeon as gs
import tensorflow as tf
DIR_NAME = os.path.dirname(file)
LIB_FILE = os.path.abspath(os.path.join(DIR_NAME, ‘libflattenconcat.so’))
MODEL_SPECS = {
‘ssd_mobilenet_v1_coco’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v1_coco.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v1_coco.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v1_coco.bin’)),
‘num_classes’: 91,
‘min_size’: 0.2,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘ssd_mobilenet_v1_egohands’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v1_egohands.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v1_egohands.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v1_egohands.bin’)),
‘num_classes’: 2,
‘min_size’: 0.05,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘ssd_mobilenet_v2_coco’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v2_coco.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v2_coco.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v2_coco.bin’)),
‘num_classes’: 91,
‘min_size’: 0.2,
‘max_size’: 0.95,
‘input_order’: [1, 0, 2], # order of loc_data, conf_data, priorbox_data
},
‘ssd_mobilenet_v2_egohands’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘ssd_mobilenet_v2_egohands.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v2_egohands.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_mobilenet_v2_egohands.bin’)),
‘num_classes’: 2,
‘min_size’: 0.05,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘yolov3_coco’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘yolov3_coco.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘yolov3_coco.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_yolov3_coco.bin’)),
‘num_classes’: 80,
‘min_size’: 0.05,
‘max_size’: 0.95,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
‘ssd_inception_v2_boats’: {
‘input_pb’: os.path.abspath(os.path.join(
DIR_NAME, ‘/media/nxavier/xavier_ssd/training_boat/hand-detection-tutorial/frozen_inference_graph_v3_haze.pb’)),#‘/media/nxavier/xavier_ssd/training_boat/hand-detection-tutorial/model_exported/frozen_inference_graph.pb’)),
‘tmp_uff’: os.path.abspath(os.path.join(
DIR_NAME, ‘tmp_v3_boats.uff’)),
‘output_bin’: os.path.abspath(os.path.join(
DIR_NAME, ‘TRT_ssd_inception_v3_boats.bin’)),
‘num_classes’: 2,
‘min_size’: 0.15,
‘max_size’: 0.9,
‘input_order’: [0, 2, 1], # order of loc_data, conf_data, priorbox_data
},
}
INPUT_DIMS2 = (3,300,300)
INPUT_DIMS = [3, 300, 300]
DEBUG_UFF = False
def add_plugin(graph, model, spec):
“”“add_plugin
Reference:
1. https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v1_coco_2018_01_28.py
2. https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
3. how to write config.py for converting ssd-mobilenetv2 to uff format - Jetson Nano - NVIDIA Developer Forums
“””
numClasses = spec[‘num_classes’]
minSize = spec[‘min_size’]
maxSize = spec[‘max_size’]
inputOrder = spec[‘input_order’]
all_assert_nodes = graph.find_nodes_by_op(“Assert”)
graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)
all_identity_nodes = graph.find_nodes_by_op(“Identity”)
graph.forward_inputs(all_identity_nodes)
Input = gs.create_plugin_node(
name=“Input”,
op=“Placeholder”,
shape=(1,) + INPUT_DIMS2
)
PriorBox = gs.create_plugin_node(
name=“MultipleGridAnchorGenerator”,
op=“GridAnchor_TRT”,
minSize=minSize, # was 0.2
maxSize=maxSize, # was 0.95
aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
variance=[0.1, 0.1, 0.2, 0.2],
featureMapShapes=[19, 10, 5, 3, 2, 1],
numLayers=6
)
NMS = gs.create_plugin_node(name=“NMS”, op=“NMS_TRT”,
shareLocation=1,
varianceEncodedInTarget=0,
backgroundLabelId=0,
confidenceThreshold=1e-8,
nmsThreshold=0.6,
topK=100,
keepTopK=100,
numClasses=3,
inputOrder= [0, 2, 1],
confSigmoid=1,
isNormalized=1
)
concat_priorbox = gs.create_node(
“concat_priorbox”,
op=“ConcatV2”,
axis=2
)
if trt.version[0] >= ‘7’:
concat_box_loc = gs.create_plugin_node(
“concat_box_loc”,
op=“FlattenConcat_TRT”,
axis=1,
ignoreBatch=0
)
concat_box_conf = gs.create_plugin_node(
“concat_box_conf”,
op=“FlattenConcat_TRT”,
axis=1,
ignoreBatch=0
)
else:
concat_box_loc = gs.create_plugin_node(
“concat_box_loc”,
op=“FlattenConcat_TRT”
)
concat_box_conf = gs.create_plugin_node(
“concat_box_conf”,
op=“FlattenConcat_TRT”
)
namespace_plugin_map = {
“MultipleGridAnchorGenerator”: PriorBox,
“Postprocessor”: NMS,
“Preprocessor”: Input,
“ToFloat”: Input,
“image_tensor”: Input,
“MultipleGridAnchorGenerator/Concatenate”: concat_priorbox, # for ‘ssd_mobilenet_v1_coco’
#“MultipleGridAnchorGenerator/Identity”: concat_priorbox,
“Concatenate”: concat_priorbox, # for other models
“concat”: concat_box_loc,
“concat_1”: concat_box_conf
}
graph.collapse_namespaces(namespace_plugin_map)
graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
graph.find_nodes_by_op(“NMS_TRT”)[0].input.remove(“Input”)
if model == ‘ssd_mobilenet_v1_coco’:
graph.find_nodes_by_name(“Input”)[0].input.remove(“image_tensor:0”)
return graph
def replace_fusedbnv3(graph):
“”“Replace all ‘FusedBatchNormV3’ in the graph with ‘FusedBatchNorm’.
NOTE: ‘FusedBatchNormV3’ is not supported by UFF parser.
TensorRT 6.0.1 + TensorFlow 1.14 - No conversion function registered for layer: FusedBatchNormV3 yet - TensorRT - NVIDIA Developer Forums
“””
for node in graph.find_nodes_by_op(‘FusedBatchNormV3’):
gs.update_node(node, op=‘FusedBatchNorm’)
return graph
def main():
parser = argparse.ArgumentParser()
parser.add_argument(‘model’, type=str, default = “ssd_inception_v2_boats”,choices=list(MODEL_SPECS.keys()))
args = parser.parse_args()
# initialize
if trt.version[0] < ‘7’:
ctypes.CDLL(LIB_FILE)
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, ‘’)
# compile the model into TensorRT engine
model = args.model
spec = MODEL_SPECS[model]
dynamic_graph = add_plugin(
gs.DynamicGraph(spec[‘input_pb’]),
model,
spec)
dynamic_graph = replace_fusedbnv3(dynamic_graph)
_ = uff.from_tensorflow(
dynamic_graph.as_graph_def(),
[‘NMS’],
output_filename=spec[‘tmp_uff’],
text=True,
debug_mode=DEBUG_UFF)
#G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
builder.max_workspace_size = 1 << 28
builder.max_batch_size = 1
builder.fp16_mode = True
parser.register_input(‘Input’, INPUT_DIMS)
parser.register_output(‘MarkOutput_0’)
parser.parse(spec[‘tmp_uff’], network)
#last_layer = network.get_layer(network.num_layers)
#print(last_layer)
#network.mark_output(last_layer.get_output(0))
#engine=trt.utils.uff_to_trt_engine(TRT_LOGGER,_,parser,4,1<<30)
engine = builder.build_cuda_engine(network)
buf = engine.serialize()
with open(spec[‘output_bin’], ‘wb’) as f:
f.write(buf)
if name == ‘main’:
main()

EDIT 3 (hope to be the last ) : I’m now facing this error :

[TRT] Assertion failed: mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3

I’m having this issue when launching the command : /usr/src/tensorrt/bin/trtexec --loadEngine=TRT_ssd_inception_v2_boats.bin

(All the errors listed above also appears if I’m using TF 1.14.0 with a model trained with it ( tried because I saw that my version of UFF was tested with tf 1.14.0)

Is it bad ?

Ps: For you, what is the best architecture in terms of versions for Tensorflow, uff etc … if I wish to work with the workflow PB–> UFF → TRT

marconi.k · July 1, 2020, 8:57am

Hi AastaLLL,

I know this thread is very long and is asking a lot of work for you… But I’m since my last post I’m struggeling a lot at going through all these issues. I’m still not able to use an engine as Nvidia is doing in Jetson-Inference repo and it is still my highest objectiv to master this PB->UFF>Engine workflow. Could you tell me if you converted my model using a Xavier ? If yes with which config please ?

Sorry again and thanks a lot.

AastaLLL · July 15, 2020, 5:41am

Hi, marconi.k

Thanks for your patience.
Really sorry about the late. We are quite busy recently since there is a new release for Jetson.
Let me check this in detail and update more information with you late today.

Thanks.

marconi.k · July 15, 2020, 7:47am

Hi, again no problem :) I guess you have a lot of stuff to do other than here, that’s normal :) I’ll wait your update !

Thanks !

AastaLLL · July 15, 2020, 8:26am

Hi,

[TRT] Assertion failed: mConcatAxisID == 1 || mConcatAxisID == 2 || mConcatAxisID == 3

The root cause of this issue is that the model was serialized using the plugin from the python sample rather than C++.
Please update the plugin to use FlattenConcat.h, which means the uff file need to be generated with /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py

Would you mind to give the comment a try? (similar to this)

If the issue goes on, would you mind to share the model again?
Thanks.

marconi.k · August 3, 2020, 7:55am

Hi,

Sorry for the late reply, will give it a try a soon as I can and I will keep you updated. Thanks

Topic		Replies	Views
how to write config.py for converting ssd-mobilenetv2 to uff format Jetson Nano	19	6877	October 14, 2021
Problems with SSD Mobilenet v2 UFF Jetson Nano ssd	35	7934	October 18, 2021
Parsing GridAnchor[Op: _GridAnchor_TRT]. ... /protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_): TensorRT	30	9536	October 12, 2021
[ASK] How to make tensor RT engine from frozen graph tensor flow? Jetson Nano	14	2088	October 14, 2021
Problems about Unsupported operation _AddV2 in mobilenet tensorrt Jetson Nano tensorrt	11	1157	October 18, 2021
Convert SSD-Mobilenet to UFF Jetson Nano	13	1828	October 14, 2021
Deploy Object Detection TF-TRT INT8 with DS Triton DeepStream SDK inference-server-triton	16	1294	October 12, 2021
model accuracy penalty with tensorRT on jetson TX2 Jetson TX2	7	635	October 18, 2021
How to convert SSD mobilenet v2 to uff,Then use uff in jetson_inference detectnet_camera script？ Jetson Nano	13	2168	October 14, 2021
sampleUffSSD with custom ssd_mobilenet_v1 model TensorRT	37	4490	October 12, 2021

Getting up to the Jetson inference performance

Related topics