Dont see any speedups using TensorRT

rsandler00 · October 31, 2018, 6:21pm

Hi,

I tried benchmarking inference time on the Jetson TX2 using pure Tensorflow (TF) vs. Tensorflow converted to TensorRT (TRT) using the tensorflow.contrib.tensorrt library.

Here Here is the link to my public benchmarking gist. To benchmark, I have used the MobileNetv2 model, whose frozen TF graph can be downloaded from here.

The average classification time for a single 224x224 image over 100 trials was:
TF: 27ms
TRT: 31ms

My questions are:

Why is there no TensorRT speedup (in fact a slight slowdown)?
I know there is also the option to convert to TensorRT in pure C++ w/o/ python. Is this much faster than TensorRT in pyton? (I dont know C++ well, so would only want to pursue this if there were significant gains)

NVES · October 31, 2018, 8:32pm

Hello,

we are attempting to repro now. What version of jetpack are you running? The performance difference between python and C++ should be negligible.

NVES · October 31, 2018, 8:40pm

Also, just so we have the complete repro, can you share the data/panda.jpg?

rsandler00 · October 31, 2018, 8:50pm

Thank you!

panda.jpg is from here: https://upload.wikimedia.org/wikipedia/commons/f/fe/Giant_Panda_in_Beijing_Zoo_1.JPG

I believe I have Jetpack 3.3, however, I am not sure how to actually check that…

I am using tensorflow-gpu==1.9.0+nv18.8

rsandler00 · November 1, 2018, 6:37pm

@NVES,

I have made some progress in the issue.

First, I tried another *.pb graph from here and had the exact same issue (TF=137ms, TRT=147ms).

I also found that in both cases, if I tried to run the TRT session before the TF session, there would be a bug during inference saying: ‘No attr named ‘identical_element_shapes’ in NodeDef’ (see below for full error stack). This is the same bug recently reported here, where you said there was a bug w/ TF-TRT which was recently fixed in new release. Do you know how I can access this fix?

FULL ERROR:

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/Projects/SANATA/trt_test_SSDlite.py in <module>
     73             outputs=output_names,
     74             max_batch_size=1,
---> 75             precision_mode='FP16',  # 'INT8'/'FP16'
     76     )
     77

~/Projects/SANATA/.venv/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py in create_inference_graph(input_graph_def, outputs, max_batch_size, max_workspace_size_bytes, precision_mode, minimum_segment_size)
    113     # pylint: disable=protected-access
    114     raise _impl._make_specific_exception(None, None, ";".join(msg[1:]),
--> 115                                          int(msg[0]))
    116     # pylint: enable=protected-access
    117   output_graph_def = graph_pb2.GraphDef()

NotFoundError: No attr named 'identical_element_shapes' in NodeDef:
         [[Node: Preprocessor/map/TensorArray_2 = TensorArrayV3[clear_after_read=true, dtype=DT_INT32, dynamic_size=false, element_shape=<unknown>, tensor_array_name=""](Preprocessor/map/TensorArrayUnstack/strided_slice)]] for 'Preprocessor/map/TensorArray_2' (op: 'TensorArrayV3') with input shapes: [].

NVES · November 1, 2018, 9:19pm

Hello,

Performance would depend on how many ops were converted from TF to TRT. Recommend viewing the network on tensorboard and comparing the two .pb files (native TF and TF-TRT)

In case of the TF-TRT network, the portion of the graph being run on TRT will show up as a new node called my_trt_op_*. It looks like nothing is converted , hence no speedup.

You can try something like this to view the number of TRT ops in the converted TRT graph.

trt_engine_ops = len([1 for n in trt_graph.node if str(n.op)=='TRTEngineOp'])

I’m seeing trt_engine_ops = 0 in your case.

To execute on native TRT, we also have python APIs which can be used to load a TF network, convert to UFF and run natively on TRT.

rsandler00 · November 1, 2018, 9:39pm

Thank you @nves

Two followup questions:

Do you know why nothing is being converted?
Based on my previous comment here, it seems like there is a fundamental issue with the ‘No attr named ‘identical_element_shapes’ in NodeDef’ error, since if I try to convert to TRT w/o/ running the TF graph first, it gives me this error. Why do I get this error?

Thanks,
Roman

NVES · November 1, 2018, 9:46pm

Hello,

regarding #2, I can’t reproduce it in TRT4 or TRT5RC. the bug I referenced in that post should be available in TRT 5.RC.

rsandler00 · November 1, 2018, 9:55pm

I have tensorrt 4.0.2.0-1+cuda9.0, and am seeing the bug. Do you mean the bug exists only in TRT 5.RC (my case proves it doesnt) or that the bug fix is available in TRT 5.RC?

NVES · November 1, 2018, 10:33pm

hello,

the original case you referenced:https://devtalk.nvidia.com/default/topic/1039216/tensorrt/no-attr-named-identical_element_shapes-in-nodedef/1, the fix is actually upstreamed to TensorFlow . It’s not a TRT bug.

rsandler00 · November 1, 2018, 11:17pm

Got it. Am I able to upgrade TensorFlow given that the install instructions for TF-TRT provided a specific version of Tensorflow directly from NVIDIA (which was v1.9)?

rsandler00 · November 9, 2018, 6:12pm

I feel that this issue went offtrack from the original posting and my issue was not resolved, so i reposted here: https://devtalk.nvidia.com/default/topic/1043918/using-tf-trt-to-convert-mobilenet-ssdlite-model-gives-errors/?offset=1#5295982

rsandler00 · November 15, 2018, 6:29pm

FYI, I was able to fix the problem using the NVIDIA-IOT code base which makes very specific modifications to common networks to get them to run. See this answer: https://devtalk.nvidia.com/default/topic/1043918/jetson-tx2/using-tf-trt-to-convert-mobilenet-ssdlite-model-gives-errors/post/5296261/#5296261

Topic		Replies	Views
Using TF-TRT to convert MobileNet / SSDLite model gives errors Jetson TX2	3	1960	October 18, 2021
converting a frozen graph to tensorRT Jetson Nano	5	1788	October 14, 2021
Lower performance with TRT than plain TF? Jetson Xavier NX tensorrt , jetson-inference	14	1955	October 18, 2021
No speed up in inference on jetson agx with tensorflow 2 and tensorRT Jetson AGX Xavier tensorrt , tensorflow	4	637	October 18, 2021
Running Inference with DeepStream, but with unknown model architecture DeepStream SDK	6	950	October 12, 2021
TensorRT 3 for tensorflow support Jetson TX2	8	1601	October 18, 2021
Memory error for tensorRT model on TX2 Jetson TX2 tensorrt	6	1465	January 5, 2022
Slow inference on jetson TX2 with tensorflow Jetson TX2	2	599	October 18, 2021
Tensorflow 1.7 with TensorRT fails Jetson TX2	13	3821	October 18, 2021
Low Compute utilization of converted TensorFlow model during inference Jetson TX2	19	1695	October 18, 2021

Dont see any speedups using TensorRT

Related topics