I tried benchmarking inference time on the Jetson TX2 using pure Tensorflow (TF) vs. Tensorflow converted to TensorRT (TRT) using the tensorflow.contrib.tensorrt library.
Here Here is the link to my public benchmarking gist. To benchmark, I have used the MobileNetv2 model, whose frozen TF graph can be downloaded from here.
The average classification time for a single 224x224 image over 100 trials was:
TF: 27ms
TRT: 31ms
My questions are:
Why is there no TensorRT speedup (in fact a slight slowdown)?
I know there is also the option to convert to TensorRT in pure C++ w/o/ python. Is this much faster than TensorRT in pyton? (I dont know C++ well, so would only want to pursue this if there were significant gains)
First, I tried another *.pb graph from here and had the exact same issue (TF=137ms, TRT=147ms).
I also found that in both cases, if I tried to run the TRT session before the TF session, there would be a bug during inference saying: ‘No attr named ‘identical_element_shapes’ in NodeDef’ (see below for full error stack). This is the same bug recently reported here, where you said there was a bug w/ TF-TRT which was recently fixed in new release. Do you know how I can access this fix?
Performance would depend on how many ops were converted from TF to TRT. Recommend viewing the network on tensorboard and comparing the two .pb files (native TF and TF-TRT)
In case of the TF-TRT network, the portion of the graph being run on TRT will show up as a new node called my_trt_op_*. It looks like nothing is converted , hence no speedup.
You can try something like this to view the number of TRT ops in the converted TRT graph.
trt_engine_ops = len([1 for n in trt_graph.node if str(n.op)=='TRTEngineOp'])
I’m seeing trt_engine_ops = 0 in your case.
To execute on native TRT, we also have python APIs which can be used to load a TF network, convert to UFF and run natively on TRT.
Based on my previous comment here, it seems like there is a fundamental issue with the ‘No attr named ‘identical_element_shapes’ in NodeDef’ error, since if I try to convert to TRT w/o/ running the TF graph first, it gives me this error. Why do I get this error?
I have tensorrt 4.0.2.0-1+cuda9.0, and am seeing the bug. Do you mean the bug exists only in TRT 5.RC (my case proves it doesnt) or that the bug fix is available in TRT 5.RC?
Got it. Am I able to upgrade TensorFlow given that the install instructions for TF-TRT provided a specific version of Tensorflow directly from NVIDIA (which was v1.9)?