TensorRT 3: Faster TensorFlow Inference and Volta Support

jwitsoe · December 4, 2017, 4:20am

Originally published at: TensorRT 3: Faster TensorFlow Inference and Volta Support | NVIDIA Technical Blog

NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications. NVIDIA released TensorRT last year with the goal of accelerating deep learning inference for production deployment. Figure 1. TensorRT optimizes trained neural network models to produce adeployment-ready runtime inference engine. In this post we’ll introduce…

anon23134999 · January 4, 2018, 7:22am

Amazing article. I am very interested to try optimization 1 during training as well. Could I just use it in this setting on a centos machine?
In downloads page only Ubuntu packages are available.

anon48679366 · January 4, 2018, 9:17am

Hi Nitin, thanks!. TensorRT is a deployment-only library, so you can't take advantage of these optimizations for training with TensorRT. Currently only ubuntu packages are tested and officially supported. TensorRT is also available for Jetson TX1 and TX2 embedded platforms.

anon23134999 · January 4, 2018, 9:34am

Great, thank you.
Is there any way or future plan to pluck and use parts of this module (like optimization 1)? I am currently facing heavy slow downs on GPUs and not able to use its full potential and feel layer fusion can heavily speed up the training. Something like,

my_raw_tf.graph > tensor_rt3.layer_fusion > op_tf.graph
op_tf.graph.fit(X, Y)

anon48679366 · January 4, 2018, 10:33am

Hi Nitin, I can't comment on future plans. Performance improvements depend on various factors including the specific graph structure and opportunities for fusion. TensorRT is able to deliver overall better performance through a combination of all the optimizations discussed in the post.

Try using NVIDIA optimized framework containers on NGC container registry. You can download and run them locally or on the latest V100 GPUs on AWS:
https://www.nvidia.com/en-u...

anon70019727 · February 8, 2018, 10:46am

Is TensorRT provides a Custom Layer C++ API for Tensorflow to inference UFF file of tensorflow on DrivePX2?

anon2494271 · February 16, 2018, 6:23pm

Is there an example of this with the C++ API?

anon27377761 · February 20, 2018, 10:26pm

What about use on AWS p2.xlarge or other home made NVIDIA / CUDA based systems?

anon76598746 · March 13, 2018, 9:12am

Useful information in this post, can't wait to try.

anon31940740 · April 5, 2018, 4:31pm

Hi
first of all, this is a very interesting and useful article.
I was wondering if I can call the inference method - infer - from within a kernel (runs on gpu)
if not, is there any future plan to support/provide this kind of device API?
It would be great to have it!

Thanks

anon83140942 · April 6, 2018, 10:48pm

is it possible to saher the code inside load_and_preprocess_images() method

anon88281670 · July 13, 2018, 9:12am

see this problem,
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.4/site-packages/tensorflow/python/framework/importer.py", line 489, in import_graph_def
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 4 for 'import/dense_p7/MatMul' (op: 'MatMul') with input shapes: [1,256,1,1], [256,1]

we can successfully complete the tensorrt subgraph convension, but we meet the problem during the inference phases. My model is resnet-50 based tensorflow. who can help me solve this problem, thanks!

anon91014425 · September 5, 2018, 1:40pm

Using TensorFlow backend.
2018-09-05 18:27:17.202041: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2018-09-05 18:27:17.336380: W tensorflow/stream_executor/cuda/cuda_driver.cc:513] A non-primary context 0x60fa250 for device 0 exists before initializing the StreamExecutor. The primary context is now 0x60cc960. We haven't verified StreamExecutor works with that.
2018-09-05 18:27:17.337269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.70GiB
2018-09-05 18:27:17.337304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-05 18:27:17.991676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-05 18:27:17.991732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-05 18:27:17.991747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-05 18:27:17.991999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7408 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
model_data/yolo.h5 model, anchors, and classes loaded.
Using output node dense_2/Softmax
Converting to UFF graph
Traceback (most recent call last):
File "demo.py", line 193, in <module>
main(YOLO())
File "demo.py", line 43, in main
uff_model = uff.from_tensorflow_frozen_model("mars-small128.pb", ["dense_2/Softmax"])
File "/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 149, in from_tensorflow_frozen_model
return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
File "/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 120, in from_tensorflow
name="main")
File "/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/converter.py", line 76, in convert_tf2uff_graph
uff_graph, input_replacements)
File "/usr/lib/python3.5/dist-packages/uff/converters/tensorflow/converter.py", line 53, in convert_tf2uff_node
raise UffException(str(name) + " was not found in the graph. Please use the -l option to list nodes in the graph.")
NameError: name 'UffException' is not defined

anon941035 · September 5, 2018, 5:15pm

You might try the devtalk forums: https://devtalk.nvidia.com/...

wade.wang · December 5, 2020, 1:35pm

HI, do you have the usage guide of tensorrt lite ?

jwitsoe · December 7, 2020, 6:04pm

@wade.wang – This might help: tensorrt.lite — TensorRT 3.0.0 documentation.

wade.wang · December 8, 2020, 1:18am

@jwitsoe Yes, it is helpful, Thank you !

Topic		Replies	Views
TensorRT Integration Speeds Up TensorFlow Inference Technical Blog	40	803	March 27, 2020
TF-TRT not generating .engine file TensorRT	1	726	May 18, 2022
Incorrect inference in TensorRT compared to the Tensorflow inference TensorRT tensorrt	3	763	March 10, 2022
Examples for porting from Tensorflow to TensorRT4 object detection inference TensorRT	4	2458	April 26, 2018
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT tensorrt , tensorflow , python , inference-server-triton , machine-learning	4	951	May 13, 2021
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	747	April 22, 2021
TF-TRT optimization TensorRT tensorrt , tensorflow , jetson-inference	4	4952	June 2, 2021
No improvement seen using TFTRT TensorRT	3	708	May 18, 2021
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2913	January 18, 2019
TF-TRT graph conversion failed for Tensorflow version 1 TensorRT tensorrt , tensorflow , ubuntu , python , tf-trt	1	823	May 17, 2022

TensorRT 3: Faster TensorFlow Inference and Volta Support

Related topics