TensorRT Integration Speeds Up TensorFlow Inference

Originally published at: https://developer.nvidia.com/blog/tensorrt-integration-speeds-tensorflow-inference/

Update, May 9, 2018: TensorFlow v1.7 and above integrates with TensorRT 3.0.4. NVIDIA is working on supporting the integration for a wider set of configurations and versions. We’ll publish updates when these become available. Meanwhile, if you’re using pip install tensorflow-gpu, simply download TensorRT files for Ubuntu 14.04 not16.04, no matter what version of Ubuntu you’re running.…

Nice example, but the link of sample code is broken.

Thanks for catching that wstbee. Corrected, please check now.

Thanks! it works now.

Thanks a lot for the example. However when I was trying to run the code, there was an error:

2018-04-03 10:31:34.601137: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-04-03 10:31:35.543647: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2624] Max batch size= 4 max workspace size= 2147483648
2018-04-03 10:31:35.543723: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] starting build engine
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007ffe075d7300 ***

Any idea?

While running the example code, I receive a stack smash when creating the TensorRT inference graph from the frozen TensorFlow GraphDef:

2018-04-05 13:25:34.652215: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
*** stack smashing detected ***: python3 terminated
Aborted (core dumped)

TensorRT version: 3.0.4
CUDA version: 8.0
TensorFlow version: v1.7

Do you have suggestions what the reason might be?

I am running into the same issue with TensorRT 3.0.4 and But, version 3.0.1 seems to work fine and not running into this issue. Any idea?


the integration is supported with CUDA 9.0 (not 8.0).

hope that helps,

@zhihui_guo:disqus Please ensure you are using TensorRT 3.0.4.
@rohith_b:disqus remove TensorRT 4.0 and try, that should resolve the issue.


@disqus_WolJuWDgTI:disqus Thanks a lot for the tip. But it doesn't seem to help. The same error exists for TensorRT 3.0.4 when running the script run_all.sh. BTW I am using TensorFlow 1.7 with Virtualenv.

That doesn't help. I still see the same error. For TensorRT 3.0.4 and 4.0, it only works for native type. For other data types (fp32), I see the error. Thanks, Rohith

@rohith_b:disqus that's weird. We would like to get more info to repro. Can you give an email ID we follow up offline at?

@zhihui_guo:disqus @rohith_b:disqus If you are using pip install tensorflow-gpu, could you please make sure that you have downloaded TensorRT3.0.4 for Ubuntu *14.04* and not 16.04. Due to compatibility requirements of TensorFlow we need to use 14.04 release of TensorRT when using TensorFlow pip packages. We hope to resolve this in the next TensorRT release. If you are building from sources, you can use your native version of TensorRT. If these doesn't help please provide us means to contact you. @Rohith B, TensorRT4RC should work if you compile from sources but is not officially supported so might encounter problems.


Thank you Sami. Yes, TensorRT 3.0.4 for Ubuntu *14.04* works. What is the recommended value for per process gpu memory and memory for TensorRT engine? I am running into out of memory issue while iterating over different batch-sizes.

That depends on your network and how you run. We are working to ease this up so you will not have to set per_process_gpu_memory option but this will be coming with TensorRT4.0. Also notice that you can run with maximum_batch_size batch sizes and smaller but smaller batch sizes will not be optimal. For now I would suggest you generate different batch sizes in separate processes.

Thank you, Sami. Yes, I figured out that each batch-size needs to be run in a separate process because I was running into memory issues.

Sami helped me with this (see my comments above). It looks like for now TensorRT 3.0.4, Ubuntu 14.04 works. Thanks

I an seeing these errors with TensorRT 3.0.4 (tensorflow 1.7):

Any thoughts?
Number of eligible GPUs (core count >= 8): 1
2018-04-22 23:44:21.480797: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 4
2018-04-22 23:44:21.481948: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::287, condition: dims.d[i] > 0
2018-04-22 23:44:21.481972: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:0 due to:"Invalid argument: Failed to create Input layer" SKIPPING......( 8 nodes)

I got core dumped error when running the sample code.

System: Ubuntu 16.04
TensorRT version:
CUDA version: 9.0
cuDNN version: 7.0.5
TensorFlow version: 1.7
GPU: GTX 1080ti

@pharrellyhy:disqus the integration works with TensorRT 3