TensorRT Integration Speeds Up TensorFlow Inference

jwitsoe · March 27, 2018, 5:33pm

Originally published at: TensorRT Integration Speeds Up TensorFlow Inference | NVIDIA Technical Blog

Update, May 9, 2018: TensorFlow v1.7 and above integrates with TensorRT 3.0.4. NVIDIA is working on supporting the integration for a wider set of configurations and versions. We’ll publish updates when these become available. Meanwhile, if you’re using pip install tensorflow-gpu, simply download TensorRT files for Ubuntu 14.04 not16.04, no matter what version of Ubuntu you’re running.…

anon58287506 · March 28, 2018, 4:41am

Nice example, but the link of sample code is broken.

anon93638682 · March 28, 2018, 4:26pm

Thanks for catching that wstbee. Corrected, please check now.

anon58287506 · March 29, 2018, 2:56am

Thanks! it works now.

anon27573365 · April 3, 2018, 5:40pm

Thanks a lot for the example. However when I was trying to run the code, there was an error:

2018-04-03 10:31:34.601137: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-04-03 10:31:35.543647: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2624] Max batch size= 4 max workspace size= 2147483648
2018-04-03 10:31:35.543723: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] starting build engine
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007ffe075d7300 ***

Any idea?

anon30097328 · April 5, 2018, 11:29am

While running the example code, I receive a stack smash when creating the TensorRT inference graph from the frozen TensorFlow GraphDef:

2018-04-05 13:25:34.652215: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
*** stack smashing detected ***: python3 terminated
Aborted (core dumped)

TensorRT version: 3.0.4
CUDA version: 8.0
TensorFlow version: v1.7

Do you have suggestions what the reason might be?

anon71652788 · April 16, 2018, 2:07pm

I am running into the same issue with TensorRT 3.0.4 and 4.0.0.3. But, version 3.0.1 seems to work fine and not running into this issue. Any idea?

anon93638682 · April 16, 2018, 11:41pm

wrgwrgrwgw

the integration is supported with CUDA 9.0 (not 8.0).

hope that helps,
Sid

anon93638682 · April 16, 2018, 11:42pm

@zhihui_guo:disqus Please ensure you are using TensorRT 3.0.4.
@rohith_b:disqus remove TensorRT 4.0 and try, that should resolve the issue.

best,
Sid

anon27573365 · April 17, 2018, 1:49am

@disqus_WolJuWDgTI:disqus Thanks a lot for the tip. But it doesn't seem to help. The same error exists for TensorRT 3.0.4 when running the script run_all.sh. BTW I am using TensorFlow 1.7 with Virtualenv.

anon71652788 · April 17, 2018, 1:52pm

That doesn't help. I still see the same error. For TensorRT 3.0.4 and 4.0, it only works for native type. For other data types (fp32), I see the error. Thanks, Rohith

anon93638682 · April 17, 2018, 3:53pm

@rohith_b:disqus that's weird. We would like to get more info to repro. Can you give an email ID we follow up offline at?

anon93517741 · April 17, 2018, 5:04pm

@zhihui_guo:disqus @rohith_b:disqus If you are using pip install tensorflow-gpu, could you please make sure that you have downloaded TensorRT3.0.4 for Ubuntu *14.04* and not 16.04. Due to compatibility requirements of TensorFlow we need to use 14.04 release of TensorRT when using TensorFlow pip packages. We hope to resolve this in the next TensorRT release. If you are building from sources, you can use your native version of TensorRT. If these doesn't help please provide us means to contact you. @Rohith B, TensorRT4RC should work if you compile from sources but is not officially supported so might encounter problems.

Cheers,
Sami

anon71652788 · April 18, 2018, 7:25pm

Thank you Sami. Yes, TensorRT 3.0.4 for Ubuntu *14.04* works. What is the recommended value for per process gpu memory and memory for TensorRT engine? I am running into out of memory issue while iterating over different batch-sizes.

anon93517741 · April 19, 2018, 6:45pm

That depends on your network and how you run. We are working to ease this up so you will not have to set per_process_gpu_memory option but this will be coming with TensorRT4.0. Also notice that you can run with maximum_batch_size batch sizes and smaller but smaller batch sizes will not be optimal. For now I would suggest you generate different batch sizes in separate processes.

anon71652788 · April 20, 2018, 1:07pm

Thank you, Sami. Yes, I figured out that each batch-size needs to be run in a separate process because I was running into memory issues.

anon71652788 · April 20, 2018, 1:09pm

Sami helped me with this (see my comments above). It looks like for now TensorRT 3.0.4, Ubuntu 14.04 works. Thanks

anon71652788 · April 23, 2018, 2:55pm

I an seeing these errors with TensorRT 3.0.4 (tensorflow 1.7):

Any thoughts?
Number of eligible GPUs (core count >= 8): 1
2018-04-22 23:44:21.480797: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 4
2018-04-22 23:44:21.481948: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::287, condition: dims.d[i] > 0
2018-04-22 23:44:21.481972: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:0 due to:"Invalid argument: Failed to create Input layer" SKIPPING......( 8 nodes)

anon29711837 · May 2, 2018, 4:24pm

I got core dumped error when running the sample code.

System: Ubuntu 16.04
TensorRT version: 4.0.0.3
CUDA version: 9.0
cuDNN version: 7.0.5
TensorFlow version: 1.7
GPU: GTX 1080ti

anon93638682 · May 2, 2018, 4:39pm

@pharrellyhy:disqus the integration works with TensorRT 3

Topic		Replies	Views
TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image) TensorRT	7	2170	October 12, 2021
TRT issue with Graph Creation - TRTEngineOP TensorRT	12	3144	November 4, 2019
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1569	February 21, 2019
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2836	November 15, 2019
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1230	April 15, 2020
Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2 TensorRT	17	3676	October 12, 2021
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2938	January 18, 2019
Inference Time is not stable TensorRT	10	1757	January 3, 2019
Speeding Up Deep Learning Inference Using TensorRT Technical Blog	5	914	November 9, 2021
Error while optimizing frozen Tensorflow graph TensorRT	4	1169	February 26, 2019

TensorRT Integration Speeds Up TensorFlow Inference

Related topics