Looking for Insight on Disappointing Results Optimizing an Object Detection Network with TensorRT

lmcquarr · March 28, 2019, 3:20pm

I came back from GTC gung ho on using TensorRT to optimize our tensorflow-based object detection network for better inference performance. Unfortunately, tweaking parameters to the Create_Inference_Graph method, I was only able to get a small performance boost in one case, and in most cases, however, performance got worse. I am looking for some insight into why I am not getting better a better boost from TensorRT. Some facts:

Object detection network similar to YOLO. (But not quite the same … since our detection task is not looking for cars, cats, etc.)
Hosted in Tensorflow/Serving: 1.13.1-gpu container (Cuda 10, cudnn 7.4.1, tensorRT 5.0.2), runtime= nvidia-docker 2.0 , Driver 410.79
Hosted on AWS p3.2xlarge instance type with Tesla V100 GPU
Set use_dynamic_ops to true since we don’t have fixed size inputs and outputs to the various layers
Tensorflow Saved Model as input
We typically have a batch size of only one
We definitely use some unsupported operations but TensorRT could still create anywhere from 10 to 40 TRT engines during conversion.
Tried various combinations of FP32, FP16 and Int8
Tried various combinations of maximum_cached_engines
Tried various combinations of mininum_segment_size (3,5,10 for example)

Watching the output of TFServing, when I do inference, I can see it takes a while to start those TRT engines, which is why I suspect that the “use_dynamic_ops” = true is one of the problems. Comments on this?

Is it due to low batch size? Would we expect to see better perfomance with larger batch sizes?

Also, on one older blog, but no where else, I found a mention of limiting Tensorflow itself to a percent of the GPU:

In NO other documentation did I see this reference to limiting TF GPU memory. Should I be limiting Tensorflow to a percent of the GPU memory and if so, what would you recommend?

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22817 C tensorflow_model_server 15708MiB |
±----------------------------------------------------------------------------+

It seems TRT uses the same memory space when it runs in TF Serving. I didn’t see anything else running in the GPU when I first started my model and and as soon as the TRT engines were created the memory usage jumped dramatically. I did find an option to have TF serving use less memory: tensorflow_model_server --per_process_gpu_memory_fraction=0.400000 but it didn’t impact performance.

Lastly, I saw no impact when setting the maximum_cached_engines and this surprised me. Perhaps I am misinterpreting what this parameter does . … can you explain?

lmcquarr · March 29, 2019, 5:13pm

Update: I followed the instructions on this page for optimizing performance (see section titled “New TensorFlow APIs”):

I told TF to use 50% of the GPU memory and re-saved the saved model from a checkpoint:

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.5)
with tf.Session(graph=tf.Graph(),config=tf.ConfigProto(gpu_options=gpu_options)) as sess:

And I told TRT to use a max_workspace_size_bytes=8000000000

This unfortunately did not improve the inference performance.

I also did two experiments … one where I let TFserving have all of the GPU memory and one where I told it only 50%.
Again, no inference improvement.

Any insights would be appreciated.

Topic		Replies	Views
TensorRT 3: Faster TensorFlow Inference and Volta Support Technical Blog	16	462	December 8, 2020
Tensorflow inference using TRT converted model TensorRT	10	1054	May 25, 2021
TensorRT not improving FPS on GTX 1080ti TensorRT	9	2396	November 21, 2018
Examples for porting from Tensorflow to TensorRT4 object detection inference TensorRT	4	2457	April 26, 2018
TRT issue with Graph Creation - TRTEngineOP TensorRT	12	3094	November 4, 2019
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1253	February 25, 2020
TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image) TensorRT	7	2153	October 12, 2021
"Engine buffer is full" TensorRT	15	3643	October 12, 2021
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1223	April 15, 2020
Inference time using TF-TRT is the same as Native Tensorflow for Object Detection Models TensorRT tensorrt , tf-trt	4	1008	March 31, 2022

Looking for Insight on Disappointing Results Optimizing an Object Detection Network with TensorRT

Related topics