TensorRT 5 and TensorRT 7 conversion discrepancy

mishall.swain · September 21, 2020, 7:55am

Provide details on the platforms you are using:

Linux distro and version - Ubuntu 18.04.4 LTS
GPU type - Tesla T4 (AWS g4dn.xlarge) and Volta V100 (AWS p3.2xlarge)
Nvidia driver version - 450.36.06
CUDA version - 10.0
CUDNN version - 7.6
Python version [if using python] - 3.6
Tensorflow version - 1.14
TensorRT version - 7

Describe the problem

Using tensorrt 5 (tf 1.15) for conversion, results:
numb. of all_nodes in frozen graph: 3408
numb. of trt_engine_nodes in TensorRT graph: 7
numb. of all_nodes in TensorRT graph: 892

using Tensort 7.0.0 (tf 1.14) for conversion. results:
numb. of all_nodes in frozen graph: 3408
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 1789

Files

Code used for conversion:
import tensorflow as tf
from tensorflow.python.platform import gfile
from tensorflow.python.compiler.tensorrt import trt_convert as trt

frozen_graph = ‘/home/ubuntu/Tensorrt/frozen_model.pb’

with open(frozen_graph, ‘rb’) as f:
frozen_graph_gd = tf.GraphDef()
frozen_graph_gd.ParseFromString(f.read())

if tf.test.gpu_device_name():
print(‘Default GPU Device:{}’.format(tf.test.gpu_device_name()))
else:
print(“Please install GPU version of TF”)

trt_graph = trt.create_inference_graph(
input_graph_def = frozen_graph_gd,
is_dynamic_op=True,
outputs=[‘num_detections:0’, ‘detection_boxes:0’, ‘detection_scores:0’,‘detection_classes:0’],
max_batch_size=32,
max_workspace_size_bytes=2*(10**9),
precision_mode=“FP16”)

with gfile.FastGFile(“/home/ubuntu/Tensorrt/TensorRT_model.pb”, ‘wb’) as f:
f.write(trt_graph.SerializeToString())
print(“TensorRT model is successfully stored!”)

all_nodes = len([1 for n in frozen_graph_gd.node])
print(“numb. of all_nodes in frozen graph:”, all_nodes)

#check how many ops that is converted to TensorRT engine
trt_engine_nodes = len([1 for n in trt_graph.node if str(n.op) == ‘TRTEngineOp’])
print(“numb. of trt_engine_nodes in TensorRT graph:”, trt_engine_nodes)
all_nodes = len([1 for n in trt_graph.node])
print(“numb. of all_nodes in TensorRT graph:”, all_nodes)

Include any logs, source, models (.uff, .pb, etc.) that would be helpful to diagnose the problem.

Cannot share .pb due to infosec.
No error reported in logs. Identical for both cases.

Reproducibility

Using the above code and any detection model.

AakankshaS · September 21, 2020, 5:47pm

Hi @mishall.swain,
Request you to share your model and verbose logs.
Also is this related to TF-TRT conversion?

Thanks!

mishall.swain · September 22, 2020, 4:05pm

I am able to reproduce the same issue with this SSD.pb

Is there any specific command or param to get these logs?

2020-09-22 16:00:31.873374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-09-22 16:00:31.873390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-09-22 16:00:33.137892: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2020-09-22 16:00:33.137946: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 6823 nodes (1), 11475 edges (2), time = 547.209ms.
2020-09-22 16:00:33.137961: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: layout did nothing. time = 4.224ms.
2020-09-22 16:00:33.137972: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 6823 nodes (0), 11475 edges (0), time = 196.495ms.
WARNING:tensorflow:From convert1.py:28: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
numb. of all_nodes in frozen graph: 6822
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 6823

For this model, TRT7 is unable to create any trt_engine_nodes as well as almost no reduction in all_nodes.
TRT5 is able to create engine nodes and reduce all nodes as well.

Yes

AakankshaS · September 22, 2020, 8:53pm

Please check the below link

Thanks!

mishall.swain · September 23, 2020, 11:22am

Hi @AakankshaS due to the verbose logging, I realized that my code was not using GPU at all.

I was using deepstream:20.07-triton docker with CUDA 10.2 and TRT 7.0 installed for the conversion. I had installed TF-gpu=1.15 myself inside the docker. Due to incompatibility, the TF was not utilizing the GPU. It installs correctly but does not use GPU due to few libraries being incompatible.

I also tried using tensorrt:20.01 docker with CUDA 10.2 and TRT 7.0. But I’m facing a similar issue during installing TF-GPU and using it.

[For TRT-5] I had installed TF and TRT-5 on the base instance directly, hence it was able to use the GPU.

Is there any easy way to test TFTRT conversion using TRT 7.0. I intend to use Triton Inference Server for my ML Pipeline. I have trained TF .pb files already.

Topic		Replies	Views
TF-TRT not generating .engine file TensorRT	1	716	May 18, 2022
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1244	February 25, 2020
Debug TensorRT loading correctly? TensorRT	4	1648	October 11, 2019
TensorRT not improving FPS on GTX 1080ti TensorRT	9	2381	November 21, 2018
TF-TRT5: Could not find tensor InputPH_0 in tensorScales TensorRT	7	1188	November 21, 2018
TensorRT Error: Can't identify the cuda device. Running on device 0 TensorRT tensorrt , cuda , tensorflow	3	646	January 7, 2021
Tensorflow inference using TRT converted model TensorRT	10	1050	May 25, 2021
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	743	April 22, 2021
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT tensorrt , tensorflow , python , inference-server-triton , machine-learning	4	945	May 13, 2021
use tensorflow tensorrt API convert failed TensorRT	7	2946	May 2, 2018