TensorRT model accuracy on different GPUs

iliya.strelnikov · September 24, 2018, 2:07pm

Linux distro and version: Ubuntu 16.04.5 LTS
GPU type: GTX 1080TI / Tesla V100-SXM2-16GB
nvidia driver version: 396.26 / 396.54
CUDA version: 7.0? or the one installed in the nvcr.io/nvidia/tensorrt:18.08-py3
CUDNN version: V9.0.176
Python version: 3.5.2
Tensorflow version: Built from source: r.1.11
TensorRT version: 4.0.1.6

I have been testing optimizing SSD Inception V2 with TensorRT. I have a GTX 1080Ti locally and I have ran some tests on my own datasets. I am using the same code for converting and running the model. And everything is ran inside of a Docker container. On my local PC with GTX 1080Ti has 75% maP and on the V100 it drops to 55% maP. The speed also got worse, from 0.038 on GTX 1080Ti to 0.05 on the V100. I have tried saving the converted network from my local 1080Ti and launching it on the V100, but it throws an error with customWinograd… Internet ways that the TensorRT builds the networks for specific GPU architectures and that they are not portable.

I am also converting the networks using examples from this repository: https://github.com/NVIDIA-Jetson/tf_trt_models

The code for launching the network is below:

import tensorflow.contrib.tensorrt as trt
import tensorflow as tf
import numpy as np
from tf_trt_models.detection import download_detection_model, build_detection_graph

config_path, checkpoint_path = download_detection_model(MODEL, 'data')
frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path
)
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=8,
    minimum_segment_size=50,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP32',
    is_dynamic_op=False
)

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True

self.tf_sess = tf.Session(config=tf_config)

tf.import_graph_def(trt_graph, name='')

self.tf_input = self.tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
self.tf_scores = self.tf_sess.graph.get_tensor_by_name('scores:0')
self.tf_boxes = self.tf_sess.graph.get_tensor_by_name('boxes:0')
self.tf_classes = self.tf_sess.graph.get_tensor_by_name('classes:0')

scores, boxes, classes = self.tf_sess.run([self.tf_scores, self.tf_boxes, self.tf_classes], feed_dict={
            self.tf_input: frames
})

iliya.strelnikov · October 2, 2018, 11:39am

Any updates on this?

NVES · October 2, 2018, 3:44pm

we are reviewing and will keep you updated. You mentioned everything is containerized, can you provide a simple repro? with maybe a small subset of your dataset that demonstrates the accuracy change and performance change?

thanks

NVES · October 3, 2018, 9:25pm

Hello,

we are NOT aware of perf regression on V100 compared to GTX 1080Ti. Would really like to see a simple repro that demonstrates the accuracy change and performance change.

Topic		Replies	Views
TensorRT not improving FPS on GTX 1080ti TensorRT	9	2396	November 21, 2018
TensorRT 5 and TensorRT 7 conversion discrepancy TensorRT tensorrt , tensorflow	4	505	September 23, 2020
TensorRT results in reduced accuracy and performance TensorRT tensorrt	1	1492	July 30, 2020
Conversion with no speed improvement, TRT-TF TensorRT	2	1138	October 12, 2021
Looking for Insight on Disappointing Results Optimizing an Object Detection Network with TensorRT TensorRT	1	886	March 29, 2019
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1253	February 25, 2020
TensorRT inference is slower than tensorflow model TensorRT	1	954	June 28, 2019
Debug TensorRT loading correctly? TensorRT	4	1653	October 11, 2019
TensorRT view the layers that are converted TensorRT tensorrt	1	570	June 11, 2021
TensorRT 4: subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Not supported constant type... TensorRT	1	725	October 8, 2018

TensorRT model accuracy on different GPUs

Related topics