TensorRT model accuracy on different GPUs

Linux distro and version: Ubuntu 16.04.5 LTS
GPU type: GTX 1080TI / Tesla V100-SXM2-16GB
nvidia driver version: 396.26 / 396.54
CUDA version: 7.0? or the one installed in the nvcr.io/nvidia/tensorrt:18.08-py3
CUDNN version: V9.0.176
Python version: 3.5.2
Tensorflow version: Built from source: r.1.11
TensorRT version: 4.0.1.6

I have been testing optimizing SSD Inception V2 with TensorRT. I have a GTX 1080Ti locally and I have ran some tests on my own datasets. I am using the same code for converting and running the model. And everything is ran inside of a Docker container. On my local PC with GTX 1080Ti has 75% maP and on the V100 it drops to 55% maP. The speed also got worse, from 0.038 on GTX 1080Ti to 0.05 on the V100. I have tried saving the converted network from my local 1080Ti and launching it on the V100, but it throws an error with customWinograd… Internet ways that the TensorRT builds the networks for specific GPU architectures and that they are not portable.

I am also converting the networks using examples from this repository: https://github.com/NVIDIA-Jetson/tf_trt_models

The code for launching the network is below:

import tensorflow.contrib.tensorrt as trt
import tensorflow as tf
import numpy as np
from tf_trt_models.detection import download_detection_model, build_detection_graph

config_path, checkpoint_path = download_detection_model(MODEL, 'data')
frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path
)
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=8,
    minimum_segment_size=50,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP32',
    is_dynamic_op=False
)

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True

self.tf_sess = tf.Session(config=tf_config)

tf.import_graph_def(trt_graph, name='')

self.tf_input = self.tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
self.tf_scores = self.tf_sess.graph.get_tensor_by_name('scores:0')
self.tf_boxes = self.tf_sess.graph.get_tensor_by_name('boxes:0')
self.tf_classes = self.tf_sess.graph.get_tensor_by_name('classes:0')

scores, boxes, classes = self.tf_sess.run([self.tf_scores, self.tf_boxes, self.tf_classes], feed_dict={
            self.tf_input: frames
})

Any updates on this?

we are reviewing and will keep you updated. You mentioned everything is containerized, can you provide a simple repro? with maybe a small subset of your dataset that demonstrates the accuracy change and performance change?

thanks

Hello,

we are NOT aware of perf regression on V100 compared to GTX 1080Ti. Would really like to see a simple repro that demonstrates the accuracy change and performance change.