Description
When calibrating for INT8 optimization I get an error: F tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:432] Check failed: t.TotalBytes() == device_tensor->TotalBytes() (372 vs. 332)
I am able to optimize for FP32 and FP16 with good results, but when I try INT8 I get the error.
The issue seems to be around the calibration dataset or how it is being loaded. I have included the code I run, minus the saved_model and calibration dataset due to IP. The standard efficientdet-d0 can be used to replicate the issue. The calibration dataset is just a bunch of the validation data saved as jpegs at the training resolution 512x512.
In the code below there is a function called input_fn_works and input_fn_doesnt_work. input_fn_works uses random data and doesnt throw the error, however the resulting optimized model doesnt perform at all. I get scores of less than 0.01 for all bboxes. input_fn_doesnt_work throws the error of check failed when it reaches the second image in the load sequence. I have tried various ways of loading the data and none seem to change the outcome. Loading the images in a different order just changes the print error from having (372 vs. 332) to some other two numbers.
Any ideas of what is going wrong?
Environment
TensorRT Version: 6.0.1
GPU Type: Titan V
Nvidia Driver Version: 455
CUDA Version: 10.1
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 18
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): Tensorflow 2
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
import glob
import os
import numpy as np
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from PIL import Image
def config_gpu_memory(gpu_mem_cap):
gpus = tf.config.experimental.list_physical_devices('GPU')
if not gpus:
return
print('Found the following GPUs:')
for gpu in gpus:
print(' ', gpu)
for gpu in gpus:
try:
if not gpu_mem_cap:
tf.config.experimental.set_memory_growth(gpu, True)
else:
tf.config.experimental.set_virtual_device_configuration(
gpu,
[tf.config.experimental.VirtualDeviceConfiguration(
memory_limit=gpu_mem_cap)])
except RuntimeError as e:
print('Can not set GPU memory config', e)
def get_trt_conversion_params(max_workspace_size_bytes,
precision_mode,
minimum_segment_size,
max_batch_size):
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(
max_workspace_size_bytes=max_workspace_size_bytes)
conversion_params = conversion_params._replace(precision_mode=precision_mode)
conversion_params = conversion_params._replace(
minimum_segment_size=minimum_segment_size)
conversion_params = conversion_params._replace(
use_calibration=precision_mode == 'INT8')
conversion_params = conversion_params._replace(
max_batch_size=max_batch_size)
return conversion_params
def get_func_from_saved_model(saved_model_dir):
saved_model_loaded = tf.saved_model.load(
saved_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
return graph_func
#It seems to process the first image then throws an error with the second
def input_fn_doesnt_work():
input_size = (512,512)
filenames = glob.glob('/path/to/saved/images/*.jpg')
for i in range(500):
batched_input = tf.io.read_file(filenames[i])
batched_input = tf.image.decode_jpeg(batched_input, channels=3)
batched_input = tf.image.resize(batched_input, size=input_size)
batched_input = tf.cast(batched_input, tf.uint8)
batched_input = np.expand_dims(batched_input, axis=0)
batched_input = tf.constant(batched_input)
yield ((batched_input),)
#works and doesnt crash, but output results are not good. AP after this is 0. Most bboxes have a score of < 0.1
def input_fn_works():
input_size = (512,512)
for i in range(500):
batched_input = np.random.random((1, input_size[0], input_size[1], 3)).astype(np.uint8)
batched_input = tf.constant(batched_input)
yield ((batched_input),)
config_gpu_memory(0)
conversion_params = get_trt_conversion_params((1<<30),'INT8',2,1)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir='/path/to/effdet-d0/saved_model/',
conversion_params=conversion_params,
)
converter.convert(calibration_input_fn=input_fn_doesnt_work)
converter.build(input_fn=input_fn_doesnt_work)
converter.save(output_saved_model_dir='/path/to/save/trt_int8/')
Steps To Reproduce
Change the paths and run via a jupyter notebook or python script.