tensorflow.python.framework.errors_impl.OpError: file is too short to be an sstable

I am unable to load tf-trt model

I have converted tf object detection model to tf-trt. But unable to load the model. i am getting RuntimeError: file is too short to be an sstable

Environment

I am using nvcr.io/nvidia/l4t-tensorflow:r32.5.0-tf2.3-py3

Steps to reproduce and Detailed description:

I have downloaded tensorflow2 object detection model from this link ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz.

I am able to load this model on my laptop and jetson device(nano).
My target deployment device is jetson. But, it takes ~350ms/sample which is huge. So, I decided to optimize the model for tensorrt. So, I followed this official link to convert the downloaded object detection model to tf-trt.

Now I used the same inference script to load the model. But, i am getting RuntimeError: file is too short to be an sstable error.
Here is the complete stacktrace.

2021-07-28 07:15:01.711920: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-28 07:15:08.336969: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-28 07:15:08.349933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:15:08.350090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1742] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-07-28 07:15:08.350167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-28 07:15:08.354037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-28 07:15:08.372553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-28 07:15:08.410318: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-28 07:15:08.429792: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-28 07:15:08.450635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-28 07:15:08.451750: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-28 07:15:08.452250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:15:08.452785: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:15:08.452895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1884] Adding visible gpu devices: 0
2021-07-28 07:16:58.340497: W tensorflow/core/platform/profile_utils/cpu_utils.cc:108] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-07-28 07:16:58.342337: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1fd38db0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-28 07:16:58.342701: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-28 07:16:58.429098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:16:58.429452: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1edd95f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-28 07:16:58.429516: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2021-07-28 07:16:58.429944: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:16:58.430078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1742] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-07-28 07:16:58.430165: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-28 07:16:58.430259: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-28 07:16:58.430315: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-28 07:16:58.430361: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-28 07:16:58.430404: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-28 07:16:58.430447: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-28 07:16:58.430505: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-28 07:16:58.430685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:16:58.430895: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:16:58.430976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1884] Adding visible gpu devices: 0
2021-07-28 07:16:58.431078: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-28 07:17:01.453962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1283] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-28 07:17:01.454048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1289]      0 
2021-07-28 07:17:01.454083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1302] 0:   N 
2021-07-28 07:17:01.454496: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:17:01.454822: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-07-28 07:17:01.455006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1428] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 882 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
Loading model...Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 95, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: file is too short to be an sstable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "app.py", line 47, in <module>
    detect_fn = tf.saved_model.load('ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/trt_saved_model')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 603, in load
    return load_internal(export_dir, tags, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 633, in load_internal
    ckpt_options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 131, in __init__
    self._restore_checkpoint()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 330, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/util.py", line 1275, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 99, in NewCheckpointReader
    error_translator(e)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 48, in error_translator
    raise errors_impl.OpError(None, None, error_message, errors_impl.UNKNOWN)
tensorflow.python.framework.errors_impl.OpError: file is too short to be an sstable
WARNING:tensorflow:5 out of the last 5 calls to <function recreate_function.<locals>.restored_function_body at 0x7eb16ed730> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:6 out of the last 6 calls to <function recreate_function.<locals>.restored_function_body at 0x7eb16c2378> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Script I used to convert tf-saved model to tf-trt is here,

import tensorflow as tf
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt

input_saved_model_dir = './ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model/'
output_saved_model_dir = './ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/trt_saved_model/'
num_runs = 1

conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1<<32))
conversion_params = conversion_params._replace(precision_mode="FP16")
# conversion_params = conversion_params._replace(maximum_cached_engiens=100)

converter = trt.TrtGraphConverterV2(input_saved_model_dir=input_saved_model_dir,conversion_params=conversion_params)
converter.convert()

def my_input_fn():
    for _ in range(num_runs):
        inp1 = np.random.normal(size=(1, 1, 320, 320, 3)).astype(np.uint8)
        yield inp1
        
converter.build(input_fn=my_input_fn)
converter.save(output_saved_model_dir)

Inference snippet to load model is here,

# Load saved model and build the detection function
detect_fn = tf.saved_model.load('ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/trt_saved_model')
  1. Am I converting the tf model correctly
  2. Is loading tf-trt model procedure is correct?

How to resolve the issue?

Any hint would be appreciable.

Hi,
We recommend you to check the below samples links in case of tf-trt integration issues.
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#samples
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#framework-integration
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#integrate-ovr
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usingtftrt

If issue persist, We recommend you to reach out to Tensorflow forum.
Thanks!