Error in TFTRT

I am trying to convert my Tensorflow model to TRT following the object detection sample.
My command is
python object_detection.py --input_saved_model_dir /workspace/examples/NumPlateDetection/saved_model --output_saved_model_dir /workspace/examples/NumPlateDetection --data_dir /workspace/examples/NumPlateDetection/infer/images --calib_data_dir /workspace/examples/NumPlateDetection/images --optimize_offline --precision INT8 --num_calib_inputs 800 --input_size 736 --batch_size 8 --mode 'inference' --outputimg_path /workspace/examples/NumPlateDetection/outputs --use_trt

What could be issue?

I have the following errors

: 164 curr_region_allocation_bytes_: 34359738368
2020-06-17 03:38:17.241209: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats: 
Limit:                 23469584548
InUse:                 17155265792
MaxInUse:              19082024704
NumAllocs:                    4030
MaxAllocSize:           3833987072

2020-06-17 03:38:17.241298: W tensorflow/core/common_runtime/bfc_allocator.cc:429] *********_______________******************_***********_______***************************************
2020-06-17 03:38:17.241333: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Requested amount of GPU memory (4404019200 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2020-06-17 03:38:17.241362: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.0/L1_Nightly/build/source/rtSafe/resources.h (164) - OutOfMemory Error in GpuMemory: 0
2020-06-17 03:38:17.241455: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Out of memory error during getBestTactic: (Unnamed Layer* 0) [Shuffle]
2020-06-17 03:38:17.241481: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Internal error: could not find any implementation for node (Unnamed Layer* 0) [Shuffle], try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
2020-06-17 03:38:17.243854: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger ../builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0
2020-06-17 03:38:17.255722: E tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:841] Calibration failed: Internal: Failed to build TensorRT engine
2020-06-17 03:38:17.255955: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Failed to feed calibration data
	 [[{{node TRTEngineOp_31}}]]
	 [[SecondStagePostprocessor/map/while/Switch_1/_316]]
2020-06-17 03:38:17.256282: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Failed to feed calibration data
	 [[{{node TRTEngineOp_31}}]]
Traceback (most recent call last):
  File "numplate_detection.py", line 380, in <module>
    optimize_offline=args.optimize_offline)
  File "numplate_detection.py", line 107, in get_graph_func
    input_fn, calib_data_dir, num_calib_inputs//batch_size))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/compiler/tensorrt/trt_convert.py", line 1004, in convert
    self._converted_func(*map(ops.convert_to_tensor, inp))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal:  Failed to feed calibration data
	 [[node TRTEngineOp_31 (defined at numplate_detection.py:107) ]]
	 [[SecondStagePostprocessor/map/while/Switch_1/_316]]
  (1) Internal:  Failed to feed calibration data
	 [[node TRTEngineOp_31 (defined at numplate_detection.py:107) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_pruned_27865]

Function call stack:
pruned -> pruned

terminate called without an active exception
Aborted (core dumped)

You may have to reduce the max workspace size and try below config to limit GPU memory usage by tensorflow.
You can set the fraction of GPU memory to be allocated when you construct a tf.Session by passing a tf.GPUOptions as part of the optional config argument:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

Thanks!

I don’t find that config in the file. Please see the program here.

Hi,
Please use the link below for reference

Also, to set the max workspace size


Thanks!

Now I understood the two configurations inside the parser, --gpu_mem_cap and --max_workspace_size. --gpu_mem_cap is to cap the gpu memory usage for Tensorflow.

It is implemented inside the code as

def config_gpu_memory(gpu_mem_cap):
  gpus = tf.config.experimental.list_physical_devices('GPU')
  if not gpus:
    return
  print('Found the following GPUs:')
  for gpu in gpus:
    print('  ', gpu)
  for gpu in gpus:
    try:
      if not gpu_mem_cap:
        tf.config.experimental.set_memory_growth(gpu, True)
      else:
        tf.config.experimental.set_virtual_device_configuration(
            gpu,
            [tf.config.experimental.VirtualDeviceConfiguration(
                memory_limit=gpu_mem_cap)])
    except RuntimeError as e:
      print('Can not set GPU memory config', e)

When I set --gpu_mem_cap = 0.3, it is same as you mentioned earlier, I have error as follows.

2020-06-17 08:48:04.947505: I tensorflow/core/common_runtime/bfc_allocator.cc:917] Bin for 256B was 256B, Chunk State: 
2020-06-17 08:48:04.947519: I tensorflow/core/common_runtime/bfc_allocator.cc:955]      Summary of in-use Chunks by size: 
2020-06-17 08:48:04.947533: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 0B
2020-06-17 08:48:04.947546: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 0 memory_limit_: 0 available bytes: 0 curr_region_allocation_bytes_: 0
2020-06-17 08:48:04.947573: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats: 
Limit:                           0
InUse:                           0
MaxInUse:                        0
NumAllocs:                       0
MaxAllocSize:                    0

2020-06-17 08:48:04.947603: W tensorflow/core/common_runtime/bfc_allocator.cc:429] <allocator contains no memory>
2020-06-17 08:48:04.947665: W tensorflow/core/framework/op_kernel.cc:1632] OP_REQUIRES failed at constant_op.cc:79 : Resource exhausted: OOM when allocating tensor of shape [] and type float
2020-06-17 08:48:04.947718: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [] and type float
	 [[{{node dummy_fetch_0}}]]
Traceback (most recent call last):
  File "numplate_detection.py", line 375, in <module>
    optimize_offline=args.optimize_offline)
  File "numplate_detection.py", line 81, in get_graph_func
    graph_func = get_func_from_saved_model(input_saved_model_dir)
  File "numplate_detection.py", line 57, in get_func_from_saved_model
    saved_model_dir, tags=[tag_constants.SERVING])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load.py", line 528, in load
    return load_internal(export_dir, tags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load.py", line 559, in load_internal
    root = load_v1_in_v2.load(export_dir, tags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 254, in load
    return loader.load(tags=tags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 225, in load
    local_init_op, _ = initializer._initialize()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 64, in _initialize
    return self._init_fn(*[path.asset_path for path in self._asset_paths])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor of shape [] and type float
	 [[{{node dummy_fetch_0}}]] [Op:__inference_pruned_3700]

Function call stack:
pruned

root@3c2057756f7c:/workspace/examples/NumPlateDetection#

Request you to share your model and environment/setup details so that we can assist you better.
Thanks!

Hi Thanks.
Please get tensorfow’s saved model from the link.
The file is big so can’t upload to this page.

My gpu is Titan RTX 24GB and Nvidia’s Tensorflow docker 2.0 is used to test the TFTRT.
This object detection program is used.

Did you find any issue on my model?

Any issue with my model?

Hello @edit_or, the team is looking into this.
We will update you soon.
Thanks for your patience.