"Engine buffer is full" with Tensorflow-TensorRT Integration

Linux distro and version Ubuntu 16.04
GPU type 1080ti/11gb
nvidia driver version 396.54
CUDA version 9
CUDNN version 7.1
Python version [if using python] 2.7
Tensorflow version r1.11
TensorRT version 4.0.1.6

Steps to Reproduce:

  1. Clone tensorflow repo, checkout r1.11 branch
  2. Build from source as directed from documentation by disabling everything except cuda, tensorrt(4.0.1.6)
  3. Create a tensorrt .pb file using following:
trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode="FP32",  # TRT Engine precision "FP32","FP16" or "INT8"
    minimum_segment_size=2  # minimum number of nodes in an engine
    )
    f = open("trt.pb", 'w')
    f.write(trt_graph.SerializeToString())
    f.close()
  1. Use Tensorflow C API to run inference on the protobuf file

Issue: Inference time increases by over 3x(200ms vs 60ms). I get the following debug statements when I run, I am assuming the slowdown is because of this, any help would be appreciated.

Related Issues: https://devtalk.nvidia.com/default/topic/1038750/-quot-engine-buffer-is-full-quot-/

^C2018-09-18 16:31:33.304540: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=22710
2018-09-18 16:31:33.304630: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for anchor_predictor/my_trt_op_6
2018-09-18 16:31:33.304606: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=22710
2018-09-18 16:31:33.310690: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for anchor_predictor/my_trt_op_5
2018-09-18 16:31:33.456561: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=22710
2018-09-18 16:31:33.456829: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_4
2018-09-18 16:31:33.465307: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.465572: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_0
2018-09-18 16:31:33.467022: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.467079: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_7
2018-09-18 16:31:33.468706: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.468724: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.468752: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_1
2018-09-18 16:31:33.468826: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.468852: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for nms/my_trt_op_8
2018-09-18 16:31:33.469381: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.470291: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_13
2018-09-18 16:31:33.471093: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.471351: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_14
2018-09-18 16:31:33.471162: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.471683: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_16
2018-09-18 16:31:33.471142: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.471937: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_15
2018-09-18 16:31:33.471166: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.472094: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_17
2018-09-18 16:31:33.472485: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.472530: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_10
2018-09-18 16:31:33.473083: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_2
2018-09-18 16:31:33.474525: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.474687: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_19
2018-09-18 16:31:33.475114: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.475321: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_18
2018-09-18 16:31:33.475148: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.475741: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_12
2018-09-18 16:31:33.474614: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.476118: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_11
2018-09-18 16:31:33.476429: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.476773: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_9
2018-09-18 16:31:33.476994: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.477219: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_21
2018-09-18 16:31:33.474801: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.477605: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for regression/my_trt_op_20
2018-09-18 16:31:33.477052: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=300
2018-09-18 16:31:33.477848: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_3
Inference Time: 222.028

@NVES I was able to reduce the gap in inference times by tweaking the batch_size to the maximum value I see from log(22710), but even in that case, the inference time is ~100ms which is about 40ms more than the baseline.

You would need to optimize the graph for the batch size that you are going to use in the benchmark, or a batch size that’s larger but not much larger. TensorRT optimizes the graph only for that batch size. You cannot run inference with batch sizes smaller than that, but you can run with batch sizes smaller than that at the cost of losing some perf.

https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#best-practices
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/#batching

I fixed this problem by setting the batch size to num_images x 300, for instance if you’re going to process 8 images at a time, set the batch size to 2400.

num_images = 8
trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=num_images * 300,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',  # TRT Engine precision "FP32","FP16" or "INT8"
    minimum_segment_size=50  # minimum number of nodes in an engine
    )

On V100 the performance gain for FP16 was about 20%, I’m going to try INT8 next.