Very slow 1.8s/image on Faster RCNN ResNet101 and wrongly perceived 1 image as 153216 images !!!

Ubuntu 16.04
Docker tensorflow/tensorflow 1.13.1 and tensorflow/serving:latest-gpu
NVIDIA TensorRT 5.0.2 (Installation Guide :: NVIDIA Deep Learning TensorRT Documentation)
Tensorflow object detection Faster RCNN Resnet101 successfully built w/ 2 classes only
Model are converted into FP32 (also tried FP16 with the same below issue)

with graph.as_default():
    with tf.Session() as sess:
        trt_graph = trt.create_inference_graph(
            input_graph_def=gdef,
            outputs=outputs,
            max_batch_size=1,
            max_workspace_size_bytes=4000000000,
            is_dynamic_op=True,
            #precision_mode='FP16')
            precision_mode='FP32')
            #precision_mode='INT8')
            output_node=tf.import_graph_def(trt_graph, return_elements=outputs)
        #sess.run(output_node)
        tf.saved_model.simple_save(sess,
            rt_output_file_name_32,
            inputs={'input_image': graph.get_tensor_by_name('{}:0'.format(node.name)) for node in graph.as_graph_def().node if node.op=='Placeholder'},
            outputs={t:graph.get_tensor_by_name('import/'+t) for t in outputs}
        )

RUN:

docker kill food_non_food
docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/mnt/hatto/food_non_food,target=/models/food_non_food 
-e MODEL_NAME=food_non_food -t tensorflow/serving:latest-gpu

CLIENT:

image = PIL.Image.open(IMAGE_PATH)
image_np = np.array(image)
payload = {"instances": [image_np.tolist()]}
SERVING_URL = 'http://localhost:8501/v1/models/food_non_food:predict'
start = time.time()
t = requests.post(SERVING_URL, json=payload)
end = time.time()
print ('Took ', end-start)

Consistenly received ERROR/WARNING:

2019-07-10 06:52:26.523782: W external/org_tensorflow/tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:264] Engine buffer is full. buffer limit=1, current entries=1, requested batch=153216
2019-07-10 06:52:26.523827: W external/org_tensorflow/tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:281] Failed to get engine batch, running native segment for import/ClipToWindow/Area/TRTEngineOp_0

This is it. It runs but always took 1.8 seconds/image (size 1024x) which is terrible ! The message above keep popping up that batch_size is 153216 while I submit only ONE SINGLE image !!!

Please help.

Thanks,
Steve