TensorFlow-TensorRT inference time and memory consumption on Nano

Hi,

I am trying to run inference of an image classification model (ResNetV2) with TensorRT optimized graph (FP32&FP16) using TensorFlow on Jetson Nano.

While running the inference script, it waits for minutes before starting the inference and after starting, first iteration takes lot of seconds (loop has only sess.run). After the first iteration inference time comes down to milliseconds. Meanwhile the memory consumption is close to 3.4 GB out of 4 GB.

Is this behaviour expected ? Does TenorRT optimize only inference time not memory usage ?
What are the best practices to reduce memory consumption on Jetson Nano ?

I am using the imagenet pretrained ResNetV2 frozen graph from here https://github.com/tensorflow/models/tree/r1.13.0/research/tensorrt#model-links for TensorRT conversion and using the official imagenet_preprocessing script from here https://github.com/tensorflow/models/blob/r1.13.0/official/resnet/imagenet_preprocessing.py for preprocessing the image.

TensorFlow: 1.13.1
TensorRT: 5.0.6

inference snippet :

def preprocess_image(file_name, output_height=224, output_width=224,
                     num_channels=3):

  image_buffer = tf.read_file(file_name)
  normalized = imagenet_preprocessing.preprocess_image(
      image_buffer=image_buffer,
      bbox=None,
      output_height=output_height,
      output_width=output_width,
      num_channels=num_channels,
      is_training=False)
  
  with tf.Session() as sess:
    result = sess.run([normalized])

  return result[0]

image = preprocess_image(INPUT_IMAGE_PATH)
image = np.expand_dims(image, axis=0)
print(image.shape)

graph = tf.Graph()
with graph.as_default():

    graph_def = tf.GraphDef()

    with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as f:
        serialized_graph = f.read()
        graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(graph_def, name='')

with graph.as_default():

    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = GPU_MEM_FRACTION # 0.5
    with tf.Session(config=config) as sess:

        input_image = tf.get_default_graph().get_tensor_by_name('input_tensor:0')
        softmax_predictions = graph.get_tensor_by_name('softmax_tensor:0')

        # warmup
        for i in range(5):
            start = time.time()
            predictions = sess.run(softmax_predictions,
                                   feed_dict={input_image: image})
            end = time.time()
            print(end - start, " seconds")

            idx = np.argmax(predictions[0])

            print(predictions[0][idx])

        time_buffer = []
        for i in range(100):

            start = time.time()
            predictions = sess.run(softmax_predictions,
                                   feed_dict={input_image: image})
            end = time.time()
            print(end - start, " seconds")

            time_buffer.append(end - start)

            idx = np.argmax(predictions[0])

            print(predictions[0][idx])

        print("Average: ", np.mean(np.array(time_buffer)))

Let me know your thoughts.
Thanks,
Arun.

Hi,

TensorFlow doesn’t apply specific optimization for Jetson and won’t give you the best performance.

It’s recommended to use our pure TensorRT rather than TF-TRT on a Jetson system.
Here is a tutorial and some benchmark results for your reference:
https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification

Thanks.