Converting tensorflow pb model file to tensorrt GPU memory error

Hi,

I am trying to run facial recognition in real time on a Jetson nano and have been struggling to get anything running at a frame rate fast enough. I am now trying to use David Sandbergs facenet pretrained tensorflow model:

Github: https://github.com/davidsandberg/facenet
Model: https://drive.google.com/uc?id=1R77HmFADxe87GmoLwzfgMu_HY0IhcyBz&export=download

I hope to eventually load this model using a tensorrt optimized version of the facenet repo:
https://github.com/JerryJiaGit/facenet_trt

However, I can not even load the model onto the Jetson nano as I receive an error saying the GPU memory is full. Here is the code I try to run:

from tensorflow.python.platform import gfile
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

graph_filename ='20180402-114759.pb'

f = gfile.FastGFile(graph_filename, 'rb')

# define graph def object
frozen_graph_def = tf.GraphDef()

# store frozen graph from pb file
frozen_graph_def.ParseFromString(f.read())

# Parameters:
output_node_name = "embeddings"
workspace_size = 1 << 10
precision = "FP16"
batch_size = 1

trt_graph = trt.create_inference_graph(
                frozen_graph_def,
                [output_node_name],
                max_batch_size=batch_size,
                max_workspace_size_bytes=workspace_size,
                precision_mode=precision)

# write modified graph def to disk
graph_filename_converted = '20180402-114759_tensorrt.pb'

with gfile.FastGFile(graph_filename_converted, 'wb') as s:
	s.write(trt_graph.SerializeToString())

When I run this (with 8gb CPU swap) I get the following error:

2019-10-07 10:30:46.291470: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-07 10:31:12.689488: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
WARNING:tensorflow:From testing_pb.py:7: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From testing_pb.py:10: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

WARNING:tensorflow:TensorRT mismatch. Compiled against version 5.1.6, but loaded 5.0.6. Things may not work
2019-10-07 10:31:41.098282: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-10-07 10:31:41.164094: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:41.164245: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2019-10-07 10:31:41.164467: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-10-07 10:31:41.183747: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-10-07 10:31:41.184383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3b53a5d0 executing computations on platform Host. Devices:
2019-10-07 10:31:41.184447: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-07 10:31:41.288047: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:41.288364: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d1321d0 executing computations on platform CUDA. Devices:
2019-10-07 10:31:41.288421: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-10-07 10:31:41.289070: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:41.289191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2019-10-07 10:31:41.289266: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-07 10:31:41.289418: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-07 10:31:41.289527: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-07 10:31:41.289623: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-07 10:31:41.412705: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-07 10:31:41.483834: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-07 10:31:41.484122: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-07 10:31:41.484487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:41.484821: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:41.484906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-07 10:31:48.577299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-07 10:31:48.577378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-10-07 10:31:48.577413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-10-07 10:31:48.577950: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:48.578323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-10-07 10:31:48.578497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 63 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2019-10-07 10:32:17.468008: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 3158 ops of 15 different types in the graph that are not converted to TensorRT: Mul, RandomUniform, Reshape, Pack, StridedSlice, DataFormatVecPermute, Shape, Merge, Placeholder, NoOp, Switch, FIFOQueueV2, FusedBatchNorm, QueueDequeueUpToV2, Identity, (For more information see https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#supported-ops).
2019-10-07 10:32:19.668138: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:735] Number of TensorRT candidate segments: 310
2019-10-07 10:33:02.344127: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-07 10:33:08.573370: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-07 10:35:19.657635: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
python3: serializationUtils.cpp:815: nvinferFlatBuffers::ifb::UnaryOperation nvinfer1::rt::serializeUnaryOp(nvinfer1::UnaryOperation): Assertion `0' failed.
Aborted (core dumped)

Please advise on how to best proceed. Are there any one-shot facial recognition models that run in real time on the nano ? I need around 12 - 15 FPS including facial detection.

System Info:

Jetson nano
TensorRT: 5.0.6.3-1+cuda10.0
CUDA: 10.0
Jetpack: 4.2

Can someone take a look at this ?