Run a UNet segmentation model on Jetson Nano / Convert pb to TensorRT

Hello,

I trained a U-Net segmentation model with Keras (using TF backend).
I am trying to convert its frozen graph (.pb) to TensorRT format on the Jetson Nano but the process is killed (as seen below). I’ve seen on other posts that it could be related to an « out of memory » problem.
To be known, I already have an SSD MobileNet V2 model running on the Jetson Nano.
If I stop the systemctl, I can make inference with the U-Net model without converting it to TensorRT (just using the frozen graph model loaded with Tensorflow). As this way doesn’t work when I start the systemctl (so when the other neural network is running), I try to convert my U-Net segmentation model to TensorRT to get an optimized version of it (which failed because of a killed process), but it may not be the right way to do this.

Is it possible to run two neural networks on a Jetson Nano ? Is there any other way to do this ?

Any help would be greatly appreciated.

Thanks a lot

For information, here is the way I try to convert the frozen graph to TensorRT :

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph_gd, # Pass the parsed graph def here
    outputs=['conv2d_24/Sigmoid'],
    max_batch_size=1,
    max_workspace_size_bytes=1 << 32, # I have tried 25 and 32 here
    precision_mode='FP16'
)

And here is when the process is killed (conversion of the U-Net frozen graph to TensorRT) :

2020-10-05 16:00:58.200269: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.

2020-10-05 16:01:11.976893: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7

2020-10-05 16:01:11.994472: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7

WARNING:tensorflow:

The TensorFlow contrib module will not be included in TensorFlow 2.0.

For more information, please see:

* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

* https://github.com/tensorflow/addons

* https://github.com/tensorflow/io (for I/O related ops)

If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From convert_pb_to_tensorrt.py:14: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

2020-10-05 16:01:13.678101: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7

2020-10-05 16:01:15.506432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1

2020-10-05 16:01:15.512224: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:15.512359: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0

2020-10-05 16:01:15.512638: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session

2020-10-05 16:01:15.532712: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency

2020-10-05 16:01:15.533264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x328fd900 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2020-10-05 16:01:15.533318: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version

2020-10-05 16:01:15.632451: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:15.632757: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x30d0edb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

2020-10-05 16:01:15.632808: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3

2020-10-05 16:01:15.633163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:15.633276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 

name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216

pciBusID: 0000:00:00.0

2020-10-05 16:01:15.633348: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

2020-10-05 16:01:15.633500: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10

2020-10-05 16:01:15.716786: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10

2020-10-05 16:01:15.903326: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10

2020-10-05 16:01:16.060655: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10

2020-10-05 16:01:16.141950: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10

2020-10-05 16:01:16.142219: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8

2020-10-05 16:01:16.142553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:16.142878: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:16.142991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0

2020-10-05 16:01:16.143133: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

2020-10-05 16:01:27.700226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:

2020-10-05 16:01:27.700377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0 

2020-10-05 16:01:27.700417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N 

2020-10-05 16:01:27.713559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:27.713897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero

2020-10-05 16:01:27.714101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 200 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)

Killed

Hi,

Please noticed that the TensorRT optimizer embedded in TensorFlow frameworks consumes lots of memory.
To allow operation fallback, it will create an engine for TensorFlow and the other for TensorRT.
As a result, it is easy to reach Nano memory limitation, which is 4GB.

To solve this, would you mind to allocate some swap memory to see if helps?
For more detail, please check this comment:

Thanks.

1 Like

Thank you for your answer, it works better with swap memory.