Not able to convert saved_model to TensorRT format on AGX Xavier

While following the tutorial for conversion of saved_model to TensorRT on

1] Google Colab - conversion happens successfully
2]AGX Xavier - SDK Jetpack 4.5.1 [L4T 32.5.1]

The code snippet which I ran is -

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

input_saved_model_dir = “”
output_saved_model_dir = “”
conversion_params = trt.TrtConversionParams(
precision_mode=trt.TrtPrecisionMode.FP32)
print(“Step2”)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)
print(“Step3”)

converter.convert()
print(“Step4”)

converter.build(input_fn=input_saved_model_dir)
print(“Step5”)

converter.save(output_saved_model_dir)
print(“Step6”)

#Tensorflow version - ‘2.4.0’
#TensorRt version - ‘7.3.1’

#Error I faced is logged below-

2021-03-24 17:44:27.137748: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:27.137904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-03-24 17:44:27.138004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-24 17:44:27.138042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293] 0
2021-03-24 17:44:27.138076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0: N
2021-03-24 17:44:27.138272: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:27.138492: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:27.138622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23500 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-03-24 17:44:27.139028: W tensorflow/core/platform/profile_utils/cpu_utils.cc:116] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-03-24 17:44:28.435188: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] Optimization results for grappler item: graph_to_optimize
function_optimizer: Graph size after: 2329 nodes (1835), 3371 edges (2877), time = 189.284ms.
function_optimizer: function_optimizer did nothing. time = 2.898ms.

2021-03-24 17:44:53.904413: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:53.950982: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-03-24 17:44:53.968462: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-03-24 17:44:54.003785: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-24 17:44:54.004545: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:54.023554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.18GiB deviceMemoryBandwidth: 82.08GiB/s
2021-03-24 17:44:54.035675: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-03-24 17:44:54.150092: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-03-24 17:44:54.210758: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-03-24 17:44:54.255626: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-24 17:44:54.260365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-24 17:44:54.271984: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-03-24 17:44:54.282752: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-03-24 17:44:54.283181: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-24 17:44:54.283454: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:54.283744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:54.283958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-03-24 17:44:54.298205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-24 17:44:54.298328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293] 0
2021-03-24 17:44:54.298361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0: N
2021-03-24 17:44:54.299098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:54.299392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-03-24 17:44:54.299603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23500 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
Killed

Has someone faced this issue before too?
Please help!

Hi,

The error usually indicates an out of memory issue.
Could you double-check the memory status with tegrastats first?

$ sudo tegrastats

Thanks.

Out of memory for CPU or GPU?

@AastaLLL then how can I convert the saved_model to TensorRT format without making it out of memory?
Same code snipper works for my personal laptop as well as Google Colab environment.

Thanks in advance!

Hi,

Which Xavier do you use? 8Gb, 16Gb or 32Gb?
If you can convert it into TensorRT on a desktop environment.
Could you monitor the used memory at the same time?

In general, we recommend to use pure TensorRT API instead.
For TF-TRT, more memory is required to handle the TensorFlow <-> TensorRT mechanism.

Thanks.

@AastaLLL , the Xavier used is of 32GB and in desktop environment as well it consumes 95-99% memory as viewed on htop.

@AastaLLL also, What do you mean by using “pure TensorRT API” ?

Hi,

Have you run the model on desktop before?
If yes, please measure the memory usage first.
Since it must be a very complicated model to occupy ~99% 32Gb memory.

Pure TensorRT indicates to run a model with TensorRT library directly without TF interface.
You can find below to know how to deploy a TF model (in ONNX format):

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#import_onnx_c

Thanks.