Device memory is insufficient to use tactic error when converting a model in SavedModel format to tensorrt model. Jetson Nano

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.0.1
GPU Type: 128 core Maxwell GPU
Nvidia Driver Version:
CUDA Version: 10.1
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable): 2.5
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Model to be converted to TensorRT:

deco = Sequential([
    Conv2D(64,(3,3),activation='relu',padding='same',input_shape=(100,100,64),name='c2'),
    Conv2D(32,(3,3),activation='relu',padding='same'),
    Conv2D(16,(3,3),activation='relu',padding='same',name='c3'),
    Conv2D(1,(3,3),activation='relu',padding='same',name='c4'),
])

Code to convert to TensorRT:
import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices(‘GPU’)
tf.config.experimental.set_memory_growth(gpu_devices[0], True)
tf.config.experimental.set_virtual_device_configuration(
gpu_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(
memory_limit=1800)]) ## Crucial value, set lower than available GPU memory (note that Jetson shares GPU memory with CPU)
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1500000000))
conversion_params = conversion_params._replace(precision_mode=“FP16”)
encoder_model = trt.TrtGraphConverterV2(
input_saved_model_dir=’/home/rohan/Desktop/original_models/decoder’,
conversion_params=conversion_params)
def input_fn():
# Substitute with your input size
Inp1 = np.random.normal(size=(1, 100, 100, 64)).astype(np.float32)
yield (Inp1, )
encoder_model.convert()
encoder_model.build(input_fn=input_fn)
encoder_model.save(output_saved_model_dir=’/home/rohan/Desktop/converted_models/decoder’)

Steps To Reproduce

Output:
2021-08-26 20:05:37.528444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-26 20:05:46.600192: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-26 20:05:46.678156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:46.678328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:05:46.678428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-26 20:05:46.867047: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-08-26 20:05:46.867313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2021-08-26 20:05:46.941017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-26 20:05:47.048284: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-26 20:05:47.169099: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-08-26 20:05:47.251010: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2021-08-26 20:05:47.254379: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-26 20:05:47.254659: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:47.254903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:47.255031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:05:47.973614: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2021-08-26 20:05:48.810263: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:48.810443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:05:48.810677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:48.810877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:48.810956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:05:48.811082: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-26 20:05:53.353701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-26 20:05:53.353809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-26 20:05:53.353853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-26 20:05:53.354184: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:53.354487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:53.354712: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:05:53.354853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1800 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-08-26 20:06:12.966546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.072032: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-08-26 20:06:13.285829: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-08-26 20:06:13.800398: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.927121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:06:13.927662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.927935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:13.971009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:06:14.084061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-26 20:06:14.084206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-26 20:06:14.084282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-26 20:06:14.185676: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:14.444806: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:14.450256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1800 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-08-26 20:06:14.898454: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 19200000 Hz
2021-08-26 20:06:16.930787: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1171] Optimization results for grappler item: graph_to_optimize
function_optimizer: Graph size after: 42 nodes (31), 57 edges (46), time = 302.428ms.
function_optimizer: function_optimizer did nothing. time = 0.3ms.

2021-08-26 20:06:18.423304: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.430558: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-08-26 20:06:18.445344: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-08-26 20:06:18.489187: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.489404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-08-26 20:06:18.489628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.489817: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.489900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-26 20:06:18.517767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-26 20:06:18.517894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-26 20:06:18.517956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-26 20:06:18.518379: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.518895: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-26 20:06:18.519155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1800 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-08-26 20:06:19.126605: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:790] There are 5 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation).
2021-08-26 20:06:19.154807: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:759] Number of TensorRT candidate segments: 1
2021-08-26 20:06:19.158407: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:853] Replaced segment 0 consisting of 27 nodes by TRTEngineOp_0_0.
2021-08-26 20:06:19.334990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1171] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 26 nodes (-16), 41 edges (-16), time = 91.128ms.
layout: Graph size after: 30 nodes (4), 45 edges (4), time = 162.196ms.
constant_folding: Graph size after: 30 nodes (0), 45 edges (0), time = 28.984ms.
TensorRTOptimizer: Graph size after: 4 nodes (-26), 3 edges (-42), time = 93.251ms.
constant_folding: Graph size after: 4 nodes (0), 3 edges (0), time = 1.673ms.
Optimization results for grappler item: TRTEngineOp_0_0_native_segment
constant_folding: Graph size after: 29 nodes (0), 36 edges (0), time = 18.628ms.
layout: Graph size after: 29 nodes (0), 36 edges (0), time = 4.067ms.
constant_folding: Graph size after: 29 nodes (0), 36 edges (0), time = 3.565ms.
TensorRTOptimizer: Graph size after: 29 nodes (0), 36 edges (0), time = 0.388ms.
constant_folding: Graph size after: 29 nodes (0), 36 edges (0), time = 3.414ms.

2021-08-26 20:06:23.723168: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-26 20:06:24.605054: I tensorflow/compiler/tf2tensorrt/common/utils.cc:58] Linked TensorRT version: 8.0.1
2021-08-26 20:06:25.057608: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer.so.8
2021-08-26 20:06:25.103550: I tensorflow/compiler/tf2tensorrt/common/utils.cc:60] Loaded TensorRT version: 8.0.1
2021-08-26 20:06:25.871391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libnvinfer_plugin.so.8
2021-08-26 20:06:41.975207: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger It is suggested to disable layer timing cache while using AlgorithmSelector. Please refer to the developer guide in Developer Guide :: NVIDIA Deep Learning TensorRT Documentation.
2021-08-26 20:07:57.835013: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Detected invalid timing cache, setup a local cache instead
2021-08-26 20:08:28.785340: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 533MB Available: 169MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:31.600899: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 533 detected for tactic 4.
2021-08-26 20:08:38.405814: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 530MB Available: 192MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:38.731835: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 530 detected for tactic 4.
2021-08-26 20:08:39.665473: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 271MB Available: 201MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:39.665709: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 271 detected for tactic 4.
2021-08-26 20:08:40.284923: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Tactic Device request: 270MB Available: 204MB. Device memory is insufficient to use tactic.
2021-08-26 20:08:40.285156: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Skipping tactic 3 due to oom error on requested size of 270 detected for tactic 4.

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

I shall move this question over to Jetson Nano, the team over there have more experience using Nano

Hi,

This is a out of memory error.

Please note that Nano only has 4GiB memory and need to share with CPU and GPU.
So it’s limited to deploy a complicated model.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.