TensorFlow GPU device created with only 1591MB memory (or is it 3.87GiB?), despite there being over 20GB available

jwalkland · June 17, 2021, 9:36am

I’m trying to run the attached script to convert a TensorFlow SavedModel using TF-TRT, however my device runs out of memory during calibration (see the attached output printed to the console).

I’ve got the 4GB variety of the Jetson Nano, and I’ve added an extra ~20GB of swap space as well. As far as I can tell, it should not be running out of memory at all! I’ve tried increasing the ‘max_workspace_size_bytes’ parameter, however all that happens is that the max workspace size is requested and fails to allocate. The logs suggest in some places that 1591MB memory is available, and in others that 3.87GiB is available - however with the additional 20GB of swap space available these figures should abe a lot higher!

Any ideas how I can resolve this?

I should say - I’m working on the assumption that the problem with the conversion is GPU memory. If I run this same script but force TensorFlow to use the CPU then it doesn’t fail - I don’t get this failure to feed input error.

tensorrt.py (3.3 KB)


2021-06-17 09:33:09.122823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 09:33:18.791085: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-17 09:33:18.895112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-17 09:33:18.937051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:18.937206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2021-06-17 09:33:18.937285: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 09:33:19.107577: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-17 09:33:19.107897: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-06-17 09:33:19.184059: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-17 09:33:19.295113: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-17 09:33:19.434249: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-17 09:33:19.790495: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-17 09:33:19.887103: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-17 09:33:19.887429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:19.887723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:19.887884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1889] Adding visible gpu devices: 0
2021-06-17 09:33:19.890905: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-17 09:33:19.891237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:19.891365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2021-06-17 09:33:19.891447: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 09:33:19.891538: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-17 09:33:19.891608: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-06-17 09:33:19.891673: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-17 09:33:19.891732: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-17 09:33:19.891933: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-17 09:33:19.892037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-17 09:33:19.892100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-17 09:33:19.892263: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:19.892448: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:19.892513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1889] Adding visible gpu devices: 0
2021-06-17 09:33:19.892641: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 09:33:27.089718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-17 09:33:27.089802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293]      0
2021-06-17 09:33:27.089838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0:   N
2021-06-17 09:33:27.090199: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:27.090608: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:27.090859: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:33:27.091007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1591 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
1 Physical GPUs, 1 Logical GPUs
2021-06-17 09:33:27.575412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer.so.7
2021-06-17 09:36:00.835364: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:00.835512: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-06-17 09:36:00.835854: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-06-17 09:36:00.837119: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-17 09:36:00.837657: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:00.837799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2021-06-17 09:36:00.837893: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 09:36:00.838049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-17 09:36:00.838156: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-06-17 09:36:00.838252: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-17 09:36:00.838345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-17 09:36:00.838445: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-17 09:36:00.838541: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-17 09:36:00.838628: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-17 09:36:00.838868: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:00.839096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:00.839173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1889] Adding visible gpu devices: 0
2021-06-17 09:36:04.568077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-17 09:36:04.568165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293]      0
2021-06-17 09:36:04.568214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0:   N
2021-06-17 09:36:04.568589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:04.568879: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:04.569001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1591 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-06-17 09:36:04.569614: W tensorflow/core/platform/profile_utils/cpu_utils.cc:116] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-06-17 09:36:08.533637: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 3427 nodes (3069), 7255 edges (6890), time = 576.172ms.
  function_optimizer: function_optimizer did nothing. time = 9.67ms.

2021-06-17 09:36:47.189183: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:47.189385: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-06-17 09:36:47.189682: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-06-17 09:36:47.190252: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-17 09:36:47.190615: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:47.190890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2021-06-17 09:36:47.191117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-17 09:36:47.191226: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-17 09:36:47.191325: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-06-17 09:36:47.191412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-17 09:36:47.191503: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-17 09:36:47.191587: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-17 09:36:47.191674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-17 09:36:47.191820: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-17 09:36:47.192261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:47.192538: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:47.192618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1889] Adding visible gpu devices: 0
2021-06-17 09:36:47.192718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-17 09:36:47.192756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293]      0
2021-06-17 09:36:47.192811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0:   N
2021-06-17 09:36:47.193315: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:47.193623: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] ARM64 does not support NUMA - returning NUMA node zero
2021-06-17 09:36:47.193901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1591 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-06-17 09:36:59.979422: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:790] There are 245 ops of 37 different types in the graph that are not converted to TensorRT: Sub, Const, AddV2, Reshape, Assert, NonMaxSuppressionV5, Squeeze, ResizeBilinear, Placeholder, Transpose, Pad, Mul, Slice, Cast, TopKV2, Identity, ExpandDims, NoOp, Pack, Split, StridedSlice, Shape, Minimum, Fill, Greater, GatherV2, Size, Unpack, ConcatV2, Exp, Equal, Select, Where, Less, Range, GreaterEqual, Sum, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2021-06-17 09:37:02.311509: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:757] Number of TensorRT candidate segments: 5
2021-06-17 09:37:08.664231: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:851] Replaced segment 0 consisting of 4 nodes by StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Area/TRTEngineOp_0_0.
2021-06-17 09:37:08.664732: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:851] Replaced segment 1 consisting of 4 nodes by StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Area/TRTEngineOp_0_1.
2021-06-17 09:37:08.665044: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:851] Replaced segment 2 consisting of 14 nodes by StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/TRTEngineOp_0_2.
2021-06-17 09:37:08.665474: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:851] Replaced segment 3 consisting of 710 nodes by StatefulPartitionedCall/TRTEngineOp_0_3.
2021-06-17 09:37:08.672646: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:851] Replaced segment 4 consisting of 4 nodes by StatefulPartitionedCall/Preprocessor/TRTEngineOp_0_4.
2021-06-17 09:37:11.349979: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] Optimization results for grappler item: tf_graph
  constant_folding: Graph size after: 1223 nodes (-1981), 4879 edges (-2208), time = 7209.69824ms.
  layout: Graph size after: 1257 nodes (34), 4913 edges (34), time = 747.827ms.
  constant_folding: Graph size after: 1257 nodes (0), 4913 edges (0), time = 305.519ms.
  TensorRTOptimizer: Graph size after: 526 nodes (-731), 734 edges (-4179), time = 9961.7832ms.
  constant_folding: Graph size after: 525 nodes (-1), 734 edges (0), time = 404.203ms.
Optimization results for grappler item: StatefulPartitionedCall/Preprocessor/TRTEngineOp_0_4_native_segment
  constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.033ms.
  layout: Graph size after: 6 nodes (0), 5 edges (0), time = 61.231ms.
  constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 0.857ms.
  TensorRTOptimizer: Graph size after: 6 nodes (0), 5 edges (0), time = 0.047ms.
  constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 0.842ms.
Optimization results for grappler item: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Area/TRTEngineOp_0_0_native_segment
  constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.955ms.
  layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.923ms.
  constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.89ms.
  TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.057ms.
  constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.857ms.
Optimization results for grappler item: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Area/TRTEngineOp_0_1_native_segment
  constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.981ms.
  layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.86ms.
  constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.891ms.
  TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.069ms.
  constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.889ms.
Optimization results for grappler item: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/TRTEngineOp_0_2_native_segment
  constant_folding: Graph size after: 19 nodes (0), 22 edges (0), time = 1.692ms.
  layout: Graph size after: 19 nodes (0), 22 edges (0), time = 1.668ms.
  constant_folding: Graph size after: 19 nodes (0), 22 edges (0), time = 1.633ms.
  TensorRTOptimizer: Graph size after: 19 nodes (0), 22 edges (0), time = 0.106ms.
  constant_folding: Graph size after: 19 nodes (0), 22 edges (0), time = 1.603ms.
Optimization results for grappler item: StatefulPartitionedCall/TRTEngineOp_0_3_native_segment
  constant_folding: Graph size after: 714 nodes (0), 727 edges (0), time = 286.253ms.
  layout: Graph size after: 714 nodes (0), 727 edges (0), time = 441.269ms.
  constant_folding: Graph size after: 714 nodes (0), 727 edges (0), time = 271.653ms.
  TensorRTOptimizer: Graph size after: 714 nodes (0), 727 edges (0), time = 47.511ms.
  constant_folding: Graph size after: 714 nodes (0), 727 edges (0), time = 272.264ms.

Grabbing file 5858c03e-23d2-11e8-a6a3-ec086b02610b.jpg
2021-06-17 10:09:59.934488: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-17 10:10:04.052857: I tensorflow/compiler/tf2tensorrt/common/utils.cc:58] Linked TensorRT version: 7.1.3
2021-06-17 10:10:04.387159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer.so.7
2021-06-17 10:10:04.437102: I tensorflow/compiler/tf2tensorrt/common/utils.cc:60] Loaded TensorRT version: 7.1.3
2021-06-17 10:10:04.595555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer_plugin.so.7
2021-06-17 10:10:05.030655: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger Int8 support requested on hardware without native Int8 support, performance will be negatively affected.
2021-06-17 10:10:11.045958: W tensorflow/core/common_runtime/bfc_allocator.cc:248] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-06-17 10:10:11.120619: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Requested amount of GPU memory (4294967296 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2021-06-17 10:10:11.157458: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (181) - OutOfMemory Error in GpuMemory: 0
2021-06-17 10:10:12.989762: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (181) - OutOfMemory Error in GpuMemory: 0
2021-06-17 10:10:13.196058: E tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:991] Calibration failed: Internal: Failed to build TensorRT engine
Traceback (most recent call last):
  File "tensorrt.py", line 100, in <module>
    converter.convert(calibration_input_fn=my_input_fn)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1124, in convert
    self._converted_func(*map(ops.convert_to_tensor, inp))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1669, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/wrap_function.py", line 247, in _call_impl
    args, kwargs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1687, in _call_impl
    return self._call_with_flat_signature(args, kwargs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1736, in _call_with_flat_signature
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal:  Failed to feed calibration data
         [[node StatefulPartitionedCall/Preprocessor/TRTEngineOp_0_4 (defined at tensorrt.py:100) ]]
         [[StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack/_32]]
  (1) Internal:  Failed to feed calibration data
         [[node StatefulPartitionedCall/Preprocessor/TRTEngineOp_0_4 (defined at tensorrt.py:100) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_pruned_45577]

Function call stack:
pruned -> pruned

AastaLLL · June 18, 2021, 3:05am

Hi,

Swap memory is not a GPU accessible memory.
For Nano, GPU memory is limited to 4GiB due to the hardware limitation.

It’s known that TF-TRT consumes much more memory than pure TensorRT on Jetson.
Would you mind to convert your model into TensorRT for better optimization?

https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleUffSSD

Thanks.

system · June 25, 2021, 7:38am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tf-trt Jetson Nano - process killed - conversion running out of memory? Jetson Nano tensorrt , tensorflow	5	1339	October 18, 2021
Memory Issues and Conversion issues with TF-TRT on Nano Jetson Nano tensorrt	8	1579	October 18, 2021
Device memory is insufficient to use tactic error when converting a model in SavedModel format to tensorrt model. Jetson Nano Jetson Nano tensorrt	3	2346	January 5, 2022
ResourceExhaustedError: Running TF-TRT integration on Jetson AGX Jetson AGX Xavier	10	1179	October 18, 2021
Error in TFTRT TensorRT	9	3421	June 22, 2020
TensorRT optimization random outcome Jetson Nano	5	835	October 15, 2021
Run a UNet segmentation model on Jetson Nano / Convert pb to TensorRT Jetson Nano tensorrt	3	1821	October 18, 2021
Allocator (GPU_0_bfc) ran out of memory trying to allocate 325.33MiB with freed_by_count=0 Jetson Nano tensorflow , tf-trt , gpu	2	7462	October 15, 2021
Converting tensorflow pb model file to tensorrt GPU memory error TensorRT	2	1167	October 17, 2019
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	762	April 22, 2021

TensorFlow GPU device created with only 1591MB memory (or is it 3.87GiB?), despite there being over 20GB available

Related topics