Allocator (GPU_0_bfc) ran out of memory trying to allocate 325.33MiB with freed_by_count=0

tts101 · January 27, 2021, 2:21am

Hello ! First of all wish you all a good health amidst this pandemic !

So I came across this problem when I try to run a TF-TRT optimized model in tensorflow 2.3. Model architecture is mobilenet-v2-fpnlite.
The model runs without any errors on a video. But very slowly. FPS count is around 5 or 6.

2021-01-27 07:13:58.631430: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 07:15:47.595334: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-27 07:15:47.642790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:47.642967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1742] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-01-27 07:15:47.643067: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 07:15:47.849545: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-01-27 07:15:47.949223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-27 07:15:48.078645: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-27 07:15:48.258984: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-27 07:15:48.352716: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-01-27 07:15:48.357316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-27 07:15:48.357716: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:48.358500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:48.358734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1884] Adding visible gpu devices: 0
2021-01-27 07:15:48.399070: W tensorflow/core/platform/profile_utils/cpu_utils.cc:108] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-01-27 07:15:48.400503: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27dca4b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-27 07:15:48.400753: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-27 07:15:48.512299: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:48.512608: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27db5bc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-27 07:15:48.512668: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2021-01-27 07:15:48.513217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:48.513341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1742] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2021-01-27 07:15:48.513437: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 07:15:48.513629: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-01-27 07:15:48.513741: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-27 07:15:48.513830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-27 07:15:48.513915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-27 07:15:48.513999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-01-27 07:15:48.514083: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-27 07:15:48.514395: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:48.514645: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:48.514716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1884] Adding visible gpu devices: 0
2021-01-27 07:15:48.514816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 07:15:53.875925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1283] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-27 07:15:53.876114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1289]      0 
2021-01-27 07:15:53.876158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1302] 0:   N 
2021-01-27 07:15:53.884393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:53.884810: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1046] ARM64 does not support NUMA - returning NUMA node zero
2021-01-27 07:15:53.884983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1428] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 225 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2021-01-27 07:19:55.611252: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-27 07:20:11.239990: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-01-27 07:20:26.029644: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 290.13MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:27.060471: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 798.07MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:27.342806: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 320.70MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:27.432250: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 457.81MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:27.521906: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 473.04MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:27.931603: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 215.05MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:28.007276: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 220.21MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:28.349143: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 220.62MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:28.422506: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 223.28MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-01-27 07:20:28.500687: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 325.33MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Gtk-Message: 07:20:35.885: Failed to load module "canberra-gtk-module"

Since the amount of memory that it requires is a small(325MB), is there a way to allocate this memory to tensorflow engine ? How can I use the whole 4GB memory?

P.S.:
I know that TF-TRT lags in performance when compared to TensorRT. But due to unavailability of an operator in the current version of TensorRT(7.1.3) I had no choice. (I can retrain the model with PyTorch but unfortunately this is not an option right now.)

AastaLLL · January 27, 2021, 7:22am

Hi,

Please noted that the log indicates TF try to allocate ‘325’ Mib for a particular layer.
The amount is for ONE operation rather than the whole model.
It’s possible that the memory is fully occupied by the previous layers and leads to this error.

So, could you monitor the system status with tegrastats at the same time for memory first?

$ tegrastats

Thanks.

Topic		Replies	Views
TensorFlow GPU device created with only 1591MB memory (or is it 3.87GiB?), despite there being over 20GB available Jetson Nano tensorflow , tf-trt	2	2701	June 25, 2021
Slowly inference on Xavier NX and OOM fault with TensorFlow 2 Jetson Xavier NX jetson-inference	3	1358	October 18, 2021
Ran out of memory on GPU Frameworks tensorflow	2	6478	July 2, 2021
Device memory is insufficient to use tactic error when converting a model in SavedModel format to tensorrt model. Jetson Nano Jetson Nano tensorrt	3	2316	January 5, 2022
Tf-trt Jetson Nano - process killed - conversion running out of memory? Jetson Nano tensorrt , tensorflow	5	1314	October 18, 2021
Out of memory trying to run WSL2 resnet deep learning code example CUDA on Windows Subsystem for Linux	5	2478	January 21, 2021
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	746	April 22, 2021
CUB segmented reduce errortoo many resources requested for launch Jetson TX2	2	1569	October 18, 2021
Power error while using TensorFlow Jetson Nano tensorflow , power , jetson-inference	2	1463	August 29, 2021
TensorFlow GPU not working in Nano Jetson Nano	3	3075	October 18, 2021

Allocator (GPU_0_bfc) ran out of memory trying to allocate 325.33MiB with freed_by_count=0

Related topics