ResourceExhaustedError: OOM when allocating tensor with shape[128,8,21]....

Question: I am not familiar with GPU computing and CUDA, was wondering if anyone know how I can resolve this issue / error? Do I require any special code for GPU computing other then using my imports?

I was on Epoch 1 / 100 and 2054 / 20736 iterations when it crashed with this message.

OS: Windows 10
CUDA v10
Tensorflow-gpu 2.0.0
Keras 2.2.4

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.layers import LSTM, SimpleRNN, GRU
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import model_from_json

Error: ResourceExhaustedError: OOM when allocating tensor with shape[128,8,21] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Minimum]


My code in relation to the rest of the traceback:
history = model.fit_generator(generator_train,

My model is running now (I decreased the batch size from 128 to 64).

Currently, task manager is reading:
1% utilization
GPU Memory: 5.4 / 22.0 GB
Shared GPU Memory: 0.1 / 16.0 GB
Dedicated GPU memory: 5.3 / 6.0 GB

NVIDIA-SMI reads that no other Processes are using the GPU and 4% utilization.

Low GPU utilization often indicates that your model is bottle-necked by IO and/or CPU preprocessing. This can happen if you are preprocessing data in numpy, for example.

Another possibility is that the GPU is running many small operations. Each GPU kernel launch takes longer than calling a CPU function. For compute-heavy kernels the GPUs higher throughput more than compensates for this initial penalty, but for trivially small kernels it is sometimes better to use a tf.device scope to assign part of the graph to the CPU. For example, the TF feature_columns API tends to generate many tiny operations and is often best assigned to the CPU.

Thanks. I will experiment with that. I tried reinstalling tensorflow-gpu and uninstalling tensorflow but it didn’t seem to do anything.

I am using Jupyter Notebook and running my script using %run *.py

Is there a convenient way to run with CPU vs running with GPU?

My data set is very small for this testing (just making sure the models work) before I open the data flood gates so I wonder if it is due the small kernel size?


You can export the environment variable CUDA_VISIBLE_DEVICES=“” this will hide all GPUs from TensorFlow. To experiment with assigning different parts of your model to the CPU or GPU you will need to use with tf.device() statements instead.

The small data set should not affect overall performance as long as you are running enough iterations to amortize one-time tuning costs at the start of training (usually worth running at least a few hundred batches when collecting performance timings). In particular, the amount of work handled per batch will not change as the data set grows.