Tensorflow 2.3.1 + CUDA 10.1.105 + cuDNN failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

I am currently following a tutorial to generate a model for handwritten text recognition at the following link: https://github.com/arthurflor23/handwritten-text-recognition

To do so, I have the following hardware:
CPU: Intel i7-9700K
GPU: RTX 2070 Super
OS: Windows 10

Having read online about using CUDA and cuDNN with TensorFlow 2.3, it seemed as though the generally accepted versions to use are CUDA 10.1 (which I have version 10.1.105 from the NVIDIA downloads) as well as cuDNN 7.6 (which I have version from the NVIDIA downloads also).

The above tutorial uses Google Colab but I’m relatively certain that my GPU is going to be faster than the ones available on Google Colab.

To install the CUDA version, I visited the following url: https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork

and to download the cuDNN version, I followed this one: https://developer.nvidia.com/compute/machine-learning/cudnn/secure/

With all this installed and various restarts to be sure, I added the required paths to my PATH variable following the instructions here: https://www.tensorflow.org/install/gpu

I then proceed to start following the jupyter notebook instructions provided with the GitHub repository making sure to install the requirements. One issue is that I needed to install numpy v1.16.0 as the current 1.19 version is incompatible with tensorflow.

With all this, I run the cells (excluding the Google Colab cell since I am running this locally). I additionally added the following code to the first TensorFlow cell since I noticed a lot of debate on the memory growth option:
gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:

Everything works fine until the training cell at which I get the following message in the console:

I’ll spare the details regarding my attempt at running this out of Jupyter which gave the same results.

I’ve tried to find answers online but they all revolve around TensorFlow 1.x which I’m not using. I’m wondering if maybe my GPU simply doesn’t have enough memory to run the training.

Thanks in advance to anyone that may help !

PS: Yes, I am a new member so I may have numerous changes to make to the post/code

Hi @maxime.h.d.michel,
Which TRT version you are using here?
We recommend you to use the latest release.
To avoid system dependencies, suggest NGC container.

Please share the code and model in case issue persist.