Internal: Blas GEMM launch failed


I encountered an error when I moved to a new GPU RTX3060. I tried to follow many different suggestions to resolve but it is unsuccessful.
Please let me know some suggestions or resources on how to proceed.

I am pasting my code and error below:-

My Code:-
import tensorflow as tf
import numpy as np
import os
#import tensorflow.compat.v1 as tf

path = ‘D:\user\GPU Testing code’
run_opts = tf.RunOptions(report_tensor_allocations_upon_oom = True)
folder = ‘imagefile.npz’

#Load npz files
path2 = os.path.join(path,folder)

for_npz = np.load(path2)
filenames1 = for_npz[‘filename_heads’]
X1 = for_npz[‘features’]

#similariMat = tf.keras.losses.cosine_similarity(y_true=X1, y_pred=X1, axis = 1)

#Once this code starts running, check the performance of GPU whether it increases or only the CPU utilization increases
with tf.compat.v1.Session() as sess:
for i in range(0, X1.shape[0], 100):
if i == 0:
Y_M, Y_N = X1.shape
Y = tf.placeholder(tf.float32, shape = (Y_M, Y_N))
Y_normalized = tf.nn.l2_normalize(Y, dim = 1)
M = X1[i:(i+100)].shape[0]
N = X1.shape[1]
X = X1[i:(i+100)]
# input
input = tf.placeholder(tf.float32, shape = (M, N))
# normalize each row
normalized = tf.nn.l2_normalize(input, dim = 1)
# multiply row i with row j using transpose
# element wise product
prod = tf.matmul(normalized, Y_normalized,
adjoint_b = True # transpose second matrix
dist = 1 - prod
Sim_Mat =, feed_dict = {input:X,


InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(100, 2048), b.shape=(301718, 2048), m=100, n=301718, k=2048
[[{{node MatMul}}]]
(1) Internal: Blas GEMM launch failed : a.shape=(100, 2048), b.shape=(301718, 2048), m=100, n=301718, k=2048
[[{{node MatMul}}]]
0 successful operations.
0 derived errors ignored.

My configurations:

Windows 10, RTX3060 , 12GB GPU memory

Python 3.6.13
cudatoolkit 11.3.1
cudnn 8.2.1 cuda11.3_0 anaconda
pytorch 1.10.2 py3.6_cuda11.3_cudnn8_0 pytorch
h5py 3.1.0 pypi_0 pypi
keras 2.6.0 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
keras-vggface 0.6 pypi_0 pypi
keras-preprocessing 1.1.2 pyhd3eb1b0_0
python 3.7.6 h0371630_2
tensorboard 1.15.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
tensorflow-gpu 1.15.0 pypi_0 pypi

When I run my code with a small size of imagefile.npz around 32 MB it gets executes successfully, But when I test with .npz of 1 GB it crashes with the above-mentioned error.

Thank you

I suggest finding a forum for tensorflow support. This forum isn’t intended for tensorflow support. Furthermore, tensorflow is not a NVIDIA product.