Error Internal: Blas GEMM launch failed

Hi,

I an encountering an error when I moved to a new laptop with RTX3070. I am new to GPU world and I tried to follow some suggestions to resolve but it is unsuccessfull . Please let me know some suggestions or resources on how to proceed.

Error:
2022-04-28 23:09:24.631740: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
0%| | 0/101 [00:57<?, ?it/s]
Traceback (most recent call last):
File “/home/sahiti/anaconda3/envs/drp2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/home/sahiti/anaconda3/envs/drp2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/home/sahiti/anaconda3/envs/drp2/lib/python3.7/site-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(192, 3), b.shape=(192, 3), m=3, n=3, k=192
[[{{node matrix_solve_ls_45/MatMul}}]]
[[gradients/AddN_685/_1851]]
(1) Internal: Blas GEMM launch failed : a.shape=(192, 3), b.shape=(192, 3), m=3, n=3, k=192
[[{{node matrix_solve_ls_45/MatMul}}]]
0 successful operations.
0 derived errors ignored.

My configurations:

Ubuntu 20.04, RTX3070 , 8GB GPU memory

cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
h5py 2.10.0 pypi_0
keras-applications 1.0.8 py_1
keras-base 2.3.1 py37_0 anaconda
keras-gpu 2.3.1 0 anaconda
keras-preprocessing 1.1.2 pyhd3eb1b0_0
python 3.7.6 h0371630_2
tensorflow 1.15.0 gpu_py37h0f0df58_0
tensorflow-base 1.15.0 gpu_py37h9dcbed7_0
tensorflow-estimator 1.15.1 pyh2649769_0
tensorflow-gpu 1.15.0 h0d30ee6_0

I set gpu configuration for its memory to be allowed to grow and also continuously monitor with nvidia-smi as the program starts. The memory usage is below 1GB and also utilization is low. The program takes nearly 20minutes to start and crashes soon with above error.

Thank you.

You might consider posting in Frameworks forum as well → Frameworks - NVIDIA Developer Forums

NVIDIA strongly recommends that anyone using RTX 30x0 GPUs use CUDA 11.1 or newer. Not doing this is almost certainly the reason for the 20 minute start time and might possibly have something to do with the CUBLAS report also.

Moving to CUDA 11.1 will affect the rest of your stack as well, meaning you will need a different version of tensorflow-gpu and a different version of cudnn, at least.

Versions of TF are readily available that are based on CUDA 11.2. This would be a much better choice for your RTX 30x0 GPU.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.