OP_REQUIRES failed at matrix_inverse_op.cc:191 : Internal: tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call failed status = 13

carlkulseng · April 3, 2019, 11:52am

We are attempting to train a network of knee MRI through Niftynet. It works quite well with CPU based training, although it takes a lot of time for each iteration. We have 2 GPUs available (2xGeForce RTX 2080) (0,1) with 8GB memory each.
.
We have the following CUDA version: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:32_Central_Daylight_Time_2017
Cuda compilation tools, release 9.0, V9.0.176

Nevertheless, we are struggling with an error message. We encounter the following error message: “tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call error 13” - which as I understand is a general serialization error. It occurs at first while the shuffle buffer is filling.

Also: OP_REQUIRES failed at matrix_inverse_op.cc:191 : Internal: tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call failed status = 13

Has anyone else on the forum encountered a similar issue?

cocoa_kang · September 3, 2019, 6:22am

Hi carkulseng,
I have met a similar problem and I found a way to avoid that. My problem code is:

 tf.matrix_inverse(view_mat_model)

Tensorflow will create a cublas solver for this, and it works well on 1080ti, but crashes on 2080ti. Replace this by:

with tf.device("/cpu:0"):
    view_mat_for_normal =tf.matrix_inverse(view_mat_model)

which move the computation to cpu.

This works fine with my code.
Hope this can help you.

carlkulseng · September 3, 2019, 12:42pm

Hi,

Good to hear that you got it running. In my case I found that with Nvidia RTX 2080 ti - I needed all the right combinations of CUDA, cuDNN, display driver and tensorflow version in order to get it to work.

Rtx2080 ti requires cuda10, will not work with cuda9

In my case I needed CUDA 10.0, cuDNN 7.6 display driver 419.17 and tensorflow-gpu version 1.13.1 . (I needed that tf-version because it is compatible with numpy 1.14.5 which is a req in my code).

Only when I had all the right versions installed, the program was able to run on GPU without any errors.

When cuBLAS has been successfully loaded you should see something like this:

tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally

For others, check what specific CUDA+cuDNN version your card requires.

carlkulseng · September 3, 2019, 1:55pm

In case you are using Windows 10, you will also need visual studio.

Recommend to follow the instructions here very carefully:
https://docs.nvidia.com/cuda/archive/10.0/

cocoa_kang · September 4, 2019, 1:24am

Hi carlkulseng,
I think choosing a correct combination of CUDA and cuDNN is the best way to solve this problem. Thanks for your reply and I will try this combination later.

cocoa_kang · September 4, 2019, 4:59am

I just tried the combination and it works!
With the combination installed, tensorflow gives logs like:
2019-09-04 12:53:10.485346: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
2019-09-04 12:53:12.048907: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0000020FC8BAF950

My environment is:
python3.6.8
tensorflow 1.13.1
CUDA10.0
cuDNN7.6.30
Driver 425.25
Win10

Thank you again.

carlkulseng · September 4, 2019, 7:14am

Fantastic! Glad to help =)