[ubuntu1404][GTX-1080] Cublas handle: not initialized in driver version 384.111

Hello,

I have a workstation with ubuntu 14.04LTS. With:

  • GTX-1080.
  • Cuda 8.0
  • Cuda Driver 384.111
  • Python 2.7.14
  • Tensorflow 1.4.1
  • Keras 2.1.4

When I run a test file to evaluate the instalation, I get this Error.

E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

How is the form to solve? I have read to install cuda driver 375.26, but this cuda driver not support GTX-1080. And when I install I loss the screen.
http://www.nvidia.com/download/driverResults.aspx/124091/en-us

See this for possible causes:
https://github.com/tensorflow/tensorflow/issues/9489

I probe all solutions in thatlink, and not work anything.

I forgot said that I was using a enviroment with python 2.7

More info:

(py27TFGPU14) user1@INVEST3:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

I see that the V8.0.61 have a Patch 2 (Released Jun 26, 2017) https://developer.nvidia.com/cuda-80-ga2-download-archive

  • cuBLAS Patch Update to CUDA 8: Includes performance enhancements and bug-fixes

I install but … doesn’t make it work.

(py27TFGPU14) user1@INVEST3:~$ conda list cudnn
# packages in environment at /home/user1/anaconda2/envs/py27TFGPU14:
#
# Name                    Version                   Build  Channel
cudnn                     7.0.5                 cuda8.0_0

And the toy program is:

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

# Generate dummy data
import numpy as np
x_train = np.random.random((1000, 20))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
x_test = np.random.random((100, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train)
score = model.evaluate(x_test, y_test, batch_size=128)

The Error is:

Using TensorFlow backend.
2018-04-24 16:33:46.190754: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-04-24 16:33:46.462779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8225
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.55GiB
2018-04-24 16:33:46.462808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
Epoch 1/1
2018-04-24 16:33:47.542109: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:48.015391: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:48.439726: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:48.864943: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:49.288670: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:49.712441: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:50.137997: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:50.559950: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:51.124191: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-04-24 16:33:51.124216: W tensorflow/stream_executor/stream.cc:1901] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "Escritorio/try TF14/prueba.py", line 37, in <module>
    model.fit(x_train, y_train)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/models.py", line 963, in fit
    validation_steps=validation_steps)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/engine/training.py", line 1712, in fit
    validation_steps=validation_steps)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/engine/training.py", line 1235, in _fit_loop
    outs = f(ins_batch)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2475, in __call__
    **self.session_kwargs)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(32, 20), b.shape=(20, 64), m=32, n=64, k=20
	 [[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_dense_1_input_0_0/_41, dense_1/kernel/read)]]
	 [[Node: metrics/acc/Mean/_83 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_548_metrics/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'dense_1/MatMul', defined at:
  File "Escritorio/try TF14/prueba.py", line 26, in <module>
    model.add(Dense(64, activation='relu', input_dim=20))
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/models.py", line 467, in add
    layer(x)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/engine/topology.py", line 617, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/layers/core.py", line 855, in call
    output = K.dot(inputs, self.kernel)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1072, in dot
    out = tf.matmul(x, y)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1891, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2437, in _mat_mul
    name=name)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/user1/anaconda2/envs/py27TFGPU14/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(32, 20), b.shape=(20, 64), m=32, n=64, k=20
	 [[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_dense_1_input_0_0/_41, dense_1/kernel/read)]]
	 [[Node: metrics/acc/Mean/_83 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_548_metrics/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Try
sudo rm -rf ~/.nv/
see:
https://devtalk.nvidia.com/default/topic/1007071/cuda-setup-and-installation/cuda-error-when-running-matrixmulcublas-sample-ubuntu-16-04/post/5169223/#5169223

Thanks, it works!!

Inglourios hidden folders.

Thanks a lot generix!
it worked for me as well.