simplecublas kernel execution error

hinshelwood · May 17, 2019, 11:15pm

After trying(failing) to get tensorflow working on our cluster I think there is an issue with cuBlas. Running the sample code provided in the cuda toolkit gives a kernel execution error. I am not sure if this is a driver installation error or a cuda library problem. The admins of our cluster recently preformed a partial update of the nvidia drivers on some nodes from 396.26 to 410.104. The interesting thing is the error below occurs on both nodes with drivers 396.26 to 410.104. Is the a problem with drivers or the cuda-toolkit install.

$ /software/cuda-toolkit/9.0.176/samples/7_CUDALibraries/simpleCUBLAS/simpleCUBLAS
GPU Device 0: “Tesla V100-PCIE-16GB” with compute capability 7.0

simpleCUBLAS test running…
!!! kernel execution error.

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 10885 G /usr/bin/X 22MiB |
| 1 10885 G /usr/bin/X 22MiB |
±----------------------------------------------------------------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 13166 G /usr/bin/X 24MiB |
| 1 13166 G /usr/bin/X 24MiB |
±----------------------------------------------------------------------------+

Here is the tensorflow error it occurs with both versions 1.12.2 and 1.13.1. On cuda 9.0

$ python regression.py
1.12.2

Layer (type) Output Shape Param #

dense (Dense) (None, 64) 640

dense_1 (Dense) (None, 64) 4160

dense_2 (Dense) (None, 1) 65

Total params: 4,865
Trainable params: 4,865
Non-trainable params: 0

None
2019-05-17 13:55:30.451749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:02:00.0
totalMemory: 11.91GiB freeMemory: 11.60GiB
2019-05-17 13:55:30.585480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:82:00.0
totalMemory: 11.91GiB freeMemory: 11.60GiB
2019-05-17 13:55:30.585555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1
2019-05-17 14:02:32.396076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-17 14:02:32.396104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1
2019-05-17 14:02:32.396110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N
2019-05-17 14:02:32.396113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N
2019-05-17 14:02:32.397192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11227 MB memory) → physical GPU (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:02:00.0, compute capability: 6.0)
2019-05-17 14:02:32.398159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11227 MB memory) → physical GPU (device: 1, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0)
2019-05-17 14:03:39.684369: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File “regression.py”, line 78, in
example_result = model.predict(example_batch)
File “/home/roverst/software/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py”, line 1878, in predict
self, x, batch_size=batch_size, verbose=verbose, steps=steps)
File “/home/roverst/software/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py”, line 326, in predict_loop
batch_outs = f(ins_batch)
File “/home/roverst/software/lib/python3.6/site-packages/tensorflow/python/keras/backend.py”, line 2988, in call
run_metadata=self.run_metadata)
File “/home/roverst/software/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1439, in call
run_metadata_ptr)
File “/home/roverst/software/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py”, line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(10, 9), b.shape=(9, 64), m=10, n=64, k=9
[[{{node dense/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device=“/job:localhost/replica:0/task:0/device:GPU:0”](_arg_dense_input_0_0/_21, dense/MatMul/ReadVariableOp)]]

Topic		Replies	Views
[ubuntu1404][GTX-1080] Cublas handle: not initialized in driver version 384.111 Linux	6	5124	October 14, 2021
Error during training using RTX3090 with TLT docker, it is ok with RTX2070 : failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED TAO Toolkit	2	1760	October 12, 2021
Nvidia Modulus: failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Technical Support (Modulus Only)	2	1740	May 18, 2022
CUDA error when running matrixMulCUBLAS sample - Ubuntu 16.04 CUDA Setup and Installation	19	13376	May 4, 2018
Tesla P100 Issue – Processing Stops at 8MiB, Multiple Driver Versions Tested nvc, nvc++ and nvfortran cuda	9	94	December 19, 2024
Error Internal: Blas GEMM launch failed GPU-Accelerated Libraries cuda , tensorflow , ubuntu , cublas	3	4742	April 30, 2022
Attempting to perform BLAS operation using StreamExecutor without BLAS support Jetson AGX Orin cuda	3	180	August 13, 2024
cublasZgemm fails on FERMI but not on TESLA CUBLAS_STATUS_NOT_INITIALIZED even if 'cublasInit()& CUDA Programming and Performance	2	5906	February 17, 2011
XLA:gpu system doesn't work. Frameworks tensorflow	1	1668	February 29, 2020
Ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1 CUDA Setup and Installation cuda , tensorflow , ai-training	2	4845	January 8, 2024

simplecublas kernel execution error

Layer (type) Output Shape Param #

dense_2 (Dense) (None, 1) 65

Related topics