Sample MatrixCUBLAS fails on V100

Hello everyone,

I was trying to run the sample matrixCUBLAS from CUDA SAMPLES SDK on V100.

   <b>Performance= inf GFlop/s, Time= 0.000 msec, Size= 196608000 Ops</b>

However it fails to run. The same sample run very well on other GPUs.

Can anyone relate to this ?

Thank you,

There isn’t any CUDA sample code called matrixCUBLAS that I am aware of. There is one called matrixMulCUBLAS. For that sample code, I’m able to run it on a Tesla V100 without difficulty. You may have a problem with your CUDA install or other problem with that setup.

$ /usr/local/cuda/samples/bin/x86_64/linux/release/matrixMulCUBLAS
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Tesla V100-PCIE-32GB" with compute capability 7.0

GPU Device 0: "Tesla V100-PCIE-32GB" with compute capability 7.0

MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 6073.35 GFlop/s, Time= 0.032 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.