Having trouble with NVBLAS

I’m trying to use nvblas as a drop in replacement for BLAS in things like R or numpy. I installed the CUDA 6 RC on Ubuntu 13.10 fine and normal CUDA applications run without issue. When I try and run something that calls BLAS though I get the following error:
csevers@titan1:~$ LD_PRELOAD=/usr/local/cuda-6.0/lib64/libnvblas.so R
[NVBLAS] Cannot open default config file ‘nvblas.conf’
Segmentation fault (core dumped)

The command is from this presentation by the way:

The CPU blas install I have is OpenBLAS built from source. That works fine.

Any ideas? I would really love to get this working.


You need to set the environment variable “NVBLAS_CONFIG_FILE” to point to the nvblas.conf file. Instructions on how to create the conf file are in page 9 of the NVBLAS documentation that comes with the CUDA 6 RC toolkit

You can also get it from this link:

Thank you. I should have RTFM first :)

Did you manage to get this working with numpy?

In my attempts, it doesn’t seem to be offloading to the CPU at all. e.g., when I time calls to numpy.dot(A, B), there is no change in execution time.

CublasXT and NVBLAS which is on top of CublasXt only accelerate BLAS Level-3 routines ( e.g matrix-matrix operations)

dot ( product of 2 vectors) is a BLAS 1 Level routines

As philippev noted, you need level 3 blas stuff in order to see a change. I definitely note a difference with numpy/scipy if I do some actual matrix multiplication stuff.