NVBLAS to accelerate cholesky decomposition (potrf)

Hi,

I’m trying to run a short piece of code in python(Anaconda).
Anaconda uses libmkl_rt.so for BLAS calls and I have set NVBLAS_CONFIG_FILE properly.

import numpy as np
from scipy import linalg
A = np.random.rand(10000,100000)
AtA = a.dot(a.T)

linalg.cholesky(AtA) # or linalg.lapack.dpotrf(AtA)

  • np.dot is passed to the GPU (Tesla K80)
  • However, cholesky runs only on CPU.

Doesn’t NVBLAS target potrf? any package in python or C that can achieve this goal? Only direct call to cublas?

Thanks in advance.
Roi

The routines that NVBLAS targets are listed in the documentation:

http://docs.nvidia.com/cuda/nvblas/index.html#routines

potrf is not listed there.

potrf is not part of CUBLAS, but is part of cuSolver:

http://docs.nvidia.com/cuda/cusolver/index.html#cuds-linearsolver-reference