nvblas + numpy + IntelMKL >2018.3 not work.

bernard.at.spark · October 11, 2018, 5:35pm

Platform is Ubuntu 16.04.5. GPU NVIDIA1070 cuda-9.2 python3.5.2

I use python with nvblas support by compiling numpy against intel MKL. It works with 2018.1 and 2018.2 but GPU is not used for MKL 2018.3, 2018.4 and 2019 preview

Here is a simple experiment (I have tried different versions of numpy with same result)

LD_PRELOAD=/usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so python3
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to '/home/bernard/.config/nvblas.conf'
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'1.15.2'
>>> np.show_config()
lapack_mkl_info:
    include_dirs = ['/opt/intel/mkl/include']
    library_dirs = ['/opt/intel/mkl/lib/intel64/']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
    include_dirs = ['/opt/intel/mkl/include']
    library_dirs = ['/opt/intel/mkl/lib/intel64/']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
blas_opt_info:
    include_dirs = ['/opt/intel/mkl/include']
    library_dirs = ['/opt/intel/mkl/lib/intel64/']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
    include_dirs = ['/opt/intel/mkl/include']
    library_dirs = ['/opt/intel/mkl/lib/intel64/']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    libraries = ['mkl_rt', 'pthread']
>>> a = np.random.rand(10000, 10000)
>>> b = np.random.rand(10000, 10000)
>>> a@b

With intel mkl version 2018.1 and 2018.2 nvidia-smi shows that Volatile GPU-Util as 100%. nvblas.log

shows
[NVBLAS] Using devices :0 
[NVBLAS] Config parsed
[NVBLAS] dgemm[gpu]: ta=N, tb=N, m=10000, n=10000, k=10000

But for newer intel mkl (2018.3, 2018.4 and 2019.0 preview) Volatile GPU-Util shows 0 % and nvblas.log is blank, so the gpu is not used at all.

Switched back to mkl 2018.2 now it works again.

I checked my nvblas.config. Everything seems to be in order (path not tied to specific versions of intel MKL)

# This is the configuration file to use NVBLAS Library
# Setup the environment variable NVBLAS_CONFIG_FILE to specify your own config file.
# By default, if NVBLAS_CONFIG_FILE is not defined, 
# NVBLAS Library will try to open the file "nvblas.conf" in its current directory
# Example : NVBLAS_CONFIG_FILE  /home/cuda_user/my_nvblas.conf
# The config file should have restricted write permissions accesses

# Specify which output log file (default is stderr)
NVBLAS_LOGFILE  nvblas.log

# Enable trace log of every intercepted BLAS calls
NVBLAS_TRACE_LOG_ENABLED

#Put here the CPU BLAS fallback Library of your choice
#It is strongly advised to use full path to describe the location of the CPU Library
#NVBLAS_CPU_BLAS_LIB  /usr/lib/libblas.so
NVBLAS_CPU_BLAS_LIB /opt/intel/mkl/lib/intel64/libmkl_rt.so
#NVBLAS_CPU_BLAS_LIB/home/bernard/opt/openblas-mpi/lib/libopenblas.so

# List of GPU devices Id to participate to the computation 
# Use ALL if you want all your GPUs to contribute
# Use ALL0, if you want all your GPUs of the same type as device 0 to contribute
# However, NVBLAS consider that all GPU have the same performance and PCI bandwidth
# By default if no GPU are listed, only device 0 will be used

#NVBLAS_GPU_LIST 0 2 4
#NVBLAS_GPU_LIST ALL
NVBLAS_GPU_LIST ALL0

# Tile Dimension
NVBLAS_TILE_DIM 2048

# Autopin Memory
NVBLAS_AUTOPIN_MEM_ENABLED

#List of BLAS routines that are prevented from running on GPU (use for debugging purpose
# The current list of BLAS routines supported by NVBLAS are
# GEMM, SYRK, HERK, TRSM, TRMM, SYMM, HEMM, SYR2K, HER2K

#NVBLAS_GPU_DISABLED_SGEMM 
#NVBLAS_GPU_DISABLED_DGEMM 
#NVBLAS_GPU_DISABLED_CGEMM 
#NVBLAS_GPU_DISABLED_ZGEMM 

# Computation can be optionally hybridized between CPU and GPU
# By default, GPU-supported BLAS routines are ran fully on GPU
# The option NVBLAS_CPU_RATIO_<BLAS_ROUTINE> give the ratio [0,1] 
# of the amount of computation that should be done on CPU
# CAUTION : this option should be used wisely because it can actually
# significantly reduced the overall performance if too much work is given to CPU
#NVBLAS_CPU_RATIO_CGEMM 0.07

Robert_Crovella · March 12, 2019, 2:31am

numpy may choose to use cblas_gemm interface/API:

[url]Intel Developer Zone

rather than fortran-style BLAS dgemm or dgemm_ interface/API. If it does that, then NVBLAS will not intercept the call:

[url]https://docs.nvidia.com/cuda/nvblas/index.html#symbols-interception[/url]

I don’t know for sure that this is the issue; you wouldn’t think that simply changing a library linked to would have this effect. However it may be that numpy is inspecting the linked blas implementation, and choosing cblas instead of ordinary blas according to some heuristic. This should probably be testable with a tool like strace.

Is there any difference in the np.show_config() output in the two cases?

Topic		Replies	Views
nvblas + numpy not work for Intel MKL >=2018.3 GPU-Accelerated Libraries	0	732	October 13, 2018
NVBLAS not offloading to GPU GPU-Accelerated Libraries	2	1198	May 1, 2019
Problems with CUBLAS, dlopen and GNU Octave GPU-Accelerated Libraries	2	1267	December 15, 2014
Having trouble with NVBLAS GPU-Accelerated Libraries	5	6729	April 11, 2014
Nvc++ OpenACC runtime segfaults if Intel MKL (numpy) is already loaded nvc, nvc++ and nvfortran	8	1250	October 7, 2023
Undefined reference to `cublasCreate_v2' GPU-Accelerated Libraries cublas	16	30854	April 9, 2024
NVBLAS on forked process not working (error code 14) CUDA Developer Tools	0	362	April 5, 2021
nvcc -cuda CUDA Programming and Performance	5	15957	November 18, 2008
nvcc + Mkl functions CUDA Programming and Performance	16	6402	August 25, 2011
nvprof core dumps on Ubuntu 16.04 CUDA Setup and Installation	12	3561	August 16, 2018

nvblas + numpy + IntelMKL >2018.3 not work.

Related topics