Recently, I intended to write a kernel function for each image pixel that involves multiple matrix operations (matrix-vector multiplication, matrix-matrix multiplication, etc).
Instead of writing some macro or inline functions for small matrices (like 3×3 matrix) by my own, I found that cuBLAS device api library can actually do the job for me, so I decided to give it a try.
But when I added the required cublas_v2.h and all the .lib files (cublas.lib, cublas_device.lib, cudadevrt.lib, cudart_static.lib) just like the in cuda sample simpleDevLibCUBLAS, I got the
ptxas fatal: Unresolved extern function ‘cublasCreate_v2’.
After a little while of googling, I found that I hadn’t set the Generate Relocatable Device Code option to Yes (-rdc=true) if I want to enable Dynamic Parallelism, I thought this would be it, but after I set -rdc=true, I got 940 errors instead, all like this:
CUDALINK : nvlink error : Undefined reference to ‘maxwell_hgemmBatched_256x128_raggedMn_nn’ in ‘C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cublas_device.lib:maxwell_sm50_hgemm_batched.obj’ (target: sm_61)
Even if I commented out my whole kernel function that call cuBLAS device api, the nvlink errors are still there.
I am quite confused since my gpu is gtx 1050ti which is pascal architecture (sm_61), what on earth does it have to do with maxwell sm_50?
Can somebody help me solving this problem? Thanks a lot.