nvlink error : Undefined reference

JacobZhu · July 2, 2017, 2:26pm

Recently, I intended to write a kernel function for each image pixel that involves multiple matrix operations (matrix-vector multiplication, matrix-matrix multiplication, etc).

Instead of writing some macro or inline functions for small matrices (like 3×3 matrix) by my own, I found that cuBLAS device api library can actually do the job for me, so I decided to give it a try.

But when I added the required cublas_v2.h and all the .lib files (cublas.lib, cublas_device.lib, cudadevrt.lib, cudart_static.lib) just like the in cuda sample simpleDevLibCUBLAS, I got the

ptxas fatal: Unresolved extern function ‘cublasCreate_v2’.

After a little while of googling, I found that I hadn’t set the Generate Relocatable Device Code option to Yes (-rdc=true) if I want to enable Dynamic Parallelism, I thought this would be it, but after I set -rdc=true, I got 940 errors instead, all like this:

CUDALINK : nvlink error : Undefined reference to ‘maxwell_hgemmBatched_256x128_raggedMn_nn’ in ‘C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cublas_device.lib:maxwell_sm50_hgemm_batched.obj’ (target: sm_61)

Even if I commented out my whole kernel function that call cuBLAS device api, the nvlink errors are still there.

I am quite confused since my gpu is gtx 1050ti which is pascal architecture (sm_61), what on earth does it have to do with maxwell sm_50?

Can somebody help me solving this problem? Thanks a lot.

Robert_Crovella · July 2, 2017, 2:38pm

Try replacing the code in the simpleDevLibCUBLAS project with yours and see if it compiles cleanly. If it does, then the problem is in your exact project setup.

JacobZhu · July 3, 2017, 12:48pm

txbob, Thank you very much for your advice, I think I found where the problem is, if I add ‘compute_61,sm_61’ to the Code Generation of the simpleDevLibCUBLAS project, it will also report tons of nvlink errors, and if I delete ‘compute_61,sm_61’ from my own project, my own project can build successfully, but WHY???

MutantJohn · July 3, 2017, 12:54pm

You should show us how you’re attempting to build your software here.

You definitely have issues with linking and we need to make sure that you’re referencing the proper files.

Btw, be careful. The cuBLAS device API is really just dynamic parallelism which may not actually give you the performance you’re likely seeking.

Writing matrix math for small matrices is actually relatively straight-forward so don’t be afraid to do it yourself should the need arise.

JacobZhu · July 3, 2017, 2:22pm

MutantJohn, thank you very much for your reply. The thing is the CUDA sample simpleDevLibCUBLAS project also report these nvlink errors with only ‘compute_61,sm_61’ added, nothing else has been changed to it, so I think maybe anybody with gtx1050ti and CUDA8.0 installed can replicate this error?

MutantJohn · July 3, 2017, 2:57pm

Keep in mind, without specifying the architecture, I think CUDA uses a JIT model. Once you compiled the code without specifying the architecture, did you actually try to run it? Runtime-compilation of the device code is likely to fail.

To parrot other users on this forum, it’d help us the most if you could provide a minimal example that exhibits the behavior as well as your compilation commands.

Just write a simple dummy.cu which attempts to call the desired functions and show us how you’re linking to the target libs.

JacobZhu · July 4, 2017, 12:38pm

I can run the original CUDA sample simpleDevLibCUBLAS project (provided with CUDA 8.0 toolkit) successfully on my gtx 1050ti, which specify the architechture to ‘compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60’, but once I add ‘compute_61,sm_61’ to the end the project reports nvlink errors.

Robert_Crovella · July 4, 2017, 1:15pm

I think this is expected behavior. I believe the cublas device library has a limited number of device linkable options. I believe these were limited to avoid code bloat in the library and because there were no useful differences for the architectures not listed in the sample project.

So if you want to use device cublas, limit yourself to the suggested architectures in the sample project. If you include sm_60 option, the code should run correctly on your cc6.1 device.

JacobZhu · July 6, 2017, 10:03am

OK, got it, thank you txbob.

But, will the performance drop noticeably if sm_60 instead of sm_61 is specified to cc6.1 device?

Topic		Replies	Views
Nvlink error : Undefined reference to 'cublasZgemm_v2' in ******.obj' GPU-Accelerated Libraries cublas	19	2243	May 1, 2025
Cublas on device in Cuda 10 CUDA Programming and Performance	0	294	December 4, 2019
Undefined reference to `cublasCreate_v2' GPU-Accelerated Libraries cublas	16	31960	April 9, 2024
example error Legacy PGI Compilers	1	2446	January 6, 2020
Undefined reference to `cublasCreate_v2’ in '/tmp/tmpxft_0000120b_0000000-10_my_program'' GPU-Accelerated Libraries cuda	3	1581	October 12, 2021
Cublas in Cuda 10 CUDA Programming and Performance	0	298	December 4, 2019
Cublas in Cuda 10 CUDA Programming and Performance	0	252	December 4, 2019
undefined reference when using cublas_device Legacy PGI Compilers	1	2730	May 17, 2019
call cublas in cuda kernel and use static link GPU-Accelerated Libraries	2	1717	July 26, 2019
Nvlink error Confidential Computing	0	145	May 29, 2024

nvlink error : Undefined reference

Related topics