CUDA driver API cuLinkComplete can't find libdevice (nvvm intrinsics bitcode)

uday1 · March 17, 2022, 11:37am

Hello,

I’m using CUDA 11.5 on an Ubuntu 20.04 and in using the CUDA driver API’s cuLinkComplete, I get the following link error:

error: cuLinkComplete(linkState, &cubinData, &cubinSize) failed with error code device kernel image is invalid[error : Undefined reference to ‘__nv_expf’ in ..

Now __nv_expf is an intrinsic that is part of CUDA’s libdevice library and it’s in the form of NVVM bitcode which I can see at /usr/local/cuda-11.5/nvvm/libdevice/libdevice.10.bc. However, somehow, the link above is failing. I am using the right (CUDA 11.5) driver API.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Sep_13_19:13:29_PDT_2021
Cuda compilation tools, release 11.5, V11.5.50
Build cuda_11.5.r11.5/compiler.30411180_0

Does anyone have insights into why it fails to find/link with libdevice here? I don’t have this issue on another Ubuntu 20.04 system where I use CUDA 11.2.

uday1 · July 1, 2022, 12:37am

A gentle ping on this one. (This isn’t specific to __nv_expf but also other such intrinsics in that library.)

Robert_Crovella · July 1, 2022, 1:57pm

You may wish to provide a complete test case.

uday1 · July 1, 2022, 2:13pm

The link error that I get comes from in-memory NVVM/LLVM IR that is on its way to being JITTed using the CUDA driver API. There isn’t a way for me to provide a reproducible test case (without the reproducer having to download and build LLVM/MLIR), but here is the call to the CUDA driver API that fails to find the necessary intrinsic:

github.com

llvm/llvm-project/blob/befa8cf087dbb8159a4d9dc8fa4d6748d6d5049a/mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp#L121


      
          RETURN_ON_CUDA_ERROR(cuLinkAddData(
              linkState, CUjitInputType::CU_JIT_INPUT_PTX,
              const_cast<void *>(static_cast<const void *>(isa.c_str())), isa.length(),
              kernelName.c_str(), 0, /* number of jit options */
              nullptr,               /* jit options */
              nullptr                /* jit option values */
              ));
          
          
void *cubinData;
          size_t cubinSize;
          RETURN_ON_CUDA_ERROR(cuLinkComplete(linkState, &cubinData, &cubinSize));
          
          
char *cubinAsChar = static_cast<char *>(cubinData);
          auto result =
              std::make_unique<std::vector<char>>(cubinAsChar, cubinAsChar + cubinSize);
          
          
// This will also destroy the cubin data.
          RETURN_ON_CUDA_ERROR(cuLinkDestroy(linkState));
          RETURN_ON_CUDA_ERROR(cuCtxDestroy(context));
          
          
return result;

@Robert_Crovella - please do let me know if it would still help provide a way to reproduce this. I could provide an LLVM IR blob that when run through mlir-cpu-runner (part of the LLVM project repo) should yield that error.

Robert_Crovella · July 1, 2022, 2:18pm

I wouldn’t be able to spend any time with it without a full test case. If you file a bug, the bug team will certainly ask for a full test case.

Do as you wish, of course.

uday1 · July 5, 2022, 5:46am

I put together a complete test case to easily reproduce this. I’ve also minimized the IR to the part of interest. To reproduce this, the following command on the attached file can be run:

$ bin/mlir-opt expf_reproducer.mlir  -pass-pipeline='gpu.module(strip-debuginfo,gpu-to-cubin{chip=sm_80})'
<unknown>:0: error: cuLinkComplete(linkState, &cubinData, &cubinSize) failed with error code unknown error[error   : Undefined reference to '__nv_expf' in 'exp_kernel']

Building the binary mlir-opt is straightforward from the LLVM trunk:

Check out LLVM from the official repo:
$ git clone git@github.com:llvm/llvm-project.git
Configure and build:

$ mkdir llvm-project/build
$ cd llvm-project/build
$ cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS=mlir \
   -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON
# Using clang and lld speeds up the build, we recomment adding:
#  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_LLD=ON
$ cmake --build . --target mlir-opt
# or ninja mlir-opt
# mlir-opt can be found in the bin/ directory

expf_reproducer.mlir (2.3 KB)

uday1 · July 5, 2022, 9:06am

I dug in a bit more and it appears that this issue may have nothing to do with cuLinkComplete (please see the question at the very end below though). CUDA’s libdevice is an LLVM bitcode library and it has to be linked in before PTX is generated. The library resides at /usr/local/cuda-11.6/nvvm/libdevice/libdevice10.bc for example.) Linking with libdevice is briefly described here:
https://llvm.org/docs/NVPTXUsage.html#libdevice
and there is a great deal of informative and detailed discussion on this on the LLVM dev list here (up until six months ago):
https://groups.google.com/g/llvm-dev/c/iOCs6HsAMT0

Is it still true that calls to libdevice can’t be linked in post conversion to PTX? (i.e., the library can’t be linked in during cuLinkComplete). If that’s the case, the linking to all such math functions (that are converted to __nv* calls) has to be solved in the LLVM/NVVM land.

Topic		Replies	Views
Linking frustration -lcuda fails CUDA Programming and Performance	7	27639	November 27, 2009
Questions on installation (Driver and SDK compile CUDA Programming and Performance	1	8781	December 21, 2010
nvcc (nvlink) not linking against device code library CUDA Programming and Performance	7	11529	June 20, 2018
CMake Linking Issues CUDA NVCC Compiler cuda , cudnn , cublas	1	105	November 20, 2025
CMake Linking error while trying to link to a __device__ void foo{} function CUDA Programming and Performance cuda	0	151	February 27, 2025
Nvlink error Confidential Computing	0	142	May 29, 2024
Release Compiles Debug will not CUDA Programming and Performance	6	935	June 15, 2021
Install Problem CUDA Programming and Performance	32	12974	December 17, 2009
error compiling SDK - "/usr/bin/ld: cannot find -lcuda" CUDA Programming and Performance	18	28693	July 30, 2010
a question: LLVM IR for CUDA code CUDA Setup and Installation	0	1158	January 14, 2017

CUDA driver API cuLinkComplete can't find libdevice (nvvm intrinsics bitcode)

error: cuLinkComplete(linkState, &cubinData, &cubinSize) failed with error code device kernel image is invalid[error : Undefined reference to ‘__nv_expf’ in ..

Related topics