CUDA driver API cuLinkComplete can't find libdevice (nvvm intrinsics bitcode)


I’m using CUDA 11.5 on an Ubuntu 20.04 and in using the CUDA driver API’s cuLinkComplete, I get the following link error:

error: cuLinkComplete(linkState, &cubinData, &cubinSize) failed with error code device kernel image is invalid[error : Undefined reference to ‘__nv_expf’ in …

Now __nv_expf is an intrinsic that is part of CUDA’s libdevice library and it’s in the form of NVVM bitcode which I can see at /usr/local/cuda-11.5/nvvm/libdevice/libdevice.10.bc. However, somehow, the link above is failing. I am using the right (CUDA 11.5) driver API.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Sep_13_19:13:29_PDT_2021
Cuda compilation tools, release 11.5, V11.5.50
Build cuda_11.5.r11.5/compiler.30411180_0

Does anyone have insights into why it fails to find/link with libdevice here? I don’t have this issue on another Ubuntu 20.04 system where I use CUDA 11.2.

A gentle ping on this one. (This isn’t specific to __nv_expf but also other such intrinsics in that library.)

You may wish to provide a complete test case.

1 Like

The link error that I get comes from in-memory NVVM/LLVM IR that is on its way to being JITTed using the CUDA driver API. There isn’t a way for me to provide a reproducible test case (without the reproducer having to download and build LLVM/MLIR), but here is the call to the CUDA driver API that fails to find the necessary intrinsic:

@Robert_Crovella - please do let me know if it would still help provide a way to reproduce this. I could provide an LLVM IR blob that when run through mlir-cpu-runner (part of the LLVM project repo) should yield that error.

I wouldn’t be able to spend any time with it without a full test case. If you file a bug, the bug team will certainly ask for a full test case.

Do as you wish, of course.

1 Like

I put together a complete test case to easily reproduce this. I’ve also minimized the IR to the part of interest. To reproduce this, the following command on the attached file can be run:

$ bin/mlir-opt expf_reproducer.mlir  -pass-pipeline='gpu.module(strip-debuginfo,gpu-to-cubin{chip=sm_80})'
<unknown>:0: error: cuLinkComplete(linkState, &cubinData, &cubinSize) failed with error code unknown error[error   : Undefined reference to '__nv_expf' in 'exp_kernel']

Building the binary mlir-opt is straightforward from the LLVM trunk:

  1. Check out LLVM from the official repo:
    $ git clone
  2. Configure and build:
$ mkdir llvm-project/build
$ cd llvm-project/build
$ cmake -G Ninja ../llvm \
# Using clang and lld speeds up the build, we recomment adding:
$ cmake --build . --target mlir-opt
# or ninja mlir-opt
# mlir-opt can be found in the bin/ directory

expf_reproducer.mlir (2.3 KB)

I dug in a bit more and it appears that this issue may have nothing to do with cuLinkComplete (please see the question at the very end below though). CUDA’s libdevice is an LLVM bitcode library and it has to be linked in before PTX is generated. The library resides at /usr/local/cuda-11.6/nvvm/libdevice/libdevice10.bc for example.) Linking with libdevice is briefly described here:
and there is a great deal of informative and detailed discussion on this on the LLVM dev list here (up until six months ago):

Is it still true that calls to libdevice can’t be linked in post conversion to PTX? (i.e., the library can’t be linked in during cuLinkComplete). If that’s the case, the linking to all such math functions (that are converted to __nv* calls) has to be solved in the LLVM/NVVM land.