nvcc (nvlink) not linking against device code library

sixgig · September 30, 2014, 1:35pm

Background
I am trying to create a library with some common CUDA device and host functions that are used in several places throughout my project. Then i want to link with this library when building executables and other libraries so i don’t have to repeat the code (common use case for a library).

Problem
I am able to build my library just fine, however when i go to link an executable or other library to it using nvcc (nvlink) i get linker error:

nvlink error : Undefined reference to ‘_ZN5iwork13CudaUtilities14bearingDiffMaxEdd’ in ‘main.o’

Steps to repeat
I’ve broken down the problem in to a simple example that highlights the issue.

Structure:

CudaUtilities.cu/h - common utility code that i want to put into my library
main.cu - code for an executable that uses CudaUtilities
build.sh - simple build script to show the error
libcudautilties.a - a static library that will contain the object code from CudaUtilities.cu (created by build script)
a.out - an executable (main.cu) that uses the CudaUtilities functions and tries to link against libcudautilities.a (created by build script)

My “Cuda Utilities” class that is in my library

CudaUtilities.cuh

#ifndef _CUDAUTILITIES_H_
#define _CUDAUTILITIES_H_

#include <cuComplex.h>

class CudaUtilities
{
 public:

  static __device__ __host__ float cuCanglef(const cuFloatComplex x);
  static __device__ __host__ double cuCangle(const cuDoubleComplex x);
      
 private:
  CudaUtilities();
  ~CudaUtilities();
};

#endif // _CUDAUTILITIES_H_

CudaUtilities.cu

#include "CudaUtilities.cuh"

// static
float CudaUtilities::cuCanglef(const cuFloatComplex x)
{
  return atan2f(x.y, x.x);
}

// static
double CudaUtilities::cuCangle(const cuDoubleComplex x)
{
  return atan2(x.y, x.x);
}

Code for my executable that uses the CudaUtilities class

main.cu

#include "CudaUtilities.cuh"
#include <stdio.h>

__global__ void kernel()
{
  cuDoubleComplex cplx = make_cuDoubleComplex(10, 20);
  double a = CudaUtilities::cuCangle(cplx);
  printf("%0.3f\n", a);
}

int main(int argc, char *argv[])
{
  const static dim3 blocks(1, 1);
  const static dim3 threads(16, 16);
  kernel<<<blocks, threads>>>();

  cudaPeekAtLastError();
  cudaDeviceSynchronize();
}

Simple build script (shell) that executes the commands that cause the issue
build.sh

#!/bin/bash

set -o verbose

# build libcudautilities.a containing CudaUtilities.cu
nvcc -arch=sm_20 -I. -dc CudaUtilities.cu
nvcc -arch=sm_20 -I, -dlink CudaUtilities.o -o CudaUtilities.link.o
/usr/bin/ar cr libcudautilities.a CudaUtilities.o CudaUtilities.link.o
/usr/bin/ranlib libcudautilities.a

# build an executable out of main.cu trying to link against libcudautilities.a
nvcc -arch=sm_20 -I. -dc main.cu
nvcc -arch=sm_20 -I. -L. -lcudautilities -dlink main.o -o main.link.o
g++ -L/usr/local/cuda/lib64 -L. main.o main.link.o -lcudadevrt -lcudart -lcudautilities

Error on build

$ ./build.sh 

# build libcudautilities.a containing CudaUtilities.cu
nvcc -arch=sm_20 -I. -dc CudaUtilities.cu
nvcc -arch=sm_20 -I, -dlink CudaUtilities.o -o CudaUtilities.link.o
/usr/bin/ar cr libcudautilities.a CudaUtilities.o CudaUtilities.link.o
/usr/bin/ranlib libcudautilities.a

# build an executable out of main.cu trying to link against libcudautilities.a
nvcc -arch=sm_20 -I. -dc main.cu
nvcc -arch=sm_20 -I. -L. -lcudautilities -dlink main.o -o main.link.o
<b>nvlink error   : Undefined reference to '_ZN13CudaUtilities8cuCangleE7double2' in 'main.o'</b>

Comments

I am able to run nm on my static library and see that my symbol is in there

$ nm libcudautilities.a 

CudaUtilities.o:
0000000000000000 r _ZL15__module_id_str
0000000000000010 b _ZL20__cudaFatCubinHandle
0000000000000057 t _ZL22____nv_dummy_param_refPv
0000000000000000 b _ZL22__nv_inited_managed_rt
000000000000006c t _ZL26__cudaUnregisterBinaryUtilv
00000000000000a5 t _ZL31__nv_cudaEntityRegisterCallbackPPv
0000000000000008 b _ZL32__nv_fatbinhandle_for_managed_rt
000000000000008b t _ZL32__nv_init_managed_rt_with_modulePPv
0000000000000000 t _ZL37__nv_save_fatbinhandle_for_managed_rtPPv
00000000000000ca t _ZL85__sti____cudaRegisterAll_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944dv
<b>0000000000000034 T _ZN13CudaUtilities8cuCangleE7double2</b>
0000000000000016 T _ZN13CudaUtilities9cuCanglefE6float2
0000000000000020 b _ZZL22____nv_dummy_param_refPvE5__ref
0000000000000018 b _ZZL31__nv_cudaEntityRegisterCallbackPPvE5__ref
                 U __cudaInitModule
                 U __cudaRegisterLinkedBinary_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944d
                 U __cudaUnregisterFatBinary
0000000000000000 R __fatbinwrap_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944d
                 U atan2
                 U atan2f
0000000000000000 r fatbinData

CudaUtilities.link.o:
0000000000000000 r _ZL15__fatDeviceText
0000000000000000 b _ZL20__cudaFatCubinHandle
0000000000000010 b _ZL22__cudaPrelinkedFatbins
000000000000005b t _ZL26__cudaRegisterLinkedBinaryPK19__fatBinC_Wrapper_tPFvPPvES2_
0000000000000000 r _ZL78def_module_id_str_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944d
0000000000000020 b _ZZ87__cudaRegisterLinkedBinary_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944dE3__p
0000000000000030 b _ZZL26__cudaRegisterLinkedBinaryPK19__fatBinC_Wrapper_tPFvPPvES2_E16__callback_array
0000000000000028 b _ZZL26__cudaRegisterLinkedBinaryPK19__fatBinC_Wrapper_tPFvPPvES2_E3__i
                 U __cudaRegisterFatBinary
0000000000000015 T __cudaRegisterLinkedBinary_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944d
0000000000000000 t __cudaUnregisterBinaryUtil
                 U __cudaUnregisterFatBinary
                 U __fatbinwrap_48_tmpxft_000014aa_00000000_6_CudaUtilities_cpp1_ii_c49f944d
                 U atexit
0000000000000000 r fatbinData

However if i modify my build script to NOT link against libcudautilities.a and instead link with the object file CudaUtilities.o then everything works just fine.

#!/bin/bash

set -o verbose

# build libcudautilities.a containing CudaUtilities.cu
nvcc -arch=sm_20 -I. -dc CudaUtilities.cu
nvcc -arch=sm_20 -I, -dlink CudaUtilities.o -o CudaUtilities.link.o
/usr/bin/ar cr libcudautilities.a CudaUtilities.o CudaUtilities.link.o
/usr/bin/ranlib libcudautilities.a

# build an executable out of main.cu trying to link against libcudautilities.a
nvcc -arch=sm_20 -I. -dc main.cu
# broken linker line
# nvcc -arch=sm_20 -I. -L. -lcudautilities -dlink main.o -o main.link.o
# hacked linker line
nvcc -arch=sm_20 -I. -L. -dlink CudaUtilities.o main.o -o main.link.o
g++ -L/usr/local/cuda/lib64 -L. main.o main.link.o -lcudadevrt -lcudart -lcudautilities

I’m sure i’m doing something wrong because this sort of thing has to be a common use case.

Any help is greatly appreciated.

-Nick

Edit:

Ideally i would like my library to be a shared library not static. However, after reading the nvcc docs i don’t think this is possible because nvcc (nvlink) ignores .so’s and only searches through .a’s

http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#libraries

Robert_Crovella · September 30, 2014, 4:07pm

Referring to the examples for linking and library creation in a relocatable device code setting:

http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#examples

Either of the following modified scripts seem to work correctly for me:

#!/bin/bash

set -o verbose

# build libcudautilities.a containing CudaUtilities.cu
nvcc -arch=sm_20 -I. -dc CudaUtilities.cu
nvcc -arch=sm_20 -lib CudaUtilities.o -o libcudautilities.a
# nvcc -arch=sm_20 -I. -dlink CudaUtilities.o -o CudaUtilities.link.o -lcudadevrt

# /usr/bin/ar cr libcudautilities.a CudaUtilities.o CudaUtilities.link.o
# /usr/bin/ranlib libcudautilities.a

# build an executable out of main.cu trying to link against libcudautilities.a
nvcc -arch=sm_20 -I. -dc main.cu
nvcc -arch=sm_20 -I. -L. -lcudautilities -dlink main.o -o main.link.o
g++ -L/usr/local/cuda/lib64 -L. main.o main.link.o -lcudadevrt -lcudart -lcudautilities

or:

#!/bin/bash

set -o verbose

# build libcudautilities.a containing CudaUtilities.cu
nvcc -arch=sm_20 -I. -dc CudaUtilities.cu
# nvcc -arch=sm_20 -lib CudaUtilities.o -o libcudautilities.a
# nvcc -arch=sm_20 -I. -dlink CudaUtilities.o -o CudaUtilities.link.o -lcudadevrt

/usr/bin/ar cr libcudautilities.a CudaUtilities.o
/usr/bin/ranlib libcudautilities.a

# build an executable out of main.cu trying to link against libcudautilities.a
nvcc -arch=sm_20 -I. -dc main.cu
nvcc -arch=sm_20 -I. -L. -lcudautilities -dlink main.o -o main.link.o
g++ -L/usr/local/cuda/lib64 -L. main.o main.link.o -lcudadevrt -lcudart -lcudautilities

sixgig · September 30, 2014, 4:41pm

txbob,

Thanks for the quick reply!

I’ve confirmed that the changes to the build script allow me to successfully build my test case.

Now, the next step is to figure out how to get CMake to perform these actions also.

CMake tries to perform the -dlink step when building the static library just like i was in my build script and is causing the compilation to fail.

CMakeLists.txt

# CMakeLists.txt for cuda_link_problem project
project(cuda_link_problem)

# required cmake version
cmake_minimum_required(VERSION 2.8)

set(CMAKE_BUILD_TYPE Debug)
set(CMAKE_VERBOSE_MAKEFILE 1)
link_directories("/usr/local/cuda/lib64")

# packages
find_package(CUDA)
message("CUDA Version: ${CUDA_VERSION}")

# important! don't want -std=c++0x etc. getting added to nvcc
set(CUDA_PROPAGATE_HOST_FLAGS OFF)

# so that we can build source files that have a separate .cu/.cuh
#   https://devtalk.nvidia.com/default/topic/543152/consistency-of-functions-pointer/
set(CUDA_SEPARABLE_COMPILATION ON)

####################################
# libcudautilities.a
set(SOURCES_CUDA_UTILITIES_LIB
  CudaUtilities.cu
)
cuda_add_library(cudautilities ${SOURCES_CUDA_UTILITIES_LIB} STATIC OPTIONS -gencode=arch=compute_20,code=sm_20)

###################################
# cudaexec
set(SOURCES_CUDA_EXECUTABLE
  main.cu
)
cuda_add_executable(cudaexec ${SOURCES_CUDA_EXECUTABLE} OPTIONS -gencode=arch=compute_20,code=sm_20)
target_link_libraries(cudaexec cudautilities)

This cmake build is failing with:

[ 75%] Building NVCC intermediate link file CMakeFiles/cudaexec.dir/./cudaexec_intermediate_link.o
/usr/local/cuda/bin/nvcc -gencode=arch=compute_20,code=sm_20 -m64 -ccbin "/usr/local/gcc-4.6.1/bin/gcc" -dlink /home/sixgig/src/prototype/cuda_link_problem/build/CMakeFiles/cudaexec.dir//./cudaexec_generated_main.cu.o -o /home/sixgig/src/prototype/cuda_link_problem/build/CMakeFiles/cudaexec.dir/./cudaexec_intermediate_link.o
nvlink error   : Undefined reference to '_ZN13CudaUtilities8cuCangleE7double2' in '/home/sixgig/src/prototype/cuda_link_problem/build/CMakeFiles/cudaexec.dir//./cudaexec_generated_main.cu.o'
make[2]: *** [CMakeFiles/cudaexec.dir/./cudaexec_intermediate_link.o] Error 255
make[2]: Leaving directory `/home/sixgig/src/prototype/cuda_link_problem/build'
make[1]: *** [CMakeFiles/cudaexec.dir/all] Error 2
make[1]: Leaving directory `/home/sixgig/src/prototype/cuda_link_problem/build'
make: *** [all] Error 2

CMake Version

$ cmake --version
cmake version 2.8.11.1

Edit: added CMake version

sixgig · September 30, 2014, 5:00pm

It appears i’m not the only one having problems with CMake and separable compilation.

[url]http://public.kitware.com/Bug/view.php?id=15157[/url]

Robert_Crovella · September 30, 2014, 5:14pm

I don’t know jack about CMake. Maybe someone else will be able to help.

Morph208 · July 20, 2015, 9:47pm

Has anyone found a solution for separable compilation with CMake? It’s an old post but it seems that the bug is still there. Any workaround?

Bhupesh1990 · May 17, 2018, 1:50pm

I have encountered thee same problem. Does anybody know how to solve it?

henry.schreiner · June 20, 2018, 8:21am

If you use CUDA as a language support in CMake (CMake 3.8 or 3.9+), this seems to work correctly now. (For any Googlers coming upon this thread). I’d still like to know if there’s a FindCUDA fix, but the CUDA language is fine for now.

(This is even mentioned in the new issue here: FindCUDA.cmake: separate compilation not working as expected (#15157) · Issues · CMake / CMake · GitLab )

Topic		Replies	Views
Building Cross-Platform CUDA Applications with CMake Technical Blog	79	4142	October 27, 2021
linker error /usr/bin/ld: cannot find -lcudart CUDA Programming and Performance	5	25391	February 16, 2011
Problem in the linking step of CUDA executable ".o: file format not recognized; treating as linker script" nvc, nvc++ and nvfortran	1	663	April 17, 2023
CMake Linking error while trying to link to a __device__ void foo{} function CUDA Programming and Performance cuda	0	70	February 27, 2025
CMAKE'ing a CUDA static library of kernels issues CUDA Programming and Performance	7	1214	October 6, 2023
Issue with cudaMemcpyToSymbol and Separable Compilation. CUDA Programming and Performance	10	1256	February 28, 2019
Nvcc compiler not seeing cuCtxSynchronize at linking CUDA NVCC Compiler	2	396	December 13, 2023
Compile error: skipping incompatible /usr/local/cuda/lib64 CUDA Programming and Performance	14	14981	May 25, 2012
OpenACC - CUDA interop with CMake nvc, nvc++ and nvfortran	9	760	October 31, 2023
Ubuntu 16.04 woes and CUDA 8 to 7.5... T_T CUDA Programming and Performance	4	1663	October 16, 2016

nvcc (nvlink) not linking against device code library

Related topics