NVCC creates host symbols for device functions after explicit class template instantiation

cuda5 · September 5, 2024, 1:29pm

I have a template classes, which I instantiate explicitly. In this case, nvcc creates host symbols for __device__ functions in the object file, which can be linked to other host code, making the program exit with return code 1 immediately.

A reproducer is this test.cu file:

#include <cuda.h>

template <class T> class c
{
public:
  inline int test();
  __device__ int test2();
};

template <class T> int c<T>::test()
{
  return 0;
}

template <class T> __device__ int c<T>::test2()
{
  return 0;
}

template class c<int>;

Compiled with CUDA 12.6.1 via: nvcc -c test.cu -o test.o

I get from nm -C -g test.o

                 U atexit
                 U __cudaInitModule
                 U __cudaRegisterFatBinary
                 U __cudaRegisterFatBinaryEnd
                 U __cudaUnregisterFatBinary
                 U exit
                 U __stack_chk_fail
0000000000000000 W c<int>::test()
0000000000000000 W c<int>::test2()

The test() function should be there as host symbol, but not test2().

cuda5 · September 5, 2024, 7:37pm

Actually, after checking a bit, I see the same also for normal functions, not class member functions, if they are not defined inline.
E.g. __device__ int foo() {return 0;} creates a host symbol, which will exit the application with return code 1.
I assume this is intended. Is there a way to disable it?

Robert_Crovella · September 5, 2024, 7:50pm

Not sure what you mean by that. You haven’t actually provided a complete application that I can see here. Do you mean it will exit if you call the __device__ function from host code?

Because in typical usage, that would be a compile error.

cuda5 · September 6, 2024, 6:43am

Hi Robert,

sorry if my post was not fully clear. To give a bit more context: sure, if I call the device function from host code directly, it gives a compile error. But in our case I had the problem with linking object files or libraries together, which use the same function names for device and for host code.

Here are 2 full examples, that are failing. One for the template case using static linking, and one in the non-template case with shared object files.

For convenience, I have also uploaded all files as tarball here: link

Example 1:
main.cpp:

template <class T> class c
{
public:
  int foo();
};

int main(int, char**)
{
  c<int> x;
  x.foo();
  return 0;
}

test.cu:

#include <cuda.h>

template <class T> class c
{
public:
  __device__ inline int foo();
};

template <class T> __device__ inline int c<T>::foo()
{
  return 0;
}

template class c<int>;

test2.cpp:

#include <cstdio>

template <class T> class c
{
public:
  inline int foo();
};

template <class T> inline int c<T>::foo()
{
  printf("foo from test2.cpp\n");  
  return 0;
}

template class c<int>;

compilation and exection: test.sh:

nvcc -c test.cu -o test.o --compiler-bindir gcc-13 -O0 -g
c++ -c test2.cpp -o test2.o -O0 -g
c++ -c main.cpp -o main.o -O0 -g
c++ -o test main.o test.o test2.o -L /opt/cuda/lib64 -lcuda -lcudart -g
./test
echo RETURN VALUE $?

Now, when I run it, I get:

qon@qon ~/tmp4/example_1 $ ./test.sh 
RETURN VALUE 1

And expected would be:

foo from test2.cpp
RETURN VALUE 0

and with nm I get:

qon@qon ~/tmp4/example_1 $ nm -C -U -g test.o
0000000000000000 W c<int>::foo()

Example 2
main.cpp:

int foo();

int main(int, char**)
{
  foo();
  return 0;
}

test.cu:

#include <cuda.h>

__device__ int foo()
{
  return 0;
}

test2.cpp:

#include <cstdio>

int foo()
{
  printf("foo from test2.cpp");  
  return 0;
}

test.sh

nvcc test.cu --shared -o libtest.so --compiler-bindir gcc-13 -O0 -g
c++ --shared test2.cpp -o libtest2.so -O0 -g
c++ main.cpp -o test -O0 -g -L. -ltest -ltest2 -L /opt/cuda/lib64 -lcuda -lcudart
LD_LIBRARY_PATH+=:. ./test
echo RETURN VALUE $?

Also here, the device function symbol is called, exiting with return value 1:

qon@qon ~/tmp4/example_2 $ ./test.sh 
RETURN VALUE 1

And from nm I get:

qon@qon ~/tmp4/example_2 $ nm -C -U -g libtest.so
000000000000ac00 T foo()

So in both cases, nvcc created a host-visible symbol for the device function. I disassembled it, and it essentially calls exit() with return code 1, which is exactly what happens. I tried the same using clang for CUDA compilation instead of nvcc, and then it behaves as expected.

Robert_Crovella · September 6, 2024, 3:03pm

That’s not permissible in CUDA.

cuda5 · September 6, 2024, 3:34pm

Fair enough, but I am having the problem with different .so files, one of them being not even CUDA. Now, I agree that 2 different host functions in 2 different non-CUDA shared object files would also collide. But I would not have expected device function in one library to affect host linking of another library.

Also, the behavior is not consistent. nvcc creates these host symbols for non-inline functions, but not for inlined functions. But then it suddenly creates them for inlined template member functions if the class is explicitly instantiated. I don’t really understand what is the purpose of this symbol. I’d rather like the linking to fail if the host symbol is missing, than to link with something that just exits the application. This actually took me some time to understand what is going on.

Robert_Crovella · September 6, 2024, 9:13pm

A function that can be and is inlined does not behave like a function in numerous respects. This doesn’t surprise me at all that the symbol disappears in this case.

I suspect that the underlying behavior is arising from needs of the CUDA compilation trajectory. I won’t be able to explain every nuance.

You’re welcome to file a bug. It wouldn’t surprise me if the bug is not actionable due to some specific need of the CUDA compilation trajectory.

cuda5 · September 8, 2024, 9:43am

Thx for the explanation, so for my use case, I started to use objcopy -L and a linker version-script to demote all symbols except those I want to LOCAL, which fully solves the problem for me.

I am still a bit concerned about the inline functions becoming global symbols after explicit template instantiation.
I’ll file a bug report for this as you indicated.

Yuki_Ni · September 10, 2024, 1:55pm

This maps to ticket ID 4853553 . We will get back the conclusion here once it is completed internally .

Yuki_Ni · September 11, 2024, 5:32am

Due to implementation constraints, ‘shadow’ __device__ functions and function template definitions are present in the code sent to the host compiler. This is required for some corner cases e.g. if the address of a __device__ function is initializing a template default argument (templates are usually preserved in the code sent to the host compiler).

As a result, a program cannot have the same symbol defined to be __host__ in one translation unit and __device__ in another i.e. the execution space annotations (__host__ / __device__) must match for all declarations of the symbol across translation units.

One workaround would be to make the device functions have internal linkage by adding ‘static’ or placing them inside unnamed namespace.

Hope this helps . We are closing this as Not a Bug .

system · September 25, 2024, 5:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Linker errors for "__host__ __device__ functions" CUDA Programming and Performance	1	1234	October 12, 2008
NVCC forces c++ compilation of .cu files CUDA Programming and Performance	11	25642	December 11, 2011
unresolved external symbol _main referenced in function ___tmainCRTStartup CUDA Programming and Performance	7	9312	February 22, 2011
C++ and CUDA using CUDA with C++ CUDA Programming and Performance	11	10794	August 27, 2008
How to create a static lib using cuda 5.0-6.5 and VS2010 (problem solved and bug found) CUDA Programming and Performance	15	14049	January 21, 2015
-deviceemu and C++ code in host functions CUDA Programming and Performance	7	9522	June 16, 2009
NVCC linking issue CUDA Programming and Performance	7	3589	June 23, 2009
Link errors for struct/class member functions CUDA 2.1, MSVC 2008, XP x86 CUDA Programming and Performance	0	3698	February 17, 2009
__constant__ stopped working CUDA Programming and Performance	20	4524	October 17, 2009
Linking errors after upgrading to CUDA 5.0 CUDA Programming and Performance	9	2565	October 23, 2014

NVCC creates host symbols for device functions after explicit class template instantiation

Related topics