Hi Robert,
sorry if my post was not fully clear. To give a bit more context: sure, if I call the device function from host code directly, it gives a compile error. But in our case I had the problem with linking object files or libraries together, which use the same function names for device and for host code.
Here are 2 full examples, that are failing. One for the template case using static linking, and one in the non-template case with shared object files.
For convenience, I have also uploaded all files as tarball here: link
Example 1:
main.cpp:
template <class T> class c
{
public:
int foo();
};
int main(int, char**)
{
c<int> x;
x.foo();
return 0;
}
test.cu:
#include <cuda.h>
template <class T> class c
{
public:
__device__ inline int foo();
};
template <class T> __device__ inline int c<T>::foo()
{
return 0;
}
template class c<int>;
test2.cpp:
#include <cstdio>
template <class T> class c
{
public:
inline int foo();
};
template <class T> inline int c<T>::foo()
{
printf("foo from test2.cpp\n");
return 0;
}
template class c<int>;
compilation and exection: test.sh:
nvcc -c test.cu -o test.o --compiler-bindir gcc-13 -O0 -g
c++ -c test2.cpp -o test2.o -O0 -g
c++ -c main.cpp -o main.o -O0 -g
c++ -o test main.o test.o test2.o -L /opt/cuda/lib64 -lcuda -lcudart -g
./test
echo RETURN VALUE $?
Now, when I run it, I get:
qon@qon ~/tmp4/example_1 $ ./test.sh
RETURN VALUE 1
And expected would be:
foo from test2.cpp
RETURN VALUE 0
and with nm
I get:
qon@qon ~/tmp4/example_1 $ nm -C -U -g test.o
0000000000000000 W c<int>::foo()
Example 2
main.cpp:
int foo();
int main(int, char**)
{
foo();
return 0;
}
test.cu:
#include <cuda.h>
__device__ int foo()
{
return 0;
}
test2.cpp:
#include <cstdio>
int foo()
{
printf("foo from test2.cpp");
return 0;
}
test.sh
nvcc test.cu --shared -o libtest.so --compiler-bindir gcc-13 -O0 -g
c++ --shared test2.cpp -o libtest2.so -O0 -g
c++ main.cpp -o test -O0 -g -L. -ltest -ltest2 -L /opt/cuda/lib64 -lcuda -lcudart
LD_LIBRARY_PATH+=:. ./test
echo RETURN VALUE $?
Also here, the device function symbol is called, exiting with return value 1:
qon@qon ~/tmp4/example_2 $ ./test.sh
RETURN VALUE 1
And from nm
I get:
qon@qon ~/tmp4/example_2 $ nm -C -U -g libtest.so
000000000000ac00 T foo()
So in both cases, nvcc created a host-visible symbol for the device function. I disassembled it, and it essentially calls exit()
with return code 1, which is exactly what happens. I tried the same using clang for CUDA compilation instead of nvcc, and then it behaves as expected.