I have a CUDA program with one kernel and several device functions called by the kernel. It all compiles (variables all declared and aligned in function declarations and specifications etc).
But I get just the one following error for only one of the device functions;
“error : external calls not supported (found non-inlined call to _Z12 functionname ifPfS_S_S_S_S…”
What does this mean and how can the error be rectified?
Would it better to ditch the device functions and have one massive kernel?
well it looks like on call to a device function did not get inlined. And all functions should get inlined. Why? I have no clue, if you add -keep, you might see a reason in the .ptx file.
Are the device function in the same compilation unit as the kernel? I would guess that if you compiled multiple .cu files separately with nvcc, then the device functions would be “external”.
OK, I guess my intuition was incorrect. We’ll really need to see a minimal code that produces this error to help you further.
To (1) and (2): just like in normal C
__device__ float myfunc(float a)
{
return a*2.0f;
}
__global__void mykernel(...)
{
....
float c = myfunc(d);
....
}
The GPU doesn’t have the full stack system that normal CPUs do for passing arguments to functions. There is a call instruction so that functions are not required to be inlined but inlining opens up a lot of opportunites for register optimization so nvcc inlines all device function calls be default.
I encounter exactly the same issue. Did you finally found out the solution? I am calling device functions from my kernel and I have the same error message, without any other compilation errors.
My codes fail to compile with the same error. What I do is to put the kernel in a file (myKernel.cu) and the functions it calls in another file (myFunc.cpp). In myFunc.cpp, I add device and host before those functions since they are called by both GPU and CPU. Then the same error comes out.
This is a total guess here as I’m not really an accomplished C programmer and even less so a CUDA one, but maybe it’s because this device function is not aware of the TILE_C type because the type declaration is not in the .cu file, even though it is being passed into the function as a parameter?
well i’m not getting the same error message as u guys, but when i call a device function from the kernel, the compiler complains, ‘identifier (function name) is undefined’. :blink:
hey i figured out the problem after i posted about it. i hadnt written the device function prototype before writing the fucntion body. i wasnt sure device functions required a declaration, but once i wrote the prototype, the code compiled. thanks for your reply. External Image
PS: the above holds only for functions with device qualifiers and not for kernels (global).
I was experiencing this error with Visual Studio 2010 and Visual Studio 2008 when trying to compile samples from “Cuda by Example” which inlcuded the device calls. I am using Parallel Nsight 1.5 and was always getting the EXTERNAL CALLS NOT SUPPORTED" exception when compliling. After a long and frustating voyage of trial and error, this response above gave me an idea that there may be differences between the debugging capability of global kernals vs device methods, such as the one below:
for (i=0; i<200; i++) {
a = a * a + c;
if (a.magnitude2() > 1000)
return 0;
}
Basically, i disabled the Generate GPU debug information where device functions exist, and things work. To disable, right click on your code file, Properties==>Configuration properties ==> and in VS 2010 go to CUDA C++ -->Device–> Generate GPU debug information → NO
in VS 2008, it is under Runtime API==>GPU==>Generate GPU debug information → NO