ERROR: EXTERNAL CALLS NOT SUPPORTED

I have a CUDA program with one kernel and several device functions called by the kernel. It all compiles (variables all declared and aligned in function declarations and specifications etc).

But I get just the one following error for only one of the device functions;

“error : external calls not supported (found non-inlined call to _Z12 functionname ifPfS_S_S_S_S…”

What does this mean and how can the error be rectified?

Would it better to ditch the device functions and have one massive kernel?

well it looks like on call to a device function did not get inlined. And all functions should get inlined. Why? I have no clue, if you add -keep, you might see a reason in the .ptx file.

Are the device function in the same compilation unit as the kernel? I would guess that if you compiled multiple .cu files separately with nvcc, then the device functions would be “external”.

Yes, all device functions are in the one .cu file with the kernel and host main function.

How should I

  1. declare and define a device function?

  2. call a device function from the kernel?

Why should a function be inlined explicitly?

OK, I guess my intuition was incorrect. We’ll really need to see a minimal code that produces this error to help you further.

To (1) and (2): just like in normal C

__device__ float myfunc(float a)

    {

    return a*2.0f;

    }

__global__void mykernel(...)

    {

    ....

    float c = myfunc(d);

    ....

    }

The GPU doesn’t have the full stack system that normal CPUs do for passing arguments to functions. There is a call instruction so that functions are not required to be inlined but inlining opens up a lot of opportunites for register optimization so nvcc inlines all device function calls be default.

Hi,

I encounter exactly the same issue. Did you finally found out the solution? I am calling device functions from my kernel and I have the same error message, without any other compilation errors.

Thanks

I also have the same problem.

What should I do? Does someone have a solution?

Can u pliz code segment which may reproduce the error?

My codes fail to compile with the same error. What I do is to put the kernel in a file (myKernel.cu) and the functions it calls in another file (myFunc.cpp). In myFunc.cpp, I add device and host before those functions since they are called by both GPU and CPU. Then the same error comes out.

//file1: myFunc.cpp

__device__ __host__ myFunc1( int *a )

{

return 2*a;

}

....

//file2: myKernel.cu

__global__ theKernel()

{

...

myFunc1();

...

}

I try to add “extern” before myFunc1, but it does not work. And is there any compile configuration trick I should notice?

I have the same error too, but in my case the error pops up when i try to create a structure variable in the device function.

[codebox]

device float waveletTransformMethod2(int i, int j, TILE_C *tileMother, WAVELET_C *wave) {

float valToReturn = 0.0;	

TILE_C tileForTransform;

createTileC(j, i,&tileForTransform);

............

[/codebox]

The problem crops up for the line TILE_C tileforTransform… any ideas??

Thanks

This is a total guess here as I’m not really an accomplished C programmer and even less so a CUDA one, but maybe it’s because this device function is not aware of the TILE_C type because the type declaration is not in the .cu file, even though it is being passed into the function as a parameter?

Probably way off, but just in case…

well i’m not getting the same error message as u guys, but when i call a device function from the kernel, the compiler complains, ‘identifier (function name) is undefined’. :blink:

can some1 help me out :)

Can you elaborate on where you have declared your functions, are they all in a single file, or in multiple files??

Posting your code will be helpful.

hey i figured out the problem after i posted about it. i hadnt written the device function prototype before writing the fucntion body. i wasnt sure device functions required a declaration, but once i wrote the prototype, the code compiled. thanks for your reply. :thumbup:

PS: the above holds only for functions with device qualifiers and not for kernels (global).

I had exactly the same problem and my solution was: right-click on the project->properties->Runtime-API->GPU.

There are the settings for GPU-Architecture(1),GPU-Architecture(2) and GPU-Architecture(3). They are set to

-sm 1.0 (GPU-Architecture(1))

-sm 2.0 (GPU-Architecture(2))

-0 (GPU-Architecture(3)).

I have only one GPU in my machine with CUDA capability major/minor version number 1.2.

So i set GPU-Architecture(2) on 0 and GPU-Architecture(1) on sm 1.2. Now it worked and the problem is solved.

I had exactly the same problem and my solution was: right-click on the project->properties->Runtime-API->GPU.

There are the settings for GPU-Architecture(1),GPU-Architecture(2) and GPU-Architecture(3). They are set to

-sm 1.0 (GPU-Architecture(1))

-sm 2.0 (GPU-Architecture(2))

-0 (GPU-Architecture(3)).

I have only one GPU in my machine with CUDA capability major/minor version number 1.2.

So i set GPU-Architecture(2) on 0 and GPU-Architecture(1) on sm 1.2. Now it worked and the problem is solved.

I was experiencing this error with Visual Studio 2010 and Visual Studio 2008 when trying to compile samples from “Cuda by Example” which inlcuded the device calls. I am using Parallel Nsight 1.5 and was always getting the EXTERNAL CALLS NOT SUPPORTED" exception when compliling. After a long and frustating voyage of trial and error, this response above gave me an idea that there may be differences between the debugging capability of global kernals vs device methods, such as the one below:

device int julia( int x, int y ) {

const float scale = 1.5;

float jx = scale * (float)(DIM/2 - x)/(DIM/2);

float jy = scale * (float)(DIM/2 - y)/(DIM/2);

cuComplex c(-0.8, 0.156);

cuComplex a(jx, jy);

int i = 0;

for (i=0; i<200; i++) {

    a = a * a + c;

    if (a.magnitude2() > 1000)

        return 0;

}

Basically, i disabled the Generate GPU debug information where device functions exist, and things work. To disable, right click on your code file, Properties==>Configuration properties ==> and in VS 2010 go to CUDA C++ -->Device–> Generate GPU debug information --> NO

in VS 2008, it is under Runtime API==>GPU==>Generate GPU debug information --> NO

Try adding device to the constructor line inside the struct:

struct cuComplex {

    float   r;

    float   i;

    cuComplex( float a, float b ) : r(a), i(b)  {}  //This line

so that it reads:

__device__ cuComplex( float a, float b ) : r(a), i(b)  {}

Thanks yiguro

I am having a similar problem. Can u explain further, probably with some code?