Non-inlined device functions for compute capability 2.0?

Hello,

does cuda (nvcc 3.2) support non-inlined device functions for compute capability 2.0?

I could not find an up-to-date answer to that question. Older entries say, that fermi is going to support it…

I use “Cuda compilation tools, release 3.2, V0.2.1221” and compile with arch=sm_20 option and i get an “Error: External calls are not supported (found non-inlined call to …)” error.

nvcc --version

nvcc: NVIDIA ® Cuda compiler driver

Copyright © 2005-2010 NVIDIA Corporation

Built on Thu_Sep__9_17:06:50_PDT_2010

Cuda compilation tools, release 3.2, V0.2.1221

GPU:

GeForce GTX 470

Example:

kernel.cu

#include "devfunc.cuh"

#include "kernel.cuh"

__global__ void kernel() {

	testfunc();  

}

kernel.cuh

#ifndef _MY_KERNEL_CUH_

#define _MY_KERNEL_CUH_

__global__ void kernel();

#endif

devfunc.cu

#include "devfunc.cuh"

__device__ void testfunc() {

}

devfunc.cuh

#ifndef __TEST_FUNC_CUH_

#define __TEST_FUNC_CUH_

__device__ void testfunc();

#endif

main.cu

#include "kernel.cuh"

int main(int argc, char**argv) {

kernel<<<1,1>>>();

return 0;

}

nvcc -c -arch=sm_20 kernel.cu

./kernel.cu(5): Error: External calls are not supported (found non-inlined call to _Z8testfuncv)

related posts:

Incremental compilation in nvcc cuda

Calling a class from cuda-kernel

  1. Edit: Example added

  2. Edit: related entries added

AFAIK non-inlined calls are supported on Fermi, but the functions still have to be in the same source file as there is no linker.

Thank you for your answer tera,

could you give an example? And what do you mean with “there is no linker”?

Here is a much shorter equivalent example to the one above with the same error:

kernel.cu

__device__ void testfunc();

__global__ void kernel() {

	testfunc();  

}

nvcc -c -arch=sm_20 kernel.cu

./kernel.cu(5): Error: External calls are not supported (found non-inlined call to _Z8testfuncv)

There is no linker means what it says, there is no linkage phase for device code. You must declare all device symbols and code in the same compilation unit, which means in the one file, or in files included into the one file.

In your example you have to have the code for testfunc in the same file (or imported into the same file) where it is called.

Thank you, avidday.

So there is a difference between non-inlined functions and external functions, which are linked in the linkage phase?

Then the error message “External calls are not supported (found non-inlined call to _Z8testfuncv)” is very misleading, or am I wrong?

It seems like you confusing several concepts. Whether function pointers are supported and whether functions must be declared inline (in the C++ inline keyword sense) are not mutually exclusive. Similarly, compilation unit and scope limitations and function pointer support are also not mutually exclusive.

You must define device functions within the compilation unit they are called in, and their behavior similar to functions declared with the c++ inline keyword. And you can have function pointers to device functions. There is no contradiction between the three.

You are right, actually I implied that compilation units and non-inlining functionality are the same for device code. I was hoping that device functions could be divided in different compilation units and compiled to object files and linked later to a kernel, similar to C++ compiling and linking. But like you already said it is not supported for device code, since there is no linkage phase and only one compilation unit. So incremental compilation for device code is not possible yet. I assumed that the non-inline functionality for compute capability 2.X devices would bring the incremental compilation for device code - but like you already this are two different concepts.

Thank you for your help, avidday!