Can __device__ functions be a external definition function?

device functions are only declaration not implementation,
when compile
$(EXEC) $(NVCC) $(GENCODE_FLAGS) -Xcompiler -fPIC --device-link ./tmp1/*.o --output-file link.o
prompt error:Undefined reference to…

But global function only declaration not implementation is OK, and using the nm command global function is a U-type symbol,mean an external definition function。

//compile OK
global kernelA();
void proc()
{
kernelA<<<>>>();
}

//compile ERROR:Undefined reference to deviceA
device deviceA();
global kernelA()
{
deviceA();
}
void proc()
{
kernelA<<<>>>();
}

I am not quite sure what you are asking, but I think you want to look into separate compilation and linking of CUDA device code in more detail. A good starting point may be the following blog post:

Note that __global__ functions are called from host code, therefore tools for handling host code, like nm, can be used to inspect the reference. But nm knows nothing about device code.

i read the blog,but my question is if
device function only declaration not implementation,
when compile it prompt error:Undefined reference to。I want to slove this problem.

device function only declaration not implementation, when compile it prompt error

In your example, kernelA() calls deviceA() but deviceA() is not defined in the same compilation unit. Unless you have another separate compilation unit that defines deviceA(), and link the object file generated from that, the program is obviously incomplete, and an error will result. Here is a simple example of working with more than one compilation unit:

A header file my_device_funcs.h to export deviceA():

#ifndef MY_DEVICE_FUNCS_H_
#define MY_DEVICE_FUNCS_H_

__device__ int deviceA (int x);

#endif // MY_DEVICE_FUNCS_H_

Define the function deviceA() in a file my_device_funcs.cu:

#include "my_device_funcs.h"

__device__ int deviceA (int x)
{
    return x * x;
}

Here is the main program, in a file my_main.cu:

#include <stdio.h>
#include <stdlib.h>
#include "my_device_funcs.h"

__global__ void kernel (int x)
{
    printf ("GPU: %d\n", deviceA (x));
}

int main (void)
{
    kernel<<<1,1>>>(5);
    return EXIT_SUCCESS;
}

Compile the device function into an object file:
nvcc -rdc=true -c -o my_device_funcs.obj my_device_funcs.cu

Compile the main program into an executable, linking the previously generated object file:
nvcc -rdc=true -o my_main.exe my_main.cu my_device_funcs.obj

We now have an executable my_main.exe that when invoked prints:
GPU: 25

Note that for the purpose of linking multiple object files can also be combined into a static or dynamic library.

Yes, the implementation of the device function is in other libraries, and this library only has declarations.
__ Global__ only declaration no implementation compile OK,__ Device__ only declaration no implementation error.

If your code calls a function (it does not matter whether in host or device code) whose object code is not accessible to the linker, that is an error and an error message should result as it is not possible to complete the building of the executable. That is no different from pure host code. If you try to build an executable from

int main (void)
{
    foo (4);
    return 0;
}

an error message will be emitted, for example:

error LNK2019: unresolved external symbol foo referenced in function main
fatal error LNK1120: 1 unresolved externals

__global__ functions behave no different in that aspect. Example:

__global__ void kernelA(int);
int main (void)
{
    kernelA<<<1,1>>>(5);
    return 0;
}

trying to compile the above results in an error message:

tmpxft_000023f4_00000000-18_foobar.obj : error LNK2019: unresolved external symbol "void __cdecl kernelA(int)" (?kernelA@@YAXH@Z) referenced in function main
fatal error LNK1120: 1 unresolved externals

References can remain unresolved when an object file is created, but they must be resolved when linking together the executable. For example, in my example above, I could also build the code like this:

nvcc -rdc=true -c -o my_device_funcs.obj my_device_funcs.cu
nvcc -rdc=true -c -o my_main.obj my_main.cu
nvcc -rdc=true -o my_main.exe my_main.obj my_device_funcs.obj

At the second step, deviceA() is an (as of yet) undefined external symbol. This is resolved in the third step which performs linking to create an executable by including my_device_funcs.obj which defines deviceA().

nvcc --device-c a.cu b.cu
nvcc --device-link a.o b.o --output-file link.o //There was an error in this step
ar -r ./tmp1/abc.so a.o b.o link.o
I need to compile it into a shared library

Do you need a static library or a dynamic library? The use of the archiver ar suggests a static library, but I am confused by abc.so, since the suffix .so suggests a dynamic library to me. The use of ar with a dynamic library seems incorrect to me, gcc -shared or something of that sort should be used.

For a worked example of how to build a static library with CUDA on Linux, see my forum post here

If you need to build a dynamic library, I cannot help. It has probably been ten years or more since I last needed to build a dynamic library on Linux, and I do not have a Linux system at hand right now to refresh my memory. I suspect the title of this thread will not entice many readers to follow it to the end, so consider asking a new question on how to build a dynamic library with CUDA on Linux, as that is the task you are actually trying to accomplish from what I understand now.