NVCC linking __host__ __device__ functions

Hello, I have a program similar to this:

__host__ __device__
void foo();

void foo() {
    // do something


int main() {

The host implementation for foo() is provided in a separate .so.

The thing is, when I run the program through nvcc, it provides a stub for “foo()” for the host code, which does nothing, instead of linking against the foo in an external .so . How can I tell nvcc not to make up host functions which don’t exist in the source?

It’s not legal in cuda to do this:

__host__ foo() {};

__device__ foo() {};

or any variant of it. What you are trying to do is illegal. You cannot provide the host definition separately from the device definition.


to be more precise, you cannot overload functions based strictly on the host device decorators. You may not provide the host function definition separately from the device function definition, for the same function.

That is quite limiting, are there plans to add support for this?

I’m not aware of any. You can request new capabilities in CUDA using the bug reporting form linked to a sticky post at the top of this forum. You can mark it “enhancement” or “RFE” which will make it clearer.

If this really bothers you, you may try compiling your code with clang++ instead, which handles host and device attributes slightly differently.

It is well possible though that you may then run into issues with other differences between nvcc and clang++. Also, using a toolchain supported by Nvidia may be important to you.