__host__ __device__
void foo();
__device__
void foo() {
// do something
}
...
int main() {
foo();
...
}
The host implementation for foo() is provided in a separate .so.
The thing is, when I run the program through nvcc, it provides a stub for “foo()” for the host code, which does nothing, instead of linking against the foo in an external .so . How can I tell nvcc not to make up host functions which don’t exist in the source?
to be more precise, you cannot overload functions based strictly on the hostdevice decorators. You may not provide the host function definition separately from the device function definition, for the same function.
I’m not aware of any. You can request new capabilities in CUDA using the bug reporting form linked to a sticky post at the top of this forum. You can mark it “enhancement” or “RFE” which will make it clearer.
It is well possible though that you may then run into issues with other differences between nvcc and clang++. Also, using a toolchain supported by Nvidia may be important to you.