Is it possible to compile CUDA kernels in a .cu file that are directly callable by multiple different applications after statically linking?

I am getting errors when I try to call some of my own CUDA kernels that are used by two separate applications.

I declare my CUDA functions in a shared header using the “global” prefix and define them in a similarly named “.cu” file. I can compile and link both applications fine but when I call the CUDA kernels I get the “invalid configuration argument” error.

Yes, it works if I write a non-CUDA wrapper to call the CUDA kernels but I want to call the kernels directly. Is this even possible unless they are in the same file or something silly like that?

Thanks.

I don’t know if I have fully understood your question, but I don’t seem to have any difficulty with something like what you describe:

# cat t1.cu
#include <cstdio>
__global__ void k() {printf("hello from kernel\n");}
# cat t1.cuh
__global__ void k();
# cat t2.cu
#include "t1.cuh"

int main(){
        k<<<1,1>>>();
        cudaDeviceSynchronize();
}
# nvcc -o test t1.cu t2.cu
# compute-sanitizer ./test
========= COMPUTE-SANITIZER
hello from kernel
========= ERROR SUMMARY: 0 errors
#

Naturally, I could build another application, similar to test, that also uses the bits from t1.cu.

You might want to give a short but complete example of what is not working for you, just as I have provided a short but complete example of something that does seem to work.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.