Question on using device functions. What is the standard practice when using a device function? I had been writing my functions in files DeviceFun1.cu, DeviceFun2.cu and then ran into the issue at compile time that nvcc wanted all my device functions inlined.
So what should I do? either:
Rename everything global and call from host
Dump all my files above my kernel and deal with it
I don’t like either. Why can’t the compiler just link everything and figure out the inlining during the linking stage?
Specifically I receive the compiler error: “Error: External calls are not supported”…
Any help or pointers on how to develop maintainable cuda kernels that are > 500 lines?