You don’t have to DLL. What you do is. You write a .cu file with a C/C++ function in it that copy data, calls the kernels and copy data back. Then you have to forward declare this function in the file where you wish to call CUDA. All you have to do now is compile the .cu file to object code, and then link this .obj file with the file where you call the method.
See that it’s a messy explanation, but somewhat ill today, so can’t figure out an easy way to put it.