That walks you through the whole process (assuming you are using Visual Studio).
The CUDA code needs to be in a separate .cu file and called via an extern “C” function. You can call cuBLAS,cuFFT or cuSPARSE directly from a mexFunction however.
You can do a two-step process of first having nvcc generate the fatbin and stuff it into a .cpp file, then Matlab’s mex will be happy to compile/link that. Here’s a slightly genericized snippet from one of my Makefiles:
I’ve had this complain if cudaCommon.o contains just device rather than deviceinline functions though, the linker claimed that I had multiple instances of a function definition.