I am developing an application for multi-view stereo reconstruction and I plan to run some of the algorithms on CUDA for better performance. The front end of the application will be developed in C#.
The issue I am having is calling CUDA code inside the C# code/forms. What would be the best way to handle this ?
It’s pretty easy, actually. Just decide whether you want to develop your kernels on top of the driver only, or the runtime. Then, you just use nvcc to compile your kernels to a .cubin, and P/Invoke the driver (or runtime) DLL functions that allocate memory, launch the kernels, etc.
The P/Invoke signature for the driver functions will be like this (for the cuMemAlloc function):
public static extern CUResult cuMemAlloc(ref CUdeviceptr dptr, uint bytesize);
Personally, I’d write a little wrapper class that has all the P/Invoke functions that you need in it. Or, you can try GASS’ CUDA.NET library…I think they have all the necessary functionality already in there.
Sarnath, you are correct there. P/Invoke calls incur about a 20-40ms overhead due to the data marshalling between the managed and unmanaged code. So, if you need to call some very short CUDA kernels repeatedly from C#, your best bet is to make a separate unmanaged DLL in C# that handles that part of things, and then you can P/Invoke that DLL instead of the CUDA driver/runtime itself (though that does take the GASS option off of the table).
You can try and compare the Kappa library also (psilambda.com). Since it has a scheduler, on Fermi hardware it can give better performance than the CUDA APIs (for straight C/C+±-if you compared it to using P/Invoke for individual CUDA API calls the Kappa library will blow your socks off) and is a lot easier to use (but not free for unlimited use). There are C# and Visual Basic examples in the installer for KappaCUDAnet.