CUDA - C# interop

Hi,
I am developing an application for multi-view stereo reconstruction and I plan to run some of the algorithms on CUDA for better performance. The front end of the application will be developed in C#.
The issue I am having is calling CUDA code inside the C# code/forms. What would be the best way to handle this ?

Thanks in advance.

Regards,
iat

It’s pretty easy, actually. Just decide whether you want to develop your kernels on top of the driver only, or the runtime. Then, you just use nvcc to compile your kernels to a .cubin, and P/Invoke the driver (or runtime) DLL functions that allocate memory, launch the kernels, etc.

The P/Invoke signature for the driver functions will be like this (for the cuMemAlloc function):

[DllImport("nvcuda")]

public static extern CUResult cuMemAlloc(ref CUdeviceptr dptr, uint bytesize);

Personally, I’d write a little wrapper class that has all the P/Invoke functions that you need in it. Or, you can try GASS’ CUDA.NET library…I think they have all the necessary functionality already in there.

There is a C# CUDA wrapper library that works fairly well:
[url=“http://www.gass-ltd.co.il/en/products/cuda.net/”]http://www.gass-ltd.co.il/en/products/cuda.net/[/url]

Yeah. GASS should do the trick.

Alternatively,

You can write your main code in CUDA (as DLL) and use “interop” calls to pass control to un-managed code.

This way C# can live without knowing CUDA.

I have done a similar thing before and it worked.

Hi,

Thank you to all for the replies.

I tried out GASS and it seems to do the trick. However is there any drop in performance in using a wrapper ?
I’ll try writing the dlls. Is there any example or how to that you can point to.

Thanks

Regards,
iat

There may be a drop if you use the wrappers heavily… (like inside a FOR loop repeatedly many times).

I vaguely remember reading C# documentation that says that managed to un-managed calls (Platform Invoke Services) are costly… Check out MSDN

Sarnath, you are correct there. P/Invoke calls incur about a 20-40ms overhead due to the data marshalling between the managed and unmanaged code. So, if you need to call some very short CUDA kernels repeatedly from C#, your best bet is to make a separate unmanaged DLL in C# that handles that part of things, and then you can P/Invoke that DLL instead of the CUDA driver/runtime itself (though that does take the GASS option off of the table).

Profquail,

Thanks for endorsing… ! Same as what I had thought before.

Hi,

Think I’ll go with using dll for the moment and give CUDA.net a try little later and do a comparison.

Thanks again for the info.

Regards,
iat

Hi,

I am unable to use the cuda.LoadModule(…) though the .cubin is in the path. Am using vs2008 and the asyncAPI example. Any help please.

You can try and compare the Kappa library also (psilambda.com). Since it has a scheduler, on Fermi hardware it can give better performance than the CUDA APIs (for straight C/C+±-if you compared it to using P/Invoke for individual CUDA API calls the Kappa library will blow your socks off) and is a lot easier to use (but not free for unlimited use). There are C# and Visual Basic examples in the installer for KappaCUDAnet.

Did the gass cuda.net dll work for you? I am having trouble using the cuda.loadmodule(…) function. It crashes everytime i run the code throwing cudaexception.

am using cuda 3.0, cuda.net 2.37 and vs2008. any help appreciated.

I had the same problem. In my case my new videocard (GTX480) was not able to load cubin files compiled for compute capability 1.1. I changed the compiler switch to -arch sm_20 and it worked again.

Strange though, I would expect it was backwardscopatible.

I had the same problem. In my case my new videocard (GTX480) was not able to load cubin files compiled for compute capability 1.1. I changed the compiler switch to -arch sm_20 and it worked again.

Strange though, I would expect it was backwardscopatible.