I need to display the results of calculations I did with CUDA into the screen.
I have a dll that uses CUDA and calculates the results. This dll copies the data from video memory into CPU memory which its address is passed by the dll interface.
The program that use this dll is written in C# and uses opengl.
Each time I want to draw the results I create recreate the vertex buffer in opengl in the C# program.
It would be much faster if I could simply draw the results I calculated in CUDA from the GPU memory, without copying it into CPU memory and then back into GPU memory.
Is there a way to do it when I have my application divided into a dll and a C# program?
The C# program is the GUI.
You can create a pixel buffer object and let your CUDA kernel write into it. Then you can display it. That’s was wumpus meant by CUDA<->GL or DX interoperability. So you can get rid of the copies to and from CPU.
The boxFilter Example in the SDK makes the interoperability very clear.
While I have not done this myself, you should be able to pass the VBO id from your C# code to a function inside your .dll that does CUDA work. That function should then do the straightforward interop - register, map, calculate, unmap, unregister. Once the function returns control to your C# code, you should be able to issue OpenGL calls using the updated VBO.
Mesher,
What library are you using to deal with OpenGL through C#? I was trying something what Paulius mentioned in the last post, but I got stuck when the .Net gl wrapper did not let me use the pbo the way i wanted.
I read the boxFilter example, it seems almost the same as mapping vertexBuffer into CUDA memory.
So I don’t see what is the difference?
My solution to my problem is to send the window handle DWORD from the C# executable to my dll.
Then use this handle to create a direct3DDevice and draw into it with directX.
After I fill a directX vertex buffer using CUDA.
The problem is even greater because the window procedure is called in a different thread then the GPGPU calculation function.
So I will need to use some sort of a mutex for this to work properly.