My system has a Tesla C1060 and a Quadro FX1800. I would like to run a CUDA kernel on the Tesla device and display the results using OpenGL on the Quadro device. The programming guide indicates that this is possible but does not explain how to do it.
Could someone please explain how to do this, or provide a code sample?
This is without a doubt possible. For my cortex neural simulations, I perform all of the simulation calculations on my Tesla C1060 and I display the results on my Quadro 3700 graphics board. There are a number of ways to do this. In my case I wrote a Windows application that uses DirectX/3D for all of the display. The application has to identify and open the C1060 device for computing. In my case I use some data structures to move large blocks of image texture data back and forth between the CPU/DisplayGPU and the Tesla-GPU and back to the CPU-DisplayGPU. I capture video on the CPU/DisplayGPU, build a texture from the video, send the texture bitmap to the C1060 GPU, copy the bitmap data to C1060 GPU device memory arrays. Several kernels are launched sequentially on the C1060 GPU by the host CPU to do the simulations and then the bitmap data is copied from C1060 GPU device memory back to CPU host memory where is is reintegrated into a DisplayGPU texture object and displayed in 3D by the CPU and display graphics board. I do not use any special CUDA texture types nor interoperability features in my program.
This is fairly involved and thus not conducive to posting code. But, it all works great. The same thing could be done with OpenGL. You need to be up to speed on OpenGL or DirectX before trying this or you will be fighting too many battles at once. In my case I had all of this working on the CPU for years before adapting it to the Tesla C1060. I consider this advanced programming, not beginning.
I used the CUDA demo programs to learn how to do all of this. There is at least 1 demo (I’m not at my development machine) that shows how to identify all of the CUDA boards in the system. I modified that technique. You just have to do this once for the program, just like you do for a normal graphics device. For the display, you just do your usual OpenGL device open procedure that you would do for a normal Windows program. I put my kernel calls in a frame loop on the CPU.
To display, I copy my CUDA device bitmap array from device to host bitmap pointer. Since this bitmap is associated with a texture (this is all DirectX), I can then display the texture on the graphics board with normal DirectX API calls. I do not use vertex buffers. I just create a square “projection screen” out of 6 vertices and I display the texture on that. In my program I can determine what data gets written by CUDA into the bitmap via keypress communications from the CPU program to CUDA, so I can use this texture trick for displaying whatever parametric data that I wish out of my simulation, membrane potentials, conductances, synaptic weigths etc.
Like I mentioned, I looked at the CUDA demo programs a lot. In fact, I took the turbulent particles demo and built it and modified it until I was comfortable with how things worked in CUDA. This demo does not use my texture bitmap method, but might be more appropriate to your use.