I figured that parts of my code perform way better on classic-GPGPU using OpenGL (mainly it’s because I am making use of the blend stage).
So I used the regular OpenGL render to texture mechanism using FBOs to perform the OpenGL part of the computation. After that I use Buffer Objects to map these OpenGL textures into Cuda’s global memory (i.e. glBindBuffer, glReadPixels). However, this seems to break down performance a lot and I anticipate that it is due to the glReadPixels call.
So I was wondering, if there is a way of avoiding the read pixels thing and using OpenGL textures within a Cuda context right away?
Do you have any idea if it will be possible soon? (“Not currently” makes me think positively)
Or is there any possibility to leverage the blending stage directly from CUDA?
My problem is that I have a super large array from where I read and another to where I write. However reads and writes are completely random and writes might occur several times for one and the same memory location, thus race conditions can occur if you don’t create a proper schedule ahead of time (on the CPU). The blendstage however is able to deal with these conditions for free (well if it wasn’t for the memory copy futher down my pipe…)
Not anytime soon. It’s not in CUDA 1.1, certainly.
Note that in CUDA you could implement the equivalent of blending by doing a read/modify/write to global memory (as long as each thread is only writing to its own location).
The thing is… blending is atomic. However, on 8800, CUDA doesn’t support atomic ops yet. For things like dynamic sparse matrix ~ static dense vector mul, it’s really really handy.
Also, it’s sometimes easier to do list operations using geometry shaders, especially for one-to-many maps.
I’m longing for the day GL and CUDA can be used interleavedly without memcpy…