I’ve found that using cudaGLRegisterBufferObject/cudaGLUnregisterBufferObject has quite a high overhead and I’m wondering exactly what they do and under what conditions it is safe to just leave a buffer registered. If I take the documentation (“as a source for OpenGL drawing commands”) literally it seems to indicate that this can only be done for VBOs (and I guess some of the new G80 features like PaBOs and TBOs).
I can see that there could be synchronisation problems with having OpenGL write to a registered buffer object, but what about as a source in a non-drawing command e.g.
You can use it with VBOs and PBOs. The postProcessGL sample posted elsewhere in this forum (and in the next upcoming release of the SDK) covers the latter.
Yes, it’s an expensive call, and usually you don’t need to call it every frame. postProcessGL does call it every frame, which is not correct, but there is currently a bug in CUDA that makes it necessary. It will be fixed in the next release of the CUDA toolkit, and we will update the sample.
What I really want to do is use glMapBuffer to get a pointer which I can then memcpy to the host using a separate thread, so that one (CPU) core is launching calculation i+1 while the other core is pulling the results from calculation i. If I understand you right, I can’t safely do this directly (without register/unregister), but I could use glTexSubImage to make a copy to a texture, then use glGetTexImage to copy that copy to host memory, without per-frame register/unregister?
While I’m asking about this, is CUDA thread-friendly enough to let me just do this with cudaMemcpy in a separate thread (obviously with suitable synchronisation)?
Never mind, I missed the bit in the documentation about addresses being specific to a context, so that won’t work.
But I’d still like to know whether glMapBuffer (with GL_READ_ONLY) can be safely used with a registered buffer, since it appears to work, but I don’t want to get bitten by race conditions later.