CUDA Driver API: OpenGL Contexts

So I am trying to implement a new version of the CUDA Runtime API in Ocelot using the CUDA Driver API 3.0 and am having problems with opengl contexts and cuGLCtxCreate. My first idea was to try to create an opengl context for every application, and then fall back on cuCtxCreate if cuGLCtxCreate failed. However, cuGLCtxCreate segfaults (rather than returns an error) if it is called before glInit() or glutInit() in the host application.

My first question is whether or not this is how cuGLCtxCreate is supposed to work? It seems kind of fishy for an api call to segfault like this.

To get around this, I tried lazily allocating an opengl context on the first open-gl related cuda call. This works for some simple applications (all of the cuda sdk except for volumerender), but it fails in cases where some resources were allocated on a regular context, an opengl buffer was allocated on another context, and a kernel accesses both. This is because both contexts cannot be active at the same time, and resources cannot be shared between contexts.

My only recourse at this point I think is to find some way of having multiple contexts active at the same time (probably not possible), or manually migrating state from one context to the new one when an opengl call is made (difficult),

Any suggestions?

My experience with X11, threading, gl and cuda contexts:

Same thread:
Create X11 Display connection. glX and gl functions use this X11 connection.
create glcontext
make glcontext current

It is correct to seg fault (although it could have better error handling) because there is no current gl rendering context that cuGLCtxCreate requires. That context was created by glutInit.
To get around this, one would check for a current gl rendering context first. If it is are valid, then call cuGLCtxCreate.

For me I serialize all gl and cuda calls in the same application. Multiple threads seem to be supported, but the underlying Display must be lock guarded. It may work with multiple threads if there are multiple gpus and Display connections, each having their own XLockDisplay and current context. But you cannot put those contexts in the context shared list because that is a X11/XGL call and each X11 Display is somewhat invisible to each other.

edit: the function call for getting the current context is glXGetCurrentContext. Also glxgetprocaddress may be useful.

Much appreciated. glXGetCurrentContext seems like a reliable way to determine if cuGLCtxCreate will succeed.