Different threads in runtime api

Does anybody know wether it is possible to use multiple host threads with runtime api?

I have the following problem.

I create an OpenGL texture, then I register and map it to the address space of CUDA. This works fine so far. But if the actual host thread is terminated and another thread is created doing the same procedure for some reason that fails,
and even worse: it freezes my computer.

the textures are defined in a seperate thread
here some snippets of the code that is executed by the host threads:

wglMakeCurrent(m_hDC, m_hRC);

// the next lines make my computer freeze when executed by the second host thread
// after the first host thread is destroyed
cudaGLRegisterBufferObject(m_pbuffer);
cudaGLMapBufferObject((void**)&data, m_pbuffer);
runCuda(…); // this is the function wich is doing some processing
cudaGLUnmapBufferObject(m_pbuffer);
cudaGLUnregisterBufferObject(m_pbuffer);

SwapBuffers(m_hDC);

wglMakeCurrent(NULL, NULL);

After the first thread ist destroyed the OpenGL texture becomes inaccessible by other host threads.

Each thread gets its own CUDA context. When you terminate the thread where you initialized CUDA, it automatically cleaned up all the resources used by it on the GPU. Even if you do not terminate that thread, device pointers, texture references and other resources are specific to that context and cannot be shared with another (like protected memory on the CPU).

yes, that’s right. But for the new thread the OpenGL texture or more it’s buffer is also mapped to the actual thread’s address space. So far this must be okay, or let me say it this way: I haven’t found any sentence in the CUDA manual why it should not work.

I guess If you try the same without CUDA you will have the same problem

Not at all. Without CUDA that works perfectly.

Because of wglMakeCurrent(m_hDC, m_hRC);

I took some investigation on my problem described upper.

Unfortunately it did not result in a solution for the problem,

but now it makes me think that it’s a bug in CUDA.

What I figured out is that binding OpenGL buffers to different host threads of one process fails.

In one process just one host thread is allowed to bind OpenGL buffers.

Trying to bind a buffer to another host thread leads to the freeze of the computer. (!)

Or is this fact described somewhere in the CUDA programming guide already?


By the way the driver version is 6.14.11.6921 (download name: 169.21_forceware_winxp_64bit_international_whql.exe)

for Windows XP 64 bit edition and I was using the runtime API from the CUDA SDK 1.1. Graphics card is a nVidia 8800 GTS.

Toolkit (nvcc.exe, etc.) was versioned 1.1 and the compiler the runtime API calls where made from was the Visual CPP Compiler,

but I don’t think that it matters, because I don’t suppose that this error is caused by the compilers!

< different host threads freeze bind OpenGL buffers process >

with a single device, is it possible to have opengl and cuda operate in two different threads?
What I want to do is create two pixel buffers in opengl, map one of them into the cuda context to be filled by a kernel, while the other is displayed with opengl.
As I expected I get errors when I call the cudaGLRegisterBufferObject() and
cudaGLMapBufferObject() functions.

Don’t know whether I understood that right.

You want to create two pixel buffers by OpenGL in one host thread and processed one of them at one time by CUDA which is operating in another host thread, right?

I think that should work, but I’m not quite sure truly said.

Do you make a call to wglMakeCurrent before cudaGLRegisterBufferObject is called?

Sorry I cannot help you more, but I have none CUDA capable equipment anymore.

That sounds about right. But let me be a bit more specific…

Actually I simply want to download and process an image with CUDA and subsequently display it with OpenGL. However, the CUDA stuff takes way too long and for some reason doesn’t run asynchronously. Since I don’t want to block the main render thread, I would like to run the CUDA stuff in a different thread. I don’t think it’s possible to map an OpenGL PBO from my main render thread into the CUDA context associated with my CUDA worker thread. Hence, I thought it might work if I created another OpenGL context in the CUDA worker thread and have it share a display list with the main OpenGL context.

This way I can create a PBO in the OpenGL context that runs in the same thread as CUDA and it should be able to map the PBO into the CUDA context. But additionally I should be able to access the PBO subsequently from my main render thread and draw its contents…

Does this sound feasible?
And do I really need to have two OpenGL contexts in this scenario?