Multiple OpenGL windows/contexts

I am trying to create a MFC application in which I can open image and run various image-processing algorithms - sort of a mini-photoshop.
I think, if I want to open multiple images, I need to create multiple OpenGL contexts - one for each window/image.

Can I do that using CUDA? I am trying to use the OpenGL PBO (as in the postProcessGL example in the SDK).
Currently, my first image opens and runs fine, but when I try to open another image, creating the PBO fails and crashes the program.

What might be a better way, if there is one?

The Programming Guids states “A CUDA context may interoperate with only 1 Direct3D device…”, but it does not talk about any such limit on the OpenGL notes.

Has anyone tried something like this yet?

Thank you!

Interestingly I can run multiple instances of my application without any problems. But if I try to open a new OGL window in one of those apps, it crashes.

It crashes when I try to register my PBO object using -


Solved! But I cannot explain why, I can tell how -

When I was registering my PBO (GLuint pbo) I was using -



Thie crashed the application when I attempted to create a new window in the application itself with its own PBO.

I just removed the CUDA_SAFE_CALL macro and now I can create as many new windows I want inside my application. So essentially I use -


The crash when registering PBOs from multiple contexts was a bug in the OpenGL driver. It has been fixed in driver release 169.


I am using release 169 (1.1 beta?) and all corresponding suuporting files (SDK, toolkit) :unsure:

So, I’d say you can put the CUDA_SAFE_CALL back in. It was the release 169 driver that fixed the OpenGL VBO issue, not the removal of the macro.


Adding the CUDA_SAFE_CALL does not let me create the second window (= GL context) on release 169. I AM using release 169 and I started this code only AFTER installing release 169. I have NOT tried it on the earlier releases.
So my observations are on release 169.

And I am using PBO not VBO, if that makes any difference in this matter.

Hmm. That’s strange, I’ll have to check.

Thank you Paulius!

Hey I also found that the CUT_CHECK_ERROR_GL() macro can create problems when called during the OnDestroy() operation.

If I opened multiple windows with the code posted below, closing one of the child windows would crash my entire app without any error. (There was the message ‘invalid memory at 0xxxxxx’ in the console which didnt help a lot)

After I remarked the CUT_CHECK_ERROR_GL() macro my program runs and shuts down smoothly.

Interestingly, I have the same macro in the CreatePBO and CreateTexture functions (which are same as those in the postProcessGL example). Those calls do not create any problem.

So I assume that I am not closing down OpenGL properly and hence the macro is finding some error but before it can tell me what it found, something is killing it. What might I be doing wrong?

Here is what I had -

void MyView::OnDestroy()


	if(wglGetCurrentContext() != NULL)



	if(m_hGLContext != NULL)



  m_hGLContext = NULL;



	DeletePBO( &pbo_source);

    DeletePBO( &pbo_dest);

    DeleteTexture( &tex_screen);



void MyView::DeletePBO( GLuint* pbo) 


   glBindBuffer( GL_ARRAY_BUFFER, *pbo);

    glDeleteBuffers( 1, pbo);


   *pbo = 0;



void MyView::DeleteTexture( GLuint* tex) 


   glDeleteTextures( 1, tex);


   *tex = 0;


Anything yet paulius?

I’m not able to reproduce the problems you’re describing. The macros are really benign - they simply check for errors and, if one is detected, print a message and exit.

Can you check whether any errors (CUDA or GL) have occured manually, as opposed to using the macros? You can check the macro code and simply duplicate it.


Thanks for checking Paulius.

Here is what I did-

I debugged the return value for each cudaGLRegisterBufferObject() call.

cudaError_t errRegisterBuffer = cudaGLRegisterBufferObject(pbo);

The first time I make this call, errRegisterBuffer gets the value ‘cudaSuccess’. But every subsequent new window I create, it gets the value 10208. I am not sure what it means as the cudaError_t enum does not define this code (at least not explicitly)

That explains why CUDA_SAFE_CALL fails. But, once I remove that macro, everything runs fine. (which I understand could be dangerous, but it runs!)

Every new window I create has its own instance of the pbo and its own data to render. I do not understand what I am doing wrong here. :mellow: I would certainly appreciate any help.

Here is a brief overview of the sequence of events the program follows :

I follow a similar procedure as in postProcessGL sample -

For every new window (document in MFC) -

  1. create and register a PBO (in doc)

    1.1. Use glGenBuffer() to create the PBO

    1.2. Register PBO using cudaGLRegisterBufferObject().

  2. Process the PBO (in view)

  3. Display the PBO (in view)

(Currently I am just writing to the PBO independent of what it already has and hence I do not have source and destination PBOs as in postProcessGL.)

Thank you!

OK. So, maybe the crash maybe due to the exit() inside the CUDA_SAFE_CALL macro whenever it detects an error?

You can check what error occured after the VBO registration by adding the following code immediately after you manual error-check:

printf("%s\n", cudaGetErrorString(cudaGetLastError()));

Or the equivalent with the MessageBox. Let me know if that gives you a message that makes sense.


It gives the error string -
‘unspecified driver error’.

Also, I get a first-chance exception at in <myProgram.exe>: cudaError_enum at memory location <0x…> Which I assume is because the 10208 number is not found in the enum.

Does your multi-window app work fine without CUDA? Meaning, if you don’t make any CUDA calls, just fill up the PBO with some window-specific color, does the app run as expected?

If you are using glew to create your PBOs, are you linking in the multi-threaded glew library?


Yes it works without the CUDA calls. Heres the code w/o cuda calls that works fine

void draw(int dataWidth, int dataHeight, int pbo_out, char color)


void* out_data;   //write the output data for pbo here

char* in_data;    // temporary memory to write pbo into

// write arbitrray color into pbo 

in_data = (char*)malloc(dataHeight * dataWidth * sizeof(char)); 

for(int i=0; i <dataHeight*dataWidth; ++i)

  in_data[i] = color; 

// bind the buffer with our memory address

glBindBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, pbo_out);

glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, dataWidth*dataHeight, NULL, GL_STREAM_DRAW);


// copy our data into the pbo

memcpy(out_data, in_data, dataHeight*dataWidth);

// unmap



I am using teh glew library that came with the CUDA SDK.

How do I confirm that the glew library I am using is the multi-threaded version? I am sorry if this is a stupid Q. :mellow:

My guess is that there’s something wrong with PBO generated in multiple threads. The glew library that comes with CUDA is not multi-threaded. You’ll have to build your own multi-threaded version of glew. Instructions are on the glew site ([url=“”][/url]).Let me know if you’re still getting the crash after installing multi-threaded glew.


I just noticed / “Solved” a similar issue with multiple windows and cuda-glbuffer registration (my version of the driver is the linux-64bit 169.12) and oh boy, was I checking for the answer a long time with this.

So what I was doing is the following in pseudo-code:







and for some reason the last registration failed (with the usual unspecified driver error)!

Well, here’s the solution:








What is going on here?

My guess:

cuda has one context per thread

GL creates here two contexts for the same thread (one for each window)

First glGenBuffers()-call gives handle == 1 because it’s the first buffer of the context.

Second glGenBuffers()-call gives also handle == 1, because it’s also the first buffer of the context, because it’s a new context.

Cuda-register checks if handle == 1 has been registered already and returns an error because it has been registered (cuda context is not changed between windows)

Adding the dummy glGenBuffers()-call increments the result-handle to 2 and cuda works a-ok.

Is this even close? Does someone know a nicer solution? Is this fixed in some later driver / cuda - version? I have cuda 1.1, since 2.0 still seems to be in beta stage.

Has anybody found a real resolve to this problem? I need to either use multiple windows or subwindows and both give me the unspecified driver error when I try to Map buffer objects. I tried removing the CUDA_SAFE_CALL, but it didn’t help. I am using Cuda2.0b2 on XP