cuda+opengl mutliple threads

I’m not sure if this is the right place for this question but hopefully someone can help.

I do cfd simulations and attempting to move to GPUs, normally I would structure my code so that I have two threads, the main thread for graphics (required by glut to be the main thread), and a second thread as my “working thread” where all the actual calculations are done. The reason I do this is because I want my calculations to be completely separate from the graphics and no depends on them at all, because opengl (and any windows app for that matter) handles it’s own main loop, and therefore I have no control over it. I want my calculations to run as fast as they want and not be dependent on a “display” function in OpenGL (hence the working thread for this).

I was wondering if anyone could give me any pointers on how they do this kind of thing, or if this is the right way to do it then how would I allow the secondary thread access to the cuda context? (I only briefly read something about the context, not familiar with it at all).

Thanks in advance.

Did you find a solution for this problem?

Put simply, you need to use critical sections or mutexes, or some other mechanism for ensuring exclusive ownership of a context to a thread.

Your ‘main’ (display) thread would then attempt to ‘lock’ the CUDA context, and upon gaining the lock, push the context, read your physics results, and pop/unlock the context so your physics thread can continue.

Your ‘worker’ (physics) thread would do a similar thing in it’s main loop, attempt to ‘lock’ the context, push once lock obtained, calculate physics in CUDA, wait for all calculations to complete, pop/unlock context.

The obvious bottleneck here is going to be your display thread waiting for your physics results to ‘finish’ before being able to obtain a lock… (and thus being limited to the same rate at which you calculate physics, which might not be 30-60Hz+)… You could consider calculating your physics in a temporary buffer, so you don’t have to sync before unlocking… then implementing a ‘request physics data’ function which locks the context, copies from the temporary buffer into some other ‘common’ buffer which the display would read from, and unlock upon completion to let your physics keep going at whatever rate it wants (thus only being interrupted when your display needs data). If that makes sense (it needs more detail, but you should get the general idea from this).

The #1 thing you have to remember is your display thread will need to ‘copy’ the physics data into its own memory buffers, and a cuda context can only be ‘active’ (eg: pushed onto the context stack) for one thread at a time - but CUDA doesn’t take care of this (it just detects it / errors out), you have to manually take care of exclusive context/thread stuff on your own via critical sections / mutexes / semaphores / whatever.