Threaded CUDA application Using threads in application that utilizes CUDA api

My application is a network simulator with rounds.
I can manually click a button to initiate next round,
but I also can loop the simulation till it ends.
But the second solution freezes the application till the simulation ends.
So I tried to use Qt concurrent library, to launch simulation function in a different thread.
But that thread cannot access memory on GPU (memcpy fail). So I guess I has a different context than main thread.
So my question is… Is there a way to transfer context created in main thread, to the worker thread ?
(I want to transfer it, because the main thread chooses on which GPU will the computation be done)
How do you deal with that kind of problems ?

You either need to move the actual call to cudaSetDevice into your worker thread, or you need to use the CUDA Driver API. The Driver API is much more verbose and complicated than the Runtime API, but allows the CUDA context to be directly manipulated. Using the Driver API, the CUDA context can be migrated from one thread to another using cuCtxPopCurrent() and cuCtxPushCurrent().

Next release makes this easy.

By next you mean which one ? :D

And is there a possibility to use driver API just for the context transfer ?