MultiGPU Thread Dependent?

I need to use multiple GPUs to solve a problem with a time limit however one of my constraints is that I only have one thread.
Is there anyway to dynamically dispatch a kernel to a GPU without having to use cudaSetDevice()?

So I can’t use the simpleMultiGPU example…

Can’t you just create a new child thread or two, managed and cleaned up by the single thread you own?

I know it sounds absurd however its a limitation of the project that I am working on.
I cannot spawn any threads.

I’ll be happy to give any more clarification if needed.

Yes, you can use multiple GPUs with one thread, but you’ll need to use the CUDA driver API, not the simpler runtime method.
Look at the Context Management calls in section 3.20 of the reference guide.