Can anyone point me to code examples in which separate CPU threads control multiple kernels on more than one GPU?
Actually any code using CPU threads would be welcome.
Pthreads & POSIX preferred. I’m just getting into it, and the basics don’t seem too bad, but it’s best to see some actual CUDA
code that someone has made work before venturing into unknown territory.
The CUDA model is one host thread with one context per GPU. And unless you are using a Fermi GPU, the GPU can only execute a single kernel at a time. Fermi can currently run up to 4 kernels simultaneously if there are enough resources and iff the kernels come from streams attached to the same context (which implies one host thread).
I don’t know exactly what you had in mind, but I am pretty certain you can’t do it with CUDA, based on your description.