Yes, you can explicitly create a CUDA context on each GPU in the system to address them separately. CUDA does not support “SLI” in the graphics sense – that wouldn’t make sense.
If your application is highly parallel and can be divided into blocks that have no dependencies (and is not bottlenecked by PCI-express transfers), then using multiple GPUs can provide very good scaling.
As far as I understand, the global functions are called synchronously, and each this function can be called only on one device at the same time. So, do you mean that those “parallel” calls should be done by different CPU threads or maybe processes?
Yes, exactly. We recommend creating one CPU thread per Quadro GPU. In each thread, create a CUDA context (the CUDA runtime API enables you to enumerate the GPUs and choose one to create the context on).
I think that’s not allowed with the runtime api. But you can use the GPUWorker class posted on these forums to deligate work to worker threads from a different thread. I’m not sure if you can have multiple threads submit work to the GPUWorker, but you could perhaps modify it to allow that.