Can CUBLAS utilize multiple gpu devices?
No.
You can write host wrappers for functions like SGEMM to spread the computation on multiple GPUs, but CUBLAS only uses one device.
Thank you for the input. Decomposing some algorithms is workable but doing it efficiently adds a lot of work to the porting of existing applications that use BLAS. As such, I’d enter it as a feature request the cublas, (and cufft lib for that matter), automatically detect, and utilize multiple GPU devices.
We are porting our application with the vision of Tesla-type devices that have many GPU boards and a version of cublas that handles the decomposition will certainly make a more efficient result and improve the time by which we can have our application ready to utilize such devices.