cublas/fft on multiple gpus ?

Two questions actually. Sorry if they seem naive.

(1) Is there any blocking in cublas/fft calls to allow arbitrary data sizes or do functions fail when device memory
is exhausted ( suspect yes) ? ie send blocks of data to gpu to solve problem when data > device memory.
Eg callling cublas_sgemm with huge matrices.

(2) Does (or will ) cublas/fft support multiple gpu - ie parallel across gpus ?


  1. No, but for certain operations ( GEMM,TRSM) it is easy to write an host side call that takes care of the splitting

  2. No