Two questions actually. Sorry if they seem naive.
(1) Is there any blocking in cublas/fft calls to allow arbitrary data sizes or do functions fail when device memory
is exhausted ( suspect yes) ? ie send blocks of data to gpu to solve problem when data > device memory.
Eg callling cublas_sgemm with huge matrices.
(2) Does (or will ) cublas/fft support multiple gpu - ie parallel across gpus ?