Hi, I know that cuBLAS-XT can handle out-of-core dense-dense matrix multiplication in multiple gpus. What if I want to multiply a sparse and a dense matrix? Basically I need an out-of-core version of csrmm, where both sparse and dense matrix cannot fit in the device memory of a single GPU. I do have multiple GPU to leverage, in a node.
Anybody knows if cusp:multiply can handle the out-of-core case?