One of the listed features for cublasxt is that matrices are limited only by host memory size, not gpu memory. I’m curious if this also applies to the free version running on a single GPU; i.e. I have a single 6GB Tesla and I try to do a matrix multiplication with two 10GB matrices (and the host has 64GB RAM) – can I just throw them in, or do I need to pre-chunk since it is the free version and/or a single GPU?
(I’d try it myself but I don’t have access to that configuration with CUDA 6 at the moment, trying to plan ahead and convince the cluster admins to upgrade from 5 sooner rather than later.)