I am using http://docs.nvidia.com/cuda/cublas/#cublas-lt-t-gt-gemm to do matrix by matrix multiplication.

But my matrix is large and i keep getting out of memory errors. Is there an algorithm or a way to get around it?

I was thinking of using smaller matrices and then add all the small multiples together? Would that work or is there a better way to do this?