Large Matrix Multiplication and Inversion Matrices that does'nt fit in GPU-Memory

Hi there,

i’m looking for a way to implement an algorithm in CUDA, that is able of calculating the Inverse of a Matrix and to multiplicate 2 rectangular Matrices. The Problem is the following, the Matrices are too big to fit in the GPU-Memory, but we assume, that they fit in the CPU-Memory, so I need a Block algorithm, which copies back and forth, but I don’t know how to do these,
Could anyone help me?

(1) If possible, you should never explicitly invert a matrix. Instead, you should perform some sort of factorization on the matrix you wish to have inverted.

(2) Work it all out on the CPU first to figure out your algorithm. Then, start worrying about using GPUs.