Inverse of a 3x3 matrix

Interesting! I can see the usefulness of what you are doing with a highly batched problem. In my case, what I need is a single 3x3 matrix inversion for the entire thread block to then use and do some work. The total number of inversions could, in theory, be in the thousands, but is much more likely to be in the tens to hundreds. The other critical piece of information is that the matrix data is already on the GPU, resident in its D-RAM, and the result will be needed by the GPU afterwards. That’s the real reason why I’m not just performing the calculation on the CPU: offloading and then loading back is the same cost (though you might correct me on the technical details) as downloading the data, solving the inverse, and then putting the result back.

I’m curious about the cublas setup–doesn’t it take a minimum matrix size of 32? One could pad or place ten 3x3 matrices as blocks along the diagonal of a 32x32 matrix, but at least in my case the point is to get one step out of the way rather than to compute a huge batch of results.

The maximum matrix size is 32. The documentation states:

This function is a short cut of cublasgetrfBatched plus cublasgetriBatched. However it doesn’t work if n is greater than 32. If not, the user has to go through cublasgetrfBatched and cublasgetriBatched.

If n (the matrix side dimension) is larger than 32, you can still do a batched inverse, but the methodology is to use getrfBatched and getriBatched together, like this.

1 Like

Thanks, sorry I had read that but thought that n was referring to the batch size, not the matrix size itself. The perils of working on too little sleep…