Using CUTLASS to get inverse of a matrix


I want to try the CUTLASS library to compute the inverse of a matrix. I was willing to get an insight into how to use it. The matrices in our case are 2x2 to 4x4 (also 8x8 in some cases but that is of concern later). I see the definitions here onwards

but I do not understand how to use it in CUDA code.
I am aware that small matrices are not very interesting for GPUs but still, we are looking for some libraries which are faster for small matrices as the full framework is running on GPUs. Any suggestion would be really helpful.

I suggest you move this question to CUTLASS GitHub issues. Also, include how many matrices are you planning to process in a time step?

Also, here’s a link for CUDA kernel computing matrix inverse.

CUTLASS provides a means to take advantage of Tensor Cores. Matrices that small are better off on CUDA cores.