I want to try the CUTLASS library to compute the inverse of a matrix. I was willing to get an insight into how to use it. The matrices in our case are 2x2 to 4x4 (also 8x8 in some cases but that is of concern later). I see the definitions here onwards
but I do not understand how to use it in CUDA code.
I am aware that small matrices are not very interesting for GPUs but still, we are looking for some libraries which are faster for small matrices as the full framework is running on GPUs. Any suggestion would be really helpful.