Matrix inverse from device code

Hi all,

I’m trying to parallelize an algorithm that needs to perform matrix inversion. Every thread will have a matrix of various size and I’m struggling to do batch processing, so I’d like to call the matrix inversion function directly from my kernel. Is there a way to do this? So far, my matrices always were hermitian, so I just implemented a Cholesky decomposition and diagonal inversion to perform the inversion, but I’m now dealing with regular non-singular matrices and I don’t want to implement the whole inversion function…
I’m a bit confused about which external libraries I can use from device code… I heard about cuBLAS, but it seems ti be deprecated since Cuda 10.0, right…?

Thanks !

any work account. from last 1 month i am struggling with matrix inversion in kernal device code.
Did you got any success??

No… But I figured out that my matrices were all positive semi-definite tho, so I implemented a Cholesky decomposition, and then the matrix inversion of the resulting diagonal matrices. Both are quite trivial to implement, but work only for very specific data. The question is still open.