I am looking for a matrix factorization algorithm for banded matrices that is also efficient to implement in CUDA. I’ll be using this to solve linear equations. The matrices I’ll be using are about 6000x6000 elements with a band width of about 60. Looking at vvolkov’s work, QR factorization is the most efficient factorization in terms of flops for dense matrices. Since the matrices are symmetric positive definite, I can also use the Cholesky decomposition or solve the system of linear equations using the conjugate gradient method. I’d appreciate any suggestions for which method is the fastest on a GPU.
If anyone can suggest libraries that supply this functionality, I’d greatly appreciate it.