At my first experience with CUDA, for a bigger program I need to compute the reduced row echelon form of a matrix represented as column major format.
I was about to start writing it from scratch but then I thought I’d ask some of you experts if there is a built in function or cublas function to do this. I could not find anything so far.
Vasily Volkov (vvolkov) posted some code he wrote a while back to do LU, QR, and Cholesky factorizations. I think they were for row-major format, but you could try to adapt them for your needs. Search this forum and you should find the post.