I implemented my own CUDA kernel for solving linear algebra AX = B by using the Gauss elimination method. I solved a 1500 x 1500 matrix and it roughly takes me 1500 milliseconds. Any other faster method that I can exploit? I am using a RTX 3070 Laptop
did you try using cusolver library? You can use a google search to find many questions about solving linear systems with cusolver, including sample codes provided by NVIDIA.
Detailed questions about using cusolver or any of the CUDA math libraries should be posted on the libraries forum.
If I were working on such a problem, I definitely would not start out by writing my own code, except maybe as a learning exercise. I would investigate high-quality libraries first.
If you are asking “are there any other faster method I can exploit while writing my own kernels” I won’t be able to help there - perhaps others will have suggestions.
I started cuSolver before but I wasn’t sure about the results (Probably I did not implement it correctly, I will try again anyway). But before going forward, do you think that I can have better performance by using this library?