There is already an incomplete sparse LU in cuSparse, so it sounds like you implemented the complete version if you are telling the truth.
If so please answer these questions;
- Did you implement partial-pivoting? If so what approach did you use?
- Did you use the ‘left-looking’ or ‘right-looking’ method?
- What pre-processing steps did you use to decrease the number of non-zeros in the result? AMD? COLAMD? HSL_M64?
- Does your implementation support complex numbers?
- Did you validate your implementation against a reference which is known to be correct? A good reference would be SuperLU or MATLAB.
- Did you add a thresholding input and the ability to determine when to pivot away from the diagonal?
- Do you have an equilibrate step for the rows and columns to improve the condition of the matrix?
- How did you break down the workload into independent ordered chunks? Which graph algorithm did you use to break down the workload?
- Do you have a symbolic pre-processing step, or did you find some way of dynamically pivoting through the main process and dynamically generating the new needed non-zero locations?
- Are your results numerically stable when used to solve AX=B?
I have found that there is a great deal of fraud out there when it comes amateur GPU based sparse LU linear algebra sub-routines. Before you get too excited about your implementation compare your results again SuperLU for a large sparse matrix which has a higher condition number.