# Solving a Linear System of Equation with Very Large Sparse Coefficient Matrix Using SVD

I have about `400,000,000` linear system of equations `Ax=b` in which all `A`s are of size `4096×3` and `b`s are of size `4096×1`.

I have constructed a large sparse matrix, `SX=B`, where S is the sparse coefficient matrix consisting of many `A`s (they are populated at the diagonal of `S`) . The solution vector `X` and right hand side vector `B` are formed by vertically stacking `x`s and `b`s.

``````S = [
A1  0    ... 0
0   A2 0 ... 0
...
0 ...     0 AN
]

B = [
b1
b2
...
bN
]
``````

I need to solve for solution using SVD/pseudoinverse.

Here are my understandings from my research on how to solve the newly formed `SX=B` using SVD/pseudoinverse:

1. `cuSolver` doesn’t allow the matrices to be solved in a sparse format.

2. `cuSolver` only allows the QR or LU factorization in the sparse format.

3. `cuSolver` allows to find the approximate `SVD` over a large batch of matrices, which can be then used to solve the system of equations.

I have implemented the batched approximate SVD and tested it on the `10,000` matrices of size `4096 x 3`. And this took `3.6` s on my laptop. This means to solve all `400,000,000` system of equations, it will take `40 hours`. The CPU version of this using Intel MKL to solve `10,000` systems of `Ax=b` in a sequential manner takes `10,000×60us=60ms` (GPU is 6 times slower).

I would appreciate it if I could have the community’s feedback for this problem. How can I beat CPU speed by taking advantage of Cuda?