I would like to find an existing GPU library / algorithm that can solve relatively small overdetermined dense least squares systems with a non-negativity constraint. For example Ay = b where A is m x n, y is n x 1 and b is m x 1. In my case I might have m ~= 100 and n < 10. The number of systems like this could be > 10^6.
If I didn’t have the non-negativity constraint on the values in x, it appears I could solve this using cuSolver or I could use cuBLAS to do a QR decomposition. But, with the non-negativity constraint I need an iterative method to solve these systems. I would appreciate any help pointing me to an existing library or algorithm for solving this type of system on the GPU using CUDA with C/C++.