Consultation Required: Various Left-hand Side Algorithms

This is a re-post from CUDA jobs. Hopefully it receives a bit more interest here :)
I am looking for someone to code a variety of left-hand-side operating CUDA kernels. None of these (to my knowledge) are found in CUSparse or CUBLAS to date.


Sparse LHS tridiagonal solve - Y = X * T
Dense * Sparse multiplication - U = D * S - storage of S is up to you (coo, csr, csc, whatever suits)
Dense * Dense multiplication - P = M * trans(N)

The reason for the use of left hand side operations is in my experience, they prove to be far more useful at extracting the most performance out of a card as there are no excessive memory reads.
Happy to pay by the hour, or a fixed rate for the job.
I look forward to hearing from you :)