Manually allocate Cusparse/Cublas' scratch space

A lot of the cusparse/cublas functions utilize scratch space (e.g the tridiagonal solve in cusparse uses a scratch space roughly equal to the size of the right hand side to be solved). When this becomes large, it makes it difficult to manage ones own memory, because we are unable to allocate this scratch space ourselves. It just tries to allocate it with each function call. This is one of my major issues at the moment with constructing my own memory manager.

Has anyone found a workaround to this problem?

With the coming Cusparse release, we are going towards that direction. The user will have to call a query routine to know how much wokspace mem is needed and the user will have to allocate the GPU mem himself before to call the actual compute routine. (It is a bit similar to LAPACK style)

However, we have not planned to retrofit the current routines.
Which routines in Cusparse do you use the most ?

Btw, in CUBLAS. there are very few routine that allocate scratch space ( mainly the in-place Triangular BLAS routine like trmv)