double precision cusparse cublas

I’m currently developing a demo for deformable objects simulation using cusparse and cublas.
It is run on my gtx470 card, for single precision the performance is alright. however, i’d like to know if the
precision (double vs single) changes the performance when it is run on a quadro 4000 (the uni is going to get me one, but 1 or 2 month to wait).

my demo is targeted for games, so i will use single precision anyway,
but I’m wondering if cusparse and cublas use double internally for computation and single for storing even if i call the single-APIs?

because quadro cards do double precision much better than gtx, i guess my demo will have a big jump in performance on quadro if cusparse and cublas use double internally for computation.

I would really like to know what precision cublas and cusparse use internally for single-precision-APIs.

anyone can help?
thanks a lot!