Too slow...why?

Maybe your cublas is spoiled somehow. Try to check performance of these functions stand alone. And why do you think they are slow? Matrix-vector multiplication should be fast, only one thing I can suggest that matrix has bad column layout. Btw, are you sure you are using float routine?