Does CUDA support double?

I’m trying to use CUDA to do some numerical computation.
I have tried cublas. Its performance is pretty good.
But cublas support only float, and that precision is not enough for me.

Will cublas will support double in the future?
If so, when the new version will be released?

I’ll decide whether to use CUDA in our project according to this information.

By the way, I used to implemented my own blas use two float to represent a higher precision,

but the result seems bad. I get a poor performance, compared to Intel MKL.

It seems that we’d better to use MKL to do our numerical computation.

Any suggestions?

I heard GPUs that support double are promised in December. Unfortunately, no word on how fast it’s going to work :(

Somewhere I read/heard that if one wants to ensure speed that ‘float’ should be used along with float constants and the associated 32 bit functions as the double precision was likely to be implemented in a way such that operations on double precision numbers would result in multiple 32-bit operations.