http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

I thought we use vector lengths as multiple of 32 because a warp has 32 CUDA cores.

However, it seems that there exist another units for double precisions, and CUDA cores seem to work only on single precision operations.

What I’m doing with my code all consist of double precision operations.

Do I still have to use vector length of 32?

If CUDA cores are for single precisions, what do they do during double precision operations?