Double Precision Units in Kepler?

http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

I thought we use vector lengths as multiple of 32 because a warp has 32 CUDA cores.

However, it seems that there exist another units for double precisions, and CUDA cores seem to work only on single precision operations.

What I’m doing with my code all consist of double precision operations.

Do I still have to use vector length of 32?

If CUDA cores are for single precisions, what do they do during double precision operations?

I thought we use vector lengths as multiple of 32 because a warp has 32 CUDA cores.

A warp has 32 threads, not cores. Threads, warps, and blocks are programming elements. Compute units, cores, and multi-processor are hardware elements. Multiple warps may be actively running at the same time on different computing elements.

Do I still have to use vector length of 32?

The floating point precision you’re using has no effect on the vector length. While using double precision may require threads to share resources and thus effect performance, this does not change how you program.

If CUDA cores are for single precisions, what do they do during double precision operations?

That will depend. If you have another warp performing single precision or integer instructions, then the other compute units will be occupied with these instructions. If all the active warps are only performing double precision, then you’d have inactive elements.

See the section labeled “Streaming Multiprocessor (SMX) Architecture” for more details.

  • Mat

Does CUDA or OpenCL also have separate programming elements?

It’s easy to understand the hierarchy of workitems in case of single precision because a vector and a warp can be directly mapped to an ALU(CUDA Core) and a SIMD engine.

However, in case of double precision, they are not mapped directly.