Does the GTX1060 support double precision?

Does the GTX 1060 support double precision? What is it’s double precision throughput in Gflops. I couldn’t find any information on this on the NVIDIA website. Is there any table which details the double precision capabilities of the NVIDIA GPUs.

The only source of information which I found is external:

However, the information on Wikipedia may not be correct.

I intent to use the GTX 1060 for scientific computations which require double precision capabilities.

The double precision throughput is 1/32 the single precision throughput.
This is derived from the compute capability 6.1 entries in this table:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions

(4/128 = 1/32)

According to this particular example:

http://www.gamersnexus.net/hwreviews/2518-nvidia-gtx-1060-review-and-benchmark-vs-rx-480?showall=1

The FP32 throughput is ~6.5 TF, so the DP (FP64) throughput should be around 0.2TF or 200GFlops.

It will vary somewhat based on actual clock - which may vary depending on the board you have and boost activity.

I think the Wikipedia info is pretty accurate also.

Thanks for the input.

In the article
http://www.gamersnexus.net/hwreviews/2518-nvidia-gtx-1060-review-and-benchmark-vs-rx-480?showall=1

it is written “Like the other GTX chips, GP106 dedicates itself to FP32 single precision compute, leaving double precision FP64 to CUDA Cores science-class GPUs.”

This would mean that there is no double precision computation with the GTX 1060.

No, that wouldn’t be the correct interpretation. However the idea is that since FP64 throughput is 1/32 of FP32 throughput (i.e. much smaller) then people who are interested in high levels of FP64 performance should probably consider “science class GPUs” by which they mean various members of the Tesla family of GPUs.

The GFLOPS information for the GTX 1060 in this Wikipedia table seem correct to me:
https://en.wikipedia.org/wiki/GeForce_10_series

In many cases the performance differences between single-precision and double-precision computation are not nearly as severe as the raw throughput ratios would suggest:

(1) On GPUs generally, is not generally possible to get more than about 75% of theoretical single-precision peak performance out of compiled code, e.g. due to scheduling and register bank conflicts. But on DP-lite consumer GPUs, it is possible to get 99% of theoretical double-precision performance when that becomes the most severe bottleneck.

(2) Most real-life codes, even when classified as floating-point intensive, execute many non-floating-point operations that aren’t affected by the disparities between SP and DP throughput.