I think his question is whether the double precision throughput is capped to 1/8 the single precision throughput in the same way that the GeForce cards are. I seem to recall a statement that the desktop Quadros run at the Tesla level of double precision at 1/2 the rate of single precision, but I have no idea what the mobile GPUs are set to.
I see. My bad. The 2000/3000/4000/5000 cards use GF104 and 106 this is the same as the 400 series which have half performance in double precision compared to single precision. The quadro 5010m card is with the same chip as the 580 gtx which had equal performance in double or single precision.
where for 5010M it is stated: The Quadro 5010M is the successor of the Quadro 5000M and also offers ECC RAM and double-precision floating point cores
and for the 4000M: Compared to the 5010M, the 4000M does not support ECC memory and DP floating point calculations
So I have some doubt (given that notebookcheck always has quite good information), does anybody have some experience with double precision on notebook gpu’s?
As far as I understand it, if the GPU is compute capability 2.0 or 2.1, then it has to support double precision. There is no equivalent in Fermi to the capability 1.2 mobile chips of the previous generation that were just like capability 1.3, but without the double precision.
Unfortunately, it sounds like the laptop vendor is confused. If they are correct about GF104, then you should have double precision at 1/12 of the maximum single precision throughput.
Quadro 5000M 2.0
Quadro 4000M 2.0
Quadro 3000M 2.0
Quadro 2000M 2.0
I am starting to get confused. These cards are based on the same chips as the 400m series which support double precision at half speed compared to single precision. Same for the 5010M.
Quadro 1000M GF108
Quadro 2000M GF106
Quadro 3000M GF104
Quadro 4000M GF104
Quadro 5000M GF100
Quadro 5010M GF110GLM
All Fermi cards support double precision at least at 1/8 of single. I have the 2000m and it definitely support double precision (at 1/8 perf. of single). I don’t know about the mobile GPUs, but Quadro 4000 and up and Teslas run at 1/2 single precision speed.
They are talking about double precision at 1/2 single precision, not any double precision support. I know for a fact from first hand experience that the 2000m, 1000m, gtx 430, gtx480, gtx 570 and from second hand experience regarding every Fermi GPU possibly excluding the wierd laptop Fermi’s with 16 cores which I don’t know about. I did read somewhere that the performance is 1/12 rather than 1/8 of single precision, but it’s definitely there, regardless of how NVIDIA word things on the website.
By the way, using CUDA-z at the moment on the Quadro 2000m which is a Compute 2.1 GPU and which you claim has no double precision support, I see:
Well, GPU-z did not really work very well, it shows the first page with info and then hangs.
The nbody example however showed 270 SP GFLOPS and 150 DP GFLOPS, so that suggests that it is indeed a factor of 2 difference.
If anybody wants to have some (windows only for the time being) benchmarks run on it, let me know.
I do not have a benchmark, but I have some code that you can test. It does some a bunch of real to complex FFT and some additions and multiplication. I have single and double precision versions. The attached codes should run less than 10 minutes and at the end will give some time.
SPRTinplaceTwoDPFCtwodblock.cu is the single precision code
DPRTinplaceTwoDPFCtwodblock.cu is the double precision code.
For the double precision the flag -arch=sm_20 in order to keep the double precision. I am not running windows so I am not sure how to compile on windows, but if you would have a command line the compile line would be: