Nvidia Quadro P4000 e P6000 double precision performance

Testing Magma 2.3.0 libraries and cuBlas, using a Quadro P4000 or a Quadro P6000 card, I find a 100-300 Gflop/s performance in double precision (dgemm routing used for test), whereas a much higher (3000-6000 Gflop/s) performance in single precision (sgemm routine).

Is it a normal behavior or is there some setting/compilation flag I should specify to get, also in double precision, performances comparable with single precision ones?


This seems correct and is a function of the hardware. The ratio of single-precision to double-precision units on most Pascal-family parts is 32. On my Quadro P2000 I measure peak throughput of 108 GFLOPS DP, 3390 GFLOPS SP.

It’s pitiful , but that’s a function of the GPU chip.

Unless you are getting a GP100 or GV100, the rest of the lineup doesn’t have any appreciable FP64 performance. The rest of the Pascal lineup is worse than a GTX TITAN BLACK.