Need advise on low GFLOP

Hi, I am running a scientific program called Abinit with cuda-enabled. However the calculation seems to be even slower than a serial one. Then I discover this in the log file generated:

 ________________________ Graphic Card Properties _______________________________

    Device                0 : Quadro K600
    Revision number:                   3.0
    Total amount of global memory:  1023.3 Mbytes
    Clock rate:                        0.9 GHz
    Max GFLOP:                           7 GFP
    Total  constant memory:          65536 bytes
    Shared memory per block:         49152 bytes
    Number of registers per block:   65536


This seems to be state of gpu utilized by abinit. What worries me is that the exceptionally low Max GFLOP: 7 GFP.

I have asked some advice before and told that maybe this is caused by the card is used for display also. Then I reboot centOS into pure cmd mode to reduce the display burden of the card. Unfortunately it yields the same results.

Is this normal?

K600, having a single SMX (192 cores) is a relatively low-end Quadro GPU. I’m not sure where the 7GFlop number is coming from, but that may require explanation from an Abinit expert.

The peak theoretical SP floating point throughput is given by 876MHz (GPU clock) * 192 (SP Cores) *2 (SP ops per FMA) = 336 GFlops.

The peak theoretical DP floating point throughput is given by 876MHz (GPU clock) * 8 (DP units) * 2 (DP ops per FMA) = 14 GFlops.

So maybe the 7GFlop number is the peak DP throughput, without accounting for FMA instructions. That is just a guess, however.

Anyway, the K600 is not a particularly powerful CUDA compute device. The comparable number for Tesla K20 would be ~500 GFlop.