poor blas-3/gemm performance on GTX480

Hi folks,

Just wondering if anyone has tried BLAS3/GEMM yet on the GTX480. These are the #s I’m seeing on an e-VGA GTX 480:

Single precision (N = square matrix dimension):
N MMADDs/sec
16 1524
32 8695
64 37026
128 62314
256 166200
512 180515
1024 208105
2048 215970
4096 216731

(Multiply by 2 to get the MFLOPS number).

These results are almost identical compared to the GTX285, which was a huge surprise to me. Has anyone seen similar results? I am guessing some tuning is needed.

–Eric

Hi,

What kind of numbers (GFLOPS) do you get for Dgemm?

DGEMM on GTX480:
N MMadds/s
16 1259
32 7036
64 23901
128 44004
256 52479
512 55618
1024 80707
2048 81969
4096 82385

It is roughly double the GTX 285 performance:
N MMadds/s
16 394
32 2275
64 11197
128 22543
256 31629
512 34955
1024 41659
2048 42496
4096 42900

Should have read the release notes:
o CUBLAS issue

  • SGEMM performance on Fermi-based GPU is 30% lower than expected.
    It will be fixed in 3.1.