I was checking the FP16 GEMM peak performance of RTX3090 and it was strange to me to see 50% drop for mid size matrices.
% M N K GPU Gflop/s (ms) GPU error
%========================================================================================================
1024 1024 1024 2635.23 ( 0.81) ---
2048 2048 2048 58631.08 ( 0.29) ---
3072 3072 3072 92188.92 ( 0.63) ---
4096 4096 4096 113209.10 ( 1.21) ---
5120 5120 5120 123589.45 ( 2.17) ---
6144 6144 6144 128241.71 ( 3.62) ---
7168 7168 7168 115558.98 ( 6.37) ---
8192 8192 8192 104684.95 ( 10.50) ---
9216 9216 9216 93158.09 ( 16.80) ---
10240 10240 10240 87832.27 ( 24.45) ---
11264 11264 11264 84352.97 ( 33.89) ---
12288 12288 12288 107240.38 ( 34.60) ---
13312 13312 13312 96272.52 ( 49.01) ---
14336 14336 14336 103458.65 ( 56.96) ---
15360 15360 15360 99511.91 ( 72.83) ---
16384 16384 16384 72890.14 ( 120.68) ---
17408 17408 17408 87728.52 ( 120.26) ---
18432 18432 18432 69442.68 ( 180.35) ---
19456 19456 19456 69949.03 ( 210.58) ---
20480 20480 20480 68355.85 ( 251.33) ---
21504 21504 21504 67744.31 ( 293.57) ---
22528 22528 22528 67491.17 ( 338.81) ---
23552 23552 23552 66234.61 ( 394.48) ---
24576 24576 24576 70176.79 ( 423.03) ---
25600 25600 25600 72157.42 ( 465.02) ---
26624 26624 26624 73832.88 ( 511.21) ---
27648 27648 27648 78171.55 ( 540.72) ---
28672 28672 28672 71223.29 ( 661.88) ---
29696 29696 29696 70045.31 ( 747.73) ---
30720 30720 30720 69575.75 ( 833.37) ---
31744 31744 31744 69425.44 ( 921.50) ---
32768 32768 32768 69352.25 (1014.66) ---
33792 33792 33792 103900.57 ( 742.77) ---
34816 34816 34816 103183.34 ( 818.01) ---
35840 35840 35840 82927.16 (1110.29) ---
36864 36864 36864 91454.85 (1095.55) ---
37888 37888 37888 80549.89 (1350.42) ---
38912 38912 38912 92057.92 (1280.03) ---
39936 39936 39936 98316.55 (1295.68) ---
40960 40960 40960 97357.48 (1411.69) ---
The kernek name for 35k is :
For this test I am using MAGMA and magma_hgemm routinewhich is actually a simple wrapper around cuBLAS.
Does it related to the GPU memory?