Hello,
this is regarding the CPU performance (not GPU).
I can’t figure out why the following OpenMP fortran code for matrix multiplication runs roughly 10 times slower if compiled with nvfortran or gfortran compare to intel. The compiler.sh script and the fortran code are attached.
module load intel nvhpc/21.9 gcc/11.1.0
[ilkhom@t019 ORIG]$ bash ./compiler.sh intel
The code is compiled with Intel
Number of threads | Time (sec)
1 | 2.33
2 | 1.23
3 | 0.85
4 | 0.67
5 | 0.56
6 | 0.46
7 | 0.44
8 | 0.38
9 | 0.37
10 | 0.32
11 | 0.32
12 | 0.28
13 | 0.27
14 | 0.26
15 | 0.24
16 | 0.23
[ilkhom@t019 ORIG]$ bash ./compiler.sh gfortran
The code is compiled with gfortran
Number of threads | Time (sec)
1 | 31.60
2 | 15.89
3 | 11.33
4 | 8.04
5 | 7.01
6 | 5.65
7 | 4.97
8 | 4.33
9 | 4.15
10 | 3.74
11 | 3.54
12 | 3.11
13 | 2.96
14 | 2.67
15 | 2.59
16 | 2.35
[ilkhom@t019 ORIG]$ bash ./compiler.sh nvfortran
The code is compiled with nvfortran
Number of threads | Time (sec)
1 | 31.47
2 | 15.75
3 | 10.73
4 | 8.03
5 | 6.87
6 | 5.67
7 | 4.99
8 | 4.33
9 | 4.19
10 | 3.71
11 | 3.40
12 | 3.10
13 | 3.03
14 | 2.66
15 | 2.75
16 | 2.34
Kind regards,
Ilkhom
compiler.sh (561 Bytes)