HPC SDK 22.9 PGCC Compiler extrem slow

Makefile (672 Bytes)
Compile scimark4 from math.nist.gov and compile it with the upload Makefile
the Benchmark Result is extrem slow
The same source compile with Intel oneAPI Compiler 2022.3 is near 2 time faster

Hi fbernasek535,

I downloaded scimark4 from Java SciMark 2.0 (What's New)

And compared perform of 22.9 pgcc/nvc to icc 2021.6.0 (I don’t have 2022.3) and show pgcc gives better performance on an AMD Epyc system and are about the same on a Sandybridge.

Can you provide more details on the system you’re using as well as the Intel compiler flags? Are you on a Sandybridge system? If not, then please remove “-tp sandybridge”.

Note that I did reduce the flagset for pgcc to just “-fast -Msafeptr” and for icc down to “-O3 -fno-alias”. Auto-parallelization seems to hurt both so removed the “-Mconcur” and “-parallel” flags. Also, given there’s no OpenMP in the code, removed the “-mp” flag.

Results on an AMD EPYC 7742, with pgcc

scimark4$ ./scimark4
**                                                              **
** SciMark4 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.

FFT             Mflops:  2399.18    (N=1024)
SOR             Mflops:  1536.96    (100 x 100)
MonteCarlo:     Mflops:   623.41
Sparse matmult  Mflops:  2403.33    (N=1000, nz=5000)
LU              Mflops:  8221.71    (M=100, N=100)

************************************
Composite Score:        3036.92
************************************

Then with icc:

scimark4$ ./scimark4
**                                                              **
** SciMark4 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.

FFT             Mflops:  1610.47    (N=1024)
SOR             Mflops:  1299.63    (100 x 100)
MonteCarlo:     Mflops:   607.58
Sparse matmult  Mflops:  2699.92    (N=1000, nz=5000)
LU              Mflops:  6757.08    (M=100, N=100)

************************************
Composite Score:        2594.94
************************************

Results on an Intel i7-3930K (Sandybridge)

With pgcc:

scimark4% ./scimark4
**                                                              **
** SciMark4 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.

FFT             Mflops:  1784.39    (N=1024)
SOR             Mflops:  1776.00    (100 x 100)
MonteCarlo:     Mflops:   599.86
Sparse matmult  Mflops:  1852.61    (N=1000, nz=5000)
LU              Mflops:  4513.50    (M=100, N=100)

************************************
Composite Score:        2105.27
************************************

Then with icc:

$ ./scimark4
**                                                              **
** SciMark4 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.

FFT             Mflops:  1241.45    (N=1024)
SOR             Mflops:  1465.37    (100 x 100)
MonteCarlo:     Mflops:   556.34
Sparse matmult  Mflops:  2048.00    (N=1000, nz=5000)
LU              Mflops:  5630.24    (M=100, N=100)

************************************
Composite Score:        2188.28
************************************

-Mat

Hi,

thanks for your e-mail.

Now i have scimark4 testet on following Hardware:

Dell Poweredge R720 Server

CPU: 2x Intel XEON E5-2680-0 EP Sandybridge 2.7 Ghz

RAM: 136 Gbyte DDR3 ECC ( ECC on is needed for my Server Apps )

HDD: 26 TByte SAS RAID5

Graphics: Matrox G200, NVIDIA GeForce GT1030, NVIDIA Tesla M40 for Scientific-Math Develop

My Opinion its not correct to compare new Processorgeneration such as AMD Epyc or new generation Sandybridge , with an 12 Year old Xeon E5-2680-0 at 2.7 Ghz

the AMD Epyc as sample costs approx $3000 and use new DDR4 Rams , my E5-2680 costs $ 35 on EBay

now thats the same you compare a Lamborghini with a old VW Golf

i have attach my test, with makefile in a Zip File

thanks and regards

scimark4res.tar.gz (565 KB)

Sorry for the confusion, but the intent here was not to compare processors, rather to give the compiler performance on two different processors.

Looking at your results, the composite scores are:

NVop : 1911
icc: 1959
icx: 2006

So roughly inline with what I’m seeing. Granted, I don’t know this benchmark, but this doesn’t seem like an extreme difference nor is Intel 2x faster. Can you please help me understand how you determined this?

Just for another data point, I ran on a Skylake (Xeon Gold 6148). Like the Epyc, pgcc/nvc gives better Mflops than icx. Composite score:

pgcc/nvc: 2969
icx: 2623

-Mat