Originally published at: Accelerating NVIDIA HPC Software with SVE on AWS Graviton3 | NVIDIA Technical Blog
The NVIDIA HPC SDK 22.7 now supports the AWS Gravition3 with auto-vectorization for the Scalable Vector Extension to the Arm architecture.
I am trying to reproduce the SPEC cpu2017 fpspeed results in this blog. Can you please share the build flags used for these runs? Thanks.
All results we measured on a single c7g-16xlarge AWS Graviton3 instance which has 64 cores.
For the GNU results, I started with the “Example-gcc-linux-aarch64.cfg” config file included with CPU2017 updating the optimization flags to:
OPTIMIZE = -Ofast -fallow-argument-mismatch -fopenmp -march=armv8.4-a+crypto+rcpc+sha3+sm4+sve+nodotprod -lm -fpermissive
For the NVHPC runs, I used the following config:
#######################################################################
teeout = yes
makeflags=-j 32
label = nvhpc
tune = base,peak
output_format = text
use_submit_for_speed = 1
fpspeed=default=default:
submit = $command
preENV_OMP_STACKSIZE=128M
preENV_MP_BIND=yes
default:
CC = nvc
CXX = nvc++
FC = nvfortran
CC_VERSION_OPTION = -V
CXX_VERSION_OPTION = -V
FC_VERSION_OPTION = -V
#######################################################################
# Optimization
default=base=default:
OPTIMIZE = -w -O3 -Mstack_arrays -static-nvidia -Mfprelaxed -tp neoverse-v1
fpspeed=default=default:
EXTRA_OPTIMIZE += -mp -DSPEC_OPENMP
default=default=default:
PORTABILITY = -DSPEC_LP64
500.perlbench_r,600.perlbench_s:
CPORTABILITY = -DSPEC_LINUX_X64
507.cactuBSSN_r,607.cactuBSSN_s=default:
EXTRA_LDFLAGS+=-fortranlibs
521.wrf_r,621.wrf_s:
PORTABILITY = -DSPEC_CASE_FLAG
FPORTABILITY = -Mbyteswapio
527.cam4_r,627.cam4_s:
PORTABILITY = -DSPEC_CASE_FLAG
526.blender_r:
PORTABILITY += -D__STDC_LIMIT_MACROS
628.pop2_s:
PORTABILITY = -DSPEC_CASE_FLAG
FPORTABILITY = -Mbyteswapio
523.xalancbmk_r,623.xalancbmk_s:
CXXPORTABILITY=-DSPEC_LINUX
557.xz_r,657.xz_s:
CPORTABILITY+=-DSPEC_GCC_INITIALIZER_CAST
Let me know if you have issues or questions.
-Mat