Accelerating NVIDIA HPC Software with SVE on AWS Graviton3

Originally published at: Accelerating NVIDIA HPC Software with SVE on AWS Graviton3 | NVIDIA Technical Blog

The NVIDIA HPC SDK 22.7 now supports the AWS Gravition3 with auto-vectorization for the Scalable Vector Extension to the Arm architecture.

I am trying to reproduce the SPEC cpu2017 fpspeed results in this blog. Can you please share the build flags used for these runs? Thanks.

All results we measured on a single c7g-16xlarge AWS Graviton3 instance which has 64 cores.

For the GNU results, I started with the “Example-gcc-linux-aarch64.cfg” config file included with CPU2017 updating the optimization flags to:

OPTIMIZE = -Ofast -fallow-argument-mismatch -fopenmp -march=armv8.4-a+crypto+rcpc+sha3+sm4+sve+nodotprod -lm -fpermissive

For the NVHPC runs, I used the following config:

#######################################################################
teeout = yes
makeflags=-j 32

label         = nvhpc
tune          = base,peak
output_format = text
use_submit_for_speed = 1

fpspeed=default=default:
submit = $command
preENV_OMP_STACKSIZE=128M
preENV_MP_BIND=yes

default:
CC           = nvc
CXX          = nvc++
FC           = nvfortran

CC_VERSION_OPTION  = -V
CXX_VERSION_OPTION = -V
FC_VERSION_OPTION  = -V

#######################################################################
# Optimization
default=base=default:
OPTIMIZE     = -w -O3 -Mstack_arrays -static-nvidia -Mfprelaxed -tp neoverse-v1

fpspeed=default=default:
EXTRA_OPTIMIZE += -mp -DSPEC_OPENMP

default=default=default:
PORTABILITY = -DSPEC_LP64

500.perlbench_r,600.perlbench_s:
CPORTABILITY =  -DSPEC_LINUX_X64

507.cactuBSSN_r,607.cactuBSSN_s=default:
EXTRA_LDFLAGS+=-fortranlibs

521.wrf_r,621.wrf_s:
PORTABILITY = -DSPEC_CASE_FLAG
FPORTABILITY = -Mbyteswapio

527.cam4_r,627.cam4_s:
PORTABILITY = -DSPEC_CASE_FLAG

526.blender_r:
PORTABILITY += -D__STDC_LIMIT_MACROS

628.pop2_s:
PORTABILITY = -DSPEC_CASE_FLAG
FPORTABILITY = -Mbyteswapio

523.xalancbmk_r,623.xalancbmk_s:
CXXPORTABILITY=-DSPEC_LINUX

557.xz_r,657.xz_s:
CPORTABILITY+=-DSPEC_GCC_INITIALIZER_CAST

Let me know if you have issues or questions.

-Mat