21.11 -tp behaviour


I am testing the new 21.11 HPC compilers, and I am seeing some surprising behaviour with the -tp parameter.

The test program, test.c, just prints whether or not #ifdef __ABM__ evaluates true.

With 21.9, this macro was never automatically defined on our system, which has a Skylake Xeon Gold 6140:

$ nvc++ -V
nvc++ 21.9-0 64-bit target on x86-64 Linux -tp skylake
$ nvc -V21.9 test.c && ./a.out
__ABM__ not defined
$ nvc -V21.9 -tp host test.c && ./a.out
__ABM__ not defined
$ nvc -V21.9 -tp skylake-avx512 test.c && ./a.out
__ABM__ not defined
$ nvc -V21.9 -tp zen2 test.c && ./a.out
__ABM__ defined

With 21.11 the situation has changed. First, it seems this CPU is detected as skylake-avx512 instead of skylake (fair enough):

$ nvc -V
nvc 21.11-0 64-bit target on x86-64 Linux -tp skylake-avx512

but by default the __ABM__ macro is now defined:

$ nvc -V21.11 test.c && ./a.out
__ABM__ defined
$ nvc -V21.11 -tp host test.c && ./a.out
__ABM__ defined

but it is not defined if I explicitly specify the skylake-avx512 architecture that nvc -V tells me was detected:

$ nvc -V21.11 -tp skylake-avx512 test.c && ./a.out
__ABM__ not defined

Is this intentional? It seems surprising that -tp host behaves differently to -tp arch_that_is_apparently_detected_for_host, but maybe I am over-interpreting the -tp X part of the nvc -V output.

The background here is that one of our dependencies assumes that the macro __ABM__ being defined means intrin.h will exist, which now fails on our system with 21.11.

In the meantime, I suppose we can explicitly pass -tp skylake, or patch the dependency.

Best, Olli

Hi Olli,

I took a look and it seems in 21.9 we were testing against support for SSE4a when deciding if by default to include ABM. In 21.11, we instead test directly for ABM but for some reason this is coming back “true” even on a Skylake. I’ve added a problem report, TPR #31063, and sent it to engineering for investigation. Using “-tp” will disable the default checks and why “-tp skylake” works around the issue.

Alternately, you can use the flag “-mno-abm” to disable use of the macro.

Thanks for the report,

Hi Mat,

Thanks for the quick response. It looks like -mno-abm will be a helpful option in the short term.

Just one other bit of information: I believe that this processor does support ABM, it is listed in the flags section of lscpu, so I am not sure that it is wrong as such to enable it. It’s more the inconsistency with -tp skylake that I found surprising and that seems like a minor bug.

Cheers, Olli

Sorry for my misunderstanding. Engineering noticed this as well and already remarked that the issue is why “-tp skylake” isn’t enabling ABM.

They also noted that gcc has similar behavior gcc where it enables ABM with -march=native but not with -march=skylake-avx512.

Hi Olli,

Apologies for the very late notification, but TPR #31063 was fixed back in our 22.2 release.