Simple source file that just contains call to xgetbv() fails to link when using “-tp px -mxsave” as command-line parameters. Compiler doesn’t inline the intrinsic function even though both gcc and clang does inline the function when only “-mxsave” is passed, There is no separate option to enable “xgetbv” or “xgetbv1” attribute in LLVM backend.
Same code links correctly if “-tp host” is used, but the resulting binary will be useless on older processor architectures that don’t support AVX2 as unsupported instructions can be generated outside run-time feature checks.
Hi MonniTheCat and welcome!
Do you mind providing a reproducing example to illustrate the issue?
I tried writing a reproducing example (see below), but it doesn’t look like we support “__builtin_ia32_xgetbv” which is what “_xgetbv” uses in “xsaveintrin.h”, hence I’m unable to reproduce the issue you describe.
Here’s my simple example:
% cat test.c
#include <stdio.h>
#include <stdint.h>
#include <immintrin.h>
int main() {
uint64_t xcr0 = _xgetbv(0);
printf("XCR0 value: 0x%llx\n", (unsigned long long)xcr0);
return 0;
}
% nvc -mxsave -tp=host test.c
/usr/bin/ld: /tmp/nvceiN0lmdKU1H29.o: in function `main':
test.c:6: undefined reference to `__builtin_ia32_xgetbv'
-Mat
“-tp=host” and “-mxsave” are mutually exclusive… You can only use “-mxsave” with “-tp px” and “-tp x86-64-v2”, as later micro-architectures should already enable the intrinsic and generating the underlying assembler instruction.
I’m pretty sure the linker error is caused by the header not specifying the intrinsic needs to be inlined when “xsave” attribute is specified. Obviously the compiler should also support the builtin before it can inline it correctly.
Obviously the compiler should also support the builtin before it can inline it correctly.
Correct, and I can submit a request to add it, but I’m trying to understand how you we’re able to successfully compile with “-tp host”.
_xgetbv() is not needed at all when compiling for current processor as it is used to detect if the processor has AVX2 or AVX512 enabled at run-time. Using just cpuid() is not enough as it only checks the hardcoded feature bit on the CPU, not if the vector unit is disabled in software.
When using “-tp host”, preprocessor has all the CPUID feature bits and those don’t include “xgetbv1”, even though “lscpu” command does list it.
nvcpfe:
-D__MMX__ -D__SSE_MATH__ -D__MMX_WITH_SSE__ -D__SSE__ -D__SSE2__ -D__SSE2_MATH__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -D__ABM__ -D__ADX__ -D__AES__ -D__AVX__ -D__AVX2__ -D__BMI__ -D__BMI2__ -D__CLFLUSHOPT__ -D__CX16__ -D__F16C__ -D__FMA__ -D__FSGSBASE__ -D__FXSR__ -D__LZCNT__ -D__MOVBE__ -D__PCLMUL__ -D__POPCNT__ -D__PRFCHW__ -D__RDRND__ -D__RDSEED__ -D__LAHF_SAHF__ -D__XSAVE__ -D__XSAVEC__ -D__XSAVEOPT__ -D__XSAVES__
lscpu:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves vnmi md_clear flush_l1d arch_capabilities