Nvc++: Undefined reference to '__builtin__pgi_isnormalff'

The following code fails to compile with nvc++ up to version 23.1 with a linker error: Undefined reference to ‘__builtin__pgi_isnormalff’.

__device__
int isnormall(float f) noexcept {
  return isnormal(f);
}

__global__
void test() {
  isnormall(0x1p-149f);
}

int main(){}

Calling isnormal directly in test() solves the problem, but clearly there are circumstances in which this isn’t always convenient.
Out of curiosity, is the issue related to the error issued when compiling with nvcc?

Cheers,
-Nuno

Likely. For reference, here’s the error:

% nvcc -c test.cu
test.cu(3): error: calling a constexpr __host__ function("isnormal") from a __device__ function("isnormall") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

Given “__builtin__pgi_isnormalff” is a host function, there’s likely not a translation to device code available.

Note that nvc++ does not fully support CUDA. It supports some, primarily enough to compile Thrust, which our STDPAR implementation is based. But you shouldn’t expect it to be a drop-in replacement for nvcc.

-Mat

Okay, so what’s happening when I replace the call in test with isnormal?
Does test actually become a host function?
In which case, what happens if I instead write this:

__device__
int isnormall(float f) noexcept {
  return isnormal(f);
}
__device__
int isinfff(float f) noexcept {
  return isinf(f);
}

__global__
void test() {
  isnormal(0x1p-149f);
  isinfff(0);
}

int main(){}

Is test a hybrid between the host and the device? I’m a bit confused, sorry.

I’m not an expert here, but given the nvcc message, it appears that when in a “device” function, these intrinsics, i.e. “isnormal” and “isinf”, do not have a device version. Hence the host version is used. Which of course isn’t available.

It does seem that the translation can occur when use in a “global” routine.

Essentially this means that you can’t use these intrinsics inside of a “device” routine.

C++23 does have a constexpr version of the intrinsics, which are evaluated at compile time, The nvcc experimental flag “–expt-relaxed-constexpr” appears to enable this support. However nvc++ doesn’t have a similar option.

I don’t have any insights as to why the intrinsics are only available in global routines. Hence, you might try posting the question on the CUDA forum.

-Mat

Hi Mat,

Apologies, I should have named the function isinfff, I’ve amended it in the previous post to avoid the conflict. I think isinf does exist and is documented as a device function, nvcc doesn’t seem to complain about that one.

The exercise here was to see what happens when part of the global function test() is using a host function, isnormal, and part using a device function, isinfff.
As you suggest, with nvcc I suspect that can’t happen, and you are forced to use the constexpr alternative, resulting in test() being totally executed on the device as whatever can’t be computed on the device was already sorted at compile time.
With nvc++, it’s a bit confusing, unless __global__ doesn’t have exactly the same meaning there or, alternatively, that the compiler is indeed using the constexpr function silently, but failing to do that when inside a __device__ function…

Does that make sense? I’ll try asking the same question on the CUDA forum anyway. :-)

With nvc++, it’s a bit confusing, unless __global__ doesn’t have exactly the same meaning there or, alternatively, that the compiler is indeed using the constexpr function silently, but failing to do that when inside a __device__ function…

Does that make sense? I’ll try asking the same question on the CUDA forum anyway. :-)

No, __global__ and __device__ mean the same thing, and it does look like the constexpr is applied in when isnormal is used as a const expression. In the device routine it’s used as call given the argument is a variable.

Looking at the generated device LLVM code (i.e. add “-gpu=keep”), you can see that device versions of the calls are created. For “isnan”, the device intrinsic, “__nv_isinff” is used but for isnormal, the host builtin “__builtin__pgi_isnormalff” is used.

Why? Again I’m not an authority here so this is my own reasoning, but in looking at the CUDA docs, there isn’t a device routine for isnormal. See: CUDA Math API :: CUDA Toolkit Documentation

Hence when nvc++ is doing the translation, there’s nothing to replace the builtin isnormal with and hence the host version remains.

Bottom line, it doesn’t appear isnormal can be used as function call in device code, and only can be used as a constant expression.

No, __global__ and __device__ mean the same thing, and it does look like the constexpr is applied in when isnormal is used as a const expression. In the device routine it’s used as call given the argument is a variable.

That’s it, thank you! If we add the constexpr specifier:

__device__
constexpr int isnormall(float f) noexcept {
  return isnormal(f);
}

Then we can call isnormall() from test().
That flag, -gpu=keep, is really useful, thanks for sharing.
I think my last question would then be, is isnormal() ever coming?

Sorry, no idea.