Acc_on_device routine with a compile-time constant argument is not evaluated to a constant at compile time

OpenACC specification (version 1.0 and later) says that if the acc_on_device routine has a compile-time constant argument, it evaluates at compile time to a constant.
However, the NVHPC OpenACC compiler (nvc++ V22.11) does not convert the acc_on_device routine with a compile-time constant argument to a constant at compile time.

The example below fails when compiled by NVHPC:

#include <openacc.h>
#include <stdio.h>

#pragma acc routine seq
void foo() {
  if constexpr(acc_on_device(acc_device_host)) 
    printf("host\n");
  else
    printf("not host\n");
}

int main() {
  #pragma acc parallel num_gangs(1)
  {
    foo();
  }
  foo();
  return 0;
}

Thanks for the report Seyong. I recreated the issue here and have filed a report (TPR #34142). We’ll have engineering investigate.

-Mat

Hi Seyong,

Engineering took a look but unfortunately it’s not possible to use “acc_on_device” within a constexpr. The problem has to do with phasing. Constexpr is evaluated by the front-end, but it isn’t until the back-end compilation when the device, host, or both code generation occurs and “acc_on_device” can be evaluated.

“acc_on_device” is a compile time constant and should work as expected if removed from the constexpr.

-Mat

Thank you for the explanation.
Then, is there any nvc++ option that enforces the compiler to remove not-taken path of an if-else statement at compile time? In the above example, the device version of foo()will not execute the if-path (printf("host\n");) since acc_on_device(acc_device_host) in the device version of foo() will be evaluated to false at compile time.

In the back-end, “acc_on_device” does get turned into either true or false depending if it’s generating either device or host code. Then dead code elimination will remove either the true or false part of the conditional. No additional flags required.

For example, here’s the generated CUDA code. As you can see, there’s no if condition used:

% cat test.cpp
#include <openacc.h>
#include <stdio.h>

#pragma acc routine seq
void foo() {
  if (acc_on_device(acc_device_host))
    printf("host\n");
  else
    printf("not host\n");
}

int main() {
  #pragma acc parallel num_gangs(1)
  {
    foo();
  }
  foo();
  return 0;
}
% nvc++ -acc test.cpp -gpu=nollvm,keep
% cat test.n001.gpu
#include "cuda_runtime.h"
#include "nvhpc_cuda_runtime.h"
#include "test.n001.h"
extern "C" __device__ void
_Z3foov(
)
{
printf((const char*)"not host\n"); /* lilix:6 */
}
extern "C" __global__ __launch_bounds__(1) void
_8test_cpp_main_14_gpu(
)
{
_Z3foov(); /* lilix:2 */
}

Note that the old CUDA code generator (i.e. -gpu=nollvm,keep) isn’t supported any longer. It’s just convenient to show this. You can see the same thing in the generated LLVM code as well (i.e. -gpu=keep), it’s just harder to read.

I verified the same behavior on a local test too; thanks.

The reason why I asked the behavior of if constexpr(acc_on_device(acc_device_host)) … is because I wanted the compiler front-end to eliminate the if-path code before the device-specific back-end generates the device code.
For example, in the example below, hostfunc() is defined in an external library that works only on the host, and thus the OpenACC back-end will not be able to handle it properly.

#pragma acc routine seq
void foo() {
  if (acc_on_device(acc_device_host)) 
    hostfunc();
  else
    devicefunc();
}

Is there a way for nvc++ to eliminate certain code in the front-end pass depending on whether it will be executed on the host or device?
(Based on your previus explanation, it seems no, since the nvc++ front-end is oblivious of the device-specific behaviours in the backend.)

It still does this, but just does it in the back-end. You shouldn’t need constexpr.

Granted, this is a simple case, but I re-wrote your example to use a library for the host call and it works fine. Did you try this with your real code? If you tried and encountered an issue, can you post a reproducing example?

Here’s the simple test:

% cat test.cpp
#include <openacc.h>
#include <stdio.h>


#pragma acc routine seq
void foo_dev() {
    printf("not host\n");
}

void foo_host();

#pragma acc routine seq
void foo() {
  if (acc_on_device(acc_device_host))
     foo_host();
  else
     foo_dev();
}

int main() {
  #pragma acc parallel num_gangs(1)
  {
    foo();
  }
  foo();
  return 0;
}
% cat test2.cpp
#include <openacc.h>
#include <stdio.h>

void foo_host() {
    printf("host\n");
}

% nvc++ -c test2.cpp -fpic
% nvc++ -shared -o libtest2.so test2.o
% nvc++ -acc test.cpp -L./ -ltest2
% a.out
not host
host