NVHPC (nvc++ V22.11) fails with `acc routine bind` directive when applying to a template function

The following example that tests acc routine bind directive fails with nvc++ v22.11.
(error: non variable found in parallel pragma list)
Does this mean that NVHPC OpenACC C++ compiler (nvc++) does not support routine bind directive for template functions? Or are there any errors in this example?

#include <openacc.h>
#include <stdio.h>

#pragma acc routine seq
template<class T>
void foo_dev(T i) {
  printf("dev%d\n", i); 
}

#pragma acc routine seq bind(foo_dev)
template<class T>
void foo(T i) { 
  printf("host%d\n", i); 
}

int main() {
  #pragma acc parallel num_gangs(1)
  {   
    int i = 0;
    foo(i);
  }
  int i = 1;
  foo(i);
  return 0;
}

This is a good question, and I’m not sure what the answer is. The OpenACC spec does not mention templated functions in the bind clause section. I opened up an issue, FS#34118, and will get back to you when I have some clarification.

I don’t think this will work, at least not without possible changes to the OpenACC standard and significant work in the compiler.

Given “foo_dev” is a template, unless it’s used, no definition is created. Hence there’s no device routine to create.

The compiler would need to trigger creating the definition if it’s in the the bind clause. However it wouldn’t know which definition to create given there could be multiple. I don’t believe it would be safe to assume the same template as “foo” would be inherited by “foo_dev”.

Brent sent this to the OpenACC standards committee to see what, if anything, could be done.

For now, you’ll need to make “foo_dev” a non-templated function, or use conditional compilation with the _OPENACC macro.

% cat test.cpp
#include <openacc.h>
#include <stdio.h>
#include <nv/target>

#pragma acc routine seq
void foo_dev(int i) {
  printf("dev%d\n", i);
}

#pragma acc routine seq bind(foo_dev)
template<class T>
void foo(T i) {
  printf("host%d\n", i);
}

int main() {
  #pragma acc parallel num_gangs(1)
  {
    int i = 0;
    foo(i);
  }
  int i = 1;
  foo(i);
  return 0;
}
% nvc++ test.cpp -acc ; a.out
dev0
host1
% cat test1.cpp
#include <openacc.h>
#include <stdio.h>

#pragma acc routine seq
template<class T>
void foo_dev(T i) {
  printf("dev%d\n", i);
}

#pragma acc routine seq
template<class T>
void foo(T i) {
  printf("host%d\n", i);
}

int main() {
  #pragma acc parallel num_gangs(1)
  {
    int i = 0;
#ifdef _OPENACC
    foo_dev(i);
#else
    foo(i);
#endif
  }
  int i = 1;
  foo(i);
  return 0;
}
% nvc++ test1.cpp -acc ; a.out
dev0
host1

Thank you for this quick answer.
Then, if foo/foo_dev are instantiated with more than one data type, there will be no currently-working solution unless each instantiation is manually created (e.g., foo_int, foo_float)?

The second suggested work around using the conditional compilation with the _OPENACC macro and explicit calling should still allow you to still use templates for both.

Only if you must use the bind clause do you need not use templates for the device routines.

Thank you for the clarification; not allowing templates for the bind clause is still an issue. I submitted an issue to the OpenACC standard committee.