Kernels become invisible with -g due to implicit use of llvm

I’ve been experimenting with the radeon backend for some time, and had some great results with it, but have been running into some very odd issues since trying to use it in a larger project. After a great deal of tracking down, the issue seems to be the result of the llvm backend, but I’ll come back to that. First, I’m running version 14.1 x86_64 under linux. The old version is because there has been an issue with our license, waiting for a renewal to percolate through purchasing, so if this has since been fixed feel free to ignore this.

In any event, given this program:

int main(int argc, char const *argv[]) {
  acc_set_device_num(0, acc_device_radeon);
  int *arr = new int[500];

#pragma acc kernels loop independent copy(arr[500]) async(0)
  for (int i = 0; i < 500; ++i) {
    arr[i] = i;
  }
#pragma acc wait(0)
  for (int i = 0; i < 500; ++i) {
    if(arr[i] != i)
      exit(1);
  }

  return 0;
}

And compiled with:

pgCC test.cc --c++11 -acc -ta=radeon,host,nvidia:keep,cc20,cc35,5.5 -Minfo=acc,mp -DCL_USE_DEPRECATED_OPENCL_1_1_APIS --gnu -o test -lOpenCL -v --c++11

The program compiles and runs without issue.

Adding “-g” to the compile command produces an extra warning during compilation:

WARNING: Linking two modules of different data layouts!

Running the resulting binary on a system with only an AMD device returns:

call to clCreateKernel returned error -46: invalid kernel name

On a system with an NVIDIA device:

call to cuModuleGetFunction returned error 500: Not found

Both of which cause the program to exit immediately with no useful debugging information. Running the compile with “-v” the command that produces the fateful warning is the following:

warning:  /opt/pgi/linux86-64/14.1/bin/pgocld -amd=tahiti test.a001.ll -o test.a001.bin
no warning: /opt/pgi/linux86-64/14.1/bin/pgocld -amd=tahiti test.a001.cl -o test.a001.bin

This leads me to believe, based on the extension of the intermediate file and arguments to the nvidia equivalents, that the compiler is now using the llvm backend. Adding the llvm requirement also causes this problem, so that seems to bear out. The question is, why does using the llvm backend in this instance cause all accelerated regions to become inaccessible, and why does the backend change for both NVIDIA and Radeon devices when -g is added?

Hi njustn,

Apologies for the late response. I’ve been traveling.

There’s two separate issues here. First is debugging support isn’t available on Radeon but is for Tesla. Adding “-g” will enable Tesla debugging which in turn uses the LLVM back-end. This works fine if you were targeting a single device, but by targeting both Radeon and Tesla, the debug info is getting into the Radeon and causing the “clCreateKernel”. The work around would be to either just target Radeon or add “nodebug” to the end of your -ta flags so only host debugging info is created. This was cleaned up in 14.4 at the same time we officially added device side debug support.

The second issue is the “WARNING: Linking two modules of different data layouts!” message. I checked with engineering and they say this message can be safely ignored.

  • Mat

Failing Case with 14.1:

% pgCC test.cc --c++11 -acc -ta=radeon,host,nvidia:keep,cc20,cc35,5.5 -Minfo=acc,mp -DCL_USE_DEPRECATED_OPENCL_1_1_APIS --gnu -o test -lOpenCL --c++11 -V14.1 -g ; ./test
main:
      7, Generating copy(arr[0:500])
         Generating Radeon code
         Generating NVIDIA code
     10, Loop is parallelizable
         Accelerator kernel generated
         10, #pragma acc loop gang, vector(128) /* global dim(0) local dim(0) */
WARNING: Linking two modules of different data layouts!
call to clCreateKernel returned error -46: invalid kernel name

14.1 Workarounds:

% pgCC test.cc --c++11 -acc -ta=radeon,host,nvidia:keep,cc20,cc35,5.5,nodebug -Minfo=acc,mp -DCL_USE_DEPRECATED_OPENCL_1_1_APIS --gnu -o test -lOpenCL --c++11 -V14.1 -g ; ./test
main:
      7, Generating copy(arr[0:500])
         Generating Radeon code
         Generating NVIDIA code
     10, Loop is parallelizable
         Accelerator kernel generated
         10, #pragma acc loop gang, vector(128) /* global dim(0) local dim(0) */
% pgCC test.cc --c++11 -acc -ta=radeon -Minfo=acc,mp -DCL_USE_DEPRECATED_OPENCL_1_1_APIS --gnu -o test -lOpenCL --c++11 -V14.1 -g ; ./test
main:
      7, Generating copy(arr[0:500])
         Generating Radeon code
     10, Loop is parallelizable
         Accelerator kernel generated
         10, #pragma acc loop gang, vector(128) /* global dim(0) local dim(0) */
WARNING: Linking two modules of different data layouts!
%

Original failing case cleaned-up in 14.4:

% pgCC test.cc --c++11 -acc -ta=radeon,host,nvidia:keep,cc20,cc35,5.5 -Minfo=acc,mp -DCL_USE_DEPRECATED_OPENCL_1_1_APIS --gnu -o test -lOpenCL --c++11 -V14.4 -g; ./test
main:
      7, Generating copy(arr[:500])
         Generating Radeon code
         Generating Tesla code
     10, Loop is parallelizable
         Accelerator kernel generated
         10, #pragma acc loop gang, vector(128) /* global dim(0) local dim(0) */
%

Thank you for the workaround, and notice about 14.4. I switched to 14.4 today and it does work for me as well.