I’ve been experimenting with the radeon backend for some time, and had some great results with it, but have been running into some very odd issues since trying to use it in a larger project. After a great deal of tracking down, the issue seems to be the result of the llvm backend, but I’ll come back to that. First, I’m running version 14.1 x86_64 under linux. The old version is because there has been an issue with our license, waiting for a renewal to percolate through purchasing, so if this has since been fixed feel free to ignore this.
In any event, given this program:
int main(int argc, char const *argv[]) {
acc_set_device_num(0, acc_device_radeon);
int *arr = new int[500];
#pragma acc kernels loop independent copy(arr[500]) async(0)
for (int i = 0; i < 500; ++i) {
arr[i] = i;
}
#pragma acc wait(0)
for (int i = 0; i < 500; ++i) {
if(arr[i] != i)
exit(1);
}
return 0;
}
And compiled with:
pgCC test.cc --c++11 -acc -ta=radeon,host,nvidia:keep,cc20,cc35,5.5 -Minfo=acc,mp -DCL_USE_DEPRECATED_OPENCL_1_1_APIS --gnu -o test -lOpenCL -v --c++11
The program compiles and runs without issue.
Adding “-g” to the compile command produces an extra warning during compilation:
WARNING: Linking two modules of different data layouts!
Running the resulting binary on a system with only an AMD device returns:
call to clCreateKernel returned error -46: invalid kernel name
On a system with an NVIDIA device:
call to cuModuleGetFunction returned error 500: Not found
Both of which cause the program to exit immediately with no useful debugging information. Running the compile with “-v” the command that produces the fateful warning is the following:
warning: /opt/pgi/linux86-64/14.1/bin/pgocld -amd=tahiti test.a001.ll -o test.a001.bin
no warning: /opt/pgi/linux86-64/14.1/bin/pgocld -amd=tahiti test.a001.cl -o test.a001.bin
This leads me to believe, based on the extension of the intermediate file and arguments to the nvidia equivalents, that the compiler is now using the llvm backend. Adding the llvm requirement also causes this problem, so that seems to bear out. The question is, why does using the llvm backend in this instance cause all accelerated regions to become inaccessible, and why does the backend change for both NVIDIA and Radeon devices when -g is added?