-ta=multicore failed to compile with llvm code generation

Also looking for workarounds that still use llvm code generation.

/**
$ pgc++ -v
Export PGI_CURR_CUDA_HOME=/opt/pgi/linux86-64-llvm/2019/cuda/
Export PGI=/opt/pgi
pgc++-Warning-No files to process


$ pgc++ -c bug.cc -ta=multicore
/opt/pgi/linux86-64-llvm/19.4/share/llvm/bin/opt: /tmp/pgc++lElHCF-OeKc.ll:106:23: error: use of undefined value '%n2.addr'
        %22 = load i32, i32* %n2.addr, align 4, !tbaa !33, !dbg !44
                             ^
 */

void func(float* b, unsigned n) {
  unsigned n2 = n;
  #pragma acc parallel loop independent private(b[0:n2])
  for (int i = 0; i < n; ++i) {}
}

Hi stw,

Thanks for the report. I added TPR#27435 to track this issue.

The problem here is that since “n2” isn’t used anywhere else except as the loop bounds to a private clause, the reference is getting deleted. I’ve seen similar issues in the past but it looks like we missed this case. The same code works when targeting Tesla or when used in a copy clause.

The work around would be to use “n” in place of “n2”, or reference “n2” someplace in the body of the loop. Something like:

void func(float* b, unsigned n) {
  unsigned n2 = n;
  #pragma acc parallel loop independent private(b[0:n2])
  for (int i = 0; i < n; ++i) {
    for (int j = 0; j < n2; ++j) {
    }
  }
}

-Mat

Thank you Mat.

The first post was just a minimal working example. The original code looked like

unsigned nsq = n * n;
// ... malloc() using nsq
#pragma acc parallel loop independent private(b[0:nsq])
for (int i = 0; i < n; ++i) {
// ...
}

Thanks to your reply, I think the problem can be avoided simply by

#pragma acc parallel loop independent private(b[0:n*n])