Copyin Multidimensional Dynamic Array with PGC++ 20.1

Hello, I am attempting to copy a dynamically allocated 2D array over to the GPU. The actual code is distributed among over 3,700 lines of C++, so I’ll attempt to summarize my issue.

Data is allocated progressively throughout the code in a manner similar to:

//in class definition
unsigned int **mapCoarseToFinerRows;
//somewhere else in the code
mapCoarseToFinerRows = (unsigned int**)malloc(sizeof(unsigned int*) * k);
//in yet another part of the code
mapCourseToFinerRows[level] = (unsigned int*)malloc(sizeof(unsigned int) * (num_coarse_vtxs + 1));

When I attempt to copyin the data doing a simple:

#pragma acc data copyin(mapCourseToFinerRows[k][num_coarse_vtxs + 1])

I get an internal compiler error:

PGC++-S-0000-Internal compiler error. pragma: bad ilmopc     307  (csrk.cpp: 192)

Installing a new version of the compiler is not possible because I am compiling my code on the Alabama Supercomputer Authority’s Dense Memory Cluster, so I just need to work around this. I have attempted to manually move the memory with acc_malloc and acc_memcpy_to_device but I am encountering a runtime error:

FATAL ERROR: variable in data clause is partially present on the device: name=_T30787688_39593
 file:/mnt/beegfs/home/uahpal001/spmv/acc-spmv-csrk/csrk.cpp _ZN10CSRk_GraphC1ElllPjS0_PfSsSsSsbiPi line:364

The error pointing to line 364 seems to correspond to the following ACC directive found on line 360:

#pragma acc enter data copyin(r_vec[N + 1], c_vec[NNZ], val[NNZ], N, NNZ, this)

Which is where I copy in all of the data that I can copy in using the pragmas.

Is there an easier workaround for this? What am I doing wrong?

EDIT: In the case that going back to an older PGI compiler solves this issue, here are all the versions of PGI installed on the Dense Memory Cluster:

pgi/14.1
pgi/14.9
pgi/15.3
pgi/15.10
pgi/16.5
pgi/17.5
pgi/17.7
pgi/18.1
pgi/18.5
pgi/18.10
pgi/19.5
pgi/20.1

This is a known issue that was fixed in the 20.11 release. Though it’s caused by the incorrect syntax (missing triplet notation) so updating the code to use the following should fix the issue:

#pragma acc data copyin(mapCourseToFinerRows[0:k][0:num_coarse_vtxs + 1])

FATAL ERROR: variable in data clause is partially present on the device: name=_T30787688_39593

This is indicating the ‘this’ pointer has already been created on the device but has different size then the previous use.

I’m not sure if it’s due to you’re use manually allocating device pointers via acc_malloc, that you have a parent class which has already created the child class on the device, or it’s just an ordering issue.

While this is probably not the issue but you can try moving ‘this’ to be the first item in the copyin clause. ‘this’ should be created on the device before any of it’s child variables so the compiler can implicitly attach them. It’s possible to have ‘this’ after, but you’d then need to go back and manually attach them (via the ‘acc_attach’ API routine or the ‘attach’ clause on a data directive).

Otherwise, I’ll need more details and preferable a small reproducer which show how you’re doing the data management.

Note, if you’re class is particularly complex, you might consider using CUDA Unified Memory by using the flag “-ta=tesla:managed” With UM, the CUDA Driver will manage the data for you. The caveat being that only allocated memory is managed. Static memory still needs to used the data directives.

Note that when we rebranded PGI to the NVHPC SDK in 20.5, the compilers became available at no-cost and no longer needs a license server. You might want to let the admins know if this helps them in updating to a more current release of the compilers.

-Mat

Hello,

Thank you very much for your help. What I ended up doing was, since I knew the mapCourseToFinerRows array was only 2xN (where N is a pretty large number) treated it as two 1D arrays of size N. As for the variable in data clause is partially present on the device error, I was copying in both elements from the class (like this->N and this itself, which the compiler didn’t like (for obvious reasons). I removed those extra copyins and only kept the copyin(this) (as well as the dynamically-allocated arrays) and my code works now!

Good to know that that was a compiler issue. I actually did encounter one other issue where data in a copyin clause was being allocated on the device, but the data was not being copied over (despite the fact that I was not using create clauses anywhere). I downgraded to PGI 19.5 and that resolved the issue. I attempted to recreate the issue using a source file that wasn’t almost 4,000 lines long so that I could post a bug report, but was unsuccessful in recreating the issue.

The system admins are in the process of getting NVHPC SDK 21.5 installed. They keep PGI versions back to 14.1 for compatibility with old code. I will look at using an updated version of OpenACC once the install is complete next week. Thank you again for your help.