Hello, I am attempting to copy a dynamically allocated 2D array over to the GPU. The actual code is distributed among over 3,700 lines of C++, so I’ll attempt to summarize my issue.
Data is allocated progressively throughout the code in a manner similar to:
//in class definition
unsigned int **mapCoarseToFinerRows;
//somewhere else in the code
mapCoarseToFinerRows = (unsigned int**)malloc(sizeof(unsigned int*) * k);
//in yet another part of the code
mapCourseToFinerRows[level] = (unsigned int*)malloc(sizeof(unsigned int) * (num_coarse_vtxs + 1));
When I attempt to copyin the data doing a simple:
#pragma acc data copyin(mapCourseToFinerRows[k][num_coarse_vtxs + 1])
I get an internal compiler error:
PGC++-S-0000-Internal compiler error. pragma: bad ilmopc 307 (csrk.cpp: 192)
Installing a new version of the compiler is not possible because I am compiling my code on the Alabama Supercomputer Authority’s Dense Memory Cluster, so I just need to work around this. I have attempted to manually move the memory with acc_malloc
and acc_memcpy_to_device
but I am encountering a runtime error:
FATAL ERROR: variable in data clause is partially present on the device: name=_T30787688_39593
file:/mnt/beegfs/home/uahpal001/spmv/acc-spmv-csrk/csrk.cpp _ZN10CSRk_GraphC1ElllPjS0_PfSsSsSsbiPi line:364
The error pointing to line 364 seems to correspond to the following ACC directive found on line 360:
#pragma acc enter data copyin(r_vec[N + 1], c_vec[NNZ], val[NNZ], N, NNZ, this)
Which is where I copy in all of the data that I can copy in using the pragmas.
Is there an easier workaround for this? What am I doing wrong?
EDIT: In the case that going back to an older PGI compiler solves this issue, here are all the versions of PGI installed on the Dense Memory Cluster:
pgi/14.1
pgi/14.9
pgi/15.3
pgi/15.10
pgi/16.5
pgi/17.5
pgi/17.7
pgi/18.1
pgi/18.5
pgi/18.10
pgi/19.5
pgi/20.1