PGCC-S-1000-Call in OpenACC region to procedure 'memmove' which has no acc routine information

I have a code: GitHub - AndStorm/openacc: GPU OPENACC PROBLEM.
I can not compile it to launch on GPU using OpenAcc and PGI 19.4 C++ compiler pgc++,
CUDA 10.1 installed properly and the Nvidia driver version 418.67, cmake 3.13.1.
The GPU on my machine is GeForce GTX 650 Ti. The OS is Fedora 23 x86_64.

  1. The compilation fails with an error (ERRGPU.txt in the repository):
PGCC-S-1000-Call in OpenACC region to procedure 'memmove' which has no acc routine information (/home/70-gaa/NFbuild_script_CHECK_GPU/source/nbody.cpp: 93)
PGCC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Missing branch target block (/home/70-gaa/NFbuild_script_CHECK_GPU/source/nbody.cpp: 1)
PGCC-F-0704-Compilation aborted due to previous errors. (/home/70-gaa/NFbuild_script_CHECK_GPU/source/nbody.cpp)
PGCC/x86-64 Linux 19.4-0: compilation aborted

I use the following compile line:

cmake . -DCMAKE_C_COMPILER=pgcc -DCMAKE_CXX_COMPILER=pgc++ -DCMAKE_CXX_FLAGS="-acc -mcmodel=medium -ta=tesla:cc30,nollvm,managed -fast -Mcuda=cuda10.1" -DCMAKE_CXX_STANDARD=17 ("managed" - try to yse CUDA unified memory).

DCMAKE_CXX_STANDARD=17 is necesary for compilation of T3ParticleTable.cpp and T3MaterialTable.cpp.

and i make necessary for using cudaMemcpy() #include’s in the beginning of T3DataHolder.h - lines 38-43:

#ifdef OPENACC
#include <accelmath.h>
#include <openacc.h>
#include <cuda.h>
#include <cuda_runtime.h>
#endif

What causes this error? And how to avoid it?

  1. In nbody.cpp, where there is main(), i try to allocate DataHolder on GPU as:
DataHolder<FloatingType> * data;
cudaMalloc((void**) & data, sizeof(DataHolder<FloatingType>));
...
cudaFree(data);

Is it right? Or maybe use #pragma acc data create(d) clause in main:

DataHolder<FloatingType> d;
#pragma acc data create(d)
{
...
}

and copy the object to GPU?
3) If to comment //#define OPENACC at line 13 of T3Defs.h (use OpenMP #pragma’s and execute code on CPU),
the code compiles and works well on CPU with the compile line:

cmake . -DCMAKE_C_COMPILER=pgcc -DCMAKE_CXX_COMPILER=pgc++ -DCMAKE_CXX_FLAGS="-acc -ta=multicore -fast" -DCMAKE_CXX_STANDARD=17.

But if to uncomment

Floating csBorder[numi+1];

at line 48 of T3MultipleScatteringFSImpl.h and comment the 2 following lines:

std::vector<Floating> csBorder;
csBorder.resize(numi+1);

the compilation fails with the following error (ERRCPU.txt in repository):

line 173: internal error: assertion failed at: "../src/statements.c"
...
1 error and 1 catastrophic error detected in the compilation of "/home/70-gaa/NFbuild_script_CHECK_GPU/source/nbody.cpp".
Compilation aborted.

Why?
I have already been working on solving these problems for several days, but do not know what to do. Please, help me.

Hi @and,

What causes this error?

It took a big a digging, but I think I understand what’s happening to cause the “memmove” reference error.

First, many system calls like “memmove” do not have equivalent device routines so should be avoided.

In this case I believe the memmove is coming from implicit copies of a std::vector. Consider your code at line 83 of T3MultipleScatteringCSImpl.h

  for(size_t i=0; i<numi; ++i)
  {
    csIsotope[i] = GetCS(e,incPDG,aMaterialTable.GetIsotopes(matID)[i]);
  }

Where “aMaterialTable.GetIsotopes” returns a std::vector. This creates a temporary copy of the vector and the code to copy a vector (from the system “bits/std_vector.h”) includes the memmove call.

There are other similar cases through out the code where implicit copies are occurring when using vector as well as other std calls.

And how to avoid it?

Unfortunately, I don’t have a good recommendation for you. In general I recommend to not use vectors or other std routines in device code. Besides not being thread safe, they may contain hidden system calls.

For the vectors, you might be able to work around this issue by directly accessing the underlying data (in this case it’s the vector fMaterials and fFractions variable when returning the inner vector) so no copy will be made.

Not sure what to do about other std calls like “get”.

-Mat