How to allocate and treat class member dynamic array data on GPU?

I have a simple code example:

#include <iostream>
#include <accelmath.h>
#include <openacc.h>

class Example {
public:
Example()
{
    S=0;
    arr = new int[N];
    for(int i=0; i<N; ++i) arr[i]=i;
#pragma acc enter data copyin(this[0:1], arr[0:N])
}
~Example(){delete [] arr;}
long int count()
{
    S=0;
    //#pragma acc update device(S)    
#pragma acc parallel loop gang vector present(arr[0:N]) reduction(+:S)
  for(int i=0; i<N; ++i) S += arr[i];
  return S;
}
void print()
{
#pragma acc update self(arr[0:N])    
  for(int i=0; i<N; ++i) std::cout<<arr[i]<<"  ";
  std::cout<<std::endl;
}
private:
static const int N = 1000;
int * arr;
long int S;
};

int main(int argc, char **argv)
{
  Example ex;
  std::cout<<ex.count()<<std::endl;
  return 0;
}

I fails at runtime with an error:

Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

How to get it to work?

Could You be so kind to answer some extra questions?
I have been away from my workplace for several months because of COVID-19 quarantine.
Now I had to ask a question on some source code using OpenAcc and saw: PGI forums have moved!
I am a user of PGI user forum for 2 years already, and some 2 years ago M. Colgrove wrote me on the old PGI user forum that I could send some not small size pieces of source code to PGI customer service: trs@pgroup.com asking them to forward it to him, and he will have a look at it.
But now the e-mail address is invalid. Is there any other address or way to seldom send some pieces of C++ code using OpenAcc, which are not convenient to be posted here, with a question to expert.
The old forum looked much more convenient and easier to use. The list of topics was much easier to look through.

There is also a question: how to use pgc++ PGI compiler now? It is written on the PGI official site that: PGI Compilers & Tools have evolved into the NVIDIA HPC SDK.
How to use the compiler now?
NVIDIA HPC SDK docs say:

OpenACC directives are enabled by adding the -⁠acc flag to the compiler command line.

So, use nvc++ with -acc compiler flag, isn’t it? But are the compiler options the same as were in pgc++ (-ta=tesla:cc70, -mcmodel=medium, -Minfo=accel, -Minline and some others)?

Excuse me for so many questions, but there are so many changes which I did not keep track of.

Hi Andrey,

Yes! There have been a lot changes in the last few months. PGI was re-branded as the NVIDIA HPC Compilers and now part of a larger NVIDIA HPC SDK which bundles the compiler (including nvcc), profilers, math libraries, mpi, and various CUDA versions, into a single package.

And yes, the forums have moved as well, but I’m still here. Given I started the original PGI Forums back in 2004, I was a bit worried about the migration and loosing 16 years of knowledge, but our team did a great job with the migration, and all 16 years of history was preserved, user were migrated, and even all internal links still work. They really did a great job and I couldn’t have been happier with the transition.

The old PGI Support email, trs/support@pgroup.com, has gone away. Technically I was only supposed to use that email for PGI Professional Edition customers, but I did from time to time break this rule and have Community Edition customers send me code. Though unlike the PGI UF, you can upload files in the NVIDIA Forums so there is no longer need to send code via email (look for the “upload” button when posting).

Personally, I like the NVIDIA Forums better, and I hope over time, you will as well.

how to use pgc++ PGI compiler now?

pgc++ is still there, but you might consider using nvc++ instead. They’re really the same thing, but eventually the old PGI drivers will go away.

So, use nvc++ with -acc compiler flag, isn’t it?

pgc++ used the “-acc” flags as well, it’s just that “-acc” is implied when you use “-ta”. However there will be some changes to these flags. Though the changes are not due to the re-branding, rather it was a just good time to make the switch. Basically since we now added C++ standard parallel language support, along with OpenACC, and in the near future will support OpenMP Target Offload to GPUs as well as native Fortran parallelism (i.e. DO CONCURRENT), the “-ta=tesla” flag is being transitioned to the “-gpu” flag (and will use the same sub-options), which is then applied to all the GPU offload model and languages. Then to target OpenACC with a GPU device, you’ll then use “-acc=gpu”. Now, “-ta” will still work, at least for some time, so you can still use it, but you may consider using the new flags if you’ve updated to the NVIDIA HPC SDK.

As for you’re code, I was not able to reproduce the illegal memory address error. However, members of aggregate types (classes, structs, Fortran User Derived Types) are not allowed to be used in reductions. Given “S” is a member of the class, this is most likely the issue. The simple work around is to use a local variable, rather than “S” in the reduction.

% cat test.cpp
#include <iostream>
#include <accelmath.h>
#include <openacc.h>

class Example {
public:
Example()
{
    S=0;
    arr = new int[N];
    for(int i=0; i<N; ++i) arr[i]=i;
#pragma acc enter data copyin(this[0:1], arr[0:N])
}
~Example(){delete [] arr;}
long int count()
{
    S=0;
    long int sum=0;
#pragma acc parallel loop gang vector present(arr[0:N]) reduction(+:sum)
  for(int i=0; i<N; ++i) sum += arr[i];
  S=sum;
  return S;
}
void print()
{
#pragma acc update self(arr[0:N])
  for(int i=0; i<N; ++i) std::cout<<arr[i]<<"  ";
  std::cout<<std::endl;
}
private:
static const int N = 1000;
int * arr;
long int S;
};

int main(int argc, char **argv)
{
  Example ex;
  std::cout<<ex.count()<<std::endl;
  return 0;
}
% nvc++ -acc=gpu test.cpp -Minfo=accel; a.out
Example::Example():
     13, Generating enter data copyin(this[:1],arr[:1000])
Example::count():
     18, Generating present(arr[:1000])
         Generating Tesla code
         20, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
             Generating reduction(+:sum)
     18, Generating implicit copy(sum) [if not already present]
         Generating implicit create(this[:]) [if not already present]
499500

Hope this helps,
Mat

Thank You very much for the answer!
I have some extra questions, if You please, could You be so kind to answer them.

  1. Is it possible to seldom upload big source codes (several thouthands of code) in NVIDIA forums?

  2. I used the following compile line

    cmake . -DCMAKE_C_COMPILER=pgcc -DCMAKE_CXX_COMPILER=pgc++
    -DCMAKE_C_FLAGS="-acc -Minfo=acc -mcmodel=medium -ta=tesla:cc70
    -tp=haswell -Mnollvm -Minline -Mcuda=cuda10.1"
    -DCMAKE_CXX_FLAGS="-acc -Minfo=acc -mcmodel=medium -
    ta=tesla:cc70
    -tp=haswell -Mnollvm -Minline -Mcuda=cuda10.1" -
    DCMAKE_CXX_STANDARD=17 -DACC=ON -DCUDA=ON
    to compile the code fo launching it on a NVIDIA Titan V GPU installed in a 64-core Intel Xeon Phi KNL CPU (64 Haswell cores => -tp=haswell).
    How should this compile line look if to use nvc++ compiler instead of pgc++?

  3. I’m willing to take part in GPU Bootcamp at the end of the month (OpenAcc 2020 Summit, 2-day GPU Bootcamp). How will it look like? I have a notebook with access to the Internet and headphones. Should I have a microphone or it is not necessary? Will there be a chat to ask questions? It will be a webinar, won’t it?

Thank You very much for support and help!

I’m actually not sure what the size limit it, but would think even a thousand line source code would be small enough to update.

  1. How should this compile line look if to use nvc++ compiler instead of pgc++?

First, the PGI drivers are still there, at least for the time being as are all the flag you’re using. Though, there’s no longer a “nollvm” version of the compilers so “-Mnollvm” will be a no-op.

Though given the additional support of standard language parallelism in C++ and in the future Fortran, as well as future support for OpenMP target offload to GPUs, we are going through a process of streamlining the GPU flags. The flag “-gpu” has been added for common GPU options, and “-acc” has added sub-options to control the target device. So in your case, you’ll want to look at using:

-acc=gpu -Minfo=acc -mcmodel=medium -tp=haswell -Minline -cuda -gpu=cuda10.1,cc70

You might try adding a higher level of CPU optimization as well by adding “-O2” or “-fast”.

I’m willing to take part in GPU Bootcamp at the end of the month (OpenAcc 2020 Summit, 2-day GPU Bootcamp). How will it look like? I have a notebook with access to the Internet and headphones. Should I have a microphone or it is not necessary? Will there be a chat to ask questions? It will be a webinar, won’t it?

While I’m aware of the GPU Bootcamp, I’m not involved with it. But I just sent a note the to organizers asking where I can direct your questions.

Thanks a lot. Some 2 small extra questions, please.

  1. Adding PGI C++ compiler -fast flag for GPU code will look like

    -gpu=cuda10.1,cc70,fast
    won’t it?

  2. I have a notebook woth AMD Vega GPU. Does PGI C++ compiler, using OpenAcc standard, allow to launch the code on this GPU and how to specify in compiler options that it is AMD Radeon Vega GPU and its compute capability?

Hi Andrey,

  1. No, just use “-fast” and it will be applied to both CPU and GPU code. The “-gpu=fast” option is an abbreviation for “-gpu=fastmath” which use better performing but less actuate math intrinsics.

  2. No, we don’t support AMD GPUs. You may want to look at using GNU 10.2 to target AMD.

-Mat

Thank You very much!