OpenACC: Copying array, returned by a function, to device

I’m trying to copy an array into device memory, where the array pointer is returned by a function call. As an example, let’s assume I have the following class that defines a dynamically allocated array with a size:

class Array {
    public:
        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements]
        }

        ~Array() {
            delete[] data_;
        }

        float* data() {
            return data_;
        }

    private:
        size_t num_elements_;
        float *data_;
};

What I’m trying to do is create an object of the above class in the constructor of another class, and then copy the elements of that array into device memory by calling the data() function (that is, I’d like to be able to keep data_ private). An example of what I’m trying to do is as follows:

class MyClass {
    public:
        MyClass() {
            size_t num_elements = 10;
            Array array(num_elements);

            #pragma acc enter data copyin(this)
            #pragma acc enter data copyin(array.data()[0:num_elements])
        }
};

However, I seem to be getting errors when I try to copy the array into device memory using #pragma acc enter data copyin(array.data()[0:num_elements]).

As such, I was wondering if there is a way for me to do something similar to the above code, where I have the data() function return a pointer to the data_ array, and can then copy all of the array’s elements into device memory.

I’d probably do more something like the following where the Array class manages it’s own data and then have the parent class attach it to it’s data.

FYI, an “attach” basically fills in a pointer in the device container to point at the device memory. This implicit occurs if the container is created on the device and then it’s components are created (‘top-down’). But when the components are created first, as is the case here, you need to manually attach them.

% cat test.cpp
#include <iostream>
#include <cstdlib>
#include <cstdio>

class Array {
    public:
        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements_];
            #pragma acc enter data create(this,data_[:num_elements_])
        }

        ~Array() {
            #pragma acc exit data delete(data_,this)
            delete[] data_;
        }

        float* data() {
            return data_;
        }

#pragma acc routine vector
        void set_data(float val) {
           printf("numele %d\n",num_elements_);
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] = val;
            }
        }

#pragma acc routine vector
        void add_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] += val;
            }
        }

        void print_data() {
            for (int i=0;i<num_elements_;++i) {
                std::cout << data_[i] << std::endl;
            }
        }

#ifdef _OPENACC
        void acc_update_device() {
           #pragma acc update device(num_elements_,data_[:num_elements_])
        }
        void acc_update_self() {
           #pragma acc update self(data_[:num_elements_])

        }
#endif

    private:
        size_t num_elements_;
        float *data_;
};

class MyClass {
    public:
        MyClass(size_t num_elements=10) : array(num_elements) {
            #pragma acc enter data copyin(this)
            #pragma acc enter data attach(array)
        }

      Array array;
};

int main () {

   int ne = 32;
   MyClass C(ne);
   C.array.set_data(100.0);
#ifdef _OPENACC
      C.array.acc_update_device();
#endif
#pragma acc parallel present(C)
{
      C.array.add_data(200.0);
}
#ifdef _OPENACC
      C.array.acc_update_self();
#endif
      C.array.print_data();
      exit(0);
}

% nvc++ test.cpp -w -acc -Minfo=accel -V21.7; a.out
main:
     79, Generating present(C)
         Generating Tesla code
Array::Array(unsigned long):
     11, Generating enter data create(this[:1],data_[:num_elements_])
Array::~Array():
     15, Generating exit data delete(this[:1],data_[:1])
Array::set_data(float):
     23, Generating Tesla code
         26, #pragma acc loop vector /* threadIdx.x */
     26, Loop is parallelizable
Array::add_data(float):
     32, Generating Tesla code
         34, #pragma acc loop vector /* threadIdx.x */
     34, Loop is parallelizable
Array::acc_update_device():
     48, Generating update device(num_elements_,data_[:num_elements_])
Array::acc_update_self():
     52, Generating update self(data_[:num_elements_])
MyClass::MyClass(unsigned long):
     65, Generating enter data copyin(this[:1])
         Generating enter data attach(array)
numele 32
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300

I’m getting the following error when compiling the code you shared (without using -V21.7 since I only have version 21.3):

NVC++-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected name in attach or detach clause

Does attach only work with the latest nvc++ compiler?

Yes, sorry about that. The ‘attach’ clause in C++ had issues in earlier compiler version. Though you can instead use the 'acc_attach" API call.

% cat test.cpp
#include <iostream>
#include <cstdlib>
#include <cstdio>
#ifdef _OPENACC
#include <openacc.h>
#endif

class Array {
    public:
        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements_];
            #pragma acc enter data create(this,data_[:num_elements_])
        }

        ~Array() {
            #pragma acc exit data delete(data_,this)
            delete[] data_;
        }

        float* data() {
            return data_;
        }

#pragma acc routine vector
        void set_data(float val) {
           printf("numele %d\n",num_elements_);
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] = val;
            }
        }

#pragma acc routine vector
        void add_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] += val;
            }
        }

        void print_data() {
            for (int i=0;i<num_elements_;++i) {
                std::cout << data_[i] << std::endl;
            }
        }

#ifdef _OPENACC
        void acc_update_device() {
           #pragma acc update device(num_elements_,data_[:num_elements_])
        }
        void acc_update_self() {
           #pragma acc update self(data_[:num_elements_])

        }
#endif

    private:
        size_t num_elements_;
        float *data_;
};

class MyClass {
    public:
        MyClass(size_t num_elements=10) : array(num_elements) {
#ifdef _OPENACC
            #pragma acc enter data copyin(this)
            acc_attach((void**) &array);
#endif
        }

      Array array;
};

int main () {

   int ne = 32;
   MyClass C(ne);
   C.array.set_data(100.0);
#ifdef _OPENACC
      C.array.acc_update_device();
#endif
#pragma acc parallel present(C)
{
      C.array.add_data(200.0);
}
#ifdef _OPENACC
      C.array.acc_update_self();
#endif
      C.array.print_data();
      exit(0);
}
g% nvc++ -acc test.cpp -V21.3; a.out
numele 32
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300

Thanks Mat, updating to nvc++ 21.7 made your code work. I do have another question however. In my case, I’m not using a constructor and so cannot initialize the Array object in a constructor initializer list. As such, I have to invoke an Array copy assignment operator when initializing the Array object. My problem is that I can’t seem to get a copy assignment operator to work with OpenACC, and always get runtime errors. Below is my attempt at modifying your Array class implementation to include a copy assignment operator:

#include <iostream>
#include <cstdlib>
#include <cstdio>

class Array {
    public:
        Array() {
            data_ = nullptr;

            #pragma acc enter data copyin(this)
        }

        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements_];
            #pragma acc enter data create(this,data_[:num_elements_])
        }

        ~Array() {
            #pragma acc exit data delete(data_,this)
            delete[] data_;
        }

        Array& operator=(const Array &array) {
            if(this == &array) {
                return *this;
            }

            num_elements_ = array.num_elements_;

            float *copied_data = new float[num_elements_];
            for(size_t i = 0; i < num_elements_; ++i) {
                copied_data[i] = array.data_[i];
            }

            delete[] data_;
            data_ = copied_data;

            #pragma acc update device(this)
            #pragma acc update device(data_[0:num_elements_])

            return *this;
        }

        float* data() {
            return data_;
        }

#pragma acc routine vector
        void set_data(float val) {
           printf("numele %d\n",num_elements_);
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] = val;
            }
        }

#pragma acc routine vector
        void add_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] += val;
            }
        }

        void print_data() {
            for (int i=0;i<num_elements_;++i) {
                std::cout << data_[i] << std::endl;
            }
        }

#ifdef _OPENACC
        void acc_update_device() {
           #pragma acc update device(num_elements_,data_[:num_elements_])
        }
        void acc_update_self() {
           #pragma acc update self(data_[:num_elements_])

        }
#endif

    private:
        size_t num_elements_;
        float *data_;
};

class MyClass {
    public:
        MyClass(size_t num_elements=10) {// : array(num_elements) {
            array = Array(num_elements);

            #pragma acc enter data copyin(this)
            #pragma acc enter data attach(array)
        }

      Array array;
};

int main () {

   int ne = 32;
   MyClass C(ne);
   C.array.set_data(100.0);
#ifdef _OPENACC
      C.array.acc_update_device();
#endif
#pragma acc parallel present(C)
{
      C.array.add_data(200.0);
}
#ifdef _OPENACC
      C.array.acc_update_self();
#endif
      C.array.print_data();
      exit(0);
}

However, when running the code with this modified Array class, I get the following runtime error:

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 6.1, threadid=1
host:0x1ff4a80 device:0x7fda42cfa400 size:128 presentcount:0+1 line:17 name:(null)
host:0x7fff0dbd0b90 device:0x7fda42cfa200 size:16 presentcount:0+1 line:17 name:_T57048928_5859
host:0x7fff0dbd0ca8 device:0x7fda42cfa000 size:16 presentcount:0+1 line:11 name:_T57048928_5857
allocated block device:0x7fda42cfa000 size:512 thread:1
allocated block device:0x7fda42cfa200 size:512 thread:1
allocated block device:0x7fda42cfa400 size:512 thread:1
FATAL ERROR: data in update device clause was not found on device 1: name=(null)
 file:/home/jeff/dev/tests/openacc_attach_test.cpp _ZN5ArrayaSERKS_ line:42

Yes, this can get a little tricky, but essentially you need to create 'copied_data", then if array.data_ is present, copy array.data_ to copied_data on the device. I did this two ways, via a device memcpy and a parallel loop. Next you need to delete the current device ‘data_’ array before you delete it on the host and then after assigning it to copied_data, attach it so the device ‘this’ is now pointing at the new data.

Note that I didn’t update ‘this’ itself since the update will do a shallow copy of all the class members (meaning data_ will now be a host pointer). It wouldn’t cause a problem here so long as the attach occurs after the update, but it’s best practice to not update aggregate types, rather just the members.

% cat test.cpp
#include <iostream>
#include <cstdlib>
#include <cstdio>

#ifdef _OPENACC
#include <openacc.h>
#endif

class Array {
    public:
        Array() {
            data_ = nullptr;

            #pragma acc enter data copyin(this)
        }

        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements_];
            #pragma acc enter data create(this,data_[:num_elements_])
        }

        ~Array() {
            #pragma acc exit data delete(data_,this)
            delete[] data_;
        }

        Array& operator=(const Array &array) {
            if(this == &array) {
                return *this;
            }
            num_elements_ = array.num_elements_;
            #pragma acc update device(num_elements_)

            float *copied_data = new float[num_elements_];
            for(size_t i = 0; i < num_elements_ ; ++i) {
                copied_data[i] = array.data_[i];
            }
#ifdef _OPENACC
            // Create the new 'copied_data' on the device
            #pragma acc enter data create(copied_data[:num_elements_])

            // If the array data is present on the device, perform a device memcpy to copied_data
            if (acc_is_present(array.data_,num_elements_*sizeof(float))) {
#ifndef USE_PLOOP
                float *src, *dest;
                dest = (float*) acc_deviceptr(copied_data);
                src =  (float*) acc_deviceptr(array.data_);
                acc_memcpy_device(dest,src,sizeof(float)*num_elements_);
#else
             // alternately if you don't want to use the API routines, you could use a loop
                #pragma acc parallel loop present(copied_data,array)
                for(size_t i = 0; i < num_elements_ ; ++i) {
                   copied_data[i] = array.data_[i];
                }
#endif
            } else {
              // array.data was not present, so update the device value
              #pragma acc update device(copied_data[:num_elements_])
            }
#endif

            // delete the old data from the device
            #pragma acc exit data delete(data_)
            delete[] data_;
            data_ = copied_data;

            // attach the new data array
            #pragma acc enter data attach(data_)

            return *this;
        }

        float* data() {
            return data_;
        }

#pragma acc routine vector
        void set_data(float val) {
           printf("numele %d\n",num_elements_);
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] = val;
            }
        }

#pragma acc routine vector
        void add_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] += val;
            }
        }

        void print_data() {
            for (int i=0;i<num_elements_;++i) {
                std::cout << data_[i] << std::endl;
            }
        }

#ifdef _OPENACC
        void acc_update_device() {
           #pragma acc update device(num_elements_,data_[:num_elements_])
        }
        void acc_update_self() {
           #pragma acc update self(data_[:num_elements_])

        }
#endif

    private:
        size_t num_elements_;
        float *data_;
};

class MyClass {
    public:
        MyClass(size_t num_elements=10) {// : array(num_elements) {
            array = Array(num_elements);

            #pragma acc enter data copyin(this)
            #pragma acc enter data attach(array)
        }

      Array array;
};

int main () {

   int ne = 32;
   MyClass C(ne);
   C.array.set_data(100.0);
#ifdef _OPENACC
      C.array.acc_update_device();
#endif
#pragma acc parallel present(C)
{
      C.array.add_data(200.0);
}
#ifdef _OPENACC
      C.array.acc_update_self();
#endif
      C.array.print_data();
      exit(0);
}
% nvc++ test.cpp -acc -w ; a.out
numele 32
300
300
..cut ...
300
300
300
300
300
300
% nvc++ test.cpp -acc -w -DUSE_PLOOP ; a.out
numele 32
300
300
300
300
... cut ..
300
300
300
300
300
300

Thanks Mat, that works well. However, I’m running into new issues if I try and create more than one Array object in the MyClass class. This is the runtime error I get:

_T52666088_5890 lives at 0x7fffd89a86f0 size 32 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 6.1, threadid=1
host:0xaf0290 device:0x7f281ecfa800 size:128 presentcount:0+1 line:44 name:copied_data
host:0x7fffd89a86f0 device:0x7f281ecfa000 size:16 presentcount:0+1 line:15 name:_T52666088_5878
host:0x7fffd89a8700 device:0x7f281ecfa200 size:16 presentcount:0+1 line:15 name:_T52666088_5878
allocated block device:0x7f281ecfa000 size:512 thread:1
allocated block device:0x7f281ecfa200 size:512 thread:1
allocated block device:0x7f281ecfa800 size:512 thread:1
deleted block   device:0x7f281ecfa600 size:512 threadid=1 
deleted block   device:0x7f281ecfa400 size:512 threadid=1 
FATAL ERROR: variable in data clause is partially present on the device: name=_T52666088_5890
 file:/home/jeff/dev/tests/openacc_attach_test.cpp _ZN7MyClassC1Em line:123

This error occurs even if I don’t initialize the Array object, and simply declare it in the private section of the MyClass class.

It because they’re static objects so the ‘this’ pointer and the base address of the data member are the same. To work around it, we’d need to make the array’s pointers and the allocate them.

% cat test2.cpp
#include <iostream>
#include <cstdlib>
#include <cstdio>

#ifdef _OPENACC
#include <openacc.h>
#endif

class Array {
    public:
        Array() {
            data_ = nullptr;
            #pragma acc enter data copyin(this)
        }

        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements_];
            #pragma acc enter data create(this,data_[:num_elements_])
        }

        ~Array() {
            #pragma acc exit data delete(data_,this)
            delete[] data_;
        }

        Array& operator=(const Array &array) {
            if(this == &array) {
                return *this;
            }
            num_elements_ = array.num_elements_;
            #pragma acc update device(num_elements_)

            float *copied_data = new float[num_elements_];
            for(size_t i = 0; i < num_elements_ ; ++i) {
                copied_data[i] = array.data_[i];
            }
#ifdef _OPENACC
            // Create the new 'copied_data' on the device
            #pragma acc enter data create(copied_data[:num_elements_])

            // If the array data is present on the device, perform a device memcpy to copied_data
            if (acc_is_present(array.data_,num_elements_*sizeof(float))) {
#ifndef USE_PLOOP
                float *src, *dest;
                dest = (float*) acc_deviceptr(copied_data);
                src =  (float*) acc_deviceptr(array.data_);
                acc_memcpy_device(dest,src,sizeof(float)*num_elements_);
#else
             // alternately if you don't want to use the API routines, you could use a loop
                #pragma acc parallel loop present(copied_data,array)
                for(size_t i = 0; i < num_elements_ ; ++i) {
                   copied_data[i] = array.data_[i];
                }
#endif
            } else {
              // array.data was not present, so update the device value
              #pragma acc update device(copied_data[:num_elements_])
            }
#endif

            // delete the old data from the device
            #pragma acc exit data delete(data_)
            delete[] data_;
            data_ = copied_data;

            // attach the new data array
            #pragma acc enter data attach(data_)

            return *this;
        }

        float* data() {
            return data_;
        }

#pragma acc routine vector
        void set_data(float val) {
           printf("numele %d\n",num_elements_);
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] = val;
            }
        }

#pragma acc routine vector
        void add_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] += val;
            }
        }

        void print_data() {
            for (int i=0;i<num_elements_;++i) {
                std::cout << data_[i] << std::endl;
            }
        }

#ifdef _OPENACC
        void acc_update_device() {
           #pragma acc update device(num_elements_,data_[:num_elements_])
        }
        void acc_update_self() {
           #pragma acc update self(data_[:num_elements_])

        }
#endif

    private:
        size_t num_elements_;
        float *data_;
};

class MyClass {
    public:
        MyClass(size_t num_elements=10) {// : array(num_elements) {
            array = new Array(num_elements);
            array2 = new Array(num_elements);
            #pragma acc enter data copyin(this)
            #pragma acc enter data attach(array)
            #pragma acc enter data attach(array2)
        }

      Array *array, *array2;
};

int main () {

   int ne = 32;
   MyClass C(ne);
   C.array->set_data(100.0);
   C.array2->set_data(200.0);
#ifdef _OPENACC
      C.array->acc_update_device();
      C.array2->acc_update_device();
#endif
#pragma acc parallel present(C)
{
      C.array->add_data(200.0);
      C.array2->add_data(200.0);
}
#ifdef _OPENACC
      C.array->acc_update_self();
      C.array2->acc_update_self();
#endif
      C.array->print_data();
      C.array2->print_data();
      exit(0);
}
%
% nvc++ -V21.7 -acc test2.cpp; a.out
numele 32
numele 32
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400

Though, it seems like you’re classes are going to get very complicated so will be more and more tricky to manually manage the data movement. You might consider moving to use CUDA Unified Memory so the CUDA driver manages the data for you. The caveat being that only heap memory can currently be managed by UM, so you’d need to allocate the top-level class (C in this case), but all child class will then just work.

UM is typically just as performant as manually managing the data, unless you’re going back and forth between the host and device a lot, and touching the data on both. Also, if you’re using MPI, UM doesn’t work with CUDA Aware MPI so the data gets copied to the host instead of going direct between GPUs.

% cat test_man.cpp
#include <iostream>
#include <cstdlib>
#include <cstdio>

#ifdef _OPENACC
#include <openacc.h>
#endif

class Array {
    public:
        Array() {
            data_ = nullptr;
        }

        Array(const size_t num_elements) {
            num_elements_ = num_elements;
            data_ = new float[num_elements_];
        }

        ~Array() {
            delete[] data_;
        }

        Array& operator=(const Array &array) {
            if(this == &array) {
                return *this;
            }
            num_elements_ = array.num_elements_;

            float *copied_data = new float[num_elements_];
            for(size_t i = 0; i < num_elements_ ; ++i) {
                copied_data[i] = array.data_[i];
            }

            // delete the old data from the device
            delete[] data_;
            data_ = copied_data;

            // attach the new data array

            return *this;
        }

        float* data() {
            return data_;
        }

#pragma acc routine vector
        void set_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] = val;
            }
        }

#pragma acc routine vector
        void add_data(float val) {
#pragma acc loop vector
            for (int i=0;i<num_elements_;++i) {
                data_[i] += val;
            }
        }

        void print_data() {
            for (int i=0;i<num_elements_;++i) {
                std::cout << data_[i] << std::endl;
            }
        }

    private:
        size_t num_elements_;
        float *data_;
};

class MyClass {
    public:
        MyClass(size_t num_elements=10) {// : array(num_elements) {
            array = Array(num_elements);
            array2 = Array(num_elements);
        }

      Array array, array2;
};

int main () {

   int ne = 32;
   MyClass *C;
   C = new MyClass(ne);
   C->array.set_data(100.0);
   C->array2.set_data(200.0);
#pragma acc parallel present(C)
{
      C->array.add_data(200.0);
      C->array2.add_data(200.0);
}
      C->array.print_data();
      C->array2.print_data();
      exit(0);
}
% nvc++ -V21.7 -acc -gpu=managed test_man.cpp ; a.out
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
300
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400
400

Oof, yeah this is getting a little too involved from what I initially set out to do (although your help has been extremely useful in giving me a better understanding of OpenACC!). I think I’ll just keep the data_ pointer public for now so that I can manually manage its device allocation in the class that it is instantiated in, instead of forcing device allocation for each Array instantiation. Most instantiations of the Array class I’m not looking to allocate on the device anyway, so it feels like a bit of an overkill to force device allocation for each Array object. The alternative solution seems to be creating two different Array-like classes for host-only use and host-device use, which will also be a bit too involved.

Going back to my original question, will there be any work in future OpenACC versions towards making it possible to have, for example, the data() function return a pointer to the private data_ array, and then copy all of the data_ array’s elements into device memory using something along the lines of #pragma acc enter data copyin(array.data()[0:num_elements])?

I can run this by the compiler folks, but even if it is possible, I’m not sure it’s useful. Since it’s returning a raw pointer which causes it to loose it’s association with the parent class, the compiler might not be able to perform the implicit attachment. Meaning that you’ll not be able to access “data” through “array” on the device. You’d need to access it via a temp pointer or only within the array class itself. You might be able to manually attach it, but this would need to be done within the array class given data is private. In that case, you might as well put the enter data directive in the class so the attach isn’t needed.

You can do this now by using temp pointers, doing something like:

ptr = array.data();
#pragma acc enter data copyin(ptr[:num_elements])
.. later when data is used in a compute region ...
ptr = array.data();
#pragma acc parallel loop present(ptr)
for (int i = ....) {
    ptr[i] = ... something ...
}

Another solution I was thinking of, that would allow data_ to be private and still be copied to the GPU/device when an Array object is instantiated in another class, would be to define copy_data_to_gpu() and delete_data_from_gpu() functions in the Array class:

void Array::copy_data_to_gpu() {
    #pragma acc enter data copyin(data_[:num_elements_])
}

void Array::delete_data_from_gpu() {
    #pragma acc exit data delete(data_)
}

And then in the MyClass init() function (or a constructor) that instantiates Array objects I could have:

void init() {
    ...
    #pragma acc enter data copyin(this)
    array.copy_data_to_gpu();
}

And in the MyClass destructor I could have:

~MyClass() {
    array.delete_data_from_gpu();
    #pragma acc exit data delete(this)
}

And then I could still use the original Array object in a loop instead of using an additional ptr pointer variable:

#pragma acc parallel loop present(array)
for(size_t i = ...) {
    ...
}

But I was wondering if this approach might be regarded as not a “best practice”, or if there are some more subtle issues that could arise when doing it this way that I’m not aware of.