How to handle a list of pointer and operate in GPU

Hi,

I copy data in class constructor and have a list(vector) to store all object pointers.
I would like to parallelize a for loop and do comparisons. I am wondering how to copy entire data(all objects and vector) into GPU memory. Can someone review my code and advise if my idea is right or not? Thank you very much!

#include
#include <openacc.h>

// my managed vector datatype
class EMData{
float* data;
size_t size;
bool iscopy;
public:
EMData( size_t size_ ){ // constructor
size = size_;
data = new float[size];
iscopy = false;
#pragma acc enter data copyin(this[0:1]) create(data[0:size])
}
EMData( const EMData &copyof ){ // copy constructor
size = copyof.size;
data = copyof.data;
iscopy = true;
#pragma acc enter data copyin(this[0:1])
acc_attach( (void**)&data );
}
void updatehost(){ // update host copy of data
#pragma acc update self( data[0:size] )
}
void updatedev(){ // update device copy of data
#pragma acc update device( data[0:size] )
}
~EMData(){ // destructor from host
if( !iscopy ){
#pragma acc exit data delete( data[0:size] )
delete data;
}
#pragma acc exit data delete( this[0:1] )
}
inline elemtype & operator (int i) const { return data_; }
// other member functions

bool Compare(const EMData *c){
return (memcmp(c->data, data, size))? ture : false;
}
};

main()
{
int i, j, k;
int len = 1000;

vector<EMData *> v;
for ( i = 0; i < len; i++) {
EMData *image = new EMData();
v.push_back(image);
}

EMData *limage = new EMData();

#pragma acc data present(limage) copyin(v[0:len])
{
#pragma acc parallel loop independent
for (i = 0; i < len; i++) {
bool result = limage->Compare(limage, v);

}
}_

Hi Po Chun LAI,

Can someone review my code and advise if my idea is right or not?

Since the example isn’t complete, I can’t tell for sure, but the manual deep copy code of the EMData class looks correct.

Though, the use of a std::vector is going to be an issue. For Vectors, I would suggest using CUDA Unified Memory (-ta=tesla:managed). Otherwise, you’d need to copy “v” to the device and then manually attach each of the EMData classes to the device side vector. By copying “v”, you’re actually copying the host addresses since vectors contain a list of pointers to EMData so you have to go back and fill in the device pointers. Possible, but tricky. Hence it would be much easier to unified memory.

Also, I don’t think the system call to “memcmp” is available on the device. You may need to make this an explicit comparison loop.

-Mat