Hello, I am using openacc to accelerate a c++ program written by eigen, which uses the eigen class. How do I transfer the data of the class into the GPU when I use the data copy in or pragma acc kernel?
There are multiple ways depending if you want to encapsulate the data management in the class itself or have the main program manage the data.
To encapsulate the data management, I’ll point you some example programs I wrote for the book “Parallel Programming with OpenACC”
In particular, the “accList” class is my simple version of a std::vector style class.
To manage the data within the main program, I first need to ask if the Eigen class use dynamic data members (i.e. pointer members)?
If not, then data management is straight forward and you can simply add the class variable in a data clause. The compiler knows the size of fixed object so will be able to copy all the data members.
If there are dynamic data members, then things get a bit more complex. The problem being that data clauses will perform a shallow copy so any pointers will copy the host address, not copy the data it’s pointing to.
The simplest thing to do and best place to start is use CUDA Unified Memory (i.e. add the flag “-ta=tesla:managed”). In this case, all allocation will use a shared address space which is accessible from either the host or device. While you’ll still need to manage static data, all dynamic data movement will be handled by the CUDA runtime.
Alternatively, you can do a “manual deep copy” where you walk the data structure and copy each dynamic data member. The “array_of_structs.c” example will give a very basic example on how to do this since structs and classes are very similar.
Finally, you can try using the new “true deep copy” beta feature that PGI recently added. This is a test implementation for a future OpenACC standard so some syntax may change, but not by much. For details, please see: https://www.pgroup.com/blogs/posts/deep-copy-beta.htm
Hope this helps,