memmory management for a Mesh processing algorithm

my question what is the best technique to handle global mesh processing algorithms
the problem is the changes must be realized in host data structures(i’m using CGAL half-edge)

taking into account that all meah would be processed , so all data must eventually be sent to device
i’m considering those options :

  1. converting the half-edge to point array(copy vertex data) , copy it to device , process it using kernel , send it back to host , overwrite the half-edge

  2. converting the half-edge to point array(copy vertex data) , use host ptr in device , overwrite half-edge

  3. create array of pointers to points , copy it to device , process data ,overwrite half-edge (that is reduce in memory copy , but device - memory communication for each work-item)

which one do u think better ? esp. considering meshes with millions of vertices