Non-POD class copy CUDA to CPU

I have a question regarding non-POD type classes being copied from GPU to CPU.

According to the C++ standard, non-POD classes contain any of the following (not fully inclusive list):

  • Virtuals
  • destructors
  • constructors

My class that I have allocated on the GPU memory DOES include inheritance, and is therefore non-POD.

The question is: Is it not safe to cudaMemCopy this class from GPU to CPU given that it is not POD compliant (or is cudaMemCopy more smart than that?). If so, is there a standard way to handle this situation?

The best I can think of is implementing the copy constructor to copy from “this” to the “other”. Then from the user standpoint, the user must allocated a cpu instantiated class and use the copy constructor of the gpu instantiated class to copy over the members to the cpu class. Then internally, the copy function must deduce the “other”'s memory location and if its on host, it must cudaMemCopy each of its members to the CPU side.

(objects of) classes can be copied and passed between host and device. Full stop. CUDA aims to be a nearly complete implementation of C++, subject to various stated restrictions.

In case that is not clear, your class is safe to copy from host to device, or device to host, subject to the following provisos/limitations/restrictions:

  • classes can be passed by-value or by-pointer. Passing a class by-value is performed the way you would pass a class by value to any function, as an argument to that function (i.e. kernel). But see note below. Passing by pointer would involve the usual cudaMemcpy techniques, coupled with pointer arguments to the kernel, similar to POD
  • if you want a constructor or destructor (or any other class method) to be usable in device code, you had better decorate it and design it appropriately
  • for classes that contain virtual methods, if an object of that class is instantiated in host code, the virtual methods should not be called from device code, or vice-versa. This is a stated limitation in the programming guide. I’m offering a slightly looser description compared to what is printed. If you want a strict compliance to the doc statements, do not pass objects with virtual methods between host/device.
  • Like any other pass-by-value situation, an object-copy will be made for use by the function called. This means that the constructor and destructor will be called. This will trip up naive constructor/destructor designs, and there are many questions on various forums describing those situations.
  • For classes that have explicit constructors/destructors, my personal suggestion is not to instantiate objects of those classes at global scope, if you intend to use CUDA API calls in the constructor or destructor. There are available descriptions that cover the general associated hazards with that.
  • Pass by reference is generally tricky in CUDA, not uniquely or specifically to classes. I generally don’t recommend it, and if you are not using managed memory techniques properly/carefully, it is not supported at all to use pass-by-reference for arguments to a CUDA kernel.