OOP Class Design with Device Variables

I’m writing a class with pointer variables to device memory. The class is a copy of an existing class on the host, except it runs the main computational function on the GPU. It looks something like this:

class GPUClass {
public:
   GPUClass() {
      cudaMalloc(&device_ptr1,...)
      cudaMalloc(&device_ptr2,...)
      ...
   }

   void GPUComputeFunction(...) {...}

   ~GPUClass() {
      cudaFree(device_ptr1)
      cudaFree(device_ptr2)
      ...
   }

private:
   void* device_ptr1;
   void* device_ptr2;
   ...
}

Here, I allocate the device memory in the class constructor and de-allocate it in the destructor. I’m wondering if there are any pitfalls to this approach, or if there are better ways to do it?

  1. If you pass an object of this class to a device kernel using pass-by-value semantics, an object-copy will be made as part of C++ pass-by-value semantics. At the completion of the function (i.e. kernel) call, the object copy destructor will get called. Think about the implications of that carefully. It probably would mess you up.

  2. If you have an object of this class declared at global scope, the constructor/destructor can get called outside of main. This is frowned on, and as your application is quitting you may get an error returned from the destructor (for example if you ran compute-sanitizer).

You can find questions on various forums pertaining to both of these issues that have bitten people.

So putting CUDA calls in the destructor is often troublesome. In short, don’t do that. One possible approach is to create and use specific object initialize/deinit methods that you call manually.

If you want to follow a high-quality C++ approach, you might wish to study thrust, although its not for the faint of heart, or just use thrust.