Issues with Classes and Unified Memory

Hi! I’m playing around with unified memory, but I believe I’m overlooking something.

I have a pure virtual class called Hitable and a derived class called Sphere. Hitable is itself derived from a class called Managed, as described here. So, I basically have Managed -> Hitable -> Sphere. Here’s the code for my classes and test kernel:

class Managed {
 public:
  void *operator new(size_t len) {
    void *ptr;
    cudaMallocManaged(&ptr, len);
    cudaDeviceSynchronize();
    return ptr;
  }

  void operator delete(void *ptr) {
    cudaDeviceSynchronize();
    cudaFree(ptr);
  }
};

class Hitable : public Managed {
 public:
  Hitable(int index) : index(index) {}

  __host__ __device__ virtual int sum() const = 0;

  int index;
};

class Sphere : public Hitable {
 public:
  Sphere() : Hitable(0) {}

  Sphere(float radius, int index = 0) : radius(radius), Hitable(index){};

  __host__ __device__ virtual int sum() const { return 45; }

  float radius;
};

__global__ void test(Hitable **hitable) {
  printf("Device Test\n");
  printf("%d\n", (*hitable)->index);
  printf("%d\n\n", (*hitable)->sum());
}

int main() {
  Hitable **hitable;
  checkCudaErrors(cudaMallocManaged(&hitable, sizeof(*hitable)));
  *hitable = new Sphere(1.f);

  printf("Host Test\n");
  printf("%d\n", (*hitable)->index);
  printf("%d\n\n", (*hitable)->sum());

  test<<<1, 1>>>(hitable);
  checkCudaErrors(cudaGetLastError());
  checkCudaErrors(cudaDeviceSynchronize());

  system("PAUSE");

  return 0;
}

The output I get from the code above is:

Host Test
0
45

Device Test
0
CUDA error = 77 at main.cu:357 'cudaDeviceSynchronize()'

(edit: line 357 on my file refers to the cudaDeviceSynchronize right after the kernel call)

So, I can access both the variable and the function on the host side, and I can access the index variable on the device side, but as soon as I try to use the sum() function on a kernel, it throws an Error 77, which means I made an illegal memory access. This makes me wonder:

a) What exactly am I missing or overlooking to make it work?
b) With my current code, why can I access the index variable but not a function like sum()? I would expect both to fail.

I’m not sure if it’s relevant, but I’m on a Windows 10 machine, with a GeForce 1060, compiling on compute/sm 60.

When passing objects from host to device, the use of virtual functions is not supported. This is an enumerated limitation in the programming guide:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#virtual-functions

The virtual function table for an object initialized on the host will contain host function pointers, which are not usable in device code. This is the reason for the failure in using your sum() function. The object itself is accessible, but attempts to call a function from the virtual function pointer table will fail.

Instead, the programming guide suggests a possible workaround - for objects you intend to use in device code, create/instantiate/initialize those objects in device code.

This mechanism plays a role in object polymorphism, so if you do a google search on “cuda polymorphism” or “cuda virtual functions” you’ll find various questions/answers on the web discussing this and possible alternative realizations.

Note that GeForce 1060 is a cc6.1 device not a cc6.0 device. It’s OK to target compute/sm60 if you wish, but a precise targetting would be compute/sm61