Hi! I’m playing around with unified memory, but I believe I’m overlooking something.
I have a pure virtual class called Hitable and a derived class called Sphere. Hitable is itself derived from a class called Managed, as described here. So, I basically have Managed → Hitable → Sphere. Here’s the code for my classes and test kernel:
class Managed {
public:
void *operator new(size_t len) {
void *ptr;
cudaMallocManaged(&ptr, len);
cudaDeviceSynchronize();
return ptr;
}
void operator delete(void *ptr) {
cudaDeviceSynchronize();
cudaFree(ptr);
}
};
class Hitable : public Managed {
public:
Hitable(int index) : index(index) {}
__host__ __device__ virtual int sum() const = 0;
int index;
};
class Sphere : public Hitable {
public:
Sphere() : Hitable(0) {}
Sphere(float radius, int index = 0) : radius(radius), Hitable(index){};
__host__ __device__ virtual int sum() const { return 45; }
float radius;
};
__global__ void test(Hitable **hitable) {
printf("Device Test\n");
printf("%d\n", (*hitable)->index);
printf("%d\n\n", (*hitable)->sum());
}
int main() {
Hitable **hitable;
checkCudaErrors(cudaMallocManaged(&hitable, sizeof(*hitable)));
*hitable = new Sphere(1.f);
printf("Host Test\n");
printf("%d\n", (*hitable)->index);
printf("%d\n\n", (*hitable)->sum());
test<<<1, 1>>>(hitable);
checkCudaErrors(cudaGetLastError());
checkCudaErrors(cudaDeviceSynchronize());
system("PAUSE");
return 0;
}
The output I get from the code above is:
Host Test
0
45
Device Test
0
CUDA error = 77 at main.cu:357 'cudaDeviceSynchronize()'
(edit: line 357 on my file refers to the cudaDeviceSynchronize right after the kernel call)
So, I can access both the variable and the function on the host side, and I can access the index variable on the device side, but as soon as I try to use the sum() function on a kernel, it throws an Error 77, which means I made an illegal memory access. This makes me wonder:
a) What exactly am I missing or overlooking to make it work?
b) With my current code, why can I access the index variable but not a function like sum()? I would expect both to fail.
I’m not sure if it’s relevant, but I’m on a Windows 10 machine, with a GeForce 1060, compiling on compute/sm 60.