Thank you very much for your quick reply! I was aware that cudaMallocManaged differs from “new” in that the data member of the class object is not initialized, but I did not realized this issue causes the segfault error when virtual function is called even only on host.
The excerpt from “Programming Guide :: CUDA Toolkit Documentation” that confuses me is this:
“Starting with CUDA 8.0 and on supporting systems with devices of compute capability 6.x, memory allocated with the default OS allocator (e.g. malloc or new) can be accessed from both GPU code and CPU code using the same pointer. On these systems, Unified Memory is the default: there is no need to use a special allocator or the creation of a specially managed memory pool.”
Now I see that I misunderstood it. Thank you for pointing out.
Since we have reached this topic, I have another question regarding the difference between cudaMallocManaged and new. Here is another code snippet:
class Simpclass {
private:
int m = 8;
public:
__device__ __host__ int get_num() {
return m;
}
};
__global__
void simp_kernel_func(Simpclass *p) {
printf("kernel func called\n");
printf("m = %d\n", p->get_num());
printf("m should have been displayed above\n");
return;
}
int main() {
Simpclass *p;
p = new Simpclass();
simp_kernel_func<<<1, 1 >>> (p);
cudaDeviceSynchronize();
cout << "after kernel: " << p->get_num() << endl;
delete p;
return 0;
}
I was hoping the result would be something like this:
kernel func called
m = 8
m should have been displayed above
after kernel: 8
However, what I got is:
kernel func called
after kernel: 8
Apparently, the kernel function terminated early without giving out any warning message. If I replace “new” with “cudaMallocManaged”, and replace “delete” with “cudaFree”, I get the result as this:
kernel func called
m = 0
m should have been displayed above
after kernel: 0
This result is expected, since cudaMallocManaged does not initialize the data member (i.e. int m = 8;) of the class object. But I do not understand why the kernel function aborts early when “new” is used to allocate the unified memory.
Can you please help? Thank you.
P.S. Thank you for reminding me of the CUDA restrictions on virtual functions. So if I pass a pointer to a class object to the kernel function, but do not invoke the virtual function on device, is that still allowed?