Can CUDA properly handle pure virtual classes?

In my program, I have a base class called “Geometry” which contains some pure virtual functions in it and one implementation of the class called “Sphere.” Whenever I run my program, the kernel never completes because “an illegal memory access was encountered.” Stepping through with the CUDA debugger, I’ve found that the crashing occurs when I call the pure virtual functions (accessed through a Geometry pointer). Is this because CUDA cannot properly handle vtables and is attempting to call a null function?

EDIT : The pointer to my sphere on the GPU shouldn’t be bad because I verified that it was copied correctly. However, whenever I add the Geometry pointer to the watch window in the debugger the following error message is shown: “Condition(false) in method: Nvda.CppExpressions.FrontEnd.TypeNodeConvertSymbolicsNodeToTypeChain(Nvda.Symbolics.ISymbolicsNode) device const _ZN3rex8GeometryE* device”. This is what is causing me to believe it is a problem with the pure virtual function call.

Are you creating these objects on the host and copying them to the device?

If so you may be running into this:

"It is not allowed to pass as an argument to a global function an object of a class with virtual functions. "

The reason is that if you instantiate the object on the host, then the virtual function table gets populated with host pointers. When you copy this object to the device, these host-pointers become meaningless.

Just speculating. A more definitive answer might be possible if you show a simple, complete test-case/example.

If you create the objects on the device (you can still configure and populate the objects with data passed from the host) then virtual functions (and polymorphism) should not be a problem.

Yup, that’s exactly what I’m doing. I knew that I was copying the host vtable pointer, but I completely overlooked that the pointers in the table would be pointing to things in completely separate address spaces. Thank you!

I have done that in the past and found it to be very slow compared to a less elegant switch case and union structures for my situation. That was back on Fermi, so perhaps things got better since then.

I changed all of my virtual code to run only on the device and made an initialization kernel and then everything worked. It was a slight pain to do it, but it all works.