Copying objects to device with virtual functions

adam95 · November 7, 2017, 1:46am

Hi
I’m writing my first cuda program, and I’m kind of stuck.

The issue is I need to get a couple classes to GPU, that have virtual functions, I need to call on GPU.
It took me a while, but I figured out that I was crashing because I can’t simply copy those, because the pointer is in the object, which gets copied too.

To circumvent this, I created a kernel, that I call after every single copy, which recreates the object from the copied data.

template<typename T>
__global__ void fixVirtualPointers(T *other) {
	*other = T(other);
}

Unfortunately this didn’t work, I still crash at the same spot, even though that kernel executed properly beforehand.

How am I supposed to solve this?

Robert_Crovella · November 7, 2017, 5:17am

The limitation is pointed out in the programming guide.

One suggestion would be to create your objects on the device, rather than in host code. Whatever method you use to initialize the objects in host code could presumably be reimplemented as a device function. So pass your ordinary initialization data to the device. Create your objects on the device from raw data, similarly to how you are creating them on the host. I acknowledge that this doesn’t allow for easy interoperability of such objects between host and device code, but that is the nature of the limitation.

Another obvious suggestion of course, is to redesign your processing flow/approach so that copying of objects with virtual functions between host and device is not needed. The previous paragraph is just a specific possibility within the more general suggestion here.

Read beyond this point at your own risk.

Since the mechanics of class virtual functions are (AFAIK) implementation-specific, I hesitate to suggest another option. However, the following ugly mod to your fixup code seems to work for me:

template<typename T>
__global__ void fixVirtualPointers(T *other) {
        T temp =  T(*other);
        memcpy(other, &temp, sizeof(T));
}

this is based on my own experimentation that suggests that an ordinary object copy never copies the virtual function pointer table. A small modification to above (change global to host device) would probably allow for a “fixup” function that could work in either direction (H->D or D->H)

Someone may come along and say that this is a really bad idea. Use at your own risk. I’m reasonably sure this suggestion is exploring undefined behavior. The only sense of “correctness” I can attempt to ascribe to this code is that the difference between the host object and device object presumably only exists at the data level, not structural/organization/size level (otherwise a general H->D or D->H cudaMemcpy of an object would not work), and that this idea probably applies to the virtual function pointer table as well (thin ice here), and that the difference in virtual function pointers must be contained within the object itself, not external to the object (seems self-evident). Which is a bunch of hand-waving. YMMV.

adam95 · November 7, 2017, 9:30pm

Thanks, this seems to have solved the issue. I finally managed to get the code run successfully.
It crashes above a certain thread count for some reason, so I should probably solve this normally, just in case it is related to this.

adam95 · November 8, 2017, 6:13pm

I just realized that I am using a virtual function to copy objects to the device.
Is that an issue too, even if I don’t have a device equivalent, and as such I never call it?

Robert_Crovella · November 8, 2017, 6:23pm

If you never copy a particular object to the device, then there are no (CUDA-imposed) limitations on the usage of that object in host code.

virtual functions/polymorphism should work in host code exactly as you would expect, for objects created in host code. This should be more or less orthogonal to CUDA.

The concern arises, as stated in the documentation, when you copy an object created on the host to the device, or conversely, when you create an object on the device and then copy it to host code. In either of those cases, the virtual function pointer table in the object is no longer coherent.

After rereading your question, it seems that the above description is not what you were asking. If you are copying an object to the device, but you never invoke a virtual function (*), it should be OK. However this is a gray area, as the writing in the programming guide does not actually read this way. it simply says:

[url]Programming Guide :: CUDA Toolkit Documentation

“It is not allowed to pass as an argument to a global function an object of a class with virtual functions.”

I cannot abrogate the programming guide. If you want to follow the letter of the law, then do not do this, and I refer you to my original comment previously in this thread.

(*) It also seems to me, that for complex object manipulation code, guaranteeing this, the burden of which is entirely on the programmer, might be difficult or non-obvious or non-trivial.

adam95 · November 9, 2017, 2:20am

My issue is I’m trying to port a fairly complex c++ code over to GPU, and it doesn’t work, and I don’t know why.
I’m really just trying to make sure I didn’t misunderstand anything.

Though now it seems like I have issues with code not related to this, and now I just need to manually debug what I messed up.

Anyway, thanks for the help.

Topic		Replies	Views
Virtual funtions in kernels CUDA Programming and Performance	3	1800	May 2, 2012
Can CUDA properly handle pure virtual classes? CUDA Programming and Performance	4	8584	April 13, 2015
Consistency of functions pointer CUDA Programming and Performance	5	3067	June 21, 2013
How to move a pointer created in a kernel using "new" to the host? (for polymorphism) CUDA Programming and Performance	5	1435	February 7, 2020
noobie here. Virtual functions in Cuda 4.0 CUDA Programming and Performance	2	3186	January 31, 2012
cudaMemcpy Failing To Copy Variable From Device To Host Correctly CUDA Programming and Performance	3	2801	April 26, 2021
Using a __device__ function pointer. Problems using a pointer to a __device__ function. CUDA Programming and Performance	2	721	June 15, 2012
OOP Class Design with Device Variables CUDA Programming and Performance cuda	1	1376	August 5, 2021
Passing structures into CUDA kernels CUDA Programming and Performance	9	20333	November 19, 2020
C++ Integration Limitations on using CUDA with C++ CUDA Programming and Performance	8	4804	July 4, 2007

Copying objects to device with virtual functions

Related topics