Unified types would allow you to describe your data structure just once.
The unified types allocate cuda and host memory at the same time.
Providing this functionality on the host side can/could be done with c++ advanced features.
The cuda side can just have it’s own data structure as normal.
For now if you do not have unified types available the only thing you can do is write double code as follows:
And so forth…
Now you just need code to copy back and forth between these two.
CopyToDevice( MyHostField, MyCudaField );
CopyFromDevice( MyHostField, MyCudaField );
CopyToDevice( MyHostPointer, MyCudaPointer );
CopyFromDevice( MyHostPointer, MyCudaPointer );
// however these two statements above are not necessary since you should do:
AllocOnHost( MyHostPointer, MySize );
AllocOnCuda( MyCudaPointer, MySize );
This way your data structure is created, allocated and initialized the same way on host and device side.
Now you just need to pass pointers.
This could be done in a pointer passing structure as well just to make sure limit is not hit.
Parameter = &MyStructure;
Parameter = &MyStructure.MyCudaPointer; // same as .MyCudaArray;
Then pass this parameter to kernel
KernelLaunch( ParameterStructure );
Finally inside cuda kernel initialize data structure one last time just to be sure… if you need pointers like that
MyStructure = Parameter
MyStructure.MyCudaPointer = Parameter;
^ All of these fields would need to be pointers.
This step could be left out if you sure data structure on device has same layout as on host… as it was allocated…
Then only thing necessary is copy cuda pointers to device.
So if you structure contains cuda data types/pointers only this might work, in reality it will probably not work, since some kernel parameters need to be host side, though this could be worked around somewhat by passing cuda parameters only…
This should give you some idea…
If this is too difficult for you to understand then here is an alternative easier solution but requires more programming effort for you every time:
LEt’s see if I understand correctly first:
You have host structure with a host pointer which points to another host array.
You have device structure with a device pointer which points to another device array.
You want the device pointer to be initialized properly.
What you would need to do is apply pointer arithmetic on the cuda pointer which is the device structure pointer.
According to others cuda pointer arithmetic is possible on cuda pointers on the host side.
This requires to know the offset of the fields compared to the base address of the structures.
However if the size of the fields of the structures are the same on host and device side then all that is necessary is to lay the structure on top of each other.
So what you can do is quite simply:
You describe you data structure just once like so:
Then you allocate the structure on host side.
Then you allocate the structure on device side.
Then you allocate the array on host side.
Then you allocate the array on cuda side.
Ok you do it a little bit different host side first cuda side then but that don’t matter.
The point is where do you store these pointers returned from the cuda malloc calls.
All you need to do is this:
TypecastCudaPointerToStructureToMyDataStructure( CudaPointerToStructure ).MyArray = CudaArrayPointer;
So long story short your code needs to look something like:
myStruct_h( myStruct_d )->ptr = myArray_d;
The only little problem is with ptr… it’s pointing to a host data type.
But this doesn’t matter if the pointers are of the same size.
So you should first look into if your kernel is using 32 bit and your host code 32 bit, or if it’s both 64 bit or mixed bit.
If it’s the same pointer size, then this technique will work.
You can change all your pointer types to void * or some other more “unified looking type”.
How to do these typecasts exactly you should be able to figure that out yourself… not gonna write that entire boring code but it’s something like this:
If using a nice pointer type:
PDataStructure( my structt )
Else you’ll have to use those nasty asterixes:
(*myStruct_h)(my_struct_d)->ptr = etc;