Clean way of copying a struct with pointers to the GPU

Goubermouche · August 23, 2022, 3:35pm

I have a simple struct that contains other structs that contain pointers to some data (note that I’m only including the relevant stuff):

template<class T>
struct MyArray {
    T* data;
    int elementCount;
}

struct Wrapper {
    MyArray<float> arrayA;
    MyArray<float> arrayB;
}

First I initialize my Wrapper struct (host) and then copy it over to my device:

Wrapper wrapperHost;
// initialize the arrays (fill data arrays with '5.0f')and other related stuff
COMPUTE_SAFE(cudaMemcpyToSymbol(wrapperDevice, &wrapperHost, sizeof(Wrapper))); // Works fine

Device-side code looks like this:

__device__ Wrapper wrapperDevice; 

__global__ void TestKernel(){
    printf("%.2f", wrapperDevice.arrayA.data[0]); // Target result: 5.00
}

When I run my kernel I get a memory access violation error, which is expected since the data pointers point to host memory (I think) - now my question is: how can I copy the Wrapper struct with its arrays over to my device?

Thanks in advance, also note that this was written in-browser, so there’s a good chance that there are typos/minor errors in the code examples.

Robert_Crovella · August 23, 2022, 3:51pm

This is a frequently asked question. It involves a “deep copy”.

There are at least dozens of questions like it already on various forums.

Here is one example.

Here is one possible approach using your code:

$ cat t2102.cu
#include <cstdio>
#define COMPUTE_SAFE(x) x
template<class T>
struct MyArray {
    T* data;
    int elementCount;
};

struct Wrapper {
    MyArray<float> arrayA;
    MyArray<float> arrayB;
};

__device__ Wrapper wrapperDevice;

__global__ void TestKernel(){
    printf("%.2f\n", wrapperDevice.arrayA.data[0]); // Target result: 5.00
}

int main(){

  Wrapper wrapperHost;
  // initialize the arrays (fill data arrays with '5.0f')and other related stuff
  const int ds = 1;
  wrapperHost.arrayA.data = new float[ds];
  wrapperHost.arrayA.data[0] = 5.0f;
  wrapperHost.arrayA.elementCount = ds;
  Wrapper wrapperDeep;
  cudaMalloc(&(wrapperDeep.arrayA.data), wrapperHost.arrayA.elementCount*sizeof(wrapperHost.arrayA.data[0]));
  wrapperDeep.arrayA.elementCount = wrapperHost.arrayA.elementCount;
  cudaMemcpy(wrapperDeep.arrayA.data, wrapperHost.arrayA.data, wrapperHost.arrayA.elementCount*sizeof(wrapperHost.arrayA.data[0]), cudaMemcpyHostToDevice);
  COMPUTE_SAFE(cudaMemcpyToSymbol(wrapperDevice, &wrapperDeep, sizeof(Wrapper)));
  TestKernel<<<1,1>>>();
  cudaDeviceSynchronize();
}

$ nvcc -o t2102 t2102.cu
$ compute-sanitizer ./t2102
========= COMPUTE-SANITIZER
5.00
========= ERROR SUMMARY: 0 errors
$

It should be evident that I’ve only done the “deep copy work” for arrayA. You would need to handle it similarly for arrayB.

Goubermouche · August 24, 2022, 4:55am

Perfect, this is what I was looking for! I guess I just searched for the wrong terms :D . Thanks!

cbuchner1 · August 25, 2022, 10:49am

An alternative that does not require deep copies is to use Managed memory for any data structure you need to access on the GPU. Then the CPU and GPU share the same address space, and the driver handles copying of data chunks between host and device on an as-needed basis. Deep copying won’t be needed as pointers remain valid.

This solution may require the use of custom allocators for non-POD objects.

Christian

Goubermouche · August 26, 2022, 5:40am

Thanks Christian, I’ll check it out!

Goubermouche · August 28, 2022, 1:13pm

hey, I know this isn’t really related to the question, but would you mind showing me a way of copying the data back from the symbol to host? I’ve been trying to do it myself, but I’m having trouble getting it working.

This is one of my attempts at making it work (I’m also copying only 1 array for now):

Wrapper out;
cudaMemcpy(&out.arrayA.data, deep.arrayA.data, host.arrayA.elementCount * sizeof(host.arrayA.data[0]), cudaMemcpyDeviceToHost);
COMPUTE_SAFE(cudaMemcpyFromSymbol(&out, wrapperDevice, sizeof(void*), 0, cudaMemcpyDeviceToHost));
printf("%.2f, out.arrayA.data[0])

But this causes an Access violation reading location exception.
Thanks in advance.

Robert_Crovella · August 28, 2022, 9:47pm

Wrapper out;

does provide storage for pointers, but doesn’t provide any allocation for them. There is no allocated space to copy an array of data to. Furthermore:

cudaMemcpy(&out.arrayA.data,

that doesn’t make any sense. You’re attempting to copy an array of data to the address of where a (unallocated) pointer is stored.

first provide a proper host allocation/storage area for the output data, then copy the device data to it:

Wrapper out;
out.arrayA.elementCount = host.arrayA.elementCount;
out.arrayA.data = new float[out.arrayA.elementCount];
cudaMemcpy(out.arrayA.data, deep.arrayA.data, out.arrayA.elementCount * sizeof(out.arrayA.data[0]), cudaMemcpyDeviceToHost);
for (int i = 0; i < out.arrayA.elementCount; i++) std::cout << out.arrayA.data[i] << std::endl;

Goubermouche · August 29, 2022, 7:24am

I don’t know how I missed that. Thanks a bunch.

system · September 12, 2022, 7:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Complex structs in CUDA CUDA Programming and Performance	5	956	May 7, 2024
Accessing pointer values inside struct copied to CUDA device CUDA Programming and Performance cuda , kernel	2	370	April 24, 2024
Struct CUDA Programming and Performance	11	1741	June 30, 2016
structs containing pointers CUDA Programming and Performance	1	811	January 14, 2013
How to copy struct with a dynamic array in it to the Device Ram CUDA Programming and Performance	2	970	January 14, 2010
How to copy Device Struct with pointers to Host? CUDA Programming and Performance	10	8432	July 8, 2014
GPU struct allocation CUDA Programming and Performance	1	6956	May 5, 2016
Copy entire struct to, and from the GPU Container: CUDA cuda , gpu	2	760	November 27, 2022
Transfering struct with pointers to device memory Used for variable argument list CUDA Programming and Performance	11	27100	January 19, 2011
Copying from array to array on the device CUDA Programming and Performance	2	1401	March 18, 2010

Clean way of copying a struct with pointers to the GPU

Related topics