Hi, I’m pretty new to CUDA programming and I’m having a little bit of a problem trying to use cudaMemCpy correctly to copy a device array back to the host array.
This is the struct that I have:
typedef struct {
int width;
int height;
float* elements;
} Matrix;
So, the host allocates the Struct POINTER with “new” (it’s a C++ program).
Let’s say:
matrix.cpp
Matrix* C = NULL; // C = A * B
C = new Matrix;
C->height = A->height;
C->width = B->width;
C->elements = new float[C->height * C->width](); // initialize it all to '0'
in my matrix.cu file is where I’m having the problem. I think I got it to copy the Matrix to the Device correctly, but when the kernel execution is done, I want to store the result back in “C->elements”.
This is what I did in matrix.cu:
// Matrix* A, Matrix* B, Matrix* C is allocated in the .cpp file w/ new() and passed here
Matrix* hostMatrix = C; // let's say we're copying C to the device and then back
Matrix* deviceMatrix = NULL;
float* d_elements;
// allocate the deviceMatrix and d_elements
cudaMalloc(&deviceMatrix, sizeof(Matrix))
int size = hostMatrix->width * hostMatrix->height * sizeof(float);
cudaMalloc(&d_elements, size)
// copy each piece of data separately
cudaMemcpy(deviceMatrix, hostMatrix, sizeof(Matrix), cudaMemcpyHostToDevice)
cudaMemcpy(d_elements, hostMatrix->elements, size, cudaMemcpyHostToDevice)
cudaMemcpy(&(deviceMatrix->elements), &d_elements, sizeof(float*), cudaMemcpyHostToDevice)
// so far so good, no compilation errors from HOST -> DEVICE
// call kernel (let's say it changes deviceMatrix->elements)
...
// now I want to store the new elements to hostMatrix->elements (remember, this was allocated w/ new)
cudaMemcpy((hostMatrix->elements), (deviceMatrix->elements), size, cudaMemcpyDeviceToHost);
// SEGFAULT!!! Program received signal SIGSEGV, Segmentation fault.
// don't forget to free the device pointers
In the given snippet of code, how can I fix line 23 such that I’ll use cudaMemcpy correctly to copy the deviceMatrix->elements correctly to hostMatrix->elements ?
Thanks