copying a cuda float4 array back to host using memcpy

Hello everyone,

I have a cuda float4 array on the device that I would like to copy back to my object in the host using memcpy.

So, do I first have to first use cudamemcpy with cudaMemcpyDeviceToHost to get it into a host array and then iterate over this array and copy the values or is their a faster and more efficient way to do this?

Many thanks for your help.

Cheers,
xarg

Why would you need to iterate over the values? Just copy it back into a float4 array on the host. If you wanted to, you could probably type cast the resulting float4* to a float (*)[4] on the host, but I haven’t verified this. Doing so would allow you to, from the host, access the defuault return type by index instead of by field name.

const size_t floatCount(20);

const size_t memoryReq(floatCount * sizeof(float4));

float4* hostDataFloat4 = new float4[floatCount];

float (*hostDataArray)[4] = new float[4][floatCount];

float4* deviceData;

cudaMalloc((void**)&deviceData, memoryReq);

...

cudaMemcpy(hostDataFloat4, deviceData, memoryReq, cudaMemcpyDeviceToHost);

cudaMemcpy(hostDataArray, deviceData, memoryReq, cudaMemcpyDeviceToHost);

cudaFree(deviceData);

...

// Both of the following should be equivalent, I think (again, I haven't tested this code)

float xVal1 = hostDataFloat4[0].x;

float xVal2 = hostDataArray[0][0];

...

delete[] hostDataFloat4;

delete[] hostDataArray;

Hope this helps.