I have a cuda float4 array on the device that I would like to copy back to my object in the host using memcpy.
So, do I first have to first use cudamemcpy with cudaMemcpyDeviceToHost to get it into a host array and then iterate over this array and copy the values or is their a faster and more efficient way to do this?
Why would you need to iterate over the values? Just copy it back into a float4 array on the host. If you wanted to, you could probably type cast the resulting float4* to a float (*) on the host, but I haven’t verified this. Doing so would allow you to, from the host, access the defuault return type by index instead of by field name.
const size_t floatCount(20);
const size_t memoryReq(floatCount * sizeof(float4));
float4* hostDataFloat4 = new float4[floatCount];
float (*hostDataArray) = new float[floatCount];
cudaMemcpy(hostDataFloat4, deviceData, memoryReq, cudaMemcpyDeviceToHost);
cudaMemcpy(hostDataArray, deviceData, memoryReq, cudaMemcpyDeviceToHost);
// Both of the following should be equivalent, I think (again, I haven't tested this code)
float xVal1 = hostDataFloat4.x;
float xVal2 = hostDataArray;