I am trying to get the hang of cudaMemcpy3D but it just really won’t work with me.
Here’s the code that I am using. Copying some linear data from the device to a 3D array on the device
/* some setups beforehand */
cudaChannelFormatDesc channelDescCoeff = cudaCreateChannelDesc<float>();
cudaExtent extent = make_cudaExtent(X, Y, Z);
/* create arrays, memory */
cudaArray* dst;
cudaMalloc3DArray(&dst, &channelDescCoeff, extent);
float* src;
cudaMalloc((void**)&src, extent.width*extent.height*extent.depth*sizeof(float));
/* do something to src */
/* let's do the 3D copy */
cudaMemcpy3DParms copyParams = {0};
copyParams.extent = extent;
copyParams.kind = cudaMemcpyDeviceToDevice;
copyParams.dstArray = dst;
copyParams.srcPtr = make_cudaPitchedPtr(src, extent.width * sizeof(float), extent.width, extent.height);
cudaMemcpy3D(©Params);
Now, this is looking good no? It also works perfectly if X,Y,Z = [32, 32, 32]. However changing to let’s say [33, 33, 33] crashes in cudaMemcpy3D, “invalid value”. I am totally confused, so how does this work? What am I missing and how can I get this to work with arbitrary X,Y,Z where src is a linear deviceptr?
I don’t have to mention that the same but using cudaMemcpyHostToDevice works, but then I have to allocate memory on the host, do calculations, copy back to host and copy “back” to GPU… :s