cudaMemcpy3D device2device? always invalid value error is returned

I am trying to get the hang of cudaMemcpy3D but it just really won’t work with me.

Here’s the code that I am using. Copying some linear data from the device to a 3D array on the device

/* some setups beforehand */

	cudaChannelFormatDesc channelDescCoeff = cudaCreateChannelDesc<float>();

	cudaExtent extent = make_cudaExtent(X, Y, Z);

	/* create arrays, memory */

	cudaArray* dst;

	cudaMalloc3DArray(&dst, &channelDescCoeff, extent);

	float* src;

	cudaMalloc((void**)&src, extent.width*extent.height*extent.depth*sizeof(float));

	/* do something to src */

	/* let's do the 3D copy */

	cudaMemcpy3DParms copyParams = {0};

	copyParams.extent   = extent;

	copyParams.kind	 = cudaMemcpyDeviceToDevice;

	copyParams.dstArray = dst;

	copyParams.srcPtr   = make_cudaPitchedPtr(src, extent.width * sizeof(float), extent.width, extent.height);


Now, this is looking good no? It also works perfectly if X,Y,Z = [32, 32, 32]. However changing to let’s say [33, 33, 33] crashes in cudaMemcpy3D, “invalid value”. I am totally confused, so how does this work? What am I missing and how can I get this to work with arbitrary X,Y,Z where src is a linear deviceptr?

I don’t have to mention that the same but using cudaMemcpyHostToDevice works, but then I have to allocate memory on the host, do calculations, copy back to host and copy “back” to GPU… :s