cudaMemcpy3D invalid argument

Hi,

When I run the following code I get an invalid argument error at the line:

cutilSafeCall(cudaMemcpy3D(&copyParams2) );

What am I doing wrong?

Here is the code:

[codebox]

const size_t nx = 541;

const size_t ny = 541;

const size_t nz = 114;

const size_t pad = 4;

dim3 dimBlock(16, 16, 1);

dim3 dimGrid(((nx-2*pad)*(ny-2*pad) -1)/ dimBlock.x+1, (nz-1)/ dimBlock.y+1, 1);

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

const cudaExtent textSize = make_cudaExtent(nx, ny, nz+2*pad);

cutilSafeCall( cudaMalloc3DArray(&d_volumeArray, &channelDesc, textSize) );

float* d_res;

cutilSafeCall(cudaMalloc((void**)&d_res, (nz+2*pad)*nx*ny*sizeof(float)));

cutilSafeCall(cudaMemset(d_res, 0, nx*ny*(nz+2*pad)*sizeof(float)));

setValKernel<<< dimGrid, dimBlock, 0 >>>(nx, ny, nz, d_res);

cutilSafeCall(cudaThreadSynchronize());

 cudaMemcpy3DParms copyParams2 = {0};

copyParams2.srcPtr.ptr   = d_res;

copyParams2.srcPtr.pitch = nx*sizeof(float);

copyParams2.srcPtr.xsize = nx;

copyParams2.srcPtr.ysize = ny;

copyParams2.dstArray = d_volumeArray;

copyParams2.extent   = textSize;

copyParams2.kind     = cudaMemcpyDeviceToDevice;

cutilSafeCall(cudaMemcpy3D(&copyParams2) );

[/codebox]

here are my device properties:

Clock rate=1296000

Device overlap=1

Major=1

Minor=3

Max grid size=65535 65535 1

maxThreadsDim=512 512 64

maxThreadsPerBlock=512

memPitch=262144

multiProcessorCount=30

name=Tesla C1060

regsPerBlock=16384

sharedMemPerBlock=16384

textureAlignment=256

totalConstMem=65536

totalGlobalMem=4294705152

warpSize=32

Thanks for the help.

sd.

Edit, Sorry I misread your code (I hate those code boxes with scroll bars), but I am going to guess that because d_res is allocated via cudaMalloc rather than cudaMalloc3D, that you should pass it as a naked pointer, rather than in a cudaPitchedPtr.

I forgot to say that everything is fine in emulation mode.

Thanks for your answer.

I can only use a cudaPitchedPtr or cudaArray in cudaMemcpy3DParms.

Apparently, cudaMemcpy3D requires that the src and dst memory be aligned.
The requirements are not clear yet. The src or dst memory must therefore be allocated using cudaMallocPitch or cudaMalloc3D.
Hope it helps anybody who had the same problem.

I have tried aligning the data on the host side with align(64)
The two functions you refer to allocate memory on the device side only, for which I use
cudaMalloc3DArray. Is there a similar function to use for the host side?

MW