I need to copy (device to device) a big memory area allocated with cudaMallocPitch in a cudaArray after one of my kernels.
Unfortunately, the pitch seems to large (4640480) since cudaMemcpy2DToArray returns cudaErrorInvalidPitchValue (!)
What is the maximum value that Cuda allowed? And why ?
Now I try to use cudaMemcpyToArray and cudaMalloc but I obtain cudaErrorInvalidValue… What does it mean? Is there some paper talking about cudaError codes?
In the same way, cudaMallocArray( &pFooArray, &cf, 640*480, 1 ) doesn’t work but cudaMallocArray( &pFooArray, &cf, 640, 480 ) is OK.
So, where could I find the Cuda spec which will make me avoid to loose my time with such things (because there are not in the programming guide neither in the ref manual)
EDIT (second): I use a nvidia Tesla with 1.5 GB and I’am sure to not overload the GPU LMEM (!)