I have an involved application, which runs perfectly with driver 256.35 (Linux FC10, CUDA 3.1, GTX480).
Upon upgrade to the driver 260.19.21, I found out that a function “cudaMemcpy3DAsync” produces an error code “invalid argument”. Upgrade of CUDA to 3.2 does not change the issue.
Some more details on the set up of the call to “cudaMemcpy3DAsync”:
- copyParams.dstPtr - 3D array of size (COL x ROW x DEPTH)
copyParams.dstPos.x = 0
copyParams.dstPos.y = 0
copyParams.dstPos.z = z < DEPTH
- copyParams.srcPtr = make_cudaPitchedPtr((void*)data_ptr,COLsizeof(float),COL,ROW);
data_ptr - pointer to GPU memory. Size of memory allocated an initialized is sufficient for the transfer
copyParams.srcPos.x = 0
copyParams.srcPos.y = 0
copyParams.srcPos.z = 0
- copyParams.extent = make_cudaExtent(COL,ROW,10);
z + 10 < DEPTH
- copyParams.kind = cudaMemcpyDeviceToDevice
I found that 3 modifications of the code, each independently eliminates the error:
- Changes “cudaMemcpy3DAsync” to “cudaMemcpy3D”, preceeded by “cudaThreadSynchronize”. The code runs with no errors and produces correct result, at some performance penalty of course
- Changing: copyParams.extent = make_cudaExtent(COL,1,1);
The resulting transfer is not correct for me but the code runs without errors
- Changing the data_ptr to point to host memory and changing opyParams.kind = cudaMemcpyHostToDevice.
Again, this is not what I need, but shows that parameters of transfer are set correctly, code run with no errors
Therefor, the problem persists only when:
Device-to-device transfer AND memory block with 3 dimension each > 1 AND Async mode
I suspect that there is a bug with the driver 260.19.21. Could anyone from NVIDIA to address this ?