I encountered something a bit curious. I used to declare an array like this:
channelDesc = cudaCreateChannelDesc(32, 32, 0, 0, cudaChannelFormatKindSigned);
cudaMallocArray(&array_gpu, &channelDesc, Size1, Size2);
cudaMemcpyToArray(array_gpu, 0, 0, Array, 2*Size1*Size2*sizeof(int), cudaMemcpyHostToDevice);
With my model size increasing, I’ve reached the height limit of 32 768 for Size2. So I got one ‘invalid argument’ after the cudaMallocArray. Fair enough, I knew it, so I decided to swap width and height (since Size1 is much smaller). So now I have this:
channelDesc = cudaCreateChannelDesc(32, 32, 0, 0, cudaChannelFormatKindSigned);
cudaMallocArray(&array_gpu, &channelDesc, Size2, Size1);
cudaMemcpyToArray(array_gpu, 0, 0, Array, 2*Size1*Size2*sizeof(int), cudaMemcpyHostToDevice);
No error anymore after cudaMallocArray as expected. But this time I got one ‘Invalid argument’ for cudaMemcpyAToArray… Any ideas? I might not see something obvious I don’t know, but swapping width and height doesn’t change the size of the array and therefore shouldn’t change anything to the cudaMemcpyToArray. No?
FYI, Array is allocated on CPU like this:
int * Array = new int[2*Size1*Size2];