Problem with copying a 2D array

I am currently working on my first CUDA project and ran into a problem with 2D arrays. Consider the following code snippet:

cudaStatus = cudaMallocPitch(&dev_traces, &pitch, N * sizeof(int), M);
cudaStatus = cudaMemcpy2D(dev_traces, pitch, traces, N * sizeof(int), N * sizeof(int), M, cudaMemcpyHostToDevice);

traces has the dimensions N and M, which should be both 10000. However, if I set N=M=480 it seems to work fine. But as soon as I use bigger values for N and M (,e.g., N=M=500) it crashes with an access violation while reading in cudaMemcpy2D().

Initialization of traces:

int **traces;
traces = new int *[POINTS_PER_TRACE];
for (int i = 0; i < POINTS_PER_TRACE; i++)
     traces[i] = new int[NUMBER_OF_TRACES];

Do you have any idea what could be wrong here?

cudaMemcpy2D() expects an array that is stored contiguously as either linear memory or pitch-linear memory. But your host data structure is an array of pointers to column (or row) vectors which could be located anywhere in memory.

There are basically two ways to address this mismatch in data structure representation: (1) Change the host side allocation to a single contiguous allocation of (row * column) elements (2) Use one-dimensional cudaMemcpy() to copy each individual column (or row) vector to the device. In terms of performance, the first option is preferable.

Thank you, njuffa! I chose the first option and it solved the problem.