Dear All, when practicing CUDA, I have some problems with this code.

```
//A, B are matrix(NxN)
void testCUDA(float** A, float** B, int N){
float *d_A;
size_t pitch_A;
size_t col_size_A = N * sizeof(float);
cudaMallocPitch(&d_A, &pitch_A, col_size_A, N);
for (int i = 0; i < N; i++)
cudaMemcpy((char*)d_A + i*pitch_A, A[i], col_size_A, cudaMemcpyHostToDevice);
for (int i = 0; i < N; i++)
cudaMemcpy(B[i], (char*)d_A + i*pitch_A, col_size_A, cudaMemcpyDeviceToHost);
}
```

After testCUDA(A,B,N), matrix B is same matrix A.

This code is OK until N > 1000. When I try N > 1000, matrix B is not same matrix A. I don’t know what happen. Please give me some advices. Thank All!