I’ve encountered strange problem in simple situation - program randomly hangs on cudaMemcpy() function. Here is the code:
const int width = 256;
const int height = 256;
int main()
{
float** A = new float*[width];
int i = 0;
int j = 0;
for (i = 0; i < width; i++)
{
A[i] = new float[height];
for (j = 0; j < height; j++)
A[i][j] = 2.0;
}
float** d_A = new float*[width];
size_t size = width * height * sizeof(float);
cudaMalloc(&d_A, size);
cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice);
}
This code correctly compiles but execution result is unpredictable - sometimes it executes correctly but sometimes it crashes with c0000005 exception code. Can someone tell me what’s wrong ? Used configuration: Core i7 930@4200MHz, Eclipse SLI X58, 6Gb RAM (3x2 Kingston 1600MHz), GTX470 x 2 SLI, Windows 7 x64 sp1, MS VS 2008, Cuda SDK 3.2, ForceWare 266.58 and 263.06(Developer driver).
Yes. And once you are done recursively allocating the memory, you will have to do the same recursion to populate the memory with data from the host, and then the same recursion to copy the data back from the device to the host. Which is a lot a recursion, and why using indexing into linear memory rather than pointers for multidimensional data is usually a better choice.