Application randomly hangs on cudaMemcpy()

Hi everyone!

I’ve encountered strange problem in simple situation - program randomly hangs on cudaMemcpy() function. Here is the code:

const int width = 256;

const int height = 256;

int main()

{

	float** A = new float*[width];

	int i = 0;

	int j = 0;

	for (i = 0; i < width; i++)

	{

		A[i] = new float[height];

		for (j = 0; j < height; j++)

			A[i][j] = 2.0;

	}

	float** d_A = new float*[width];

	size_t size = width * height * sizeof(float);

	cudaMalloc(&d_A, size);

	cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice);

}

This code correctly compiles but execution result is unpredictable - sometimes it executes correctly but sometimes it crashes with c0000005 exception code. Can someone tell me what’s wrong ? Used configuration: Core i7 930@4200MHz, Eclipse SLI X58, 6Gb RAM (3x2 Kingston 1600MHz), GTX470 x 2 SLI, Windows 7 x64 sp1, MS VS 2008, Cuda SDK 3.2, ForceWare 266.58 and 263.06(Developer driver).

Thanks.

A is sizeof(float*)width, not widthheight*sizeof(float).

Could you please explain me why? A must contain 2D array in memory (2562564 bytes of float).

No, A is a pointer array of size width, each pointer pointing to an array of size height.

So first I must allocate memory for row of pointers and then allocate memory for each column ?

Yes. And once you are done recursively allocating the memory, you will have to do the same recursion to populate the memory with data from the host, and then the same recursion to copy the data back from the device to the host. Which is a lot a recursion, and why using indexing into linear memory rather than pointers for multidimensional data is usually a better choice.

Thanks a lot! Now it’s clear for me.