cudaMemcpy not copying ?


I usually use cudaMemcpy without any problem but here I face one. For an unknown reason, it seems the values in the matrixGPU (GPU) are not copied in the matrix (CPU).

While debugging and reading the memory of compute_kernel, I can see that nodeGPU has the right values (so this first memcpy is working), and so has matrixGPU inside the kernel, but when in compute_distances, matrix has no values.

If I replace my kernel by a CPU code and matrixGPU by matrix (and nodeGPU by node) then, the values are good for matrix, even in the compute_distances function.

It appears like cudaMemcpy is not copying anything… do you have any idea?

Can you please help me fast? I need to handle this today…

Thanks a lot.

Here is listed my code (with only important parts). No errors are reported for memory allocation or memory copies.

extern long int ** compute_distances(struct point *node)


	long int **matrixGPU;

	long int **matrix;

	struct point *nodeGPU;

	cudaError_t cudaStatus;

	int mem_size, mem_size2;

	mem_size = // VALUE // ;

	mem_size2 = // VALUE // ;


	if((matrix = (long **) malloc(mem_size)) == NULL){




	cudaMalloc ( (void **) &matrixGPU, mem_size);

	cudaMalloc ( (void **) &nodeGPU, mem_size2);

	cudaMemcpy(nodeGPU, node, mem_size2, cudaMemcpyHostToDevice);

	compute_kernel<<<1,1>>>(matrixGPU, nodeGPU);

	cudaStatus = cudaMemcpy(matrix, matrixGPU, mem_size, cudaMemcpyDeviceToHost);


	if (cudaStatus != cudaSuccess) {

       fprintf(stderr, "cudaMemcpy failed!");




	return matrix;


The kernel header is

__global__ void compute_kernel(long int **matrix, struct point * node)

Replace all the double pointers ([font=“Courier New”]**[/font]) with single pointers ([font=“Courier New”]*[/font]).

Ok thanks for answering so quickly. It appears I encounter another problem inside the kernel this time.

To avoid double pointer, I defined (simplified code)

typedef struct Matrix


	long int ** content;

} Matrix;

and used

Matrix matrixGPU;

Matrix matrix;

matrix.content = (long **) malloc(mem_size);

cudaMalloc ( (void **) &(matrixGPU.content), mem_size);


cudaStatus = cudaMemcpy(&matrix, &matrixGPU, mem_size, cudaMemcpyDeviceToHost);


Matrix *matrix; // function parameter

matrix->content[i] = (long int*) (matrix->content + numbers);

Same idea, this is working in CPU but crashing in GPU.

Is this not legal?

While debugging NSight gave me an “access violation” error on this line.

Thanks again for your help and your fast answer.

I’m not sure what you are trying to achieve but I get the impression that you need to read up on basic pointer use in C and the difference between a pointer and an array.

Just guessing at the reason for the confusion: [font=“Courier New”]int **matrix[/font] and [font=“Courier New”]int matrix[NX][NY][/font] are very different things in C. Only the latter is a two-dimensional array that could be used as a matrix.

Yes I know the difference don’t worry.
Double pointer only is a 2D table if memory is allocated in this purpose.
It was a linear table meant to be in 2D.

Thanks anyway. I could not fix the problem but I’m still working on it.