Hi,

I usually use cudaMemcpy without any problem but here I face one. For an unknown reason, it seems the values in the matrixGPU (GPU) are not copied in the matrix (CPU).

While debugging and reading the memory of compute_kernel, I can see that nodeGPU has the right values (so this first memcpy is working), and so has matrixGPU inside the kernel, but when in compute_distances, matrix has no values.

If I replace my kernel by a CPU code and matrixGPU by matrix (and nodeGPU by node) then, the values are good for matrix, even in the compute_distances function.

It appears like cudaMemcpy is not copying anything… do you have any idea?

Can you please help me fast? I need to handle this today…

Thanks a lot.

Here is listed my code (with only important parts). No errors are reported for memory allocation or memory copies.

```
extern long int ** compute_distances(struct point *node)
{
long int **matrixGPU;
long int **matrix;
struct point *nodeGPU;
cudaError_t cudaStatus;
int mem_size, mem_size2;
mem_size = // VALUE // ;
mem_size2 = // VALUE // ;
if((matrix = (long **) malloc(mem_size)) == NULL){
exit(1);
}
cudaMalloc ( (void **) &matrixGPU, mem_size);
cudaMalloc ( (void **) &nodeGPU, mem_size2);
cudaMemcpy(nodeGPU, node, mem_size2, cudaMemcpyHostToDevice);
compute_kernel<<<1,1>>>(matrixGPU, nodeGPU);
cudaStatus = cudaMemcpy(matrix, matrixGPU, mem_size, cudaMemcpyDeviceToHost);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
}
cudaFree(matrixGPU);
cudaFree(nodeGPU);
return matrix;
}
```

The kernel header is

```
__global__ void compute_kernel(long int **matrix, struct point * node)
```