Hi,
I usually use cudaMemcpy without any problem but here I face one. For an unknown reason, it seems the values in the matrixGPU (GPU) are not copied in the matrix (CPU).
While debugging and reading the memory of compute_kernel, I can see that nodeGPU has the right values (so this first memcpy is working), and so has matrixGPU inside the kernel, but when in compute_distances, matrix has no values.
If I replace my kernel by a CPU code and matrixGPU by matrix (and nodeGPU by node) then, the values are good for matrix, even in the compute_distances function.
It appears like cudaMemcpy is not copying anything… do you have any idea?
Can you please help me fast? I need to handle this today…
Thanks a lot.
Here is listed my code (with only important parts). No errors are reported for memory allocation or memory copies.
extern long int ** compute_distances(struct point *node)
{
long int **matrixGPU;
long int **matrix;
struct point *nodeGPU;
cudaError_t cudaStatus;
int mem_size, mem_size2;
mem_size = // VALUE // ;
mem_size2 = // VALUE // ;
if((matrix = (long **) malloc(mem_size)) == NULL){
exit(1);
}
cudaMalloc ( (void **) &matrixGPU, mem_size);
cudaMalloc ( (void **) &nodeGPU, mem_size2);
cudaMemcpy(nodeGPU, node, mem_size2, cudaMemcpyHostToDevice);
compute_kernel<<<1,1>>>(matrixGPU, nodeGPU);
cudaStatus = cudaMemcpy(matrix, matrixGPU, mem_size, cudaMemcpyDeviceToHost);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
}
cudaFree(matrixGPU);
cudaFree(nodeGPU);
return matrix;
}
The kernel header is
__global__ void compute_kernel(long int **matrix, struct point * node)