A strange question about cudaMemcpy2D

I recently try to copy a 2D matrix on device to the host.

But I run with Error: test_2D_matrix.cu:43, code: 77, reason: an illegal memory access was encountered.

Maybe I have some blind points so I put my code here:

#include <cuda_runtime.h>
#include <stdio.h>
#include <sys/time.h>
#include <cuda.h>
#include "common.h"

#include <cuda_device_runtime_api.h>

__global__ void initial(double **d_state)
{

int idx = threadIdx.x;
int idy = blockIdx.x;
d_state[idy][idx]=1.11;

}


int main(void)
{
	int N=90;
	int M=180;
	
	double h_state[N][M];


	double **d_state;
	size_t d_pitch;

	CHECK(cudaMallocPitch( &d_state,&d_pitch,sizeof(double)*M,N) );
	
	CHECK(cudaMemset2D(d_state,d_pitch,10,sizeof(double)*M,N) );
	cudaDeviceSynchronize();
	
	dim3 grid(N);
	dim3 block(M);

	initial<<<grid,block>>>(d_state);
	
	cudaDeviceSynchronize();

	CHECK(cudaMemcpy2D(h_state,sizeof(double)*M,d_state,d_pitch,sizeof(double)*M,N,cudaMemcpyDeviceToHost) );
	
	cudaDeviceSynchronize();
	printf("%f",h_state[1][3]);

	cudaFree(d_state);

	return(0);
}

cudaMemcpy2D/Memset2D/MallocPitch are not designed to enable the handling of multiply-subscripted arrays.

Note in the API documentation that the first parameter of cudaMallocPitch is a double-pointer argument (**).

But when you do this:

double **d_state;
	size_t d_pitch;

	CHECK(cudaMallocPitch( &d_state,&d_pitch,sizeof(double)*M,N) );
                               ^^^^^^^^

You are actually passing a triple-pointer (***) argument. That could be your first clue that you are headed down the wrong path.

You’re confused about what the functions are for. There are numerous questions on the web just like this one, so I’ll not elaborate further.

Thanks for this explanation. I am new to CUDA programming.

The answer is very helpful on my confusion.