How to copy variable with two pointers ** from GPU to CPU

I have tried to use function cudaMemcpy2D, code as below, but it doesn’t work. Anyone can help me? Thank you in advance.

BYTE gpu_h; //hashtable
cudaMalloc((void
) &gpu_h, sizeof(BYTE*) * curlen);
BYTE h_q ;
h_q = (BYTE
)malloc(sizeof(BYTE*) * 10);


find_matchlen_test <<<grid, THREAD_NUM>>>(gpu_out, gpu_p, gpu_h);

int nx = 4, ny = 10; // lengght of sizeof(BYTE*) is 4, so I set nx(width) as 4.
size_t d_pitchBytes;
size_t h_pitchBytes = ny*sizeof(BYTE*);
cudaMallocPitch( (void**) &gpu_h, &d_pitchBytes, nx*sizeof(byte), ny);
cudaMemcpy2D( h_q, h_pitchBytes, gpu_h, d_pitchBytes, nx * sizeof(BYTE), ny, cudaMemcpyDeviceToHost);

Thanks
Rock

You’re using cudaMemcpy2D on a buffer you allocated using cudaMalloc, you should just use cudaMemcpy () instead. It looks like you might have meant to allocate an array using cudaMallocPitch though. Change one or the other?

curlen must == 10 (otherwise sizeof ( gpu_h ) != sizeof ( h_q ))

size_t h_pitchBytes = nysizeof(BYTE);

The pitch is the WIDTH in bytes of one row, so nx * sizeof(byte), make sure d_pitchBytes is correct in this respect.