cudaMemcpy from device to host and Segmentation Fault cudaMemcpy Segmentation Fault

Hi all,

here I encounter a problem of segmentation fault when calling the statement of cudaMemcpy in the following:


int* d_startArray;

CUDA_SAFE_CALL( cudaMalloc( (void**) (&d_startArray), sizeof(int)*rLen) );


int lastValue;

printf(“debug: d_startArray=%p rlen=%d”, d_startArray, rLen);

CUDA_SAFE_CALL( cudaMemcpy( &lastValue, d_startArray+(rLen-1), sizeof(int), cudaMemcpyDeviceToHost) );

printf(“reach here? no!”);


The printf before it shows that the d_startArray and rLen have some proper values. But the process cannot run to the printf after, which means it is useless to check the return value of cudaMemcpy.

I though it may be the problem of (d_startArrary+(rLen-1)), since this is a device pointer calculation. So i change the program to:



int *h_startArray;

printf("debug: d_startArray=%p rlen=%d", d_startArray, rLen);

cudaMallocHost((void **)&startArray, sizeof(int)*rLen);

printf("reach here? yes.");

cudaError_t ret=cudaMemcpy( startArray, d_startArray, sizeof(int)*rLen, cudaMemcpyDeviceToHost);

printf("reach here? No.");



But the segmentation fault is still generated in the cudaMemcpy() statement. I have no idea why this can happen. I read from the cuda’s manual that dereferencing a device pointer in the host side may lead to a segmentation fault. But here I have tried to avoid that.

Does anybody have any idea about this? Thanks very much!


Can you upload full source?

Hi, Acutually I am using the Mars (version1). It can be downloaded from cuda zone or here (

I am trying to use the sample application PageViewCount. The seg fault happens in the function of getChunkBoundary() in