A wired problem about cudaMemcpy

Dear all,

I have got a wired problem when I use the function “cudaMemcpy”, the codes are shown below:

extern "C"
void functionA(...){
......
//mmmmm
cudaMemcpy(TransformedGradientx,    gpuTransformedGradientx,    sizeof(float)*Vsize, cudaMemcpyDeviceToHost);
cudaMemcpy(TransformedGradienty,    gpuTransformedGradienty,    sizeof(float)*Vsize, cudaMemcpyDeviceToHost);
cudaMemcpy(TransformedGradientz,    gpuTransformedGradientz,    sizeof(float)*Vsize, cudaMemcpyDeviceToHost);
cudaMemcpy(FixedTransformedImage,   gpuFixedTransformedImage,   sizeof(float)*Vsize, cudaMemcpyDeviceToHost);
cudaMemcpy(ForwardTransformedImage, gpuForwardTransformedImage, sizeof(float)*Vsize, cudaMemcpyDeviceToHost);
.....
}

void main{
    .....
    for (int ii = 0; ii < iteration; ii++){
          functionA(...)
    }
    ....
}

This program can be succeed run in first 55 iterations, however, at 56th iteration, the code hanged, never run. Such hanged place located at “mmmmm”. How can I solve this problem?

Thanks a lot.