cudaMemcpy synchronization problem

Hi all,

I’ve come across a problem I’ve not yet understood. I have a device array (d_data), allocated with cudaMalloc(). I read this out to a host array (h_data), manipulate it and copy it back. Unfortunately, the data is never changed in d_data.

I’ve tried synchonizing with cudaThreadSynchonize() and cuCtxSynchronize() and even moved to segment of code to another function, but nothing is changed. Does anyone have a clue? Maybe there is an error somewhere, an error I’ve failed to find after staring at the code for hours.

/ Mårten Björkman aka Celebrandil of Phenomena


int sz = sizeof(float)*numPts;

float h_data = (float)malloc(sz);

CUDA_SAFE_CALL(cudaMemcpy(h_data, d_data, sz, cudaMemcpyDeviceToHost));

for (int i=0;i<numPts;i++) {

printf("%d %.2f\n", i, h_data[i]);

h_data[i] = 100.0f;


CUDA_SAFE_CALL(cudaMemcpy(d_data, h_data, sz, cudaMemcpyHostToDevice));



And how did you check whether d_data is changed?

I’ve had an issue like this once, where it seemed that a cudaMemcpy didn’t do what it was supposed to do and ‘forgot’ the first part of the buffer it was told to copy. Then, I found out my kernel was writing outside of the allocated memory somewhere,
Buffer overflows cause really nondeterministic and weird behaviour, so look for those.

I wrote a program and did a test according to your description but found all works fine. did you memset your array every copying operation ?