I’ve come across a problem I’ve not yet understood. I have a device array (d_data), allocated with cudaMalloc(). I read this out to a host array (h_data), manipulate it and copy it back. Unfortunately, the data is never changed in d_data.
I’ve tried synchonizing with cudaThreadSynchonize() and cuCtxSynchronize() and even moved to segment of code to another function, but nothing is changed. Does anyone have a clue? Maybe there is an error somewhere, an error I’ve failed to find after staring at the code for hours.
I’ve had an issue like this once, where it seemed that a cudaMemcpy didn’t do what it was supposed to do and ‘forgot’ the first part of the buffer it was told to copy. Then, I found out my kernel was writing outside of the allocated memory somewhere,
Buffer overflows cause really nondeterministic and weird behaviour, so look for those.