cudaMemcpy causing strange behaviour

If I call a kernel A using variable d_x as a texture, then cudamemcpy some data from the device to host then back to the device which does not include d_x, and then call a different kernel B that also uses d_x as a texture, I get an unspecified launch failure from the second kernel B. However when trying to debug I found that not calling the cudamempy does not produce any launch failures (but the copy is required for the algorithm).

So why would the cudamemcpy be causing such a runtime error?

uh, are you sure it’s not just a case of asynchronous errors and the implicit synchronization associated with cudaMemcpy causing errors to show up earlier than they do when you just have one big cudaThreadSynchronize at the end (where you check errors… right?)?

you were partly correct, in that the error had nothing to do with the copy

But even more worrying, perhaps, is that the error was due to the compiler!

The compiler allowed the declaration and definition of a function to be different, so that what was supposed to be an array was in fact passed to the kernel as an integer constant.

Let me explain.

In my main CUDA function I call a function funcA which is declared in a .h header file and defined in a different .cu file to that of the main CUDA function. That .cu file contains “extern C” functions. funcA, an “extern C” function, calculates the thread grid definition for the kernel kA. The problem was caused by the fact that the parameter list in the call to funcA from the main CUDA function matched that of the declaration, but (here’s the problem) the parameter list in the definition did not match that of the declaration. So I thought I the last parameter I was passing was a float array but due to the mismatching parameter list the last parameter to funcA was an integer, so that when that got passed to the kernel kA the kernel was told it was a float array but what was being passed to it was actually an integer constant. Hence the launch failure.

But his raises another question; why would the compiler allow the parameter lists for the same function to not match? This is not the first time I’ve seen this.