My CUDA program is running to completion but its output changes slightly every time I run the program (changing between about 86 and 90 timesteps to reach a final time). This is a new error to me so I was wondering what could cause this in a program. My guess is that I have multiple threads writing to the same memory address on the GPU but I haven’t had any luck catching such a thing so far. Are there other ways CUDA implementations can be unpredictable? Any tips on catching them would also be great – right now I’m just staring at everything I cudaMalloced.
Sorry for not having code to post but the error could I suppose be anywhere in a large code.
EDIT: The issue appears to be a bug when using constant memory as input for intrinsic functions. Developing a workaround now.