Hello to everyone, I am having a weird problem. If someone could read this and gave me some ideas would be extremely helpfully. Thanks.
I am running a kernel where each threads reads several positions in an array (its assigned position and some neighbours). The kernel receives as a parameter the array from where each threads reads.
The array I send as input is allocated and then I use cudaMemset to put every element to 0 (it should be 0 on the first call to the kernel).
I have discovered that for some positions when I call the kernel the threads are reading a NAN instead of a 0.
I have already checked that the size of the array are ok, so no thread is passing over the maximum position of the array. I have checked the array in 2 different ways just before the kernel call: first, I copy the array back to cpu and check for nan there then I ran a kernel that just checks for NAN at each position (all of this just for testing). None of them have found NAN but when calling the real kernel just after the check tnan appears.
I don’t really know why are all this nan appearing because when I look at the exact same position just before the kernel I get the right value, 0.
May it be that as 2 threads are reading from the same position although at different parts of the kernel there may be memory crush or something?
Hope I could get some ideas from you to try and solve the problem.
Thanks in advance.